Commit graph

11 commits

Author SHA1 Message Date
Timur Kristóf
14a5021aff aco/gfx10: Refactor of GFX10 wave64 bpermute.
The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr,
so we don't consider it a reduction anymore. Additionally, the
code is slightly reorganized in preparation for the GFX6 emulated
bpermute.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
2020-06-02 21:12:12 +00:00
Samuel Pitoiset
8ece71507d aco: allocate a temp VGPR for some 8-bit/16-bit reduction ops on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>
2020-05-29 11:20:58 +00:00
Samuel Pitoiset
c76595aec2 aco: use a temporary SGPR for 8-bit/16-bit literal reduction identities
Otherwise, the compiler overwrites s0 which contains the exec mask.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4494>
2020-05-21 15:06:48 +00:00
Rhys Perry
20eb1acb6f aco: fix gfx10_wave64_bpermute
Since 9254fb4fc7, the pass replaced the SCC clobber with the scalar
identity temporary. Just skip most of the temporary setup, since we don't
need it for gfx10_wave64_bpermute.

Although shuffles are disabled on GFX10, Detroit: Become Human seems to
use them anyway.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 9254fb4fc7 ('aco: don't use a scalar
       temporary for reductions on GFX10')

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683>
2020-02-06 16:43:03 +00:00
Daniel Schürmann
f895a8b1df aco: implement (clustered) reductions for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
9254fb4fc7 aco: don't use a scalar temporary for reductions on GFX10
This patch also adds the scalar temporary for scans on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Timur Kristóf
e0bcefc3a0 aco/wave32: Use lane mask regclass for exec/vcc.
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Rhys Perry
56c06c79fc aco: implement 64-bit integer reductions
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.

No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.

v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
2019-11-19 18:58:04 +00:00
Timur Kristóf
d59f702e26 aco: Implement subgroup shuffle in GFX10 wave64 mode.
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.

This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Rhys Perry
3865448012 aco: Fix reductions on GFX10.
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Daniel Schürmann
93c8ebfa78 aco: Initial commit of independent AMD compiler
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.

ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.

Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com>

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-19 12:10:00 +02:00