Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.
This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.
ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.
Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>