mesa/src/amd/llvm
Marek Olšák 13acbaecd8 radeonsi: rewrite the prefix sum computation for shader culling
Instead of storing the vertex mask per wave into LDS and then computing
the prefix sum, store 8-bit bitcounts (vertex counts) of the vertex masks
into LDS. This allows us to compute the sum using v_sad_u8, which computes
a sum of 4 i8vec4 components in one instruction.

Each i8vec4 of vertex counts is loaded in parallel threads (one dword
per thread) instead of all being loaded in thread 0, and readlane copies
them to SGPRs instead of readfirstlane.

LDS is no longer initialized before culling. Instead, the counts for
inactive waves are masked with AND later.

Incorrect old comments are also fixed.

This change removes 80 bytes from the code size, and it allows increasing
the workgroup size from 128 to 256. (which is the main motivation for this)

Now changing the workgroup size with wave64 has no effect on the code size.
Switching to wave32 with 8 waves even generates slightly smaller code than
wave64 with 4 waves.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
2021-05-25 16:15:44 +00:00
..
ac_llvm_build.c radeonsi: rewrite the prefix sum computation for shader culling 2021-05-25 16:15:44 +00:00
ac_llvm_build.h radeonsi: rewrite the prefix sum computation for shader culling 2021-05-25 16:15:44 +00:00
ac_llvm_cull.c ac: add build_alloca with an initializer 2020-11-18 06:19:59 +00:00
ac_llvm_cull.h amd/llvm: switch to 3-spaces style 2020-09-07 10:00:20 +02:00
ac_llvm_helper.cpp amd: drop support for LLVM 9 2021-04-16 09:25:19 +00:00
ac_llvm_util.c ac/llvm: set target features per function instead of per target machine 2021-05-25 16:15:44 +00:00
ac_llvm_util.h ac/llvm: set target features per function instead of per target machine 2021-05-25 16:15:44 +00:00
ac_nir_to_llvm.c ac/llvm: allow ac_build_optimization_barrier with SGPRs, pointers, and metadata 2021-05-25 16:15:44 +00:00
ac_nir_to_llvm.h ac: move ac_lower_indirect_derefs() outside of the LLVM dir 2021-04-23 11:52:01 +02:00
ac_shader_abi.h ac/llvm: Implement new Geometry Shader intrinsics. 2021-03-17 12:42:23 +00:00
meson.build meson: use gnu_symbol_visibility argument 2020-06-01 18:59:18 +00:00