aco: Enable constant exec mask based optimization on compute shaders.

We know for sure exec is initially -1 when the shader always has full subgroups.

Fossil DB stats on GFX11:
Totals from 3884 (2.88% of 134913) affected shaders:
SpillSGPRs: 1673 -> 1697 (+1.43%); split: -1.67%, +3.11%
SpillVGPRs: 2316 -> 2310 (-0.26%); split: -0.65%, +0.39%
CodeSize: 19584436 -> 19567156 (-0.09%); split: -0.13%, +0.04%
Scratch: 217088 -> 216832 (-0.12%)
Instrs: 3784596 -> 3780303 (-0.11%); split: -0.15%, +0.03%
Latency: 39971204 -> 39794967 (-0.44%); split: -0.47%, +0.03%
InvThroughput: 7885552 -> 7801247 (-1.07%); split: -1.14%, +0.07%
VClause: 74654 -> 74611 (-0.06%); split: -0.07%, +0.01%
SClause: 103139 -> 103043 (-0.09%); split: -0.13%, +0.04%
Copies: 279864 -> 281995 (+0.76%); split: -0.72%, +1.48%
Branches: 92082 -> 92084 (+0.00%); split: -0.03%, +0.03%
PreSGPRs: 155637 -> 149491 (-3.95%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
This commit is contained in:
Timur Kristóf 2023-01-10 19:15:25 +01:00 committed by Marge Bot
parent 39448c8e9c
commit 81620fc7b0

View file

@ -268,6 +268,12 @@ add_coupling_code(exec_ctx& ctx, Block* block, std::vector<aco_ptr<Instruction>>
bld.copy(Definition(exec, bld.lm), start_exec);
}
/* EXEC is automatically initialized by the HW for compute shaders.
* We know for sure exec is initially -1 when the shader always has full subgroups.
*/
if (ctx.program->stage == compute_cs && ctx.program->info.cs.uses_full_subgroups)
start_exec = Operand::c32_or_c64(-1u, bld.lm == s2);
if (ctx.handle_wqm) {
ctx.info[0].exec.emplace_back(start_exec, mask_type_global | mask_type_exact);
/* if this block needs WQM, initialize already */