From 09f89d15e4c983bf8762e53e0f7d62f32480ecfd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timur=20Krist=C3=B3f?= Date: Fri, 24 Sep 2021 17:37:45 +0200 Subject: [PATCH] ac/nir/nggc: Don't reuse uniform values from divergent control flow. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With NGG culling, the shaders are split into two parts: the top part that computes just the position output, and the bottom part which produces the other outputs. To reduce redundancy between the two, I added some code to reuse uniform variables between them. However, there is an edge case I didn't think about: because of vertex repacking, it is possible for the bottom part to process a different vertex. Therefore it can take a different divergent code path (though it must still take the same uniform code path). Due to this, when a uniform value comes from divergent control flow, this may be undefined in the bottom part. This commit stops reusing uniform variables from divergent control flow, to fix issues that arise from this. Fossil DB stats on Sienna Cichlid with NGGC on: Totals from 1723 (1.34% of 128647) affected shaders: VGPRs: 89312 -> 89184 (-0.14%); split: -0.15%, +0.01% SpillSGPRs: 4575 -> 120 (-97.38%) CodeSize: 10846424 -> 10873836 (+0.25%); split: -0.68%, +0.93% MaxWaves: 34582 -> 34602 (+0.06%); split: +0.06%, -0.01% Instrs: 2124471 -> 2128835 (+0.21%); split: -0.51%, +0.72% Latency: 7274569 -> 7293899 (+0.27%); split: -0.22%, +0.48% InvThroughput: 1637130 -> 1635490 (-0.10%); split: -0.17%, +0.07% VClause: 25141 -> 25414 (+1.09%); split: -0.02%, +1.10% SClause: 56367 -> 59503 (+5.56%); split: -1.36%, +6.93% Copies: 230704 -> 219313 (-4.94%); split: -5.49%, +0.55% Branches: 72781 -> 72681 (-0.14%); split: -0.21%, +0.07% PreSGPRs: 118766 -> 100176 (-15.65%); split: -15.70%, +0.05% PreVGPRs: 76876 -> 76833 (-0.06%) Fixes: 0bb543bb60f93bea5b1c0ed6382ced49e659273e Signed-off-by: Timur Kristóf Reviewed-by: Daniel Schürmann Part-of: --- src/amd/common/ac_nir_lower_ngg.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/amd/common/ac_nir_lower_ngg.c b/src/amd/common/ac_nir_lower_ngg.c index 233d20f5f47..88d6865198d 100644 --- a/src/amd/common/ac_nir_lower_ngg.c +++ b/src/amd/common/ac_nir_lower_ngg.c @@ -937,7 +937,23 @@ save_reusable_variables(nir_builder *b, lower_ngg_nogs_state *nogs_state) nir_cf_node *next_cf_node = nir_cf_node_next(&block->cf_node); if (next_cf_node) { /* It makes no sense to try to reuse things from within loops. */ - if (next_cf_node->type == nir_cf_node_loop) { + bool next_is_loop = next_cf_node->type == nir_cf_node_loop; + + /* Don't reuse if we're in divergent control flow. + * + * Thanks to vertex repacking, the same shader invocation may process a different vertex + * in the top and bottom part, and it's even possible that this different vertex was initially + * processed in a different wave. So the two parts may take a different divergent code path. + * Therefore, these variables in divergent control flow may stay undefined. + * + * Note that this problem doesn't exist if vertices are not repacked or if the + * workgroup only has a single wave. + */ + bool next_is_divergent_if = + next_cf_node->type == nir_cf_node_if && + nir_cf_node_as_if(next_cf_node)->condition.ssa->divergent; + + if (next_is_loop || next_is_divergent_if) { block = nir_cf_node_cf_tree_next(next_cf_node); continue; }