From c6e831ac44e46e6359b5b11bca43852e8d9e90f0 Mon Sep 17 00:00:00 2001 From: Faith Ekstrand Date: Thu, 21 Aug 2025 13:38:27 -0400 Subject: [PATCH] nak,nir: Use a simpler version of phis_to_regs_block in lower_cf The original lower_phis_to_regs_block() is a little too clever. It crawls up the predecessor tree until it finds a cross edge and places the register writes as deep as it can. This breaks nak_nir_lower_cf(). Say you have a shader like... con %0 = load_uniform() con loop { if div { } else { } break; } con %1 = phi %0 The original lower_phis_to_regs_block() will turn it into con %0 = load_uniform() con %r = decl_reg(); con loop { if div { reg_store(%r, %0) } else { reg_store(%r, %0) } break; } con %1 = reg_load(%r) We then convert it into unstructured control-flow and run regs_to_ssa() to get our phis back, which lowers each of the registers we inserted to a phi tree. When we try to recover divergence information on phis by looking at their sources, this works fine if each source maps directly to a reg_store() whic maps directly to a phi in the original IR. However, because the reg_store() instructions are placed deeper, it may introduce false divergence. Switch to the simple version of nir_lower_phis_to_regs_block() which places reg writes directly in phi predecessor blocks. We could probably be more conservative and just avoid placing writes to uniform regs in divergent control-flow but it's more robust to make the load/store_reg intrinsics match the original phis directly. This fixes some shaders in Horizon: Zero Dawn Remastered Fixes: b013d54e4ff9 ("nak/lower_cf: Flag phis as convergent when possible") Reviewed-by: Mel Henning Part-of: --- src/nouveau/compiler/nak_nir_lower_cf.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/nouveau/compiler/nak_nir_lower_cf.c b/src/nouveau/compiler/nak_nir_lower_cf.c index f4a808412a4..b859b1964a9 100644 --- a/src/nouveau/compiler/nak_nir_lower_cf.c +++ b/src/nouveau/compiler/nak_nir_lower_cf.c @@ -445,10 +445,14 @@ lower_cf_func(nir_function *func) nir_metadata_require(old_impl, nir_metadata_dominance | nir_metadata_divergence); /* First, we temporarily get rid of SSA. This will make all our block - * motion way easier. + * motion way easier. Ask the pass to place reg writes directly in the + * immediate predecessors of the phis instead of trying to be clever. + * This will ensure that we never get a write to a uniform register from + * non-uniform control flow and makes our divergence reconstruction for + * phis more reliable. */ nir_foreach_block(block, old_impl) - nir_lower_phis_to_regs_block(block, false); + nir_lower_phis_to_regs_block(block, true); /* We create a whole new nir_function_impl and copy the contents over */ func->impl = NULL;