From 3570e94bcc187512490ac0871086fb101dc1c9d6 Mon Sep 17 00:00:00 2001 From: Alyssa Rosenzweig Date: Sat, 22 Oct 2022 14:17:54 -0400 Subject: [PATCH] agx: Use agx_nir_opt_preamble MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that everything is in place, we can actually take advantage of preambles. This wins us a crude form of UBO pushing (accounting for most of the win here), as well as its intended purpose of optimizing uniform-on-uniform arithmetic. shader-db results are excellent. The shader that's regressed for instruction count is a fragment shader that solely consists of `gl_FragColor = uniform`, which goes from a vectorized UBO load to four scalar moves. That's more instructions (and more bytes) but presumably faster, since ALU should be much cheaper than load/store. total instructions in shared programs: 6502 -> 5764 (-11.35%) instructions in affected programs: 5136 -> 4398 (-14.37%) helped: 60 HURT: 1 helped stats (abs) min: 2.0 max: 47.0 x̄: 12.33 x̃: 8 helped stats (rel) min: 0.84% max: 34.48% x̄: 18.69% x̃: 21.05% HURT stats (abs) min: 2.0 max: 2.0 x̄: 2.00 x̃: 2 HURT stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33% 95% mean confidence interval for instructions value: -14.69 -9.51 95% mean confidence interval for instructions %-change: -20.49% -15.20% Instructions are helped. total bytes in shared programs: 42186 -> 38310 (-9.19%) bytes in affected programs: 33182 -> 29306 (-11.68%) helped: 60 HURT: 1 helped stats (abs) min: 10.0 max: 272.0 x̄: 64.83 x̃: 50 helped stats (rel) min: 0.72% max: 30.00% x̄: 15.16% x̃: 16.67% HURT stats (abs) min: 14.0 max: 14.0 x̄: 14.00 x̃: 14 HURT stats (rel) min: 31.82% max: 31.82% x̄: 31.82% x̃: 31.82% 95% mean confidence interval for bytes value: -77.73 -49.35 95% mean confidence interval for bytes %-change: -16.66% -12.11% Bytes are helped. total halfregs in shared programs: 2370 -> 1639 (-30.84%) halfregs in affected programs: 1804 -> 1073 (-40.52%) helped: 60 HURT: 0 helped stats (abs) min: 1.0 max: 40.0 x̄: 12.18 x̃: 8 helped stats (rel) min: 3.85% max: 72.73% x̄: 41.37% x̃: 36.17% 95% mean confidence interval for halfregs value: -14.77 -9.60 95% mean confidence interval for halfregs %-change: -46.00% -36.75% Halfregs are helped. Total CPU time (seconds): 2.71 -> 2.80 (3.32%) Signed-off-by: Alyssa Rosenzweig Part-of: --- src/asahi/compiler/agx_compile.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/asahi/compiler/agx_compile.c b/src/asahi/compiler/agx_compile.c index bb920bc293a..7016eefe043 100644 --- a/src/asahi/compiler/agx_compile.c +++ b/src/asahi/compiler/agx_compile.c @@ -1646,7 +1646,7 @@ agx_lower_aligned_offsets(struct nir_builder *b, } static void -agx_optimize_nir(nir_shader *nir) +agx_optimize_nir(nir_shader *nir, unsigned *preamble_size) { bool progress; @@ -1687,6 +1687,7 @@ agx_optimize_nir(nir_shader *nir) NIR_PASS(progress, nir, nir_opt_loop_unroll); } while (progress); + NIR_PASS_V(nir, agx_nir_opt_preamble, preamble_size); NIR_PASS_V(nir, nir_opt_algebraic_late); NIR_PASS_V(nir, nir_opt_constant_folding); NIR_PASS_V(nir, nir_copy_prop); @@ -1935,7 +1936,7 @@ agx_compile_shader_nir(nir_shader *nir, NIR_PASS_V(nir, agx_lower_resinfo); NIR_PASS_V(nir, nir_legalize_16bit_sampler_srcs, tex_constraints); - agx_optimize_nir(nir); + agx_optimize_nir(nir, &out->push_count); /* Implement conditional discard with real control flow like Metal */ NIR_PASS_V(nir, nir_lower_discard_if, (nir_lower_discard_if_to_cf |