agx: Use agx_nir_opt_preamble

Now that everything is in place, we can actually take advantage of
preambles. This wins us a crude form of UBO pushing (accounting for most
of the win here), as well as its intended purpose of optimizing
uniform-on-uniform arithmetic.

shader-db results are excellent. The shader that's regressed for instruction
count is a fragment shader that solely consists of `gl_FragColor = uniform`,
which goes from a vectorized UBO load to four scalar moves. That's more
instructions (and more bytes) but presumably faster, since ALU should be much
cheaper than load/store.

total instructions in shared programs: 6502 -> 5764 (-11.35%)
instructions in affected programs: 5136 -> 4398 (-14.37%)
helped: 60
HURT: 1
helped stats (abs) min: 2.0 max: 47.0 x̄: 12.33 x̃: 8
helped stats (rel) min: 0.84% max: 34.48% x̄: 18.69% x̃: 21.05%
HURT stats (abs)   min: 2.0 max: 2.0 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33%
95% mean confidence interval for instructions value: -14.69 -9.51
95% mean confidence interval for instructions %-change: -20.49% -15.20%
Instructions are helped.

total bytes in shared programs: 42186 -> 38310 (-9.19%)
bytes in affected programs: 33182 -> 29306 (-11.68%)
helped: 60
HURT: 1
helped stats (abs) min: 10.0 max: 272.0 x̄: 64.83 x̃: 50
helped stats (rel) min: 0.72% max: 30.00% x̄: 15.16% x̃: 16.67%
HURT stats (abs)   min: 14.0 max: 14.0 x̄: 14.00 x̃: 14
HURT stats (rel)   min: 31.82% max: 31.82% x̄: 31.82% x̃: 31.82%
95% mean confidence interval for bytes value: -77.73 -49.35
95% mean confidence interval for bytes %-change: -16.66% -12.11%
Bytes are helped.

total halfregs in shared programs: 2370 -> 1639 (-30.84%)
halfregs in affected programs: 1804 -> 1073 (-40.52%)
helped: 60
HURT: 0
helped stats (abs) min: 1.0 max: 40.0 x̄: 12.18 x̃: 8
helped stats (rel) min: 3.85% max: 72.73% x̄: 41.37% x̃: 36.17%
95% mean confidence interval for halfregs value: -14.77 -9.60
95% mean confidence interval for halfregs %-change: -46.00% -36.75%
Halfregs are helped.

Total CPU time (seconds): 2.71 -> 2.80 (3.32%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
This commit is contained in:
Alyssa Rosenzweig 2022-10-22 14:17:54 -04:00
parent 5e8b0289c3
commit 3570e94bcc

View file

@ -1646,7 +1646,7 @@ agx_lower_aligned_offsets(struct nir_builder *b,
}
static void
agx_optimize_nir(nir_shader *nir)
agx_optimize_nir(nir_shader *nir, unsigned *preamble_size)
{
bool progress;
@ -1687,6 +1687,7 @@ agx_optimize_nir(nir_shader *nir)
NIR_PASS(progress, nir, nir_opt_loop_unroll);
} while (progress);
NIR_PASS_V(nir, agx_nir_opt_preamble, preamble_size);
NIR_PASS_V(nir, nir_opt_algebraic_late);
NIR_PASS_V(nir, nir_opt_constant_folding);
NIR_PASS_V(nir, nir_copy_prop);
@ -1935,7 +1936,7 @@ agx_compile_shader_nir(nir_shader *nir,
NIR_PASS_V(nir, agx_lower_resinfo);
NIR_PASS_V(nir, nir_legalize_16bit_sampler_srcs, tex_constraints);
agx_optimize_nir(nir);
agx_optimize_nir(nir, &out->push_count);
/* Implement conditional discard with real control flow like Metal */
NIR_PASS_V(nir, nir_lower_discard_if, (nir_lower_discard_if_to_cf |