turnip: Enable lowering of mediump temps/CS shared to 16-bit.

In Aztec Ruins, we end up storing some big shared-mem arrays as 16-bit,
cutting shared mem size in half across many shaders while also reducing
conversions.  gfxbench vk-5-normal perf +0.364983% +/- 0.189764% (n=4).

fossil-db:
Totals from 448 (2.99% of 14988) affected shaders:
MaxWaves: 6154 -> 6390 (+3.83%); split: +3.96%, -0.13%
Instrs: 174554 -> 165045 (-5.45%); split: -6.45%, +1.01%
CodeSize: 364224 -> 345558 (-5.12%); split: -6.03%, +0.90%
NOPs: 48224 -> 48024 (-0.41%); split: -3.33%, +2.91%
MOVs: 6985 -> 6104 (-12.61%); split: -19.11%, +6.50%
Full: 4577 -> 4101 (-10.40%); split: -11.08%, +0.68%
(ss): 3428 -> 3335 (-2.71%); split: -4.17%, +1.46%
(sy): 1250 -> 1205 (-3.60%); split: -4.72%, +1.12%
(ss)-stall: 14695 -> 14528 (-1.14%); split: -2.25%, +1.12%
(sy)-stall: 19565 -> 17998 (-8.01%); split: -9.55%, +1.54%
STPs: 1086 -> 870 (-19.89%)
LDPs: 162 -> 108 (-33.33%)
Cat0: 51400 -> 51120 (-0.54%); split: -3.31%, +2.76%
Cat1: 16861 -> 14688 (-12.89%); split: -18.18%, +5.30%
Cat2: 71161 -> 68454 (-3.80%); split: -4.52%, +0.72%
Cat3: 29572 -> 25306 (-14.43%); split: -14.49%, +0.06%
Cat4: 3128 -> 3131 (+0.10%)
Cat5: 1502 -> 1506 (+0.27%)
Cat6: 840 -> 750 (-10.71%)

aztec ruins is a big winner with the ldp/stp reductions.  summoners_war
racks up an astounding 41% reduction in instructions and +15% max_waves.
Most affected apps show a minor win in instrs, with
fallout_shelter_online, and aztec ruins on ANGLE taking minor hits.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18259>
This commit is contained in:
Emma Anholt 2022-08-25 14:34:20 -07:00 committed by Marge Bot
parent e1588cdf9e
commit 08548650bd

View file

@ -104,9 +104,22 @@ tu_spirv_to_nir(struct tu_device *dev,
NIR_PASS_V(nir, nir_lower_sysvals_to_varyings, &sysvals_to_varyings);
NIR_PASS_V(nir, nir_lower_global_vars_to_local);
/* Older glslang missing bf6efd0316d8 ("SPV: Fix #2293: keep relaxed
* precision on arg passed to relaxed param") will pass function args through
* a highp temporary, so we need the nir_opt_find_array_copies() and a copy
* prop before we lower mediump vars, or you'll be unable to optimize out
* array copies after lowering. We do this before splitting copies, since
* that works against nir_opt_find_array_copies().
* */
NIR_PASS_V(nir, nir_opt_find_array_copies);
NIR_PASS_V(nir, nir_opt_copy_prop_vars);
NIR_PASS_V(nir, nir_opt_dce);
NIR_PASS_V(nir, nir_split_var_copies);
NIR_PASS_V(nir, nir_lower_var_copies);
NIR_PASS_V(nir, nir_lower_mediump_vars, nir_var_function_temp | nir_var_shader_temp | nir_var_mem_shared);
NIR_PASS_V(nir, nir_opt_copy_prop_vars);
NIR_PASS_V(nir, nir_opt_combine_stores, nir_var_all);