freedreno/ir3: Consistently lower mediump inputs to 16-bit (when we can).

If every use was a conversion to 16, then ir3_cf would fold it into the
bary instruction.  But if something had generated a highp comparison of
the mediump input with a mediump op result, it would get stuck as highp,
even though we could have used 16-bit values without upconverting.

This fixes dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment on
ANGLE on turnip, closing #7043.  fossil-db results are mixed:

fossil-db:
Totals from 697 (4.65% of 14988) affected shaders:
MaxWaves: 10712 -> 10736 (+0.22%)
Instrs: 82394 -> 83572 (+1.43%); split: -1.31%, +2.74%
CodeSize: 178280 -> 180118 (+1.03%); split: -0.46%, +1.49%
NOPs: 15887 -> 16067 (+1.13%); split: -7.48%, +8.61%
MOVs: 1297 -> 1328 (+2.39%); split: -6.86%, +9.25%
Full: 3730 -> 3842 (+3.00%); split: -1.80%, +4.80%
(ss): 1877 -> 1849 (-1.49%); split: -5.59%, +4.10%
(sy): 1249 -> 1255 (+0.48%); split: -1.04%, +1.52%
(ss)-stall: 6809 -> 6364 (-6.54%); split: -13.85%, +7.31%
(sy)-stall: 17059 -> 17257 (+1.16%); split: -6.51%, +7.67%
Cat0: 17220 -> 17400 (+1.05%); split: -6.90%, +7.94%
Cat1: 5307 -> 6366 (+19.95%); split: -6.93%, +26.89%
Cat2: 39138 -> 39101 (-0.09%); split: -0.31%, +0.22%
Cat3: 16772 -> 16741 (-0.18%)
Cat5: 1269 -> 1276 (+0.55%)

I tried to pick some apps to test that looked the most impacted, and
indeed the results are mixed:

cookie_run_kingdom:         +0.275514% +/- 0.0883816% (n=68)
trex_200:                   +0.0943847% +/- 0.0297073% (n=1463)
command_and_conquer_rivals: no difference (n=131)
war_planet_online:          no difference (n=120)
lego_legacy:                -0.192131% +/- 0.152083% (n=99)
among_us:                   -0.625227% +/- 0.385419% (n=60)

Given that the perf results are small and go both ways, and apparently
we're an outlier in not always lowering mediump inputs to 16-bit, just do
it for consistency with other drivers.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18506>
This commit is contained in:
Emma Anholt 2022-08-09 17:48:14 -07:00 committed by Marge Bot
parent f4857591e1
commit a1c765a44c
3 changed files with 39 additions and 3 deletions

View file

@ -1178,6 +1178,8 @@ image_from_yuv_textures,-1
image_scale_aligned,-1
imagealphathreshold_image,-1
imageblur,-1
# A few pixels at the edge of the blur, nothing more noticeable at those points than the jaggedness elsewhere.
imageblurclampmode,180
imagefilters_xfermodes,-1
imagefiltersbase,-1
imagefiltersclipped,-1

View file

@ -207,7 +207,7 @@ traces:
freedreno-a530:
checksum: 70e18ba06d56fea277cd3fb000729879
freedreno-a630:
checksum: 9eb6d261c0c5946d8aeb0f41b2b7c1b1
checksum: 92944a4996ea019f51cfcdeaf4fc6926
glmark2/desktop:windows=4:effect=shadow-v2.trace:
freedreno-a306:
@ -223,7 +223,7 @@ traces:
freedreno-a530:
checksum: 7712c8143a244c56922124f4ac207722
freedreno-a630:
checksum: 7712c8143a244c56922124f4ac207722
checksum: 8a3b04aff4848fd36de9d956f24fac2f
glmark2/effect2d:kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;-v2.trace:
freedreno-a306:
@ -231,7 +231,7 @@ traces:
freedreno-a530:
checksum: 50f1841f2bb96905c9fcd1815c4a95c0
freedreno-a630:
checksum: 50f1841f2bb96905c9fcd1815c4a95c0
checksum: 05e7d72609840c897f734dca03748ead
glmark2/function:fragment-steps=5:fragment-complexity=low-v2.trace:
freedreno-a306:

View file

@ -463,6 +463,40 @@ ir3_nir_post_finalize(struct ir3_shader *shader)
if (compiler->gen >= 6 && s->info.stage == MESA_SHADER_FRAGMENT &&
!(ir3_shader_debug & IR3_DBG_NOFP16)) {
/* Lower FS mediump inputs to 16-bit. If you declared it mediump, you
* probably want 16-bit instructions (and have set
* mediump/RelaxedPrecision on most of the rest of the shader's
* instructions). If we don't lower it in NIR, then comparisons of the
* results of mediump ALU ops with the mediump input will happen in highp,
* causing extra conversions (and, incidentally, causing
* dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment on ANGLE to
* fail)
*
* However, we can't do flat inputs because flat.b doesn't have the
* destination type for how to downconvert the
* 32-bit-in-the-varyings-interpolator value. (also, even if it did, watch
* out for how gl_nir_lower_packed_varyings packs all flat-interpolated
* things together as ivec4s, so when we lower a formerly-float input
* you'd end up with an incorrect f2f16(i2i32(load_input())) instead of
* load_input).
*/
uint64_t mediump_varyings = 0;
nir_foreach_shader_in_variable(var, s) {
if ((var->data.precision == GLSL_PRECISION_MEDIUM ||
var->data.precision == GLSL_PRECISION_LOW) &&
var->data.interpolation != INTERP_MODE_FLAT) {
mediump_varyings |= BITFIELD64_BIT(var->data.location);
}
}
if (mediump_varyings) {
NIR_PASS_V(s, nir_lower_mediump_io,
nir_var_shader_in,
mediump_varyings,
false);
}
/* This should come after input lowering, to opportunistically lower non-mediump outputs. */
NIR_PASS_V(s, nir_lower_mediump_io, nir_var_shader_out, 0, false);
}