nir/range_analysis: Teach range analysis about fdot opcodes

This really, really helps on platforms where fabs() isn't free.  A great
many shaders use a * frsq(fabs(fdot(a, a))) to normalize a vector.
Since the result of the fdot must be non-negative, the fabs can be
eliminated by an existing algebraic rule.

shader-db results:

r300 (run on R420 - X800XL)
total instructions in shared programs: 1369807 -> 1368550 (-0.09%)
instructions in affected programs: 59986 -> 58729 (-2.10%)
helped: 609
HURT: 0

total vinst in shared programs: 512899 -> 512861 (<.01%)
vinst in affected programs: 1522 -> 1484 (-2.50%)
helped: 36
HURT: 0

total sinst in shared programs: 260690 -> 260570 (-0.05%)
sinst in affected programs: 1419 -> 1299 (-8.46%)
helped: 120
HURT: 0

total consts in shared programs: 957295 -> 957230 (<.01%)
consts in affected programs: 849 -> 784 (-7.66%)
helped: 65
HURT: 0

LOST:   0
GAINED: 3

The 3 gained shaders are all vertex shaders from XCom: Enemy Unknown.
I'm guessing that game is never going to run on my X800XL. :)

i915
total instructions in shared programs: 791121 -> 780843 (-1.30%)
instructions in affected programs: 220170 -> 209892 (-4.67%)
helped: 2085
HURT: 0

total temps in shared programs: 47765 -> 47766 (<.01%)
temps in affected programs: 9 -> 10 (11.11%)
helped: 0
HURT: 1

total const in shared programs: 93048 -> 92983 (-0.07%)
const in affected programs: 784 -> 719 (-8.29%)
helped: 65
HURT: 0

LOST:   0
GAINED: 36

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16702250 -> 16697908 (-0.03%)
instructions in affected programs: 119277 -> 114935 (-3.64%)
helped: 1065
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 4.08 x̃: 4
helped stats (rel) min: 0.48% max: 10.17% x̄: 3.66% x̃: 3.94%
95% mean confidence interval for instructions value: -4.26 -3.89
95% mean confidence interval for instructions %-change: -3.76% -3.56%
Instructions are helped.

total cycles in shared programs: 880772068 -> 880734134 (<.01%)
cycles in affected programs: 2134456 -> 2096522 (-1.78%)
helped: 941
HURT: 324
helped stats (abs) min: 2 max: 2180 x̄: 123.06 x̃: 44
helped stats (rel) min: 0.04% max: 49.96% x̄: 7.08% x̃: 3.81%
HURT stats (abs)   min: 2 max: 2098 x̄: 240.33 x̃: 35
HURT stats (rel)   min: 0.04% max: 77.07% x̄: 12.34% x̃: 3.00%
95% mean confidence interval for cycles value: -47.93 -12.04
95% mean confidence interval for cycles %-change: -2.87% -1.34%
Cycles are helped.

No shader-db changes on any other Intel platform.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17181>
This commit is contained in:
Ian Romanick 2022-06-21 16:47:31 -07:00 committed by Marge Bot
parent f50fe9b0b6
commit 6689fa2ab4

View file

@ -1046,6 +1046,37 @@ analyze_expression(const nir_alu_instr *instr, unsigned src,
r = (struct ssa_result_range){le_zero, false, true, false};
break;
case nir_op_fdot2:
case nir_op_fdot3:
case nir_op_fdot4:
case nir_op_fdot8:
case nir_op_fdot16:
case nir_op_fdot2_replicated:
case nir_op_fdot3_replicated:
case nir_op_fdot4_replicated:
case nir_op_fdot8_replicated:
case nir_op_fdot16_replicated: {
const struct ssa_result_range left =
analyze_expression(alu, 0, ht, nir_alu_src_type(alu, 0));
/* If the two sources are the same SSA value, then the result is either
* NaN or some number >= 0. If one source is the negation of the other,
* the result is either NaN or some number <= 0.
*
* In either of these two cases, if one source is a number, then the
* other must also be a number. Since it should not be possible to get
* Inf-Inf in the dot-product, the result must also be a number.
*/
if (nir_alu_srcs_equal(alu, alu, 0, 1)) {
r = (struct ssa_result_range){ge_zero, false, left.is_a_number, false };
} else if (nir_alu_srcs_negative_equal(alu, alu, 0, 1)) {
r = (struct ssa_result_range){le_zero, false, left.is_a_number, false };
} else {
r = (struct ssa_result_range){unknown, false, false, false};
}
break;
}
case nir_op_fpow: {
/* Due to flush-to-zero semanatics of floating-point numbers with very
* small mangnitudes, we can never really be sure a result will be