fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 09:48:16 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	c74b98486a	nir: Add a helper for fetching the SSA def from an instruction Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	0ba508d7a3	nir,intel: Add support for lowering 64-bit nir_opt_extract_* We need this when doing full software 64-bit emulation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309 Fixes: `cbad201c2b` "nir/algebraic: Add missing 64-bit extract_[iu]8..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-15 16:08:37 -05:00
Jason Ekstrand	7a19e05e8c	nir/opt_if: Clean up single-src phis in opt_if_loop_terminator Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071 Fixes: `2a74296f24` "nir: add opt_if_loop_terminator()" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-15 19:58:51 +00:00
Andres Gomez	9aadd5d688	nir/compiler: keep same bit size when lowering with flrp This was probably not caught before because no supported test was exercising the flrp lowering with other bit size different than 32. With the arrival of VK_KHR_shader_float_controls we will have some of those and, unless we keep the bit size, we will end with something like: ../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed. Fixes: `158370ed2a` ("nir/flrp: Add new lowering pass for flrp instructions") Fixes: `ae02622d8f` ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists") Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>	2019-07-12 16:15:20 +00:00
Iago Toral Quiroga	b0eec9e27d	nir: add a new v3d-specific intrinsic for tile buffer color reads This is intended to be used, for example, with OpenGL logic operations. It takes a render target as source and a sample index in the base index for MSAA color reads. v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Ian Romanick	ef7b4fdf3f	nir/algebraic: Recognize open-coded flrp(a, b, a) No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. All Gen6-Gen9 platforms had similar results. (Skylake shown) total instructions in shared programs: 15041996 -> 15041184 (<.01%) instructions in affected programs: 71776 -> 70964 (-1.13%) helped: 312 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3 helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28% 95% mean confidence interval for instructions value: -2.66 -2.55 95% mean confidence interval for instructions %-change: -1.89% -1.61% Instructions are helped. total cycles in shared programs: 354303333 -> 354301807 (<.01%) cycles in affected programs: 433742 -> 432216 (-0.35%) helped: 206 HURT: 78 helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8 helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82% HURT stats (abs) min: 1 max: 220 x̄: 35.95 x̃: 10 HURT stats (rel) min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56% 95% mean confidence interval for cycles value: -10.68 -0.06 95% mean confidence interval for cycles %-change: -0.99% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	0c2b3a7fc0	nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Convert the pattern directly to flrp. There were negligible improvements on Gen4 and Gen5, and Gen11 was actually hurt. I believe the problem is this optimization conflicts with the (1-x)*y => ffma(-x, y, y) optimization on Gen11. Skylake total instructions in shared programs: 15046487 -> 15041996 (-0.03%) instructions in affected programs: 194681 -> 190190 (-2.31%) helped: 880 HURT: 20 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17% 95% mean confidence interval for instructions value: -5.25 -4.73 95% mean confidence interval for instructions %-change: -5.11% -4.36% Instructions are helped. total cycles in shared programs: 354340839 -> 354303333 (-0.01%) cycles in affected programs: 1753622 -> 1716116 (-2.14%) helped: 786 HURT: 182 helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22 helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84% HURT stats (abs) min: 1 max: 440 x̄: 37.99 x̃: 9 HURT stats (rel) min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32% 95% mean confidence interval for cycles value: -45.90 -31.59 95% mean confidence interval for cycles %-change: -3.09% -2.50% Cycles are helped. All Gen6-Gen8 platforms had similar results. (Broadwell shown) total instructions in shared programs: 15055907 -> 15051466 (-0.03%) instructions in affected programs: 196370 -> 191929 (-2.26%) helped: 871 HURT: 26 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12% 95% mean confidence interval for instructions value: -5.21 -4.69 95% mean confidence interval for instructions %-change: -4.99% -4.24% Instructions are helped. total cycles in shared programs: 387729170 -> 387699745 (<.01%) cycles in affected programs: 1816409 -> 1786984 (-1.62%) helped: 788 HURT: 172 helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22 helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76% HURT stats (abs) min: 1 max: 404 x̄: 45.59 x̃: 14 HURT stats (rel) min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43% 95% mean confidence interval for cycles value: -35.69 -25.61 95% mean confidence interval for cycles %-change: -2.88% -2.40% Cycles are helped. total fills in shared programs: 34712 -> 34710 (<.01%) fills in affected programs: 7 -> 5 (-28.57%) helped: 1 HURT: 0 LOST: 0 GAINED: 2 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	09705747d7	nir/algebraic: Reassociate fadd into fmul in DPH-like pattern Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> `1220190` (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	ff9f526de3	nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a) v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. v3: Add a couple more patterns that just move the negation around. No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. Skylake total instructions in shared programs: 15279687 -> 15256058 (-0.15%) instructions in affected programs: 4344440 -> 4320811 (-0.54%) helped: 23455 HURT: 18 helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1 HURT stats (rel) min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34% 95% mean confidence interval for instructions value: -1.01 -1.00 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 355593755 -> 355339981 (-0.07%) cycles in affected programs: 162089552 -> 161835778 (-0.16%) helped: 20467 HURT: 7158 helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6 helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58% HURT stats (abs) min: 1 max: 4814 x̄: 47.46 x̃: 11 HURT stats (rel) min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98% 95% mean confidence interval for cycles value: -10.39 -7.98 95% mean confidence interval for cycles %-change: -0.57% -0.47% Cycles are helped. total spills in shared programs: 8843 -> 8835 (-0.09%) spills in affected programs: 190 -> 182 (-4.21%) helped: 2 HURT: 0 total fills in shared programs: 21738 -> 21738 (0.00%) fills in affected programs: 372 -> 372 (0.00%) helped: 1 HURT: 1 LOST: 12 GAINED: 22 Broadwell total instructions in shared programs: 15290523 -> 15266818 (-0.16%) instructions in affected programs: 4314738 -> 4291033 (-0.55%) helped: 23391 HURT: 11 helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 189 x̄: 18.09 x̃: 1 HURT stats (rel) min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50% 95% mean confidence interval for instructions value: -1.04 -0.99 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 388911660 -> 388830827 (-0.02%) cycles in affected programs: 172903324 -> 172822491 (-0.05%) helped: 15601 HURT: 13269 helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6 helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55% HURT stats (abs) min: 1 max: 14904 x̄: 28.21 x̃: 6 HURT stats (rel) min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60% 95% mean confidence interval for cycles value: -4.20 -1.40 95% mean confidence interval for cycles %-change: -0.17% -0.08% Cycles are helped. total spills in shared programs: 23110 -> 23069 (-0.18%) spills in affected programs: 656 -> 615 (-6.25%) helped: 3 HURT: 1 total fills in shared programs: 34399 -> 34398 (<.01%) fills in affected programs: 905 -> 904 (-0.11%) helped: 3 HURT: 1 LOST: 6 GAINED: 23 Haswell total instructions in shared programs: 13465303 -> 13441142 (-0.18%) instructions in affected programs: 3726999 -> 3702838 (-0.65%) helped: 22139 HURT: 347 helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12% 95% mean confidence interval for instructions value: -1.08 -1.07 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 376271308 -> 376273090 (<.01%) cycles in affected programs: 167496811 -> 167498593 (<.01%) helped: 13206 HURT: 13281 helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8 helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80% HURT stats (abs) min: 1 max: 3828 x̄: 35.32 x̃: 8 HURT stats (rel) min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61% 95% mean confidence interval for cycles value: -1.33 1.47 95% mean confidence interval for cycles %-change: 0.22% 0.36% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23158 -> 23134 (-0.10%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 34580 -> 34550 (-0.09%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 23 GAINED: 13 Ivy Bridge total instructions in shared programs: 12034154 -> 12014301 (-0.16%) instructions in affected programs: 3636209 -> 3616356 (-0.55%) helped: 18771 HURT: 459 helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11% 95% mean confidence interval for instructions value: -1.04 -1.02 95% mean confidence interval for instructions %-change: -0.86% -0.84% Instructions are helped. total cycles in shared programs: 180186960 -> 180175147 (<.01%) cycles in affected programs: 44652745 -> 44640932 (-0.03%) helped: 12979 HURT: 11033 helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6 helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74% HURT stats (abs) min: 1 max: 4811 x̄: 37.61 x̃: 9 HURT stats (rel) min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69% 95% mean confidence interval for cycles value: -2.29 1.31 95% mean confidence interval for cycles %-change: 0.11% 0.26% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3623 -> 3599 (-0.66%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 4061 -> 4031 (-0.74%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 17 GAINED: 18 Sandy Bridge total instructions in shared programs: 10853968 -> 10834932 (-0.18%) instructions in affected programs: 3769957 -> 3750921 (-0.50%) helped: 17944 HURT: 204 helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93% 95% mean confidence interval for instructions value: -1.05 -1.04 95% mean confidence interval for instructions %-change: -0.81% -0.78% Instructions are helped. total cycles in shared programs: 153894864 -> 153885988 (<.01%) cycles in affected programs: 50643925 -> 50635049 (-0.02%) helped: 9361 HURT: 10534 helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4 helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22% HURT stats (abs) min: 1 max: 1371 x̄: 16.42 x̃: 5 HURT stats (rel) min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27% 95% mean confidence interval for cycles value: -1.27 0.38 95% mean confidence interval for cycles %-change: -0.03% 0.04% Inconclusive result (value mean confidence interval includes 0). LOST: 6 GAINED: 24 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	1259f6d802	nir: intel/vec4: Add flag to disable some algebraic optimizations A couple patches later in this series use the flag to avoid a few thousand shader-db regresions on all vec4 platforms. I'm not particularly enamored with the name of this flag. However, I suspect the Intel vec4 backend is the only backend that will benefit from it. Specifically, the cases where this helps are all cases where we want to prevent nir_opt_algebraic from rearranging instructions to create 3-source instructions, such as ffma and flrp, with additional immediate value or uniform sources. The earlier commit "intel/vec4: Try to emit a single load for multiple 3-src instruction operands" solves most of the problems caused by additional immediate values, but the restrictions on register strides that cause problems for uniforms and shader inputs persist. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Connor Abbott	133273aa22	nir/lower_io: Don't use variable to get deref mode Drivers only use lower_io for modes where pointers don't have a meaningful value, and dereferences can always be traced back to a variable. But there can be other modes, like global mode with VK_EXT_buffer_device_address, where pointers cannot be traced back to a variable, and lower_io would segfault on loads/stores of these since nir_deref_instr_get_variable() would return NULL. Just use the mode on the deref itself to filter out these modes before we try to get the variable. Fixes: `118a66df99` ("radv: Use NIR barycentric coordinates") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 12:31:41 +02:00
Jason Ekstrand	7e0fcea727	nir/loop_analyze: Pass nir_const_values directly to helpers Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	ff972c7a3a	nir/loop_analyze: Properly handle swizzles in loop conditions This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for all intermediate values so that we can properly handle swizzles. Even though if conditions are required to be scalars, they may still consume swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your loop termination condition. The old code would just bail the moment it saw its first non-zero swizzle but we can now properly chase the scalar from the if condition to all the way to a, b, and c. Shader-db results on Kaby Lake: total loops in shared programs: 4388 -> 4364 (-0.55%) loops in affected programs: 29 -> 5 (-82.76%) helped: 29 HURT: 5 Shader-db results on Haswell: total loops in shared programs: 4370 -> 4373 (0.07%) loops in affected programs: 2 -> 5 (150.00%) helped: 2 HURT: 5 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	0333649e63	nir/loop_analyze: Refactor detection of limit vars This commit reworks both get_induction_and_limit_vars() and try_find_trip_count_vars_in_iand to return true on success and not modify their output parameters on failure. This makes their callers significantly simpler. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	8f7405ed9d	nir: Add some helpers for chasing SSA values properly There are various cases in which we want to chase SSA values through ALU ops ranging from hand-written optimizations to back-end translation code. In all these cases, it can be very tricky to do properly because of swizzles. This set of helpers lets you easily work with a single component of an SSA def and chase through ALU ops safely. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	9a3cb6f5fe	nir/loop_analyze: Bail if we encounter swizzles None of the current code knows what to do with swizzles. Take the safe option for now and bail if we see one. This does have a small shader-db impact but it is at least safe. Shader-db results on Kaby Lake: total loops in shared programs: 4364 -> 4388 (0.55%) loops in affected programs: 5 -> 29 (480.00%) helped: 5 HURT: 29 Shader-db results on Haswell: total loops in shared programs: 4373 -> 4370 (-0.07%) loops in affected programs: 5 -> 2 (-60.00%) helped: 5 HURT: 2 Fixes: `6772a17acc` "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	6455fa9710	nir/loop_analyze: Use new eval_const_* helpers in test_iterations Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	268ad47c11	nir/loop_analyze: Handle bit sizes correctly in calculate_iterations The current code assumes everything is 32-bit which is very likely true but not guaranteed by any means. Instead, use nir_eval_const_opcode to do the calculations in a bit-size-agnostic way. We also use the new constant constructors to build the correct size constants. Fixes: `6772a17acc` "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	9f7ffe41dd	nir/loop_analyze: Fix phi-of-identical-alu detection One issue was that the original version didn't check that swizzles matched when comparing ALU instructions so it could end up matching very different instructions. Using the nir_instrs_equal function from nir_instr_set.c which we use for CSE should be much more reliable. Another was that the loop assumes it will only run two iterations which may not be true. If there's something which guarantees that this case only happens for phis after ifs, it wasn't documented. Fixes: `9e6b39e1d5` "nir: detect more induction variables" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	6e984bcb92	nir/instr_set: Expose nir_instrs_equal() Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	64328f947e	nir/builder: Use nir_const_value_for_* for constructing immediates Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	3acddc733f	nir: Refactor nir_src_as_* constant functions Now that we have the nir_const_value_as_* helpers, every one of these functions is effectively the same except for the suffix they use so we can easily define them with a repeated macro. This also means that they're inline and the fact that the nir_src is being passed by-value should no longer really hurt anything. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	ce5581e23e	nir: Add more helpers for working with const values Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Alyssa Rosenzweig	15000c79da	nir: Add Panfrost-specific blending intrinsic This gives more flexibility than the normal store_deref/store_output versions (particularly, it allows us to abuse the type system in awful ways, which is necessary for efficient format conversion in blend shaders.) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-07-09 14:07:23 -07:00
Alyssa Rosenzweig	4a4b48fb05	nir: Add nir_imm_vec4_16 We already have nir_imm_float16 and nir_imm_vec4; let's add the ability to easily make immediate fp16 vectors as well, now that fp16 support is maturing in NIR/GLSL. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-09 18:43:07 +00:00
Connor Abbott	86968327df	nir/lower_io_to_temporaries: Fix hash table leak Fixes: `c45f5db527` ("nir/lower_io_to_temporaries: Handle interpolation intrinsics") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-09 10:39:37 +02:00
Ian Romanick	5450fd7a36	nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations Existing users only operate on instructions with SSA destinations. Some later patches add new direct calls and indirect calls (via existing NIR functions) on instructions after going out of SSA. At the very least, these calls are added by: intel/vec4: Try to emit a VF source in try_immediate_source intel/vec4: Try to emit a single load for multiple 3-src instruction operands The first commit adds direct calls, and the second adds calls via nir_alu_srcs_equal and nir_alu_srcs_negative_equal. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	12217de08c	nir: Handle swizzle in nir_alu_srcs_negative_equal When I added this function, I was not sure if swizzles of immediate values were a thing that occurred in NIR. The only existing user of these functions is the partial redundancy elimination for compares. Since comparison instructions are inherently scalar, this does not occur. However, a couple later patches, "nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)" combined with "intel/vec4: Try to emit a single load for multiple 3-src instruction operands", collaborate to create a few thousand instances. No shader-db changes on any Intel platform. v2: Handle the swizzle in nir_alu_srcs_negative_equal and leave nir_const_value_negative_equal unchanged. Suggested by Jason. v3: Correctly handle write masks. Add note (and assertion) that the caller is responsible for various compatibility checks. The single existing caller only calls this for combinations of scalar fadd and float comparison instructions, so all of the requirements are met. A later patch (intel/vec4: Try to emit a single load for multiple 3-src instruction operands) will call this for sources of the same instruction, so all of the requirements are met. v4: Add unit test for nir_opt_comparison_pre that is fixed by this commit. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	ad50e812a3	nir: nir_const_value_negative_equal compares one value at a time Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Ian Romanick	bcd22b740c	nir: Port some const_value_negative_equal tests to alu_src_negative_equal The next commit will make the existing tests irrelevant. Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 11:30:10 -07:00
Ian Romanick	ec96c289ea	nir: Pass fully qualified type to nir_const_value_negative_equal Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Ian Romanick	0ac5ff9ecb	nir: Use nir_src_bit_size instead of alu1->dest.dest.ssa.bit_size This is important because, for example nir_op_fne has dest.dest.ssa.bit_size == 1, but the source operands can be 16-, 32-, or 64-bits. Fixing this helps partial redundancy elimination for compares in a few more shaders. v2: Add unit tests for nir_opt_comparison_pre that are fixed by this commit. All Intel platforms had similar results. total instructions in shared programs: 17179408 -> 17179081 (<.01%) instructions in affected programs: 43958 -> 43631 (-0.74%) helped: 118 HURT: 2 helped stats (abs) min: 1 max: 5 x̄: 2.87 x̃: 2 helped stats (rel) min: 0.06% max: 4.12% x̄: 1.19% x̃: 0.81% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 5.83% max: 6.06% x̄: 5.94% x̃: 5.94% 95% mean confidence interval for instructions value: -3.08 -2.37 95% mean confidence interval for instructions %-change: -1.30% -0.85% Instructions are helped. total cycles in shared programs: 360959066 -> 360942386 (<.01%) cycles in affected programs: 774274 -> 757594 (-2.15%) helped: 111 HURT: 4 helped stats (abs) min: 1 max: 1591 x̄: 169.49 x̃: 36 helped stats (rel) min: <.01% max: 24.43% x̄: 8.86% x̃: 2.24% HURT stats (abs) min: 1 max: 2068 x̄: 533.25 x̃: 32 HURT stats (rel) min: 0.02% max: 5.10% x̄: 3.06% x̃: 3.56% 95% mean confidence interval for cycles value: -200.61 -89.47 95% mean confidence interval for cycles %-change: -10.32% -6.58% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1] Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `be1cc3552b` ("nir: Add nir_const_value_negative_equal")	2019-07-08 11:30:10 -07:00
Ian Romanick	b08d704051	nir: Add unit tests for nir_opt_comparison_pre Each tests has a comment with the expected before and after NIR. The tests don't actually check this. The tests only check whether or not the optimization pass reported progress. I couldn't think of a robust, future-proof way to check the before and after code. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Caio Marcelo de Oliveira Filho	2614319259	nir: print ptr_stride for deref_casts Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-08 10:05:56 -07:00
Caio Marcelo de Oliveira Filho	a42e8f0ed1	nir: Add demote and is_helper_invocation intrinsics From SPV_EXT_demote_to_helper_invocation. Demote will be implemented as a variant of discard, so mark uses_discard if it is used. v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Connor Abbott	e5536aa584	compiler: Add color system value This is nice to have with radeonsi, where color varyings are handled specially to avoid recompiles. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:34 +02:00
Connor Abbott	6b28808b22	intel/nir: Extract add_const_offset_to_base Pretty much every driver using nir_lower_io_to_temporaries followed by nir_lower_io is going to want this. In particular, radv and radeonsi in the next commits. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	c45f5db527	nir/lower_io_to_temporaries: Handle interpolation intrinsics These weren't properly supported. This does pretty much the same thing that the radv code did. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	3a2ea2af9d	nir: Avoid coalescing vars created by lower_io_to_temporaries Right now nir_copy_prop_vars is effectively undoing nir_lower_io_to_temporaries for inputs by propagating the original variable through the copy created in lower_io_to_temporaries. A theoretical variable coalescing pass would have the same issue with output variables, although that doesn't exist yet. To fix this, add a new bit to nir_variable, and disable copy propagation when it's set. This doesn't seem to affect any drivers now, probably since since no one uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but it will fix radv once we flip on lower_io_to_temporaries for fs inputs. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	f3e2c65041	nir: Return correct size in nir_assign_io_var_locations() It was double-counting cases where multiple variables were assigned to the same slot, and not handling the case where the last variable is a compact variable. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	dd81d8808d	nir: Handle compact variables when assigning i/o locations These are used in Vulkan for clip/cull distances, instead of the GLSL lowering when the clip/cull arrays are shared. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	fd5ed6b9d6	nir: Move st_nir_assign_var_locations() to common code It isn't really doing anything Gallium-specific, and it's needed for handling component packing, overlapping, etc. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:15:06 +02:00
Connor Abbott	27f0c3c15e	radv: Make FragCoord a sysval load_fragcoord is already handled in common code for radeonsi, so we don't need to do anything to handle it. However, there were some passes creating NIR with the varying, so we switch them over to the sysval. In the case of nir_lower_input_attachments which is used by both radv and anv, we add handling for both until intel switches to using a sysval. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	c31f470066	anv,nir: Move lower_input_attachments pass from ANV to NIR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:02:50 +02:00
Rob Clark	5787a2dfe3	nir: add pass to lower load_interpolated_input Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Sagar Ghuge	80117117bd	nir: Add optimization to use ROR/ROL instructions v2: 1) Add more optimization rules for ROL/ROR (Matt Turner) 2) Add lowering rules for ROL/ROR (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	81d342e2a1	nir: Add urol and uror opcodes Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Alejandro Piñeiro	12355c7e91	nir: add is_in_ubo/ssbo/block helpers Equivalent to the already existing ir_variable is_in_buffer_block and is_in_shader_storage_block, adding the uniform buffer object one. I'm using the short forms (ssbo, ubo) to avoid having method names too long. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Ian Romanick	02c6cd8481	nir/serach: Increase maximum commutative expressions from 4 to 8 No shader-db change on any Intel platform. No shader-db run-time difference on a certain 36-core / 72-thread system at 95% confidence (n=20). Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-28 18:56:19 -07:00
Ian Romanick	1a43cf9a40	nir/algebraic: Don't mark expression with duplicate sources as commutative There is no reason to mark the fmul in the expression ('fmul', ('fadd', a, b), ('fadd', a, b)) as commutative. If a source of an instruction doesn't match one of the ('fadd', a, b) patterns, it won't match the other either. This change is enough to make this pattern work: ('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)), ('fadd', 1.0, ('fneg', a))), ('fmul', ('flrp', a, 1.0, a), b)) This pattern has 5 commutative expressions (versus a limit of 4), but the first fmul does not need to be commutative. No shader-db change on any Intel platform. No shader-db run-time difference on a certain 36-core / 72-thread system at 95% confidence (n=20). There are more subpatterns that could be marked as non-commutative, but detecting these is more challenging. For example, this fadd: ('fadd', ('fmul', a, b), ('fmul', a, c)) The first fadd: ('fmul', ('fadd', a, b), ('fadd', a, b)) And this fadd: ('flt', ('fadd', a, b), 0.0) This last case may be easier to detect. If all sources are variables and they are the only instances of those variables, then the pattern can be marked as non-commutative. It's probably not worth the effort now, but if we end up with some patterns that bump up on the limit again, it may be worth revisiting. v2: Update the comment about the explicit "len(self.sources)" check to be more clear about why it is necessary. Requested by Connor. Many Python fixes style / idom fixes suggested by Dylan. Add missing (!!!) opcode check in Expression::__eq__ method. This bug is the reason the expected number of commutative expressions in the bitfield_reverse pattern changed from 61 to 45 in the first version of this patch. v3: Use all() in Expression::__eq__ method. Suggested by Connor. Revert away from using __eq__ overloads. The "equality" implementation of Constant and Variable needed for commutativity pruning is weaker than the one needed for propagating and validating bit sizes. Using actual equality caused the pruning to fail for my ('fmul', ('fadd', 1, a), ('fadd', 1, a)) case. I changed the name to "equivalent" rather than the previous "same_as" to further differentiate it from __eq__. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 18:56:19 -07:00

1 2 3 4 5 ...

1702 commits