fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 22:20:14 +01:00

Author	SHA1	Message	Date
Ian Romanick	acd7796a07	intel/vec4: Try to emit a VF source in try_immediate_source This commit is also a pre-requisite for the next commit. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on `eeebeb211f` ("intel/vec4: Try emitting non-scalar immediates"). This change is a lot less helpful since that commit landed (previously helped 1934 shaders on HSW) because, apparently, a lot of the cases helped by that commit were things like vector loads of { 1.0, 1.0, 1.0 } that were also helped by this commit. Haswell total instructions in shared programs: 13480095 -> 13478598 (-0.01%) instructions in affected programs: 229534 -> 228037 (-0.65%) helped: 1006 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09% 95% mean confidence interval for instructions value: -1.54 -1.43 95% mean confidence interval for instructions %-change: -1.15% -1.07% Instructions are helped. total cycles in shared programs: 376385734 -> 376386916 (<.01%) cycles in affected programs: 14101380 -> 14102562 (<.01%) helped: 941 HURT: 56 helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2 helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 115.50 x̃: 32 HURT stats (rel) min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44% 95% mean confidence interval for cycles value: -2.06 4.43 95% mean confidence interval for cycles %-change: -0.47% -0.39% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 12048004 -> 12046589 (-0.01%) instructions in affected programs: 217072 -> 215657 (-0.65%) helped: 934 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11% 95% mean confidence interval for instructions value: -1.57 -1.46 95% mean confidence interval for instructions %-change: -1.18% -1.10% Instructions are helped. total cycles in shared programs: 180285854 -> 180287608 (<.01%) cycles in affected programs: 14103824 -> 14105578 (0.01%) helped: 871 HURT: 53 helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2 helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 123.66 x̃: 32 HURT stats (rel) min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46% 95% mean confidence interval for cycles value: -1.60 5.39 95% mean confidence interval for cycles %-change: -0.46% -0.37% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10861227 -> 10860328 (<.01%) instructions in affected programs: 92969 -> 92070 (-0.97%) helped: 624 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95% 95% mean confidence interval for instructions value: -1.52 -1.36 95% mean confidence interval for instructions %-change: -1.09% -1.01% Instructions are helped. total cycles in shared programs: 153944316 -> 153942720 (<.01%) cycles in affected programs: 1640956 -> 1639360 (-0.10%) helped: 601 HURT: 15 helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2 helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08% HURT stats (abs) min: 2 max: 72 x̄: 36.13 x̃: 36 HURT stats (rel) min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00% 95% mean confidence interval for cycles value: -3.44 -1.74 95% mean confidence interval for cycles %-change: -0.18% -0.09% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139924 -> 8139378 (<.01%) instructions in affected programs: 69776 -> 69230 (-0.78%) helped: 322 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1 helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54% 95% mean confidence interval for instructions value: -1.88 -1.51 95% mean confidence interval for instructions %-change: -0.85% -0.72% Instructions are helped. total cycles in shared programs: 188542864 -> 188541756 (<.01%) cycles in affected programs: 3031532 -> 3030424 (-0.04%) helped: 320 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2 helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -3.85 -3.07 95% mean confidence interval for cycles %-change: -0.06% -0.05% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	365b45d571	intel/vec4: Try to emit a single load for multiple 3-src instruction operands If a 3-source instruction uses immediate values 1.0 and -1.0, just load 1.0 into a register. Use the negation source modifier to get -1.0. This has trivial impact now, but it prevents a few thousand regressions on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)" All Gen6 and Gen7 platforms had similar results. (Haswell shown) total instructions in shared programs: 13487412 -> 13487406 (<.01%) instructions in affected programs: 541 -> 535 (-1.11%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.33% -0.97% Instructions are helped. total cycles in shared programs: 376402564 -> 376402500 (<.01%) cycles in affected programs: 10348 -> 10284 (-0.62%) helped: 10 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2 helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29% 95% mean confidence interval for cycles value: -11.72 0.08 95% mean confidence interval for cycles %-change: -1.20% -0.36% Inconclusive result (value mean confidence interval includes 0). No shader-db changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	6f6bc842f6	intel/vec4: Refactor operand fixing for ffma and flrp Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	dd2dc7e707	intel/vec4: Delete vec4_visitor::emit_lrp Effectivley unused since `dd7135d55d` ("intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5"). I had intended to remove this code as part of that series, but I forgot. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	b04beaf41d	intel/vec4: Try both sources as candidates for being immediates For some reason, when I first wrote try_immediate_source, I thought the sources had already been ordered so that the immediate value was the second source. That's rubbish. The generator assumes neither source is immediate, and it relies on later copy/constant propagation passes to do the reordering. For this reason, the changes to try_immediate_source have to go to some efforts to reorder the operands and tell the caller when it reordered them. The generator for comparison instructions uses this to determine when the comparison needs to change (e.g., from GT to LT). No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13484431 -> 13480500 (-0.03%) instructions in affected programs: 441138 -> 437207 (-0.89%) helped: 1883 HURT: 0 helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1 helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90% 95% mean confidence interval for instructions value: -2.19 -1.98 95% mean confidence interval for instructions %-change: -1.14% -1.06% Instructions are helped. total cycles in shared programs: 376420286 -> 376406400 (<.01%) cycles in affected programs: 15995668 -> 15981782 (-0.09%) helped: 1692 HURT: 219 helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4 helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35% HURT stats (abs) min: 2 max: 516 x̄: 43.09 x̃: 22 HURT stats (rel) min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13% 95% mean confidence interval for cycles value: -9.70 -4.83 95% mean confidence interval for cycles %-change: -0.42% -0.28% Cycles are helped. total spills in shared programs: 23166 -> 23158 (-0.03%) spills in affected programs: 66 -> 58 (-12.12%) helped: 2 HURT: 0 total fills in shared programs: 34592 -> 34580 (-0.03%) fills in affected programs: 75 -> 63 (-16.00%) helped: 2 HURT: 0 Ivy Bridge total instructions in shared programs: 12051590 -> 12048513 (-0.03%) instructions in affected programs: 355911 -> 352834 (-0.86%) helped: 1481 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1 helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90% 95% mean confidence interval for instructions value: -2.17 -1.98 95% mean confidence interval for instructions %-change: -1.12% -1.04% Instructions are helped. total cycles in shared programs: 180319624 -> 180307642 (<.01%) cycles in affected programs: 15591028 -> 15579046 (-0.08%) helped: 1340 HURT: 174 helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2 helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32% HURT stats (abs) min: 2 max: 518 x̄: 40.41 x̃: 14 HURT stats (rel) min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67% 95% mean confidence interval for cycles value: -10.85 -4.97 95% mean confidence interval for cycles %-change: -0.45% -0.31% Cycles are helped. All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown) total instructions in shared programs: 10863159 -> 10861462 (-0.02%) instructions in affected programs: 157839 -> 156142 (-1.08%) helped: 715 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2 helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85% 95% mean confidence interval for instructions value: -2.53 -2.21 95% mean confidence interval for instructions %-change: -1.13% -1.02% Instructions are helped. total cycles in shared programs: 153957782 -> 153948778 (<.01%) cycles in affected programs: 3171648 -> 3162644 (-0.28%) helped: 696 HURT: 62 helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4 helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12% HURT stats (abs) min: 2 max: 300 x̄: 31.29 x̃: 2 HURT stats (rel) min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34% 95% mean confidence interval for cycles value: -15.65 -8.11 95% mean confidence interval for cycles %-change: -0.56% -0.36% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 18:13:18 -07:00
Ian Romanick	379cf3bb87	intel/vec4: Try immediate sources for dot products too No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. All Haswell and earlier platforms has similar results. (Haswell shown) total instructions in shared programs: 13484467 -> 13484431 (<.01%) instructions in affected programs: 8540 -> 8504 (-0.42%) helped: 33 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35% 95% mean confidence interval for instructions value: -1.19 -0.99 95% mean confidence interval for instructions %-change: -0.60% -0.38% Instructions are helped. total cycles in shared programs: 376420572 -> 376420286 (<.01%) cycles in affected programs: 56260 -> 55974 (-0.51%) helped: 26 HURT: 5 helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2 helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13% HURT stats (abs) min: 2 max: 6 x̄: 4.40 x̃: 6 HURT stats (rel) min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35% 95% mean confidence interval for cycles value: -22.91 4.45 95% mean confidence interval for cycles %-change: -0.56% -0.02% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 17:16:16 -07:00
Ian Romanick	eeebeb211f	intel/vec4: Try emitting non-scalar immediates Sometimes an instruction has a vector as a source, but all of the components have the same value. For example, vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0) ... vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13487811 -> 13484467 (-0.02%) instructions in affected programs: 421981 -> 418637 (-0.79%) helped: 1859 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84% 95% mean confidence interval for instructions value: -1.85 -1.74 95% mean confidence interval for instructions %-change: -1.07% -1.00% Instructions are helped. total cycles in shared programs: 376423252 -> 376420572 (<.01%) cycles in affected programs: 14800970 -> 14798290 (-0.02%) helped: 1519 HURT: 329 helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4 helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36% HURT stats (abs) min: 2 max: 598 x̄: 40.74 x̃: 16 HURT stats (rel) min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98% 95% mean confidence interval for cycles value: -3.53 0.63 95% mean confidence interval for cycles %-change: -0.30% -0.09% Inconclusive result (value mean confidence interval includes 0). total fills in shared programs: 34601 -> 34592 (-0.03%) fills in affected programs: 91 -> 82 (-9.89%) helped: 9 HURT: 0 Ivy Bridge total instructions in shared programs: 12053565 -> 12051626 (-0.02%) instructions in affected programs: 298103 -> 296164 (-0.65%) helped: 1228 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1 helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81% 95% mean confidence interval for instructions value: -1.63 -1.53 95% mean confidence interval for instructions %-change: -0.95% -0.88% Instructions are helped. total cycles in shared programs: 180322270 -> 180319922 (<.01%) cycles in affected programs: 14123840 -> 14121492 (-0.02%) helped: 1036 HURT: 195 helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2 helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35% HURT stats (abs) min: 2 max: 598 x̄: 51.33 x̃: 16 HURT stats (rel) min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72% 95% mean confidence interval for cycles value: -4.92 1.10 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10864286 -> 10863189 (-0.01%) instructions in affected programs: 159722 -> 158625 (-0.69%) helped: 724 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1 helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62% 95% mean confidence interval for instructions value: -1.58 -1.46 95% mean confidence interval for instructions %-change: -0.82% -0.75% Instructions are helped. total cycles in shared programs: 153967938 -> 153957926 (<.01%) cycles in affected programs: 1923186 -> 1913174 (-0.52%) helped: 654 HURT: 56 helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4 helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18% HURT stats (abs) min: 2 max: 390 x̄: 54.75 x̃: 32 HURT stats (rel) min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92% 95% mean confidence interval for cycles value: -17.42 -10.78 95% mean confidence interval for cycles %-change: -0.76% -0.40% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8142677 -> 8141721 (-0.01%) instructions in affected programs: 139511 -> 138555 (-0.69%) helped: 588 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46% 95% mean confidence interval for instructions value: -1.70 -1.55 95% mean confidence interval for instructions %-change: -0.89% -0.78% Instructions are helped. total cycles in shared programs: 188549394 -> 188547676 (<.01%) cycles in affected programs: 3171960 -> 3170242 (-0.05%) helped: 527 HURT: 0 helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2 helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06% 95% mean confidence interval for cycles value: -3.49 -3.03 95% mean confidence interval for cycles %-change: -0.09% -0.07% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 17:16:06 -07:00
Jason Ekstrand	859de4a748	intel/fs,vec4: Use g0 as the header for MFENCE We set header_present but then pass it some random garbage. Give it g0 instead. I'm not actually sure this does anything but g0 is the usual header data and this is what the windows driver does so it seems like a good idea. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	8ffbb54405	intel: Implement abs, neg, and sat in the back-end Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Jason Ekstrand	6b1c398bcb	intel/nir: Take a nir_tex_instr and src index in brw_texture_offset This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.	2019-04-14 22:25:56 +02:00
Jason Ekstrand	61e009d2c4	spirv: Use the same types for resource indices as pointers We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Ian Romanick	d2056ab993	intel/vec4: Emit constants for some ALU sources as immediate values In some cases of flow control, the constant propagation is not able to determine that the source of an instruction must be a constant value. When we still have NIR SSA values, we can easily determine this. Emit the immediate value during code generation to possible avoid spurious loads of constants into registers. I wrote this patch to prevent a couple trivial regressions in vec4 shaders caused by "nir/algebraic: Replace i2b used by bcsel or if-statement with comparison". The final result was quite a bit better than that... No shader-db changes on any Gen8+ platform. v2: Assert that we never get a negation source modifier on Gen8+. Suggested by Ken. This should never happen because we don't normally use vec4 for Gen8+ (requires and environment variable to force it), and there's no code to generate these negations. Still, erring on the side of caution is better. Haswell total instructions in shared programs: 13776218 -> 13764783 (-0.08%) instructions in affected programs: 663931 -> 652496 (-1.72%) helped: 3495 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24% 95% mean confidence interval for instructions value: -3.39 -3.15 95% mean confidence interval for instructions %-change: -1.84% -1.75% Instructions are helped. total cycles in shared programs: 386818984 -> 386511910 (-0.08%) cycles in affected programs: 20379636 -> 20072562 (-1.51%) helped: 3052 HURT: 476 helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6 helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69% HURT stats (abs) min: 2 max: 416 x̄: 62.76 x̃: 24 HURT stats (rel) min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18% 95% mean confidence interval for cycles value: -115.57 -58.51 95% mean confidence interval for cycles %-change: -0.93% -0.73% Cycles are helped. total spills in shared programs: 100482 -> 100480 (<.01%) spills in affected programs: 79 -> 77 (-2.53%) helped: 3 HURT: 1 total fills in shared programs: 96883 -> 96877 (<.01%) fills in affected programs: 85 -> 79 (-7.06%) helped: 4 HURT: 0 Ivy Bridge total instructions in shared programs: 12000562 -> 11990113 (-0.09%) instructions in affected programs: 572581 -> 562132 (-1.82%) helped: 3106 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49% 95% mean confidence interval for instructions value: -3.49 -3.23 95% mean confidence interval for instructions %-change: -1.91% -1.81% Instructions are helped. total cycles in shared programs: 180958504 -> 180664500 (-0.16%) cycles in affected programs: 19991810 -> 19697806 (-1.47%) helped: 2654 HURT: 486 helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6 helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68% HURT stats (abs) min: 2 max: 396 x̄: 59.18 x̃: 24 HURT stats (rel) min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16% 95% mean confidence interval for cycles value: -125.62 -61.64 95% mean confidence interval for cycles %-change: -0.76% -0.56% Cycles are helped. Sandy Bridge total instructions in shared programs: 10842336 -> 10835438 (-0.06%) instructions in affected programs: 395340 -> 388442 (-1.74%) helped: 1926 HURT: 0 helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2 helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42% 95% mean confidence interval for instructions value: -3.73 -3.43 95% mean confidence interval for instructions %-change: -1.84% -1.72% Instructions are helped. total cycles in shared programs: 154590074 -> 154569050 (-0.01%) cycles in affected programs: 8159932 -> 8138908 (-0.26%) helped: 1670 HURT: 228 helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6 helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28% HURT stats (abs) min: 2 max: 1798 x̄: 40.58 x̃: 14 HURT stats (rel) min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31% 95% mean confidence interval for cycles value: -13.51 -8.64 95% mean confidence interval for cycles %-change: -0.60% -0.46% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8212357 -> 8206587 (-0.07%) instructions in affected programs: 323664 -> 317894 (-1.78%) helped: 1457 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3 helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44% 95% mean confidence interval for instructions value: -4.14 -3.78 95% mean confidence interval for instructions %-change: -1.93% -1.78% Instructions are helped. total cycles in shared programs: 187668016 -> 187657422 (<.01%) cycles in affected programs: 14856234 -> 14845640 (-0.07%) helped: 1372 HURT: 83 helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6 helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08% HURT stats (abs) min: 2 max: 14 x̄: 3.20 x̃: 2 HURT stats (rel) min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for cycles value: -7.65 -6.91 95% mean confidence interval for cycles %-change: -0.11% -0.10% Cycles are helped. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:41:46 -08:00
Eric Anholt	8f3694e1ab	intel: Use the NIR lowering for isign. Drops one instruction from fs-sign-int.shader_test. No change in shader-db due to it having 0 instances of sign(genIType). This may hurt isign64 if algebraic runs before int64 lowering, but I wasn't sure how to mark the algebraic opt as "every bit size but 64". v2: Update commit message about shader-db. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)	2019-02-14 00:32:30 +00:00
Kenneth Graunke	04c2f12ab2	i965: Drop mark_surface_used mechanism. The original idea was that the backend compiler could eliminate surfaces, so we would have it mark which ones are actually used, then shrink the binding table accordingly. Unfortunately, it's a pretty blunt mechanism - it can only prune things from the end, not the middle - since we decide the layout before we even start the backend compiler, and only limit the size. It also basically gives up if it sees indirect array access. Besides, we do the vast majority of our surface elimination in NIR anyway, not the backend - and I don't see that trend changing any time soon. Vulkan abandoned this plan a long time ago, and I don't use it in Iris, but it's still been kicking around in i965. I hacked shader-db to print the binding table size in bytes, and observed no changes with this patch. So, this code appears to do nothing useful. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-13 09:35:32 -08:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	455ec7327d	i965/vec4: Implement nir_op_uadd_sat Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Jason Ekstrand	dca35c598d	intel/fs,vec4: Fix a compiler warning ../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare] assert(nir_intrinsic_write_mask(instr) == ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ (1 << instr->num_components) - 1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This was caused by `6339aba775` which added these completely valid checks. However clang likes to complain about signedness mismatches. Fixes: `6339aba775` "intel/compiler: Lower SSBO and shared..." Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-19 09:57:41 -06:00
Jason Ekstrand	6339aba775	intel/compiler: Lower SSBO and shared loads/stores in NIR We have a bunch of code to do this in the back-end compiler but it's fairly specific to typed surface messages and the way we emit them. This breaks it out into NIR were it's easier to do things a bit more generally. It also means we can easily share the code between the vec4 and FS back-ends if we wish. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:49 -06:00
Jason Ekstrand	1413512b4c	intel/vec4: Use the new nir_src_is_const and friends As of this commit, all uses of const sources either go through a nir_src_as_<type> helper which handles bit sizes correctly or else are accompanied by a nir_src_bit_size() == 32 assertion to assert that we have the size we think we have. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	6b2918709a	intel/fs,vec4: Clean up a repeated pattern with SSBOs Everywhere we handle SSBO intrinsics, we have exactly the same pattern for computing the index so we may as well make a helper for it. We also add a get_nir_src_imm to vec4 and use it for SSBO offsets. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:06 -06:00
Jason Ekstrand	d3a0d8b750	intel/compiler: Stop assuming the entrypoint is called "main" This isn't true for Vulkan so we have to whack it to "main" in anv which is silly. Instead of walking the list of functions and asserting that everything is named "main" and hoping there's only one function named "main", just use the nir_shader_get_entrypoint() helper which has better assertions anyway. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 20:14:52 -05:00
Jason Ekstrand	0e0dc596a2	intel/vec4: Fix nir_op_b2[fi] with 64-bit result This is valid NIR but you can't actually hit this case today. GLSL IR doesn't have a bool to double opcode; it does f2d(b2f(x)). In SPIR-V we don't have any to/from bool conversion opcodes at all. However, the next commit will make us start generating it so we should be ready. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Ian Romanick	b44c9292b7	intel/compiler: Don't handle fsign.sat No shader-db or CI changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	c836326a29	i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:50 -07:00
Ian Romanick	9626ea497d	i965/vec4: Properly handle sign(-abs(x)) This is achived by copying the sign(abs(x)) optimization from the FS backend. On Gen7 an earlier platforms, this fixes new piglit tests: - glsl-1.10/execution/vs-sign-neg-abs.shader_test - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-07-06 16:20:07 -07:00
Ian Romanick	965a06dbd7	i965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachable The bug fixed by the previous commit went undetected because extra stderr messages are not flagged by the CI. Copy the solution from fs_visitor::nir_emit_instr and mark the default case unreachable. An alternate solution is to delete the default case so that the compiler will issue a warning. That may require more work since there are other (impossible) cases that exist. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-05 21:13:32 -07:00
Kenneth Graunke	a1afef8de0	i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes. These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-30 20:30:34 -08:00
Iago Toral Quiroga	8620f7ebbc	i965/vec4: use a temp register to compute offsets for pull loads 64-bit pull loads are implemented by emitting 2 separate 32-bit pull load messages, where the second message loads from an offset at +16B. That addition of 16B to the original offset should not alter the original offset register used as source for the pull load instruction though, since the compiler might use that same offset register in other instructions (for example, for other pull loads in the shader code that take that same offset as reference). If the pull load is 32-bit then we only need to emit one message and we don't need to do offset calculations, but in that case the optimizer should be able to drop the redundant MOV. Fixes the following test on Haswell: KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007	2017-11-30 07:57:53 +01:00
Kenneth Graunke	ff964916dc	i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code. We use the same hardware mechanism for both atomic counters and SSBO atomics, so there's really no benefit to maintaining separate code to handle each case. Instead, we can just use Rob's shiny new NIR pass to convert atomic_uints to SSBOs, and delete piles of code. The ssbo_start section of the binding table becomes a combined ABO and SSBO section, with ABOs first, then SSBOs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-15 09:37:32 -08:00
Samuel Iglesias Gonsálvez	9e515cf381	i965/vec4: remove setting default LOD in the backend It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-20 08:29:53 +02:00
Lionel Landwerlin	d3acc240d0	intel: compiler: vec4: add missing default 0 lod We set a similar default value for LOD in the fs backend for TXS/TXL. Without this we end up generating invalid MOV with a null src. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-03 22:50:46 +01:00
Kenneth Graunke	a62fe34098	i965/vec4: Actually handle atomic op intrinsics. Embarassingly, someone enabled the ARB_shader_atomic_counter_ops extension for Gen7+ but never added the intrinsics to the switch statement in the vec4 backend, so they just hit an unreachable() call and died. Fixes: `40dd45d0c6` (i965: Enable ARB_shader_atomic_counter_ops) Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2017-09-26 15:35:06 -07:00
Matt Turner	d37d9f84ac	i965: Mark functions static Cuts 300 bytes of .text Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-08-21 14:45:44 -07:00
Samuel Iglesias Gonsálvez	8aa6ada838	i965/vec4: fix swizzle and writemask when loading an uniform with constant offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: "17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-05-18 06:49:54 +02:00
Jason Ekstrand	037ce253b1	i965/vec4: Delete the system value infastructure The only thing still using it is INVOCATION_ID for geometry shaders. That's easily enough inlined into the nir_intrinsic_load_invocation_id handling code. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:07 -07:00
Jason Ekstrand	0d5f89cdc3	i965/vec4: Use NIR remapping for VS attributes The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:08:06 -07:00
Jason Ekstrand	b86dba8a0e	nir: Embed the shader_info in the nir_shader again Commit `e1af20f18a` changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since `00620782c9`, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Samuel Iglesias Gonsálvez	f030aaf2fb	i965/vec4: use vec4_builder to emit instructions in setup_imm_df() Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez	6e3265eae5	i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez	50a5217637	i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2017-04-14 14:56:08 -07:00
Jason Ekstrand	405ef7bb33	intel/vec4: Add some fall through comments Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-04-03 16:58:35 -07:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	bab4610e9c	i965/vec4: Get rid of the type parameter from to/from_double Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-03-14 07:36:40 -07:00
Jason Ekstrand	700bebb958	i965: Move the back-end compiler to src/intel/compiler Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:34 +00:00

47 commits