fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 07:10:09 +01:00

Author	SHA1	Message	Date
Rohan Garg	7f48c70bab	intel/compiler: construct masks instead of using magic values Signed-off-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23933>	2023-06-30 09:19:57 +00:00
Caio Oliveira	59cc77f0fa	compiler: Move from nir_scope to mesa_scope Just moving the enum and performing renames, no behavior change. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23328>	2023-06-19 23:29:26 +00:00
Ian Romanick	96cde9cc01	intel/fs: Emit better code for bfi(..., 0) DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) total instructions in shared programs: 20570141 -> 20570063 (<.01%) instructions in affected programs: 30679 -> 30601 (-0.25%) helped: 77 / HURT: 0 total cycles in shared programs: 902113977 -> 902118723 (<.01%) cycles in affected programs: 3255958 -> 3260704 (0.15%) helped: 60 / HURT: 19 Broadwell total instructions in shared programs: 18524633 -> 18524547 (<.01%) instructions in affected programs: 34095 -> 34009 (-0.25%) helped: 75 / HURT: 2 total cycles in shared programs: 949532394 -> 949543761 (<.01%) cycles in affected programs: 3419107 -> 3430474 (0.33%) helped: 57 / HURT: 24 total spills in shared programs: 22484 -> 22484 (0.00%) spills in affected programs: 516 -> 516 (0.00%) helped: 2 / HURT: 2 total fills in shared programs: 29346 -> 29338 (-0.03%) fills in affected programs: 572 -> 564 (-1.40%) helped: 4 / HURT: 0 Haswell total instructions in shared programs: 17331356 -> 17331523 (<.01%) instructions in affected programs: 27920 -> 28087 (0.60%) helped: 41 / HURT: 4 total cycles in shared programs: 936603192 -> 936574664 (<.01%) cycles in affected programs: 3417695 -> 3389167 (-0.83%) helped: 28 / HURT: 21 total spills in shared programs: 19718 -> 19756 (0.19%) spills in affected programs: 436 -> 474 (8.72%) helped: 0 / HURT: 4 total fills in shared programs: 22547 -> 22607 (0.27%) fills in affected programs: 444 -> 504 (13.51%) helped: 0 / HURT: 4 Ivy Bridge total cycles in shared programs: 463451277 -> 463451273 (<.01%) cycles in affected programs: 95870 -> 95866 (<.01%) helped: 3 / HURT: 2 DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown) Totals: Instrs: 152825278 -> 152819969 (-0.00%); split: -0.00%, +0.00% Cycles: 15014075626 -> 15014628652 (+0.00%); split: -0.01%, +0.01% Subgroup size: 8528536 -> 8528560 (+0.00%) Send messages: 7711431 -> 7711464 (+0.00%) Spill count: 99907 -> 99509 (-0.40%); split: -0.40%, +0.00% Fill count: 202459 -> 201598 (-0.43%); split: -0.43%, +0.00% Scratch Memory Size: 4376576 -> 4371456 (-0.12%) Totals from 2915 (0.44% of 662497) affected shaders: Instrs: 2288842 -> 2283533 (-0.23%); split: -0.24%, +0.01% Cycles: 471633295 -> 472186321 (+0.12%); split: -0.27%, +0.39% Subgroup size: 27488 -> 27512 (+0.09%) Send messages: 151344 -> 151377 (+0.02%) Spill count: 48091 -> 47693 (-0.83%); split: -0.83%, +0.00% Fill count: 59053 -> 58192 (-1.46%); split: -1.46%, +0.00% Scratch Memory Size: 1827840 -> 1822720 (-0.28%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>	2023-06-14 18:49:53 +00:00
Lionel Landwerlin	6b9f838d62	intel/fs: handle load_global_constant_uniform_block_intel Again, load the data just once in GRF, share it across lanes. Shader-db on dg2: total instructions in shared programs: 23214555 -> 23215400 (<.01%) instructions in affected programs: 199977 -> 200822 (0.42%) helped: 3 HURT: 38 helped stats (abs) min: 5 max: 670 x̄: 283.67 x̃: 176 helped stats (rel) min: 1.34% max: 49.41% x̄: 22.15% x̃: 15.70% HURT stats (abs) min: 1 max: 185 x̄: 44.63 x̃: 32 HURT stats (rel) min: 0.13% max: 42.86% x̄: 10.25% x̃: 9.30% 95% mean confidence interval for instructions value: -18.65 59.87 95% mean confidence interval for instructions %-change: 3.29% 12.47% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 5928 -> 5928 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 851137495 -> 851152449 (<.01%) cycles in affected programs: 16406137 -> 16421091 (0.09%) helped: 9 HURT: 32 helped stats (abs) min: 10 max: 13498 x̄: 6443.22 x̃: 5581 helped stats (rel) min: 0.11% max: 4.75% x̄: 1.45% x̃: 0.34% HURT stats (abs) min: 3 max: 15056 x̄: 2279.47 x̃: 735 HURT stats (rel) min: 0.10% max: 23.71% x̄: 4.58% x̃: 4.65% 95% mean confidence interval for cycles value: -1315.40 2044.87 95% mean confidence interval for cycles %-change: 1.71% 4.80% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 11856 -> 11825 (-0.26%) spills in affected programs: 2368 -> 2337 (-1.31%) helped: 4 HURT: 0 total fills in shared programs: 16258 -> 16207 (-0.31%) fills in affected programs: 2930 -> 2879 (-1.74%) helped: 4 HURT: 0 total sends in shared programs: 1038194 -> 1038185 (<.01%) sends in affected programs: 40 -> 31 (-22.50%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 2.25 x̃: 2 helped stats (rel) min: 10.00% max: 33.33% x̄: 21.46% x̃: 21.25% 95% mean confidence interval for sends value: -4.64 0.14 95% mean confidence interval for sends %-change: -40.41% -2.51% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 0 Some VK/DX titles result (on DG2 only), it's mostly additional instruction counts except for the unity spaceship demo where a CS shader gets additional SIMDness. The reason for additional instructions is that since we're doing block loads, we need to find the live channels in control flow to select a single lane value that is valid. aztec_ruins_high: Totals from 3 (1.12% of 269) affected shaders: Instrs: 17732 -> 17896 (+0.92%) Cycles: 796518 -> 819302 (+2.86%) cyberpunk_2077: Totals from 17 (0.17% of 10301) affected shaders: Instrs: 10848 -> 11658 (+7.47%) Cycles: 248243 -> 259168 (+4.40%); split: -0.57%, +4.97% fallout_4_dxvk_g2: Totals from 2 (0.12% of 1638) affected shaders: Instrs: 3157 -> 3368 (+6.68%) Cycles: 487807 -> 490426 (+0.54%); split: -0.26%, +0.79% Max live registers: 139 -> 141 (+1.44%) red_dead_redemption2: Totals from 68 (1.14% of 5970) affected shaders: Instrs: 34871 -> 36486 (+4.63%) Cycles: 551430 -> 565211 (+2.50%) Send messages: 2074 -> 2072 (-0.10%) Max live registers: 5078 -> 5077 (-0.02%) total_war_warhammer2: Totals from 5 (1.05% of 478) affected shaders: Instrs: 6905 -> 6971 (+0.96%); split: -0.16%, +1.12% Cycles: 97035 -> 97989 (+0.98%); split: -0.07%, +1.05% unity spaceship demo (instruction count going up due to a CS shader bump from SIMD8->16): Totals from 53 (9.71% of 546) affected shaders: Instrs: 223748 -> 233223 (+4.23%); split: -0.01%, +4.25% Cycles: 23134697 -> 25207080 (+8.96%); split: -0.17%, +9.13% Subgroup size: 480 -> 488 (+1.67%) Spill count: 2156 -> 2242 (+3.99%); split: -0.19%, +4.17% Fill count: 4617 -> 4845 (+4.94%); split: -0.09%, +5.02% Max live registers: 5991 -> 6050 (+0.98%); split: -0.40%, +1.39% Max dispatch width: 480 -> 488 (+1.67%) witcher_3_dxvk_g2: Totals from 27 (2.51% of 1074) affected shaders: Instrs: 57067 -> 57677 (+1.07%); split: -0.03%, +1.10% Cycles: 1397871 -> 1436704 (+2.78%); split: -0.35%, +3.13% Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>	2023-06-14 12:04:05 +00:00
Lionel Landwerlin	5ae8a78d8c	intel/fs: make use of load_ubo_uniform_block_intel The principle is the same as the load_ssbo_uniform_block_intel. Whenever we see a uniform offset, load the data only once in GRFs to reduce register pressure. Iris shader-db run on DG2 : total instructions in shared programs: 23001325 -> 23094969 (0.41%) instructions in affected programs: 1775989 -> 1869633 (5.27%) helped: 764 HURT: 2097 helped stats (abs) min: 1 max: 102 x̄: 6.96 x̃: 2 helped stats (rel) min: 0.03% max: 16.91% x̄: 1.36% x̃: 0.63% HURT stats (abs) min: 1 max: 2461 x̄: 47.19 x̃: 7 HURT stats (rel) min: <.01% max: 199.34% x̄: 5.91% x̃: 2.60% 95% mean confidence interval for instructions value: 25.43 40.03 95% mean confidence interval for instructions %-change: 3.60% 4.33% Instructions are HURT. total loops in shared programs: 5847 -> 5847 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 839329852 -> 845491482 (0.73%) cycles in affected programs: 130229434 -> 136391064 (4.73%) helped: 1098 HURT: 2228 helped stats (abs) min: 1 max: 130102 x̄: 1340.64 x̃: 22 helped stats (rel) min: <.01% max: 64.25% x̄: 4.03% x̃: 0.71% HURT stats (abs) min: 1 max: 185309 x̄: 3426.24 x̃: 87 HURT stats (rel) min: <.01% max: 92.85% x̄: 8.12% x̃: 3.82% 95% mean confidence interval for cycles value: 1342.16 2362.97 95% mean confidence interval for cycles %-change: 3.70% 4.52% Cycles are HURT. total spills in shared programs: 10768 -> 11856 (10.10%) spills in affected programs: 9717 -> 10805 (11.20%) helped: 25 HURT: 28 total fills in shared programs: 13720 -> 16258 (18.50%) fills in affected programs: 12016 -> 14554 (21.12%) helped: 25 HURT: 28 total sends in shared programs: 1034790 -> 1031266 (-0.34%) sends in affected programs: 33416 -> 29892 (-10.55%) helped: 1005 HURT: 0 helped stats (abs) min: 1 max: 22 x̄: 3.51 x̃: 3 helped stats (rel) min: 1.69% max: 60.00% x̄: 15.20% x̃: 14.08% 95% mean confidence interval for sends value: -3.72 -3.29 95% mean confidence interval for sends %-change: -15.82% -14.57% Sends are helped. LOST: 26 GAINED: 183 shader-db on a number of VK/DX titles on DG2 : PERCENTAGE DELTAS Shaders Instrs Cycles age_of_wonders_III 1928 +0.02% -0.19% PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Max live registers Max dispatch width assassins_creed_odyssey 2119 +1.12% -0.42% -0.03% -0.29% -9.10% -4.26% -0.64% +0.65% PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Max live registers aztec_ruins_high 269 -0.05% -0.45% -0.29% -7.27% -0.33% PERCENTAGE DELTAS Shaders Instrs Cycles Max live registers Max dispatch width dark_souls_3_dxvk_g2 1420 +0.09% +0.24% +0.21% +0.12% (stats look bad, but it's just one shader affected) PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers fallout_4_dxvk_g2 1638 +0.67% +8.32% +16.02% +7.17% +100.00% +0.48% PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Spill count Fill count Max live registers Max dispatch width red_dead_redemption2 5969 +0.16% -0.04% -0.04% +0.01% +0.05% -0.20% +0.04% PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers Max dispatch width rise_of_the_tomb_raider_g2 12129 +2.19% +1.36% -1.23% -0.36% +2.04% PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers shooter-game 693 +0.07% -0.89% -0.09% -0.09% PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers Max dispatch width talos_g2 1140 +0.37% +3.80% -0.86% -0.67% +0.19% PERCENTAGE DELTAS Shaders Instrs Cycles Max live registers Max dispatch width total_war_warhammer2 477 +0.25% +0.66% -0.17% +0.10% PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers Max dispatch width witcher_3_dxvk_g2 1074 +0.75% -10.45% -0.15% -0.16% -0.16% PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers wolfenstein_youngblood 1111 +0.52% +0.66% -0.59% -0.03% Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>	2023-06-14 12:04:05 +00:00
Lionel Landwerlin	7eb1e2a690	intel/fs: avoid reusing the VGRF for uniform load_ubo Only found 3 shaders affected in Red Dead Redemption : Totals from 3 (0.05% of 5969) affected shaders: Instrs: 2246 -> 2230 (-0.71%) Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05% This will have a larger effect when we add the load_ubo_uniform_block_intel intrinsic where we will have larger blocks (vec8/vec16 vs vec4 only now). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>	2023-06-14 12:04:05 +00:00
Lionel Landwerlin	0cd9f0c3d3	intel/fs: fix bindless/shared surface mistake Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `068bf1378d` ("intel/fs: enable SSBO accesses through the bindless heap") Tested-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23536>	2023-06-14 07:42:57 +00:00
Lionel Landwerlin	04777171e0	intel/fs: try to rematerialize surface computation code This helps a lot with accessing surface handles in control flow. Our resource_intel intrinsic has a non_uniform flag, in which case we cannot apply this optimization. But in uniform cases, this is just a massive win. We drop all kind of pipeline stalls due to find_live_channel. We also reduce register pressure by doing the surface handle computation in a single GRF (instead of 2 or 4). There are some regressions in max dispatch width but those I think are only on SIMD32 and due to the current heuristic disabling it after throughput comparison with SIMD16. We know this heuristic is not perfect, it should probably be updated in another change. Here are some stats (all titles seem to have similar gains) : PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width red_dead_redemption2 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93% --------------------------------------------------------------------------------------------------------------------------------------------------------------- All affected 4716 -37.29% -5.67% +0.95% +0.07% -81.26% -79.16% -70.62% -9.15% -8.47% --------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93% PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width rise_of_the_tomb_raider_g2 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96% --------------------------------------------------------------------------------------------------------------------------------------------------------------------- All affected 11732 -37.27% -22.14% +0.01% +0.00% -99.01% -99.14% -98.65% -7.67% -5.11% --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96% PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers Max dispatch width total_war_warhammer2 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62% ----------------------------------------------------------------------------------------------------------------------------------- All affected 335 -28.31% -12.77% -82.35% -88.46% -66.67% -6.25% -7.24% ----------------------------------------------------------------------------------------------------------------------------------- Total 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62% PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width witcher_3_dxvk_g2 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00% ------------------------------------------------------------------------------------------------------------------------------------------------------------ All affected 693 -41.93% -58.45% +0.09% +0.01% -98.52% -97.29% -98.10% -10.25% -1.33% ------------------------------------------------------------------------------------------------------------------------------------------------------------ Total 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00% Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	b28609a756	intel/fs: enable uniform block accesses through bindless heap Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	01fc9a06bd	intel/fs: enable get_buffer_size on bindless heap Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	ad9bc1ffb5	intel/fs: enable UBO accesses through bindless heap Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	068bf1378d	intel/fs: enable SSBO accesses through the bindless heap Using the information coming from surface_index_intel, we can tell whether we should use the BTI or bindless heap for a particular SSBO access. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	3d0cc3f63b	intel/fs: keep track of new resource_intel information Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	86e9943b00	intel/fs: teach ubo range analysis pass about resource_intel Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:37 +00:00
Lionel Landwerlin	e09cfda0de	intel/fs: lower get_buffer_size like other logical sends This will also enable the use of the bindless heap. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:36 +00:00
Lionel Landwerlin	429ef02f83	intel/fs: make tcs input_vertices dynamic We need to do 3 things to accomplish this : 1. make all the register access consider the maximal case when unknown at compile time 2. move the clamping of load_per_vertex_input prior to lowering nir_intrinsic_load_patch_vertices_in (in the dynamic cases, the clamping will use the nir_intrinsic_load_patch_vertices_in to clamp), meaning clamping using derefs rather than lowered nir_intrinsic_load_per_vertex_input 3. in the known cases, lower nir_intrinsic_load_patch_vertices_in in NIR (so that the clamped elements still be vectorized to the smallest number of URB read messages) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>	2023-05-24 18:32:07 +00:00
Lionel Landwerlin	952a523abb	intel: switch over to unified atomics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23004>	2023-05-15 16:32:21 +00:00
Lionel Landwerlin	fb13360546	intel/fs: reduce register usage for relocated constants Commit `bb8e31b7ed` ("anv: avoid hardcoding instruction VA constant in shaders") had a slight negative impact on shaders (Red Dead Redemption 2 in particular). Dropping a few shaders from SIMD32 to SIMD16. With this change, it brings back all the dropped SIMD32 shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22872>	2023-05-07 19:38:04 +00:00
Kenneth Graunke	9dd6fcd9ec	intel/compiler: UNDEF SubgroupInvocation's register This value takes a few instructions to create, involving expanding V-immediates, adding 8 for SIMD16, and so on. We can mark it UNDEF so that it's clear that although these are partial writes, we are actually defining the entire value. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22835>	2023-05-04 18:17:26 +00:00
Kenneth Graunke	4913f54a1f	intel/compiler: UNDEF comparisons with smaller than 32-bit Comparisons which produce 32-bit boolean results (0 or 0xFFFFFFFF) but operate on 16-bit types would first generate a CMP instruction with W or HF types, before expanding it out. This CMP is a partial write, which leads us to think the register may contain some prior contents still. When placed in a loop, this causes its live range to extend beyond its real life time. Mark the register with UNDEF first so that we know that no prior contents exist and need to be preserved. This affects: flt32, fge32, feq32, fneu32, ilt32, ult32, ige32, uge32, ieq32, ine32 On one of Cyberpunk 2077's most complex compute shaders, this reduces the maximum live registers from 696 to 537 (22.8%). Together with the next patch, Cyberpunk's spills and fills are cut by 10.23% and 9.19%, respectively. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22835>	2023-05-04 18:17:26 +00:00
Lionel Landwerlin	daa8003e45	intel/fs: use nomask for setting cr0 for float controls The instructions manipulation cr0 use the default mask on lane0. So if for some reason that lane is disabled in some of the dispatchs, we can end up not executing the instructions. Fixes flakyness in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_float_32_to_16.uniform_matrix_float_rtz_frag Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22314>	2023-04-11 11:01:31 +00:00
Kenneth Graunke	98bcf650f1	intel/compiler: Use nir_dest_bit_size() for ballot bit size check There's no guarantee that this is a SSA value. Use the helper to handle both SSA values and register correctly. Otherwise we read trash when we encounter a register and make bad decisions on types, possibly leading to our destination being UQ typed when the VGRF is only 32-bit. Fixes compilation with -Dintel-clc=enabled since `7f6491b76d` (nir: Combine if_uses with instruction uses) but the bug is much older than that, circa 2017. We were just getting lucky before. Fixes: `069bf7c907` ("i965/fs: Match destination type to size for ballot") Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22374>	2023-04-07 19:28:56 -07:00
Jordan Justen	eef7a117a1	intel/compiler: Support fmul_fsign opt for fp64 when int64 isn't supported MTL support fp64, but not int64. The fsign(double(x))*FOO optimization would try to use a 64-bit int xor operation to conditionally toggle the sign bit off the result. Since this only affects high bit of the result, we can do a 32-bit move of the low dword, and a 32-bit xor on the high dword. Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero on MTL. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>	2023-04-05 18:48:21 +00:00
Lionel Landwerlin	a358b97c58	intel/fs: optimize uniform SSBO & shared loads Using divergence analysis, figure out when SSBO & shared memory loads are uniform and carry the data only once in register space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>	2023-04-05 12:32:56 +00:00
Sagar Ghuge	cece2aa2c1	intel/compiler: Add Wa_14014063774 for slm_fence Before SLM fence compiler needs to insert SYNC.ALLWR in order to avoid the SLM data race. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22050>	2023-03-25 00:45:04 +00:00
Mark Janes	33d03e57ad	intel/fs: use generated helpers for Wa_14013363432 / Wa_14012688258 Wa_14013363432 is a clone of Wa_14012688258. It does not apply to all gfx 12.5 platforms. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21745>	2023-03-23 19:13:09 +00:00
Lionel Landwerlin	56474fae93	intel/fs: fix subgroup invocation read bounds checking nir->info.subgroup_size can be set to an enum : SUBGROUP_SIZE_VARYING = 0 SUBGROUP_SIZE_UNIFORM = 1 SUBGROUP_SIZE_API_CONSTANT = 2 SUBGROUP_SIZE_FULL_SUBGROUPS = 3 So compute the API subgroup size value and compare it to the dispatch size to determine whether we need some bound checking. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ac192d79d` ("intel/fs: bound subgroup invocation read to dispatch size") Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>	2023-03-14 12:15:48 +00:00
Ian Romanick	28311f9d02	nir: intel/compiler: Move ufind_msb lowering to NIR Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Cycles in all programs: 9098346105 -> 9098333765 (-0.0%) Cycles helped: 6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	08ca862ef8	intel/compiler: Tighter src and dest size bounds checking for some opcodes Enforce the sizes listed in the Skylake PRM: BFREV: source types: D destination types: D CBIT: source types: UB, UW, UD destination types: UD FBH: source types: D, UD destination types: UD FBL: source types: UD destination types: UD LZD: source types: D, UD destination types: UD v2: Update BFREV commit message documentation. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	0cc7bf63b7	nir: intel/compiler: Move ifind_msb lowering to NIR Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so no @32 annotation is required. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	15c6c859cf	intel/compiler: Lower find_lsb in NIR No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Kenneth Graunke	f5e5705c91	intel/fs: Use F32TO16/F16TO32 helpers in fquantize16 handling I originally thought that we were intentionally emitting the legacy opcodes here to make them opaque to the optimizer, so that it wouldn't eliminate the explicit type conversions, as they're actually required to do the quantization. But...we don't actually optimize those away currently anyway. So...go ahead and use the helpers for consistency. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	309ec3725a	intel/fs: Use new F16TO32 helpers for unpack_half_split_* opcodes This gets us a MOV at the IR level on Gfx8+ which should be more optimizable than F16TO32. It also removes confusion about which pipe which the instruction will run on. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	78bf53904e	intel/fs: Delete a TODO about using brw_F32TO16. We can just use the new builder helpers to get the optimization advantages of a MOV on Gfx8+ while also getting the necessary F32TO16 on Gfx7.x and yet not worry too hard about it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Caio Oliveira	c92d589597	intel/compiler: Drop non-scoped barrier handling Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Caio Oliveira	db0a09c9e2	intel/fs: Handle scoped barriers with execution scope Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Faith Ekstrand	83fd7a5ed1	intel: Use nir_lower_tex_options::lower_index_to_offset Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>	2023-03-06 21:38:32 +00:00
Caio Oliveira	c80268a20d	intel/compiler: Mark various memory barriers intrinsics unreachable Now that both SPIR-V and GLSL are using scoped barriers, we can stop handling the specialized ones. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3339>	2023-02-27 20:24:01 +00:00
Daniel Schürmann	2bb369dd8d	nir: add assertions that loops don't have a Continue Construct Hoping that I didn't miss any, this should add assertions to all functions and passes which explicitly handle 'nir_loop'. Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>	2023-02-21 10:41:11 +00:00
Lionel Landwerlin	9ac192d79d	intel/fs: bound subgroup invocation read to dispatch size This is to avoid out of bound register accesses (potentially leading to hangs) when the dispatch size is smaller than when is reported in the NIR subgroup_size. v2: Implement bounding with a mask (since workgroup sizes are powers of 2) (Faith) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `530de844ef` ("intel,anv,iris,crocus: Drop subgroup size from the shader key") Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21282>	2023-02-14 21:29:42 +00:00
Jason Ekstrand	949b42c4dc	intel/compiler: Convert wm_prog_key::multisample_fbo to a tri-state This allows us to communicate to the back-end that we don't actually know if the framebuffer is multisampled or not. No drivers set anything but ALWAYS/NEVER and we still have a few ALWAYS/NEVER assumptions but those should be asserted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:18 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Sagar Ghuge	0c083d29a5	intel/fs: Always stall between the fences on Gen11+ Be conservative in Gfx11+ and always stall in a fence. Since there are two different fences, and shader might want to synchronize between them. This change also brings back the original code block for the stall between the fence and comment from the commit `b390ff3517`. v2: (Caio) - Re-arrange code block. - Adjust comment. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6958 Fixes: `f7262462` ("intel/fs: Rework fence handling in brw_fs_nir.cpp") Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Tested-by: Mark Janes <markjanes@swizzler.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20996>	2023-02-02 00:21:21 +00:00
Amber	ab4c2990ed	intel/compiler: use lower_image_samples_to_one Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewer-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Amber Amber <amber@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20813>	2023-02-01 19:52:49 +00:00
Lionel Landwerlin	13cca48920	intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more generic sends and drop this internal opcode. The idea behind this change is to allow bindless surfaces to be used for UBO pulls and why it's interesting to be able to reuse setup_surface_descriptors(). But that will come in a later change. No shader-db changes on TGL & DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>	2023-01-26 11:26:53 +00:00
Marcin Ślusarz	9bb18a4f9e	intel/compiler: fix generation of vec8/vec16 alu instruction I stumbled on this when I inserted some suboptimal lowering code after all optimizations. Adding certain subset of optimizations after my lowering code actually avoided this bug, so I think it's not possible to hit this on upstream. Let's fix this for the next person generating suboptimal code... Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>	2023-01-24 13:15:58 +00:00
Kenneth Graunke	16b66ab659	intel/compiler: Drop dest checking in atomic code NIR atomic operation intrinsics all have destinations. This is just copy and pasted from other generic intrinsic handling where that may or may not be the case. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	780f3e2e6b	intel/compiler: Delete all the A64 atomic variants for type sizes These are handled identically in almost all cases. There is one place in the legacy surface lowering that was obtaining the bitsize from the opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8) for that and works just fine. If we just do that in the legacy lowering too, then we don't need this plethora of opcodes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	03ddde1230	intel/compiler: Combine nir_emit_{ssbo,shared}_atomic into one helper These are basically identical save for: - shared has surface hardcoded to SLM rather than an SSBO index - shared has to handle adding the 'base' const_index (SSBO have none) - the NIR source index for data is shifted by one It's not worth copy and pasting the entire function for this. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	b84939c678	intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float() These are now basically identical to their non-float counterparts. The only thing that differed was the opcode checking to determine which operands existed. Now that we have a unified opcode enum and a helper for the number of data operands, we can just use that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00

1 2 3 4 5 ...

532 commits