fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 09:20:12 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	03f948f5fd	brw: Skip fetching unread leading components of UBO loads We were already skipping unread trailing components, but now we skip them on both ends. About -3.5% spills on Shadow of the Tomb Raider on Alchemist (mostly a wash elsewhere, but it will help additional shaders with later patches). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	4ab04799ee	brw: Delete assign_constant_locations and push_constant_loc[] The push_constant_loc[] array is always an identity mapping these days, so it's kind of pointless. Just use the original uniform number and skip the unnecessary "remap" step. With that gone, and shrinking UBO ranges gone, assign_constant_locations() is now empty and can be removed as well. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Caio Oliveira	e1aebf8a0c	intel/brw: Remove 'fs' prefix from passes and related functions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>	2025-01-02 18:11:05 +00:00
Ian Romanick	0f3a350087	brw/nir: Don't generate scalar byte to float conversions on DG2+ in optimize_extract_to_float The lowering code does not generate efficient code. It is better to just not emit the bad thing in the first place. The shaders that I examined had blocks of NIR like: con 32 %527 = extract_u8 %456.o, %5 (0x0) con 32 %528 = extract_u8 %456.o, %35 (0x1) con 32 %529 = extract_u8 %456.o, %14 (0x2) con 32 %530 = extract_u8 %456.o, %11 (0x3) con 32 %531 = u2f32 %527 con 32 %532 = u2f32 %528 con 32 %533 = u2f32 %529 con 32 %534 = u2f32 %530 In some cases the u2f results are multiplied with 1/255. There may be a slightly more efficient way to do this by doing something like mov(8) g40<1>UW g12.1<32,8,4>UB mov(8) g41<1>UW g12.2<32,8,4>UB mov(8) g42<1>UW g12.3<32,8,4>UB mov(8) g60<1>F g12<32,8,4>UB mov(8) g61<1>F g40<1,1,0>UW mov(8) g62<1>F g41<1,1,0>UW mov(8) g63<1>F g42<1,1,0>UW In SIMD16 and SIMD32 that would save temporary register space. It could save a register in SIMD8 by using g40.8 instead of g42. Making that happen might be tricky. Maybe we should just add a special NIR opcode that converts a packed uint32 to a vec4? v2: Add a bunch of documentation explaining what's going on. Suggested by Ken. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 18228689 -> 18228720 (<.01%) instructions in affected programs: 43091 -> 43122 (0.07%) helped: 0 / HURT: 30 total cycles in shared programs: 932542994 -> 932544290 (<.01%) cycles in affected programs: 8150758 -> 8152054 (0.02%) helped: 15 / HURT: 17 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 142890605 -> 142890392 (-0.00%); split: -0.00%, +0.00% Cycle count: 21655049536 -> 21654693720 (-0.00%); split: -0.00%, +0.00% Totals from 181 (0.03% of 553251) affected shaders: Instrs: 188022 -> 187809 (-0.11%); split: -0.12%, +0.01% Cycle count: 85291658 -> 84935842 (-0.42%); split: -0.47%, +0.05% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 154438050 -> 154436980 (-0.00%) Cycle count: 15334650326 -> 15334644375 (-0.00%); split: -0.00%, +0.00% Spill count: 56754 -> 56706 (-0.08%) Fill count: 95919 -> 95808 (-0.12%) Scratch Memory Size: 2306048 -> 2304000 (-0.09%) Max live registers: 32469924 -> 32469899 (-0.00%) Totals from 112 (0.02% of 642922) affected shaders: Instrs: 156186 -> 155116 (-0.69%) Cycle count: 11111478 -> 11105527 (-0.05%); split: -0.62%, +0.56% Spill count: 1766 -> 1718 (-2.72%) Fill count: 2815 -> 2704 (-3.94%) Scratch Memory Size: 78848 -> 76800 (-2.60%) Max live registers: 11526 -> 11501 (-0.22%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	1a7593ed36	brw/nir: Treat some ballot as convergent v2: Fix for Xe2. v3: Add a comment explaining the use of bld instead of xbld. Suggested by Ken. Fix a bug in handing is_scalar source. Noticed by me while applying Ken's review feedback. shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18228657 -> 18228689 (<.01%) instructions in affected programs: 9333 -> 9365 (0.34%) helped: 2 / HURT: 26 total cycles in shared programs: 932511560 -> 932542994 (<.01%) cycles in affected programs: 2263040 -> 2294474 (1.39%) helped: 7 / HURT: 27 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20700370 -> 20700392 (<.01%) instructions in affected programs: 18579 -> 18601 (0.12%) helped: 1 / HURT: 28 total cycles in shared programs: 888385851 -> 888386325 (<.01%) cycles in affected programs: 2571368 -> 2571842 (0.02%) helped: 14 / HURT: 6 total spills in shared programs: 4373 -> 4371 (-0.05%) spills in affected programs: 71 -> 69 (-2.82%) helped: 1 / HURT: 0 total fills in shared programs: 4657 -> 4653 (-0.09%) fills in affected programs: 196 -> 192 (-2.04%) helped: 1 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 142887258 -> 142890605 (+0.00%); split: -0.00%, +0.00% Cycle count: 21653599282 -> 21655049536 (+0.01%); split: -0.00%, +0.01% Max live registers: 47942973 -> 47942837 (-0.00%) Totals from 22209 (4.01% of 553251) affected shaders: Instrs: 4337679 -> 4341026 (+0.08%); split: -0.00%, +0.08% Cycle count: 261852040 -> 263302294 (+0.55%); split: -0.38%, +0.93% Max live registers: 1299670 -> 1299534 (-0.01%) Meteor Lake, DG2, Tiger Lake, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 156599915 -> 156590882 (-0.01%); split: -0.01%, +0.00% Cycle count: 16940072009 -> 16940902317 (+0.00%); split: -0.01%, +0.01% Max live registers: 32610801 -> 32610488 (-0.00%) Max dispatch width: 5730736 -> 5731744 (+0.02%); split: +0.12%, -0.11% Totals from 35528 (5.52% of 643617) affected shaders: Instrs: 6175409 -> 6166376 (-0.15%); split: -0.21%, +0.06% Cycle count: 230679923 -> 231510231 (+0.36%); split: -0.46%, +0.82% Max live registers: 1354716 -> 1354403 (-0.02%) Max dispatch width: 167648 -> 168656 (+0.60%); split: +4.26%, -3.66% Ice Lake Totals: Instrs: 155330276 -> 155318037 (-0.01%); split: -0.01%, +0.00% Cycle count: 15019092327 -> 15019637026 (+0.00%); split: -0.00%, +0.01% Max live registers: 32640341 -> 32637305 (-0.01%) Max dispatch width: 5780720 -> 5780688 (-0.00%); split: +0.02%, -0.02% Totals from 37773 (5.85% of 645641) affected shaders: Instrs: 6643030 -> `6630791` (-0.18%); split: -0.24%, +0.05% Cycle count: 223589025 -> 224133724 (+0.24%); split: -0.29%, +0.53% Max live registers: 1491781 -> 1488745 (-0.20%) Max dispatch width: 167600 -> 167568 (-0.02%); split: +0.75%, -0.77% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	f2d2014636	brw/nir: Simplify get_nir_image_intrinsic_image and get_nir_buffer_intrinsic_index shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 20041625 -> 20041634 (<.01%) instructions in affected programs: 1206 -> 1215 (0.75%) helped: 0 / HURT: 5 total cycles in shared programs: 929993812 -> 929993816 (<.01%) cycles in affected programs: 10930 -> 10934 (0.04%) helped: 1 / HURT: 2 fossil-db: Lunar Lake Totals: Instrs: 142892951 -> 142893049 (+0.00%) Send messages: 6591165 -> 6591186 (+0.00%) Cycle count: 21653727624 -> 21653732470 (+0.00%); split: -0.00%, +0.00% Scratch Memory Size: 5664768 -> 5660672 (-0.07%) Max live registers: 47944999 -> 47944983 (-0.00%) Totals from 19 (0.00% of 553292) affected shaders: Instrs: 10671 -> 10769 (+0.92%) Send messages: 697 -> 718 (+3.01%) Cycle count: 234508 -> 239354 (+2.07%); split: -0.01%, +2.08% Scratch Memory Size: 38912 -> 34816 (-10.53%) Max live registers: 2203 -> 2187 (-0.73%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 156744203 -> 156743428 (-0.00%); split: -0.00%, +0.00% Send messages: 7654787 -> 7654808 (+0.00%) Cycle count: 16942341318 -> 16942329195 (-0.00%); split: -0.00%, +0.00% Spill count: 75549 -> 75499 (-0.07%) Fill count: 140094 -> 140012 (-0.06%) Scratch Memory Size: 3945472 -> 3944448 (-0.03%) Max live registers: 32642020 -> 32642009 (-0.00%) Totals from 19 (0.00% of 644000) affected shaders: Instrs: 12489 -> 11714 (-6.21%); split: -7.00%, +0.79% Send messages: 697 -> 718 (+3.01%) Cycle count: 203873 -> 191750 (-5.95%); split: -6.77%, +0.82% Spill count: 50 -> 0 (-inf%) Fill count: 82 -> 0 (-inf%) Scratch Memory Size: 25600 -> 24576 (-4.00%) Max live registers: 1150 -> 1139 (-0.96%) No fossil-db changes on Tiger Lake, Ice Lake, or Skylake. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	9a967c5ec4	brw/nir: Don't try optimize around emit_uniformize Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	63e395fa87	brw/nir: Eliminate nir_to_brw_state::uniform_values No shader-db changes on any Intel platform. No fossil-db changes on Tiger Lake, Ice Lake, or Skylake. fossil-db: Lunar Lake Totals: Cycle count: 21653230858 -> 21653230518 (-0.00%); split: -0.00%, +0.00% Max live registers: 47941741 -> 47941737 (-0.00%) Totals from 17 (0.00% of 553202) affected shaders: Cycle count: 201232 -> 200892 (-0.17%); split: -0.19%, +0.02% Max live registers: 1354 -> 1350 (-0.30%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 156455123 -> 156453396 (-0.00%); split: -0.00%, +0.00% Cycle count: 16904545026 -> 16904393943 (-0.00%); split: -0.00%, +0.00% Max live registers: 32638039 -> 32638035 (-0.00%) Totals from 1201 (0.19% of 643905) affected shaders: Instrs: 509360 -> 507633 (-0.34%); split: -0.34%, +0.00% Cycle count: 1579931758 -> 1579780675 (-0.01%); split: -0.01%, +0.00% Max live registers: 59633 -> 59629 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	a13244e57b	brw/nir: Treat some resource_intel as convergent No shader-db changes on any Intel platform. No fossil-db changes on Ice Lake or Skylake. fossil-db: Lunar Lake Totals: Cycle count: 21653232202 -> 21653230858 (-0.00%); split: -0.00%, +0.00% Totals from 4 (0.00% of 553202) affected shaders: Cycle count: 14276568 -> 14275224 (-0.01%); split: -0.01%, +0.00% Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 156453398 -> 156455123 (+0.00%); split: -0.00%, +0.00% Cycle count: 16904394153 -> 16904545026 (+0.00%); split: -0.00%, +0.00% Totals from 1189 (0.18% of 643905) affected shaders: Instrs: 502891 -> 504616 (+0.34%); split: -0.00%, +0.34% Cycle count: 1579688485 -> 1579839358 (+0.01%); split: -0.00%, +0.01% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	1b24612c57	brw/nir: Treat load_*_uniform_block_intel as convergent Between 5 and 10 shaders (depending on the platform) from Blender are massively helped for spills and fills (e.g., from 45 spills to 0, and 180 fills to 0). Previously this commit cause a lot of spill and fill damage to Wolfenstein Youngblood and Red Dead Redemption 2. I believe due to !32041 and !32097, this is no longer the case. RDR2 is helped, and Wolfenstein Youngblood has no changes. However, q2rtx/q2rtx-rt-pipeline is hurt: Spill count: 126 -> 175 (+38.89%); split: -0.79%, +39.68% Fill count: 156 -> 235 (+50.64%); split: -1.92%, +52.56% By the end of this series this damage is fixed, and q2rtx is helped overall by -0.79% spills and -1.92% fills. v2: Fix for Xe2. v3: Just keep using bld for the group(1, 0) call. Suggested by Ken. v4: Major re-write. Pass bld and xbld to fs_emit_memory_access. The big fix is changing the way srcs[MEMORY_LOGICAL_ADDRESS] is calculated (around line 7180). In previous versions of the commit, the address would be calculated using bld (which is now xbld) even if the address source was not is_scalar. This could cause the emit_uniformize (later in the function) to fetch garbage. This also drops the special case handling of constant offset. Constant propagation and algebraic will handle this. v5: Fix a subtle bug that was ultimately caused by the removal of offset_to_component. The MEMORY_LOGICAL_ADDRESS for load_shared_uniform_block_intel was being calculated as SIMD16 on LNL, but the later emit_uniformize would treat it as SIMD32. This caused GPU hangs in Assassin's Creed Valhalla. v6: Fix a bug in D16 to D16U32 expansion. Noticed by Ken. Add a comment explaining bld vs xbld vs ubld in fs_nir_emit_memory_access. Suggested by Ken. v7: Revert some of the v6 changes related to D16 to D16U32 expansion. This code was mostly correct. xbld is correct because DATA0 needs to be generated in size of the eventual SEND instruction. Using offset(nir_src, xbld, c) will cause offset() to correctly added component(..., 0) if nir_src.is_scalar but xbld is not scalar_group(). v8: nir_intrinsic_load_shared_uniform_block_intel was removed. This caused reproducible hangs in Assassin's Creed: Valhalla. There are some other compiler issues related to this game, and we're not yet sure exactly what the cause of any of it is. shader-db: Lunar Lake total instructions in shared programs: 18058270 -> 18068886 (0.06%) instructions in affected programs: 5196846 -> 5207462 (0.20%) helped: 4442 / HURT: 11416 total cycles in shared programs: 921324492 -> 919819398 (-0.16%) cycles in affected programs: 733274162 -> 731769068 (-0.21%) helped: 11312 / HURT: 31788 total spills in shared programs: 3633 -> 3585 (-1.32%) spills in affected programs: 48 -> 0 helped: 5 / HURT: 0 total fills in shared programs: 2277 -> 2198 (-3.47%) fills in affected programs: 79 -> 0 helped: 5 / HURT: 0 LOST: 123 GAINED: 377 Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19703458 -> 19699173 (-0.02%) instructions in affected programs: `5885251` -> 5880966 (-0.07%) helped: 4545 / HURT: 14971 total cycles in shared programs: 903497253 -> 902054570 (-0.16%) cycles in affected programs: 691762248 -> 690319565 (-0.21%) helped: 16412 / HURT: 28080 total spills in shared programs: 4894 -> 4646 (-5.07%) spills in affected programs: 248 -> 0 helped: 7 / HURT: 0 total fills in shared programs: 6638 -> 5581 (-15.92%) fills in affected programs: 1057 -> 0 helped: 7 / HURT: 0 LOST: 427 GAINED: 978 Ice Lake and Skylake had similar results. (Ice Lake shonw) total instructions in shared programs: 20384200 -> 20384889 (<.01%) instructions in affected programs: 5295084 -> 5295773 (0.01%) helped: 5309 / HURT: 12564 total cycles in shared programs: 873002832 -> 872515246 (-0.06%) cycles in affected programs: 463413458 -> 462925872 (-0.11%) helped: 16079 / HURT: 13339 total spills in shared programs: 4552 -> 4373 (-3.93%) spills in affected programs: 546 -> 367 (-32.78%) helped: 11 / HURT: 0 total fills in shared programs: 5298 -> 4657 (-12.10%) fills in affected programs: 1798 -> 1157 (-35.65%) helped: 10 / HURT: 0 LOST: 380 GAINED: 925 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 141528822 -> 141728392 (+0.14%); split: -0.21%, +0.35% Subgroup size: 10968048 -> 10968144 (+0.00%) Send messages: 6567930 -> 6567909 (-0.00%) Cycle count: 22165780202 -> 21624534624 (-2.44%); split: -3.09%, +0.65% Spill count: 69890 -> 66665 (-4.61%); split: -5.06%, +0.44% Fill count: 128331 -> 120189 (-6.34%); split: -7.44%, +1.09% Scratch Memory Size: 5829632 -> 5664768 (-2.83%); split: -2.86%, +0.04% Max live registers: 47928290 -> 47611371 (-0.66%); split: -0.71%, +0.05% Totals from 364369 (66.18% of 550563) affected shaders: Instrs: 113448842 -> 113648412 (+0.18%); split: -0.26%, +0.44% Subgroup size: 7694080 -> 7694176 (+0.00%) Send messages: 5308287 -> 5308266 (-0.00%) Cycle count: 21885237842 -> 21343992264 (-2.47%); split: -3.13%, +0.65% Spill count: 65152 -> 61927 (-4.95%); split: -5.42%, +0.47% Fill count: 122811 -> 114669 (-6.63%); split: -7.77%, +1.14% Scratch Memory Size: 5438464 -> 5273600 (-3.03%); split: -3.07%, +0.04% Max live registers: 34355310 -> 34038391 (-0.92%); split: -1.00%, +0.07% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	db2b1e4d76	brw/nir: Treat load_btd_{global,local}_arg_addr_intel and load_btd_shader_type_intel as convergent No shader-db changes on any Intel platform. No fossil-db changes on Tiger Lake, Ice Lake, or Skylake. fossil-db: Lunar Lake Totals: Instrs: 141808714 -> 141808513 (-0.00%); split: -0.00%, +0.00% Cycle count: 22177889310 -> 22181410192 (+0.02%); split: -0.00%, +0.02% Spill count: 69892 -> 69890 (-0.00%); split: -0.01%, +0.01% Fill count: 128313 -> 128331 (+0.01%) Max live registers: 48052083 -> 48052742 (+0.00%); split: -0.00%, +0.00% Totals from 549 (0.10% of 551446) affected shaders: Instrs: 911251 -> 911050 (-0.02%); split: -0.10%, +0.07% Cycle count: 1244153266 -> 1247674148 (+0.28%); split: -0.04%, +0.32% Spill count: 15849 -> 15847 (-0.01%); split: -0.04%, +0.03% Fill count: 35087 -> 35105 (+0.05%) Max live registers: 68047 -> 68706 (+0.97%); split: -0.25%, +1.22% Meteor Lake Totals: Instrs: 152744298 -> 152741241 (-0.00%); split: -0.00%, +0.00% Cycle count: 17410258529 -> 17405949054 (-0.02%); split: -0.04%, +0.01% Spill count: 78528 -> 78598 (+0.09%); split: -0.01%, +0.09% Fill count: 147893 -> 147978 (+0.06%); split: -0.00%, +0.06% Scratch Memory Size: 3962880 -> 3969024 (+0.16%) Max live registers: 31887206 -> 31887413 (+0.00%); split: -0.00%, +0.00% Totals from 552 (0.09% of 633315) affected shaders: Instrs: 907279 -> 904222 (-0.34%); split: -0.48%, +0.15% Cycle count: 1152358569 -> 1148049094 (-0.37%); split: -0.56%, +0.19% Spill count: 15290 -> 15360 (+0.46%); split: -0.03%, +0.48% Fill count: 35313 -> 35398 (+0.24%); split: -0.02%, +0.26% Scratch Memory Size: 1313792 -> 1319936 (+0.47%) Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08% DG2 Totals: Instrs: 152766492 -> 152763061 (-0.00%); split: -0.00%, +0.00% Cycle count: 17406058608 -> 17406396943 (+0.00%); split: -0.02%, +0.02% Spill count: 78626 -> 78624 (-0.00%); split: -0.01%, +0.01% Fill count: 147956 -> 148007 (+0.03%); split: -0.01%, +0.04% Scratch Memory Size: 3962880 -> 3969024 (+0.16%) Max live registers: 31887158 -> 31887365 (+0.00%); split: -0.00%, +0.00% Totals from 552 (0.09% of 633315) affected shaders: Instrs: 908513 -> 905082 (-0.38%); split: -0.47%, +0.09% Cycle count: 1148162185 -> 1148500520 (+0.03%); split: -0.23%, +0.26% Spill count: 15364 -> 15362 (-0.01%); split: -0.07%, +0.06% Fill count: 35343 -> 35394 (+0.14%); split: -0.03%, +0.17% Scratch Memory Size: 1313792 -> 1319936 (+0.47%) Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	f3593df877	brw/nir: Treat load_reloc_const_intel as convergent shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) Lunar Lake total instructions in shared programs: 18096549 -> 18096537 (<.01%) instructions in affected programs: 26128 -> 26116 (-0.05%) helped: 7 / HURT: 2 total cycles in shared programs: 922073090 -> 922093922 (<.01%) cycles in affected programs: 10574198 -> 10595030 (0.20%) helped: 19 / HURT: 76 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20503943 -> 20504053 (<.01%) instructions in affected programs: 23378 -> 23488 (0.47%) helped: 6 / HURT: 5 total cycles in shared programs: 875477036 -> 875480112 (<.01%) cycles in affected programs: 13840528 -> 13843604 (0.02%) helped: 22 / HURT: 55 total spills in shared programs: 4546 -> 4552 (0.13%) spills in affected programs: 8 -> 14 (75.00%) helped: 0 / HURT: 1 total fills in shared programs: 5280 -> 5298 (0.34%) fills in affected programs: 24 -> 42 (75.00%) helped: 0 / HURT: 1 One compute shader in Tomb Raider was hurt for spills and fills. fossil-db: Lunar Lake Totals: Instrs: 141808815 -> 141808714 (-0.00%); split: -0.00%, +0.00% Cycle count: 22185066952 -> 22177889310 (-0.03%); split: -0.05%, +0.02% Spill count: 69859 -> 69892 (+0.05%); split: -0.03%, +0.07% Fill count: 128344 -> 128313 (-0.02%); split: -0.04%, +0.01% Scratch Memory Size: 5833728 -> 5829632 (-0.07%) Totals from 13384 (2.43% of 551446) affected shaders: Instrs: 13852162 -> 13852061 (-0.00%); split: -0.00%, +0.00% Cycle count: 7691993336 -> 7684815694 (-0.09%); split: -0.15%, +0.06% Spill count: 53266 -> 53299 (+0.06%); split: -0.03%, +0.10% Fill count: 96492 -> 96461 (-0.03%); split: -0.05%, +0.02% Scratch Memory Size: 3827712 -> 3823616 (-0.11%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152744735 -> 152744298 (-0.00%); split: -0.00%, +0.00% Cycle count: 17400199290 -> 17410258529 (+0.06%); split: -0.01%, +0.07% Max live registers: 31887208 -> 31887206 (-0.00%) Totals from 12435 (1.96% of 633315) affected shaders: Instrs: 13445310 -> 13444873 (-0.00%); split: -0.00%, +0.00% Cycle count: 6941685096 -> 6951744335 (+0.14%); split: -0.03%, +0.18% Max live registers: 1071302 -> 1071300 (-0.00%) Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150644063 -> 150643944 (-0.00%); split: -0.00%, +0.00% Cycle count: 15618718733 -> 15622092285 (+0.02%); split: -0.01%, +0.03% Spill count: 58816 -> 58790 (-0.04%) Fill count: 101054 -> 101065 (+0.01%) Max live registers: 31792771 -> 31792766 (-0.00%); split: -0.00%, +0.00% Totals from 13383 (2.12% of 632544) affected shaders: Instrs: 12016285 -> 12016166 (-0.00%); split: -0.00%, +0.00% Cycle count: 5239956851 -> 5243330403 (+0.06%); split: -0.02%, +0.08% Spill count: 28977 -> 28951 (-0.09%) Fill count: 47568 -> 47579 (+0.02%) Max live registers: 1001554 -> 1001549 (-0.00%); split: -0.00%, +0.00% Skylake Totals: Instrs: 140943195 -> 140943154 (-0.00%); split: -0.00%, +0.00% Cycle count: 14818940190 -> 14816706154 (-0.02%); split: -0.02%, +0.00% Max live registers: 31663173 -> 31663168 (-0.00%); split: -0.00%, +0.00% Totals from 12625 (2.01% of 629351) affected shaders: Instrs: 11598223 -> 11598182 (-0.00%); split: -0.00%, +0.00% Cycle count: 4519027823 -> 4516793787 (-0.05%); split: -0.05%, +0.00% Max live registers: 970275 -> 970270 (-0.00%); split: -0.00%, +0.00% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	fb9b363376	brw/nir: Treat load_inline_data_intel as convergent No shader-db changes on any Intel platform. fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 141808595 -> 141808815 (+0.00%); split: -0.00%, +0.00% Cycle count: 22181300418 -> 22185066952 (+0.02%); split: -0.01%, +0.03% Max live registers: 48052077 -> 48052083 (+0.00%) Totals from 720 (0.13% of 551446) affected shaders: Instrs: 116778 -> 116998 (+0.19%); split: -0.01%, +0.20% Cycle count: 1197931082 -> 1201697616 (+0.31%); split: -0.21%, +0.53% Max live registers: 56552 -> 56558 (+0.01%) No fossil-db changes on any other Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	3e63920ca5	brw/nir: Treat some load_ubo as convergent v2: Fix for Xe2. No changes in shader-db or fossil-db on Lunar Lake, Meteor Lake, or DG2. shader-db: Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) total instructions in shared programs: 19626547 -> 19634353 (0.04%) instructions in affected programs: 1591181 -> 1598987 (0.49%) helped: 925 / HURT: 3595 total cycles in shared programs: 865236718 -> 866682659 (0.17%) cycles in affected programs: 151284264 -> 152730205 (0.96%) helped: 3430 / HURT: 5510 total sends in shared programs: 1032237 -> `1032233` (<.01%) sends in affected programs: 20 -> 16 (-20.00%) helped: 4 / HURT: 0 LOST: 48 GAINED: 141 fossil-db: Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 150662952 -> 150641175 (-0.01%); split: -0.03%, +0.02% Subgroup size: 7768880 -> 7768888 (+0.00%) Send messages: 7502265 -> 7502044 (-0.00%) Cycle count: 15621785298 -> 15618640525 (-0.02%); split: -0.06%, +0.04% Spill count: 58818 -> 58816 (-0.00%) Fill count: 101063 -> 101054 (-0.01%) Max live registers: 31795403 -> 31792179 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 5572160 -> 5571488 (-0.01%); split: +0.00%, -0.01% Totals from 10278 (1.62% of 632539) affected shaders: Instrs: 5276493 -> 5254716 (-0.41%); split: -0.89%, +0.48% Subgroup size: 156432 -> 156440 (+0.01%) Send messages: 279259 -> 279038 (-0.08%) Cycle count: 6483576378 -> 6480431605 (-0.05%); split: -0.16%, +0.11% Spill count: 27133 -> 27131 (-0.01%) Fill count: 49384 -> 49375 (-0.02%) Max live registers: 675781 -> 672557 (-0.48%); split: -0.49%, +0.01% Max dispatch width: 97256 -> 96584 (-0.69%); split: +0.08%, -0.77% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	c48570d2b2	brw/nir: Treat some ALU results as convergent v2: Fix for Xe2. v3: Fix handling of 64-bit CMP results. v4: Scalarize 16-bit comparison temporary destination when used as a source (as was already done for 64-bit). Suggested by Ken. shader-db: Lunar Lake total instructions in shared programs: 18096500 -> 18096549 (<.01%) instructions in affected programs: 15919 -> 15968 (0.31%) helped: 8 / HURT: 21 total cycles in shared programs: 921841300 -> 922073090 (0.03%) cycles in affected programs: 115946336 -> 116178126 (0.20%) helped: 386 / HURT: 135 Meteor Lake and DG2 (Meteor Lake shown) total instructions in shared programs: 19836053 -> 19836016 (<.01%) instructions in affected programs: 19547 -> 19510 (-0.19%) helped: 21 / HURT: 18 total cycles in shared programs: 906713777 -> 906588541 (-0.01%) cycles in affected programs: 96914584 -> 96789348 (-0.13%) helped: 335 / HURT: 134 total fills in shared programs: 6712 -> 6710 (-0.03%) fills in affected programs: 52 -> 50 (-3.85%) helped: 1 / HURT: 0 LOST: 1 GAINED: 1 Tiger Lake total instructions in shared programs: 19641284 -> 19641278 (<.01%) instructions in affected programs: 12358 -> 12352 (-0.05%) helped: 10 / HURT: 19 total cycles in shared programs: 865413131 -> 865460513 (<.01%) cycles in affected programs: 74641489 -> 74688871 (0.06%) helped: 388 / HURT: 100 total spills in shared programs: 3899 -> 3898 (-0.03%) spills in affected programs: 17 -> 16 (-5.88%) helped: 1 / HURT: 0 total fills in shared programs: 3249 -> 3245 (-0.12%) fills in affected programs: 51 -> 47 (-7.84%) helped: 1 / HURT: 0 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20495826 -> 20496111 (<.01%) instructions in affected programs: 53220 -> 53505 (0.54%) helped: 28 / HURT: 16 total cycles in shared programs: 875173550 -> 875243910 (<.01%) cycles in affected programs: 51700652 -> 51771012 (0.14%) helped: 400 / HURT: 39 total spills in shared programs: 4546 -> 4546 (0.00%) spills in affected programs: 288 -> 288 (0.00%) helped: 1 / HURT: 2 total fills in shared programs: 5224 -> 5280 (1.07%) fills in affected programs: 795 -> 851 (7.04%) helped: 0 / HURT: 4 LOST: 1 GAINED: 1 fossil-db: Lunar Lake Totals: Instrs: 141811551 -> 141807640 (-0.00%); split: -0.00%, +0.00% Cycle count: 22183128332 -> 22181285594 (-0.01%); split: -0.06%, +0.05% Spill count: 69890 -> 69859 (-0.04%); split: -0.09%, +0.04% Fill count: 128877 -> 128344 (-0.41%); split: -0.42%, +0.00% Max live registers: 48053415 -> 48051613 (-0.00%); split: -0.00%, +0.00% Totals from 6817 (1.24% of 551443) affected shaders: Instrs: 4300169 -> 4296258 (-0.09%); split: -0.14%, +0.05% Cycle count: 17263755610 -> 17261912872 (-0.01%); split: -0.08%, +0.07% Spill count: 41822 -> 41791 (-0.07%); split: -0.15%, +0.07% Fill count: 75523 -> 74990 (-0.71%); split: -0.71%, +0.01% Max live registers: 733647 -> 731845 (-0.25%); split: -0.29%, +0.04% Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152735305 -> 152735801 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7733536 -> 7733616 (+0.00%) Cycle count: 17398725539 -> 17400873100 (+0.01%); split: -0.00%, +0.02% Max live registers: 31887018 -> 31885742 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5561696 -> 5561712 (+0.00%) Totals from 5672 (0.90% of 633314) affected shaders: Instrs: 2817606 -> 2818102 (+0.02%); split: -0.05%, +0.07% Subgroup size: 81128 -> 81208 (+0.10%) Cycle count: 10021470543 -> 10023618104 (+0.02%); split: -0.01%, +0.03% Max live registers: 306520 -> 305244 (-0.42%); split: -0.43%, +0.01% Max dispatch width: 74136 -> 74152 (+0.02%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	7eab2cb67e	brw/nir: Treat load_workgroup_id as convergent v2: Fix for Xe2. shader-db: Lunar Lake Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18096526 -> 18096500 (<.01%) instructions in affected programs: 6759 -> 6733 (-0.38%) helped: 9 / HURT: 3 total cycles in shared programs: 921727804 -> 921841300 (0.01%) cycles in affected programs: 110049730 -> 110163226 (0.10%) helped: 90 / HURT: 372 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20496591 -> 20496402 (<.01%) instructions in affected programs: 48757 -> 48568 (-0.39%) helped: 25 / HURT: 8 total cycles in shared programs: 875253948 -> 875237902 (<.01%) cycles in affected programs: 56760140 -> 56744094 (-0.03%) helped: 363 / HURT: 34 total spills in shared programs: 4555 -> 4546 (-0.20%) spills in affected programs: 174 -> 165 (-5.17%) helped: 2 / HURT: 0 total fills in shared programs: 5243 -> 5224 (-0.36%) fills in affected programs: 382 -> 363 (-4.97%) helped: 2 / HURT: 0 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 141811577 -> 141811551 (-0.00%); split: -0.00%, +0.00% Cycle count: 22173792370 -> 22183128332 (+0.04%); split: -0.00%, +0.04% Max live registers: 48053498 -> 48053415 (-0.00%) Totals from 3911 (0.71% of 551443) affected shaders: Instrs: 2164804 -> 2164778 (-0.00%); split: -0.00%, +0.00% Cycle count: 2404062476 -> 2413398438 (+0.39%); split: -0.02%, +0.41% Max live registers: 413583 -> 413500 (-0.02%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	6fab1b77c2	brw/nir: Treat some load_uniform as convergent No shader-db changes on any Intel platform. v2: Fix for Xe2. v3: Rework the way that we determine that an intrinsic can actually be convergent. This will now depend on whether or not the important sources have previously be determined to be convergent. Fixes intermitent failures in some test cases (including dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.push_constant_float_16_to_32.scalar_frag). v4: s/the it/it/ in a comment. Noticed by Ken. fossil-db: No fossil-db changes on Lunar Lake. Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152743449 -> 152743161 (-0.00%) Cycle count: 17399179660 -> 17399193488 (+0.00%) Totals from 144 (0.02% of 633314) affected shaders: Instrs: 5936 -> 5648 (-4.85%) Cycle count: 51616 -> 65444 (+26.79%) Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 150646195 -> 150645907 (-0.00%) Cycle count: 15618427818 -> 15618428942 (+0.00%) Totals from 144 (0.02% of 632567) affected shaders: Instrs: 6218 -> 5930 (-4.63%) Cycle count: 39968 -> 41092 (+2.81%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	341e5117ec	brw/nir: Treat load_const as convergent opt_combine_constants goes to great effort to pack 8 constants into a single register, this can't have much effect. There is a lot of fossil-db variation among platforms, but the results are generally positive. v2: Fix for Xe2. shader-db: Lunar Lake total instructions in shared programs: 18095100 -> 18092845 (-0.01%) instructions in affected programs: 158931 -> 156676 (-1.42%) helped: 423 / HURT: 0 total cycles in shared programs: 921523326 -> 921522784 (<.01%) cycles in affected programs: 7522774 -> 7522232 (<.01%) helped: 225 / HURT: 228 LOST: 1 GAINED: 7 Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19820211 -> 19820303 (<.01%) instructions in affected programs: 53087 -> 53179 (0.17%) helped: 135 / HURT: 1 total cycles in shared programs: 906380523 -> 906383031 (<.01%) cycles in affected programs: 1402315 -> 1404823 (0.18%) helped: 156 / HURT: 100 LOST: 1 GAINED: 16 fossil-db: Lunar Lake Totals: Instrs: 141876801 -> 141783010 (-0.07%); split: -0.07%, +0.00% Subgroup size: 10994624 -> 10994704 (+0.00%) Cycle count: 22173441950 -> 22172949188 (-0.00%); split: -0.01%, +0.01% Spill count: 69850 -> 69890 (+0.06%); split: -0.00%, +0.06% Fill count: 129285 -> 128877 (-0.32%) Max live registers: 48047900 -> 48043650 (-0.01%); split: -0.01%, +0.00% Totals from 29837 (5.41% of 551396) affected shaders: Instrs: 7842512 -> 7748721 (-1.20%); split: -1.23%, +0.03% Subgroup size: 940320 -> 940400 (+0.01%) Cycle count: 3444846368 -> 3444353606 (-0.01%); split: -0.09%, +0.08% Spill count: 23358 -> 23398 (+0.17%); split: -0.01%, +0.18% Fill count: 52296 -> 51888 (-0.78%) Max live registers: 3183481 -> 3179231 (-0.13%); split: -0.16%, +0.03% Meteor Lake Totals: Instrs: 152709353 -> 152666543 (-0.03%); split: -0.03%, +0.00% Cycle count: 17397176906 -> 17397668904 (+0.00%); split: -0.00%, +0.01% Fill count: 147896 -> 147893 (-0.00%) Max live registers: 31862891 -> 31861888 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04% Totals from 20913 (3.30% of 633046) affected shaders: Instrs: 6676676 -> 6633866 (-0.64%); split: -0.64%, +0.00% Cycle count: 1498330125 -> 1498822123 (+0.03%); split: -0.06%, +0.09% Fill count: 41010 -> 41007 (-0.01%) Max live registers: 1799295 -> 1798292 (-0.06%); split: -0.06%, +0.00% Max dispatch width: 12880 -> 14992 (+16.40%); split: +33.29%, -16.89% DG2 and Tiger Lake had similar results. (DG2 shown) Totals: Instrs: 152730878 -> 152688139 (-0.03%); split: -0.03%, +0.00% Cycle count: 17394835605 -> 17394179808 (-0.00%); split: -0.01%, +0.00% Max live registers: 31862843 -> 31861840 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04% Totals from 20912 (3.30% of 633046) affected shaders: Instrs: 6563021 -> 6520282 (-0.65%); split: -0.65%, +0.00% Cycle count: 1201999616 -> 1201343819 (-0.05%); split: -0.08%, +0.03% Max live registers: 1798392 -> 1797389 (-0.06%); split: -0.06%, +0.00% Max dispatch width: 12872 -> 14984 (+16.41%); split: +33.31%, -16.90% Ice Lake Totals: Instrs: 151914872 -> 151868108 (-0.03%) Cycle count: 15262958696 -> 15262665082 (-0.00%); split: -0.00%, +0.00% Max live registers: 32194225 -> 32193192 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5650880 -> 5650608 (-0.00%); split: +0.02%, -0.03% Totals from 22192 (3.48% of 637223) affected shaders: Instrs: 6419739 -> 6372975 (-0.73%) Cycle count: 184733818 -> 184440204 (-0.16%); split: -0.36%, +0.20% Max live registers: 1989950 -> 1988917 (-0.05%); split: -0.05%, +0.00% Max dispatch width: 5744 -> 5472 (-4.74%); split: +23.40%, -28.13% Skylake Totals: Instrs: 141027379 -> 140811741 (-0.15%) Cycle count: 14817704293 -> 14817418611 (-0.00%); split: -0.01%, +0.01% Max live registers: 31628796 -> 31627791 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5535176 -> 5539880 (+0.08%); split: +0.14%, -0.06% Totals from 22218 (3.53% of 628840) affected shaders: Instrs: 5944856 -> 5729218 (-3.63%) Cycle count: 182845101 -> 182559419 (-0.16%); split: -0.60%, +0.44% Max live registers: 1974576 -> 1973571 (-0.05%); split: -0.07%, +0.02% Max dispatch width: 16912 -> 21616 (+27.81%); split: +46.93%, -19.11% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	5ea9ed4798	brw/nir: Prepare try_rebuild_source for scalar values Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	d5d7ae22ae	brw/nir: Fix up handling of sources that might be convergent vectors Sources that are scalars (almost all source) and convergent generally want <0,1,0> source stride. Sources that are vectors (e.g., texture coordinates, SSBO write data, etc.) and convergent want no extra strides applied. In nearly all cases LOAD_PAYLOAD lowering will do the right thing. v2: Use VEC in emit_pixel_interpolater_send. Suggested by Ken. v3: With the elimination of offset_to_component(), offset() may not convert an is_scalar source to have a zero stride. Explicitly do this in get_nir_src and prepare_alu_destination_and_sources. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Caio Oliveira	93dfe504f2	intel/brw: Add SHADER_OPCODE_READ_FROM_CHANNEL and LIVE_CHANNEL Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32412>	2024-12-14 11:38:14 -08:00
Paulo Zanoni	0dc2a5808e	brw: don't forget the base when emitting SHADER_OPCODE_MOV_RELOC_IMM The last argument seems to be used as brw_shader_reloc::delta (from brw_add_reloc), and we're unconditionally setting it to 0 here, while the other place where we handle nir_intrinsic_load_reloc_const_intel seems to be setting the base appropriately. I found this by inspection while debugging a bug related to this code, so I'm not aware of any workloads that get improved by this patch. Related patches: - `ecbec25e84` ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic") - `99047451c9` ("intel/fs: add plumbing for embedded samplers") Fixes: `ecbec25e84` ("intel/nir: add reloc delta to load_reloc_const_intel intrinsic") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32531>	2024-12-09 15:45:49 +00:00
Sagar Ghuge	9afb0480c4	intel/compiler: Extend nir_intrinsic_load_topology_id_intel for xe3 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32426>	2024-12-04 19:20:51 +00:00
Kenneth Graunke	01680a66a9	brw: Simplify choose_oword_block_size_dwords() Just calculate the block size using util_logbase2() - it's simpler. Also drop the name "oword" as this refers to legacy HDC messages, rather than the newer LSC "vector size" field. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	e703ff5e02	brw: Only consider components read for UBO loads This will matter more with overfetching, where we may suggest loading additional data that we don't actually need for vectorization purposes. We want to make sure that push ranges have the data we actually need; any extra padding is irrelevant. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Kenneth Graunke	8c795af0b8	brw: Drop a few crocus references in comments crocus no longer uses brw. It uses elk. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:32 +00:00
Lionel Landwerlin	ba3ff8b3bb	brw: move barycentric_mode enum to intel_shader_enums.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Lionel Landwerlin	bfcb9bf276	brw: rename brw_sometimes to intel_sometimes Moving it to intel_shader_enums.h The plan is to make it visible to OpenCL shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Caio Oliveira	8474dc853d	intel/brw: Add SHADER_OPCODE_QUAD_SWAP For the horizontal, vertical and diagonal variants. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31053>	2024-11-22 00:27:01 +00:00
Caio Oliveira	2bd7592b0b	intel/brw: Add SHADER_OPCODE_BALLOT Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31052>	2024-11-21 19:32:59 +00:00
Kenneth Graunke	5848035443	brw: Fix try_rebuild_source's ult32/ushr handling to use unsigned types We were accidentally doing a signed integer comparison here for ult32, or a sign-extending shift for ushr. One notable bit of fallout was that load_global_uniform_block_intel address calculations broke on platforms that don't have native 64-bit integer support, as the iadd64 lowering for "do I need to carry?" was using ult32...and performing the wrong comparison. We spotted this in Borderlands 3 on Alchemist once we turned on other optimizations. Thanks to Lionel Landwerlin for helping spot the problem! Fixes: `c7b312ad45` ("brw: factor out source extraction for rematerialization") Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>	2024-11-18 12:55:47 +00:00
Ian Romanick	2a57568ebd	brw/build: Add scalar_group() helper Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems different, so I decided to keep it. The next commit wants to use this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Caio Oliveira	019770f026	intel/brw: Add SHADER_OPCODE_VOTE_* Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL. The first two are also used for the quad variants. Move their lowering from NIR conversion to brw_lower_subgroup_ops. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>	2024-10-19 02:44:20 +00:00
Caio Oliveira	d97381efd8	intel/brw: Add fs_builder::BROADCAST() helper Include in the helper which already take care of using exec_all() and taking the first component of the result. Both are expected by SHADER_OPCODE_BROADCAST. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>	2024-10-19 02:44:20 +00:00
Lionel Landwerlin	97b17aa0b1	brw/nir: rework inline_data_intel to work with compute This intrinsic was initially dedicated to mesh/task shaders, but the mechanism it exposes also exists in the compute shaders on Gfx12.5+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Lionel Landwerlin	b2c5ca0ade	brw: remove rebuild single element special case No shader-db difference on DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Lionel Landwerlin	19eb601cfc	brw: avoid clashing nested loop indices Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>	2024-10-17 19:35:59 +00:00
Kenneth Graunke	dea61b7399	intel/brw: Fix register and builder size in emit_barrier() for Xe2 We were manually allocating 1 REG_SIZE for the barrier payload, which is only half a register on Xe2. This should eventually get allocated to a whole register anyway, but it's awkward in the meantime. Also, we were zero-initializing the header using group(8, 0) which only initialized half the register. The rest of the fields are Reserved MBZ, so they're likely unused and unread anyway - but it's better to zero-initialize them so we don't get random undefined, miserable-to-debug behavior. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Kenneth Graunke	7c9eb8b289	intel/brw: Make a ubld temporary in emit_barrier() Saves typing .exec_all() in a lot of places. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Kenneth Graunke	a9d9488788	intel/brw: Delete Gfx7-8 code from emit_barrier() Those are supported by elk, not brw. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Kenneth Graunke	c747c1e1f4	intel/brw: Fix spill/fill count for load/store_scratch in SIMD32 Honestly, I don't know what I was thinking - we are emitting a single spill/fill message here, but were counting it as 2 spill/fills in SIMD32 shaders. So our eventual shader stat reporting would subtract the number of spills and fills from send_count, and get a negative number, wrapping around to just shy of UINT32_MAX. That's way too many sends. This is especially noticable on Xe2 which often uses SIMD32 shaders. Backport-to: 24.2 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>	2024-10-15 18:14:37 +00:00
Caio Oliveira	0ba1159b0a	intel/brw: Add SHADER_OPCODE_*_SCAN Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	9537b62759	intel/brw: Add SHADER_OPCODE_REDUCE Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	affa7567c2	intel/brw: Add phases to backend The general idea is to be able to validate that certain instructions were lowered and certain restrictions were already handled. Passes can now assert their expectations, i.e. if a pass is mean to run after certain lowerings or not. The actual phases are a initial stab and as we re-organized the passes, we may remove/add phases. This commit just add some phase steps, later commits will make use of them. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Sviatoslav Peleshko	57344052b6	intel/brw: Don't apply discard_if condition opt if it can change results We can't just always negate the alu instruction's cmod, because negating it can produce different results when the argument is NaN float. We can still do that if the condition is == or !=. Fixes: `0ba9497e` ("intel/fs: Improve discard_if code generation") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11800 Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31042>	2024-09-27 11:52:27 +00:00
Lionel Landwerlin	eeb5f6e8c8	brw: make sampler message emission more generic We can generalize the simd8-16bits case by just rounding to a physical register. We also take the opportunity to limit the register allocation to a single physical GRF for the residency data. Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com> Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31307>	2024-09-25 10:22:40 +00:00
Lionel Landwerlin	45377dc5c4	brw: fix vecN rebuilds When loading a 64bit address from the push constants, we'll load a vec2, so we need to allocate 2 GRFs and MOV each component. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11831 Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>	2024-09-17 14:22:23 +00:00
Lionel Landwerlin	c16b27f66f	brw: use a builder of the size of the physical register for uniforms Should avoid any partial write non-sense on Xe2+. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>	2024-09-17 14:22:23 +00:00
Kenneth Graunke	7090578c35	intel/brw: Switch load_ubo_uniform_block_intel over to memory intrinsics While there are many cases that turn into the *_PULL_CONSTANT_LOAD ops or push constants, this one piece was emitting surface block loads. Switch it over to use the new intrinsics to delete a bunch of code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00
Kenneth Graunke	b55f77161d	intel/brw: Switch to emitting MEMORY__LOGICAL opcodes We introduce a new fs_nir_emit_memory_access() helper that can handle image, bindless image, SSBO, shared, global, and scratch memory, and handles loads, stores, atomics, and block loads. It translates each of these NIR intrinsics into the new MEMORY__LOGICAL intrinsics. As a result, we delete a lot of similar surface access emitter code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>	2024-09-12 20:54:36 +00:00

1 2 3 4 5 ...

750 commits