fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 00:20:09 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	a0b1e07976	brw: Make get_nir_src_imm() usable for non-32-bit-sizes. We return an immediate for 32-bit constant values, but fall back to calling get_nir_src() for other values, as 64-bit, and even 8-bit immediates have odd restrictions. We could probably support 16-bit here without too many issues, but we leave it be for now. This makes it usable for case where we'd like to get constants for 32-bit values but where it may be a different bit-size too. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	03f948f5fd	brw: Skip fetching unread leading components of UBO loads We were already skipping unread trailing components, but now we skip them on both ends. About -3.5% spills on Shadow of the Tomb Raider on Alchemist (mostly a wash elsewhere, but it will help additional shaders with later patches). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Kenneth Graunke	c8b2ab041e	brw: Add more safeguards against misaligned OWord Block messages HDC doesn't support block loads/stores with sub-DWord (<4B) aligned offsets, and shared local memory has to use the Aligned OWord Block messages which require OWord (16B) alignment. Make the validator detect this case and say no. Also make the lowering code assert that the alignment is valid as a second line of defense. LSC has no such restrictions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>	2025-01-10 22:44:09 +00:00
Caio Oliveira	7fadd864dd	intel/elk: Fix typo in assertion Just assert that the array will fit whatever the MAX is for a given Gfx version. Fixes: `172c1ab984` ("intel/elk: Add ELK_MAX_MRF_ALL for static allocating arrays") Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32978>	2025-01-10 20:16:59 +00:00
Caio Oliveira	c9e667b7ad	intel/elk: Remove uses of VLAs Was causing trouble in some build configurations, we don't really need them. Unless there's a good reason, defaults to use ralloc for consistency with the larger codebase. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Caio Oliveira	172c1ab984	intel/elk: Add ELK_MAX_MRF_ALL for static allocating arrays Replace usage of variable length arrays. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Caio Oliveira	4d43ee0dd6	intel/brw: Remove uses of VLAs Was causing trouble in some build configurations, we don't really need them. Use ralloc for consistency. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Caio Oliveira	faf4c35b74	intel/compiler: Use linear allocator for ACP trees in copy-prop Replace usage of variable length array. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Caio Oliveira	e6a3770433	intel/compiler: Use INFINITY spill cost to represent no_spill Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Vinson Lee	83809f06a7	intel/elk: Fix assert with side effect Fix defect reported by Coverity Scan. Side effect in assertion (ASSERT_SIDE_EFFECT) assert_side_effect: Argument ++eot_count of assert() has a side effect. The containing function might work differently in a non-debug build. Fixes: `ebd6738260` ("intel/elk/chv: Implement WaClearArfDependenciesBeforeEot") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32884>	2025-01-09 04:07:42 +00:00
Kenneth Graunke	35f175301d	brw: Fix vectorizer hole_size condition after signedness change Marek recently changed hole_size to be signed, rather than unsigned. A negative hole_size means that the two loads overlap - and thus are prime candidates to be combined. My original hole_size handling was: if hole_size > 4 * (8 - low->num_components) then don't vectorize For non-overlapping loads, this worked: NIR's largest vector is vec16, and if low was already a vec16, combining it with anything would exceed that, so it'd never be considered. That meant low would always be a vec8 or less, so (8 - low->num_components) was a positive number. Now that we see overlapping loads, we can see a vec16 low, vec4 high, and also a negative hole size, giving us fun comparisons like: -16 > 4 * (8 - 16) => -16 > -32 => true, don't vectorize Which is absolutely the wrong thing to do, because the high load's data is entirely included within the former load's data. The idea here was to make sure the second load would be able to pack at least one component into the first's V8 result. But even this isn't the best, because...even if it's simply adjacent, doing one V16 load is more efficient than requesting two back to back V8 loads. So, we just simplify down to a static check: if there's an entire V8 of hole, don't vectorize. This already won't happen because the core pass has max_hole set to 28 bytes (7 32-bit components), but that could change based on the needs of other drivers, so let's be defensive. fossil-db results on Alchemist: Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05% Subgroup size: 8092544 -> 8092568 (+0.00%) Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05% Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35% Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29% Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80% Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05% Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01% Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26% Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03% Fixes: `c21bc65ba7` ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32932>	2025-01-08 00:19:54 +00:00
Caio Oliveira	868016d92c	intel/brw/xe2+: Do not use $.dst or $.src SWSB annotations in SENDs When a SEND instruction is a EOT, the scoreboard lowering will not allocate a new SBID for it, since nothing needs to wait for it. In Gfx12 this allowed the SEND to get out-of-order $.dst or $.src dependencies. Starting on Xe2+ this is not supported anymore, in favor of supporting more combined modes. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32712>	2025-01-07 22:23:59 +00:00
Tapani Pälli	1cc17e9ce9	intel/compiler: take reg_unit size into account with ubo ranges Fixes: `1ab4fe2dd6` ("brw: Don't shrink UBO push ranges in the backend") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12423 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32925>	2025-01-07 21:38:06 +00:00
Kenneth Graunke	4ab04799ee	brw: Delete assign_constant_locations and push_constant_loc[] The push_constant_loc[] array is always an identity mapping these days, so it's kind of pointless. Just use the original uniform number and skip the unnecessary "remap" step. With that gone, and shrinking UBO ranges gone, assign_constant_locations() is now empty and can be removed as well. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Kenneth Graunke	93e186e1a4	brw: Delete pull constant lowering Now that we never shrink ranges in the backend, we never lower push constants to pull constants late in the backend either. get_pull_loc will never return true, and so all of brw_lower_constant_loads becomes a noop. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Kenneth Graunke	1ab4fe2dd6	brw: Don't shrink UBO push ranges in the backend Back in the bad old days (vec4?) we had a bunch of smarts in the backend to dead code eliminate unused vector components and re-pack regular uniforms, so we really couldn't decide how much data we were pushing until very late in the backend. Nowadays we have none of that - we do all of our elimination and packing in NIR. anv shrinks ranges to deal with Vulkan API push constants, and iris treats everything as a UBO and as of the previous commit will also shrink appropriately. So we don't need to do this anymore...which will let us simplify quite a bit of code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Kenneth Graunke	583ad35455	brw: Limit maximum push UBO ranges to 64 registers in the NIR pass. anv already does this limiting, since it needs to handle non-UBO push constants as well. iris treats everything as a UBO, but doesn't have a limiter and was relying on the backend to handle it. Do this in the NIR pass so that we can eliminate the backend code. It's not necessary for anv, but handling it here is simple and less error prone for iris, which calls this in a number of places. We know we need to limit things to this much; anv can limit more if needed. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>	2025-01-06 12:45:47 +00:00
Caio Oliveira	6968794c50	intel/brw: Add missing bits in 3-src SWSB encoding for Xe2+ Fix invalid SWSB annotation in dEQP-VK.glsl.builtin.precision.mix.mediump.vec4 for LNL. Fixes: `4a24f49b57` ("intel/compiler/xe2: Implement codegen of three-source instructions.") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32846>	2025-01-03 21:19:26 +00:00
Caio Oliveira	e1aebf8a0c	intel/brw: Remove 'fs' prefix from passes and related functions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>	2025-01-02 18:11:05 +00:00
Caio Oliveira	25384dccc0	intel/brw: Remove 'fs' prefix from passes filenames Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>	2025-01-02 18:11:05 +00:00
Marek Olšák	c21bc65ba7	nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads A negative hole size means the loads overlap. This will be used by drivers to handle overlapping loads in the callback easily. Reviewed-by: Mel Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>	2025-01-01 00:03:55 +00:00
Caio Oliveira	056b14b882	intel/brw: Move two NIR passes to brw_nir.c Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32799>	2024-12-30 20:18:23 +00:00
Caio Oliveira	1154b07d09	intel/brw: Add missing call to invalidate analysis Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32798>	2024-12-30 19:01:40 +00:00
Caio Oliveira	3ca6fa7487	intel/brw: Gather brw_reg related implementations in brw_reg.cpp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32800>	2024-12-30 18:26:59 +00:00
Caio Oliveira	5860e07f92	intel/brw: Rename brw_compact_inst_* helpers to brw_eu_compact_inst_* Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>	2024-12-30 17:16:15 +00:00
Caio Oliveira	228aba779f	intel/brw: Rename brw_inst_* helpers to brw_eu_inst_* Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>	2024-12-30 17:16:15 +00:00
Caio Oliveira	3031b22a8a	intel/brw: Rename brw_inst_bits/set_bits to brw_eu_inst_bits/set_bits Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>	2024-12-30 17:16:15 +00:00
Caio Oliveira	06ccaad5f1	intel/brw: Rename brw_compact_inst to brw_eu_compact_inst Consistent with brw_eu_inst. Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>	2024-12-30 17:16:15 +00:00
Caio Oliveira	3c3f4a1235	intel/brw: Rename brw_inst to brw_eu_inst Free the old name for the BRW IR instruction. Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>	2024-12-30 17:16:15 +00:00
Caio Oliveira	9caa845e0f	intel/brw: Rename brw_inst.h to brw_eu_inst.h Acked-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>	2024-12-30 17:16:15 +00:00
Ian Romanick	0f3a350087	brw/nir: Don't generate scalar byte to float conversions on DG2+ in optimize_extract_to_float The lowering code does not generate efficient code. It is better to just not emit the bad thing in the first place. The shaders that I examined had blocks of NIR like: con 32 %527 = extract_u8 %456.o, %5 (0x0) con 32 %528 = extract_u8 %456.o, %35 (0x1) con 32 %529 = extract_u8 %456.o, %14 (0x2) con 32 %530 = extract_u8 %456.o, %11 (0x3) con 32 %531 = u2f32 %527 con 32 %532 = u2f32 %528 con 32 %533 = u2f32 %529 con 32 %534 = u2f32 %530 In some cases the u2f results are multiplied with 1/255. There may be a slightly more efficient way to do this by doing something like mov(8) g40<1>UW g12.1<32,8,4>UB mov(8) g41<1>UW g12.2<32,8,4>UB mov(8) g42<1>UW g12.3<32,8,4>UB mov(8) g60<1>F g12<32,8,4>UB mov(8) g61<1>F g40<1,1,0>UW mov(8) g62<1>F g41<1,1,0>UW mov(8) g63<1>F g42<1,1,0>UW In SIMD16 and SIMD32 that would save temporary register space. It could save a register in SIMD8 by using g40.8 instead of g42. Making that happen might be tricky. Maybe we should just add a special NIR opcode that converts a packed uint32 to a vec4? v2: Add a bunch of documentation explaining what's going on. Suggested by Ken. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 18228689 -> 18228720 (<.01%) instructions in affected programs: 43091 -> 43122 (0.07%) helped: 0 / HURT: 30 total cycles in shared programs: 932542994 -> 932544290 (<.01%) cycles in affected programs: 8150758 -> 8152054 (0.02%) helped: 15 / HURT: 17 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 142890605 -> 142890392 (-0.00%); split: -0.00%, +0.00% Cycle count: 21655049536 -> 21654693720 (-0.00%); split: -0.00%, +0.00% Totals from 181 (0.03% of 553251) affected shaders: Instrs: 188022 -> 187809 (-0.11%); split: -0.12%, +0.01% Cycle count: 85291658 -> 84935842 (-0.42%); split: -0.47%, +0.05% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 154438050 -> 154436980 (-0.00%) Cycle count: 15334650326 -> 15334644375 (-0.00%); split: -0.00%, +0.00% Spill count: 56754 -> 56706 (-0.08%) Fill count: 95919 -> 95808 (-0.12%) Scratch Memory Size: 2306048 -> 2304000 (-0.09%) Max live registers: 32469924 -> 32469899 (-0.00%) Totals from 112 (0.02% of 642922) affected shaders: Instrs: 156186 -> 155116 (-0.69%) Cycle count: 11111478 -> 11105527 (-0.05%); split: -0.62%, +0.56% Spill count: 1766 -> 1718 (-2.72%) Fill count: 2815 -> 2704 (-3.94%) Scratch Memory Size: 78848 -> 76800 (-2.60%) Max live registers: 11526 -> 11501 (-0.22%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	1a7593ed36	brw/nir: Treat some ballot as convergent v2: Fix for Xe2. v3: Add a comment explaining the use of bld instead of xbld. Suggested by Ken. Fix a bug in handing is_scalar source. Noticed by me while applying Ken's review feedback. shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18228657 -> 18228689 (<.01%) instructions in affected programs: 9333 -> 9365 (0.34%) helped: 2 / HURT: 26 total cycles in shared programs: 932511560 -> 932542994 (<.01%) cycles in affected programs: 2263040 -> 2294474 (1.39%) helped: 7 / HURT: 27 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20700370 -> 20700392 (<.01%) instructions in affected programs: 18579 -> 18601 (0.12%) helped: 1 / HURT: 28 total cycles in shared programs: 888385851 -> 888386325 (<.01%) cycles in affected programs: 2571368 -> 2571842 (0.02%) helped: 14 / HURT: 6 total spills in shared programs: 4373 -> 4371 (-0.05%) spills in affected programs: 71 -> 69 (-2.82%) helped: 1 / HURT: 0 total fills in shared programs: 4657 -> 4653 (-0.09%) fills in affected programs: 196 -> 192 (-2.04%) helped: 1 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 142887258 -> 142890605 (+0.00%); split: -0.00%, +0.00% Cycle count: 21653599282 -> 21655049536 (+0.01%); split: -0.00%, +0.01% Max live registers: 47942973 -> 47942837 (-0.00%) Totals from 22209 (4.01% of 553251) affected shaders: Instrs: 4337679 -> 4341026 (+0.08%); split: -0.00%, +0.08% Cycle count: 261852040 -> 263302294 (+0.55%); split: -0.38%, +0.93% Max live registers: 1299670 -> 1299534 (-0.01%) Meteor Lake, DG2, Tiger Lake, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 156599915 -> 156590882 (-0.01%); split: -0.01%, +0.00% Cycle count: 16940072009 -> 16940902317 (+0.00%); split: -0.01%, +0.01% Max live registers: 32610801 -> 32610488 (-0.00%) Max dispatch width: 5730736 -> 5731744 (+0.02%); split: +0.12%, -0.11% Totals from 35528 (5.52% of 643617) affected shaders: Instrs: 6175409 -> 6166376 (-0.15%); split: -0.21%, +0.06% Cycle count: 230679923 -> 231510231 (+0.36%); split: -0.46%, +0.82% Max live registers: 1354716 -> 1354403 (-0.02%) Max dispatch width: 167648 -> 168656 (+0.60%); split: +4.26%, -3.66% Ice Lake Totals: Instrs: 155330276 -> 155318037 (-0.01%); split: -0.01%, +0.00% Cycle count: 15019092327 -> 15019637026 (+0.00%); split: -0.00%, +0.01% Max live registers: 32640341 -> 32637305 (-0.01%) Max dispatch width: 5780720 -> 5780688 (-0.00%); split: +0.02%, -0.02% Totals from 37773 (5.85% of 645641) affected shaders: Instrs: 6643030 -> `6630791` (-0.18%); split: -0.24%, +0.05% Cycle count: 223589025 -> 224133724 (+0.24%); split: -0.29%, +0.53% Max live registers: 1491781 -> 1488745 (-0.20%) Max dispatch width: 167600 -> 167568 (-0.02%); split: +0.75%, -0.77% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	f2d2014636	brw/nir: Simplify get_nir_image_intrinsic_image and get_nir_buffer_intrinsic_index shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 20041625 -> 20041634 (<.01%) instructions in affected programs: 1206 -> 1215 (0.75%) helped: 0 / HURT: 5 total cycles in shared programs: 929993812 -> 929993816 (<.01%) cycles in affected programs: 10930 -> 10934 (0.04%) helped: 1 / HURT: 2 fossil-db: Lunar Lake Totals: Instrs: 142892951 -> 142893049 (+0.00%) Send messages: 6591165 -> 6591186 (+0.00%) Cycle count: 21653727624 -> 21653732470 (+0.00%); split: -0.00%, +0.00% Scratch Memory Size: 5664768 -> 5660672 (-0.07%) Max live registers: 47944999 -> 47944983 (-0.00%) Totals from 19 (0.00% of 553292) affected shaders: Instrs: 10671 -> 10769 (+0.92%) Send messages: 697 -> 718 (+3.01%) Cycle count: 234508 -> 239354 (+2.07%); split: -0.01%, +2.08% Scratch Memory Size: 38912 -> 34816 (-10.53%) Max live registers: 2203 -> 2187 (-0.73%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 156744203 -> 156743428 (-0.00%); split: -0.00%, +0.00% Send messages: 7654787 -> 7654808 (+0.00%) Cycle count: 16942341318 -> 16942329195 (-0.00%); split: -0.00%, +0.00% Spill count: 75549 -> 75499 (-0.07%) Fill count: 140094 -> 140012 (-0.06%) Scratch Memory Size: 3945472 -> 3944448 (-0.03%) Max live registers: 32642020 -> 32642009 (-0.00%) Totals from 19 (0.00% of 644000) affected shaders: Instrs: 12489 -> 11714 (-6.21%); split: -7.00%, +0.79% Send messages: 697 -> 718 (+3.01%) Cycle count: 203873 -> 191750 (-5.95%); split: -6.77%, +0.82% Spill count: 50 -> 0 (-inf%) Fill count: 82 -> 0 (-inf%) Scratch Memory Size: 25600 -> 24576 (-4.00%) Max live registers: 1150 -> 1139 (-0.96%) No fossil-db changes on Tiger Lake, Ice Lake, or Skylake. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	9a967c5ec4	brw/nir: Don't try optimize around emit_uniformize Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	63e395fa87	brw/nir: Eliminate nir_to_brw_state::uniform_values No shader-db changes on any Intel platform. No fossil-db changes on Tiger Lake, Ice Lake, or Skylake. fossil-db: Lunar Lake Totals: Cycle count: 21653230858 -> 21653230518 (-0.00%); split: -0.00%, +0.00% Max live registers: 47941741 -> 47941737 (-0.00%) Totals from 17 (0.00% of 553202) affected shaders: Cycle count: 201232 -> 200892 (-0.17%); split: -0.19%, +0.02% Max live registers: 1354 -> 1350 (-0.30%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 156455123 -> 156453396 (-0.00%); split: -0.00%, +0.00% Cycle count: 16904545026 -> 16904393943 (-0.00%); split: -0.00%, +0.00% Max live registers: 32638039 -> 32638035 (-0.00%) Totals from 1201 (0.19% of 643905) affected shaders: Instrs: 509360 -> 507633 (-0.34%); split: -0.34%, +0.00% Cycle count: 1579931758 -> 1579780675 (-0.01%); split: -0.01%, +0.00% Max live registers: 59633 -> 59629 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	a13244e57b	brw/nir: Treat some resource_intel as convergent No shader-db changes on any Intel platform. No fossil-db changes on Ice Lake or Skylake. fossil-db: Lunar Lake Totals: Cycle count: 21653232202 -> 21653230858 (-0.00%); split: -0.00%, +0.00% Totals from 4 (0.00% of 553202) affected shaders: Cycle count: 14276568 -> 14275224 (-0.01%); split: -0.01%, +0.00% Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 156453398 -> 156455123 (+0.00%); split: -0.00%, +0.00% Cycle count: 16904394153 -> 16904545026 (+0.00%); split: -0.00%, +0.00% Totals from 1189 (0.18% of 643905) affected shaders: Instrs: 502891 -> 504616 (+0.34%); split: -0.00%, +0.34% Cycle count: 1579688485 -> 1579839358 (+0.01%); split: -0.00%, +0.01% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	1b24612c57	brw/nir: Treat load_*_uniform_block_intel as convergent Between 5 and 10 shaders (depending on the platform) from Blender are massively helped for spills and fills (e.g., from 45 spills to 0, and 180 fills to 0). Previously this commit cause a lot of spill and fill damage to Wolfenstein Youngblood and Red Dead Redemption 2. I believe due to !32041 and !32097, this is no longer the case. RDR2 is helped, and Wolfenstein Youngblood has no changes. However, q2rtx/q2rtx-rt-pipeline is hurt: Spill count: 126 -> 175 (+38.89%); split: -0.79%, +39.68% Fill count: 156 -> 235 (+50.64%); split: -1.92%, +52.56% By the end of this series this damage is fixed, and q2rtx is helped overall by -0.79% spills and -1.92% fills. v2: Fix for Xe2. v3: Just keep using bld for the group(1, 0) call. Suggested by Ken. v4: Major re-write. Pass bld and xbld to fs_emit_memory_access. The big fix is changing the way srcs[MEMORY_LOGICAL_ADDRESS] is calculated (around line 7180). In previous versions of the commit, the address would be calculated using bld (which is now xbld) even if the address source was not is_scalar. This could cause the emit_uniformize (later in the function) to fetch garbage. This also drops the special case handling of constant offset. Constant propagation and algebraic will handle this. v5: Fix a subtle bug that was ultimately caused by the removal of offset_to_component. The MEMORY_LOGICAL_ADDRESS for load_shared_uniform_block_intel was being calculated as SIMD16 on LNL, but the later emit_uniformize would treat it as SIMD32. This caused GPU hangs in Assassin's Creed Valhalla. v6: Fix a bug in D16 to D16U32 expansion. Noticed by Ken. Add a comment explaining bld vs xbld vs ubld in fs_nir_emit_memory_access. Suggested by Ken. v7: Revert some of the v6 changes related to D16 to D16U32 expansion. This code was mostly correct. xbld is correct because DATA0 needs to be generated in size of the eventual SEND instruction. Using offset(nir_src, xbld, c) will cause offset() to correctly added component(..., 0) if nir_src.is_scalar but xbld is not scalar_group(). v8: nir_intrinsic_load_shared_uniform_block_intel was removed. This caused reproducible hangs in Assassin's Creed: Valhalla. There are some other compiler issues related to this game, and we're not yet sure exactly what the cause of any of it is. shader-db: Lunar Lake total instructions in shared programs: 18058270 -> 18068886 (0.06%) instructions in affected programs: 5196846 -> 5207462 (0.20%) helped: 4442 / HURT: 11416 total cycles in shared programs: 921324492 -> 919819398 (-0.16%) cycles in affected programs: 733274162 -> 731769068 (-0.21%) helped: 11312 / HURT: 31788 total spills in shared programs: 3633 -> 3585 (-1.32%) spills in affected programs: 48 -> 0 helped: 5 / HURT: 0 total fills in shared programs: 2277 -> 2198 (-3.47%) fills in affected programs: 79 -> 0 helped: 5 / HURT: 0 LOST: 123 GAINED: 377 Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19703458 -> 19699173 (-0.02%) instructions in affected programs: `5885251` -> 5880966 (-0.07%) helped: 4545 / HURT: 14971 total cycles in shared programs: 903497253 -> 902054570 (-0.16%) cycles in affected programs: 691762248 -> 690319565 (-0.21%) helped: 16412 / HURT: 28080 total spills in shared programs: 4894 -> 4646 (-5.07%) spills in affected programs: 248 -> 0 helped: 7 / HURT: 0 total fills in shared programs: 6638 -> 5581 (-15.92%) fills in affected programs: 1057 -> 0 helped: 7 / HURT: 0 LOST: 427 GAINED: 978 Ice Lake and Skylake had similar results. (Ice Lake shonw) total instructions in shared programs: 20384200 -> 20384889 (<.01%) instructions in affected programs: 5295084 -> 5295773 (0.01%) helped: 5309 / HURT: 12564 total cycles in shared programs: 873002832 -> 872515246 (-0.06%) cycles in affected programs: 463413458 -> 462925872 (-0.11%) helped: 16079 / HURT: 13339 total spills in shared programs: 4552 -> 4373 (-3.93%) spills in affected programs: 546 -> 367 (-32.78%) helped: 11 / HURT: 0 total fills in shared programs: 5298 -> 4657 (-12.10%) fills in affected programs: 1798 -> 1157 (-35.65%) helped: 10 / HURT: 0 LOST: 380 GAINED: 925 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 141528822 -> 141728392 (+0.14%); split: -0.21%, +0.35% Subgroup size: 10968048 -> 10968144 (+0.00%) Send messages: 6567930 -> 6567909 (-0.00%) Cycle count: 22165780202 -> 21624534624 (-2.44%); split: -3.09%, +0.65% Spill count: 69890 -> 66665 (-4.61%); split: -5.06%, +0.44% Fill count: 128331 -> 120189 (-6.34%); split: -7.44%, +1.09% Scratch Memory Size: 5829632 -> 5664768 (-2.83%); split: -2.86%, +0.04% Max live registers: 47928290 -> 47611371 (-0.66%); split: -0.71%, +0.05% Totals from 364369 (66.18% of 550563) affected shaders: Instrs: 113448842 -> 113648412 (+0.18%); split: -0.26%, +0.44% Subgroup size: 7694080 -> 7694176 (+0.00%) Send messages: 5308287 -> 5308266 (-0.00%) Cycle count: 21885237842 -> 21343992264 (-2.47%); split: -3.13%, +0.65% Spill count: 65152 -> 61927 (-4.95%); split: -5.42%, +0.47% Fill count: 122811 -> 114669 (-6.63%); split: -7.77%, +1.14% Scratch Memory Size: 5438464 -> 5273600 (-3.03%); split: -3.07%, +0.04% Max live registers: 34355310 -> 34038391 (-0.92%); split: -1.00%, +0.07% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	db2b1e4d76	brw/nir: Treat load_btd_{global,local}_arg_addr_intel and load_btd_shader_type_intel as convergent No shader-db changes on any Intel platform. No fossil-db changes on Tiger Lake, Ice Lake, or Skylake. fossil-db: Lunar Lake Totals: Instrs: 141808714 -> 141808513 (-0.00%); split: -0.00%, +0.00% Cycle count: 22177889310 -> 22181410192 (+0.02%); split: -0.00%, +0.02% Spill count: 69892 -> 69890 (-0.00%); split: -0.01%, +0.01% Fill count: 128313 -> 128331 (+0.01%) Max live registers: 48052083 -> 48052742 (+0.00%); split: -0.00%, +0.00% Totals from 549 (0.10% of 551446) affected shaders: Instrs: 911251 -> 911050 (-0.02%); split: -0.10%, +0.07% Cycle count: 1244153266 -> 1247674148 (+0.28%); split: -0.04%, +0.32% Spill count: 15849 -> 15847 (-0.01%); split: -0.04%, +0.03% Fill count: 35087 -> 35105 (+0.05%) Max live registers: 68047 -> 68706 (+0.97%); split: -0.25%, +1.22% Meteor Lake Totals: Instrs: 152744298 -> 152741241 (-0.00%); split: -0.00%, +0.00% Cycle count: 17410258529 -> 17405949054 (-0.02%); split: -0.04%, +0.01% Spill count: 78528 -> 78598 (+0.09%); split: -0.01%, +0.09% Fill count: 147893 -> 147978 (+0.06%); split: -0.00%, +0.06% Scratch Memory Size: 3962880 -> 3969024 (+0.16%) Max live registers: 31887206 -> 31887413 (+0.00%); split: -0.00%, +0.00% Totals from 552 (0.09% of 633315) affected shaders: Instrs: 907279 -> 904222 (-0.34%); split: -0.48%, +0.15% Cycle count: 1152358569 -> 1148049094 (-0.37%); split: -0.56%, +0.19% Spill count: 15290 -> 15360 (+0.46%); split: -0.03%, +0.48% Fill count: 35313 -> 35398 (+0.24%); split: -0.02%, +0.26% Scratch Memory Size: 1313792 -> 1319936 (+0.47%) Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08% DG2 Totals: Instrs: 152766492 -> 152763061 (-0.00%); split: -0.00%, +0.00% Cycle count: 17406058608 -> 17406396943 (+0.00%); split: -0.02%, +0.02% Spill count: 78626 -> 78624 (-0.00%); split: -0.01%, +0.01% Fill count: 147956 -> 148007 (+0.03%); split: -0.01%, +0.04% Scratch Memory Size: 3962880 -> 3969024 (+0.16%) Max live registers: 31887158 -> 31887365 (+0.00%); split: -0.00%, +0.00% Totals from 552 (0.09% of 633315) affected shaders: Instrs: 908513 -> 905082 (-0.38%); split: -0.47%, +0.09% Cycle count: 1148162185 -> 1148500520 (+0.03%); split: -0.23%, +0.26% Spill count: 15364 -> 15362 (-0.01%); split: -0.07%, +0.06% Fill count: 35343 -> 35394 (+0.14%); split: -0.03%, +0.17% Scratch Memory Size: 1313792 -> 1319936 (+0.47%) Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	f3593df877	brw/nir: Treat load_reloc_const_intel as convergent shader-db: Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) Lunar Lake total instructions in shared programs: 18096549 -> 18096537 (<.01%) instructions in affected programs: 26128 -> 26116 (-0.05%) helped: 7 / HURT: 2 total cycles in shared programs: 922073090 -> 922093922 (<.01%) cycles in affected programs: 10574198 -> 10595030 (0.20%) helped: 19 / HURT: 76 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20503943 -> 20504053 (<.01%) instructions in affected programs: 23378 -> 23488 (0.47%) helped: 6 / HURT: 5 total cycles in shared programs: 875477036 -> 875480112 (<.01%) cycles in affected programs: 13840528 -> 13843604 (0.02%) helped: 22 / HURT: 55 total spills in shared programs: 4546 -> 4552 (0.13%) spills in affected programs: 8 -> 14 (75.00%) helped: 0 / HURT: 1 total fills in shared programs: 5280 -> 5298 (0.34%) fills in affected programs: 24 -> 42 (75.00%) helped: 0 / HURT: 1 One compute shader in Tomb Raider was hurt for spills and fills. fossil-db: Lunar Lake Totals: Instrs: 141808815 -> 141808714 (-0.00%); split: -0.00%, +0.00% Cycle count: 22185066952 -> 22177889310 (-0.03%); split: -0.05%, +0.02% Spill count: 69859 -> 69892 (+0.05%); split: -0.03%, +0.07% Fill count: 128344 -> 128313 (-0.02%); split: -0.04%, +0.01% Scratch Memory Size: 5833728 -> 5829632 (-0.07%) Totals from 13384 (2.43% of 551446) affected shaders: Instrs: 13852162 -> 13852061 (-0.00%); split: -0.00%, +0.00% Cycle count: 7691993336 -> 7684815694 (-0.09%); split: -0.15%, +0.06% Spill count: 53266 -> 53299 (+0.06%); split: -0.03%, +0.10% Fill count: 96492 -> 96461 (-0.03%); split: -0.05%, +0.02% Scratch Memory Size: 3827712 -> 3823616 (-0.11%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152744735 -> 152744298 (-0.00%); split: -0.00%, +0.00% Cycle count: 17400199290 -> 17410258529 (+0.06%); split: -0.01%, +0.07% Max live registers: 31887208 -> 31887206 (-0.00%) Totals from 12435 (1.96% of 633315) affected shaders: Instrs: 13445310 -> 13444873 (-0.00%); split: -0.00%, +0.00% Cycle count: 6941685096 -> 6951744335 (+0.14%); split: -0.03%, +0.18% Max live registers: 1071302 -> 1071300 (-0.00%) Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150644063 -> 150643944 (-0.00%); split: -0.00%, +0.00% Cycle count: 15618718733 -> 15622092285 (+0.02%); split: -0.01%, +0.03% Spill count: 58816 -> 58790 (-0.04%) Fill count: 101054 -> 101065 (+0.01%) Max live registers: 31792771 -> 31792766 (-0.00%); split: -0.00%, +0.00% Totals from 13383 (2.12% of 632544) affected shaders: Instrs: 12016285 -> 12016166 (-0.00%); split: -0.00%, +0.00% Cycle count: 5239956851 -> 5243330403 (+0.06%); split: -0.02%, +0.08% Spill count: 28977 -> 28951 (-0.09%) Fill count: 47568 -> 47579 (+0.02%) Max live registers: 1001554 -> 1001549 (-0.00%); split: -0.00%, +0.00% Skylake Totals: Instrs: 140943195 -> 140943154 (-0.00%); split: -0.00%, +0.00% Cycle count: 14818940190 -> 14816706154 (-0.02%); split: -0.02%, +0.00% Max live registers: 31663173 -> 31663168 (-0.00%); split: -0.00%, +0.00% Totals from 12625 (2.01% of 629351) affected shaders: Instrs: 11598223 -> 11598182 (-0.00%); split: -0.00%, +0.00% Cycle count: 4519027823 -> 4516793787 (-0.05%); split: -0.05%, +0.00% Max live registers: 970275 -> 970270 (-0.00%); split: -0.00%, +0.00% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	fb9b363376	brw/nir: Treat load_inline_data_intel as convergent No shader-db changes on any Intel platform. fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 141808595 -> 141808815 (+0.00%); split: -0.00%, +0.00% Cycle count: 22181300418 -> 22185066952 (+0.02%); split: -0.01%, +0.03% Max live registers: 48052077 -> 48052083 (+0.00%) Totals from 720 (0.13% of 551446) affected shaders: Instrs: 116778 -> 116998 (+0.19%); split: -0.01%, +0.20% Cycle count: 1197931082 -> 1201697616 (+0.31%); split: -0.21%, +0.53% Max live registers: 56552 -> 56558 (+0.01%) No fossil-db changes on any other Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	3e63920ca5	brw/nir: Treat some load_ubo as convergent v2: Fix for Xe2. No changes in shader-db or fossil-db on Lunar Lake, Meteor Lake, or DG2. shader-db: Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) total instructions in shared programs: 19626547 -> 19634353 (0.04%) instructions in affected programs: 1591181 -> 1598987 (0.49%) helped: 925 / HURT: 3595 total cycles in shared programs: 865236718 -> 866682659 (0.17%) cycles in affected programs: 151284264 -> 152730205 (0.96%) helped: 3430 / HURT: 5510 total sends in shared programs: 1032237 -> `1032233` (<.01%) sends in affected programs: 20 -> 16 (-20.00%) helped: 4 / HURT: 0 LOST: 48 GAINED: 141 fossil-db: Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 150662952 -> 150641175 (-0.01%); split: -0.03%, +0.02% Subgroup size: 7768880 -> 7768888 (+0.00%) Send messages: 7502265 -> 7502044 (-0.00%) Cycle count: 15621785298 -> 15618640525 (-0.02%); split: -0.06%, +0.04% Spill count: 58818 -> 58816 (-0.00%) Fill count: 101063 -> 101054 (-0.01%) Max live registers: 31795403 -> 31792179 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 5572160 -> 5571488 (-0.01%); split: +0.00%, -0.01% Totals from 10278 (1.62% of 632539) affected shaders: Instrs: 5276493 -> 5254716 (-0.41%); split: -0.89%, +0.48% Subgroup size: 156432 -> 156440 (+0.01%) Send messages: 279259 -> 279038 (-0.08%) Cycle count: 6483576378 -> 6480431605 (-0.05%); split: -0.16%, +0.11% Spill count: 27133 -> 27131 (-0.01%) Fill count: 49384 -> 49375 (-0.02%) Max live registers: 675781 -> 672557 (-0.48%); split: -0.49%, +0.01% Max dispatch width: 97256 -> 96584 (-0.69%); split: +0.08%, -0.77% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	c48570d2b2	brw/nir: Treat some ALU results as convergent v2: Fix for Xe2. v3: Fix handling of 64-bit CMP results. v4: Scalarize 16-bit comparison temporary destination when used as a source (as was already done for 64-bit). Suggested by Ken. shader-db: Lunar Lake total instructions in shared programs: 18096500 -> 18096549 (<.01%) instructions in affected programs: 15919 -> 15968 (0.31%) helped: 8 / HURT: 21 total cycles in shared programs: 921841300 -> 922073090 (0.03%) cycles in affected programs: 115946336 -> 116178126 (0.20%) helped: 386 / HURT: 135 Meteor Lake and DG2 (Meteor Lake shown) total instructions in shared programs: 19836053 -> 19836016 (<.01%) instructions in affected programs: 19547 -> 19510 (-0.19%) helped: 21 / HURT: 18 total cycles in shared programs: 906713777 -> 906588541 (-0.01%) cycles in affected programs: 96914584 -> 96789348 (-0.13%) helped: 335 / HURT: 134 total fills in shared programs: 6712 -> 6710 (-0.03%) fills in affected programs: 52 -> 50 (-3.85%) helped: 1 / HURT: 0 LOST: 1 GAINED: 1 Tiger Lake total instructions in shared programs: 19641284 -> 19641278 (<.01%) instructions in affected programs: 12358 -> 12352 (-0.05%) helped: 10 / HURT: 19 total cycles in shared programs: 865413131 -> 865460513 (<.01%) cycles in affected programs: 74641489 -> 74688871 (0.06%) helped: 388 / HURT: 100 total spills in shared programs: 3899 -> 3898 (-0.03%) spills in affected programs: 17 -> 16 (-5.88%) helped: 1 / HURT: 0 total fills in shared programs: 3249 -> 3245 (-0.12%) fills in affected programs: 51 -> 47 (-7.84%) helped: 1 / HURT: 0 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20495826 -> 20496111 (<.01%) instructions in affected programs: 53220 -> 53505 (0.54%) helped: 28 / HURT: 16 total cycles in shared programs: 875173550 -> 875243910 (<.01%) cycles in affected programs: 51700652 -> 51771012 (0.14%) helped: 400 / HURT: 39 total spills in shared programs: 4546 -> 4546 (0.00%) spills in affected programs: 288 -> 288 (0.00%) helped: 1 / HURT: 2 total fills in shared programs: 5224 -> 5280 (1.07%) fills in affected programs: 795 -> 851 (7.04%) helped: 0 / HURT: 4 LOST: 1 GAINED: 1 fossil-db: Lunar Lake Totals: Instrs: 141811551 -> 141807640 (-0.00%); split: -0.00%, +0.00% Cycle count: 22183128332 -> 22181285594 (-0.01%); split: -0.06%, +0.05% Spill count: 69890 -> 69859 (-0.04%); split: -0.09%, +0.04% Fill count: 128877 -> 128344 (-0.41%); split: -0.42%, +0.00% Max live registers: 48053415 -> 48051613 (-0.00%); split: -0.00%, +0.00% Totals from 6817 (1.24% of 551443) affected shaders: Instrs: 4300169 -> 4296258 (-0.09%); split: -0.14%, +0.05% Cycle count: 17263755610 -> 17261912872 (-0.01%); split: -0.08%, +0.07% Spill count: 41822 -> 41791 (-0.07%); split: -0.15%, +0.07% Fill count: 75523 -> 74990 (-0.71%); split: -0.71%, +0.01% Max live registers: 733647 -> 731845 (-0.25%); split: -0.29%, +0.04% Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152735305 -> 152735801 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7733536 -> 7733616 (+0.00%) Cycle count: 17398725539 -> 17400873100 (+0.01%); split: -0.00%, +0.02% Max live registers: 31887018 -> 31885742 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5561696 -> 5561712 (+0.00%) Totals from 5672 (0.90% of 633314) affected shaders: Instrs: 2817606 -> 2818102 (+0.02%); split: -0.05%, +0.07% Subgroup size: 81128 -> 81208 (+0.10%) Cycle count: 10021470543 -> 10023618104 (+0.02%); split: -0.01%, +0.03% Max live registers: 306520 -> 305244 (-0.42%); split: -0.43%, +0.01% Max dispatch width: 74136 -> 74152 (+0.02%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	7eab2cb67e	brw/nir: Treat load_workgroup_id as convergent v2: Fix for Xe2. shader-db: Lunar Lake Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown) total instructions in shared programs: 18096526 -> 18096500 (<.01%) instructions in affected programs: 6759 -> 6733 (-0.38%) helped: 9 / HURT: 3 total cycles in shared programs: 921727804 -> 921841300 (0.01%) cycles in affected programs: 110049730 -> 110163226 (0.10%) helped: 90 / HURT: 372 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20496591 -> 20496402 (<.01%) instructions in affected programs: 48757 -> 48568 (-0.39%) helped: 25 / HURT: 8 total cycles in shared programs: 875253948 -> 875237902 (<.01%) cycles in affected programs: 56760140 -> 56744094 (-0.03%) helped: 363 / HURT: 34 total spills in shared programs: 4555 -> 4546 (-0.20%) spills in affected programs: 174 -> 165 (-5.17%) helped: 2 / HURT: 0 total fills in shared programs: 5243 -> 5224 (-0.36%) fills in affected programs: 382 -> 363 (-4.97%) helped: 2 / HURT: 0 fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 141811577 -> 141811551 (-0.00%); split: -0.00%, +0.00% Cycle count: 22173792370 -> 22183128332 (+0.04%); split: -0.00%, +0.04% Max live registers: 48053498 -> 48053415 (-0.00%) Totals from 3911 (0.71% of 551443) affected shaders: Instrs: 2164804 -> 2164778 (-0.00%); split: -0.00%, +0.00% Cycle count: 2404062476 -> 2413398438 (+0.39%); split: -0.02%, +0.41% Max live registers: 413583 -> 413500 (-0.02%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	6fab1b77c2	brw/nir: Treat some load_uniform as convergent No shader-db changes on any Intel platform. v2: Fix for Xe2. v3: Rework the way that we determine that an intrinsic can actually be convergent. This will now depend on whether or not the important sources have previously be determined to be convergent. Fixes intermitent failures in some test cases (including dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.push_constant_float_16_to_32.scalar_frag). v4: s/the it/it/ in a comment. Noticed by Ken. fossil-db: No fossil-db changes on Lunar Lake. Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152743449 -> 152743161 (-0.00%) Cycle count: 17399179660 -> 17399193488 (+0.00%) Totals from 144 (0.02% of 633314) affected shaders: Instrs: 5936 -> 5648 (-4.85%) Cycle count: 51616 -> 65444 (+26.79%) Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 150646195 -> 150645907 (-0.00%) Cycle count: 15618427818 -> 15618428942 (+0.00%) Totals from 144 (0.02% of 632567) affected shaders: Instrs: 6218 -> 5930 (-4.63%) Cycle count: 39968 -> 41092 (+2.81%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:59 -08:00
Ian Romanick	341e5117ec	brw/nir: Treat load_const as convergent opt_combine_constants goes to great effort to pack 8 constants into a single register, this can't have much effect. There is a lot of fossil-db variation among platforms, but the results are generally positive. v2: Fix for Xe2. shader-db: Lunar Lake total instructions in shared programs: 18095100 -> 18092845 (-0.01%) instructions in affected programs: 158931 -> 156676 (-1.42%) helped: 423 / HURT: 0 total cycles in shared programs: 921523326 -> 921522784 (<.01%) cycles in affected programs: 7522774 -> 7522232 (<.01%) helped: 225 / HURT: 228 LOST: 1 GAINED: 7 Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19820211 -> 19820303 (<.01%) instructions in affected programs: 53087 -> 53179 (0.17%) helped: 135 / HURT: 1 total cycles in shared programs: 906380523 -> 906383031 (<.01%) cycles in affected programs: 1402315 -> 1404823 (0.18%) helped: 156 / HURT: 100 LOST: 1 GAINED: 16 fossil-db: Lunar Lake Totals: Instrs: 141876801 -> 141783010 (-0.07%); split: -0.07%, +0.00% Subgroup size: 10994624 -> 10994704 (+0.00%) Cycle count: 22173441950 -> 22172949188 (-0.00%); split: -0.01%, +0.01% Spill count: 69850 -> 69890 (+0.06%); split: -0.00%, +0.06% Fill count: 129285 -> 128877 (-0.32%) Max live registers: 48047900 -> 48043650 (-0.01%); split: -0.01%, +0.00% Totals from 29837 (5.41% of 551396) affected shaders: Instrs: 7842512 -> 7748721 (-1.20%); split: -1.23%, +0.03% Subgroup size: 940320 -> 940400 (+0.01%) Cycle count: 3444846368 -> 3444353606 (-0.01%); split: -0.09%, +0.08% Spill count: 23358 -> 23398 (+0.17%); split: -0.01%, +0.18% Fill count: 52296 -> 51888 (-0.78%) Max live registers: 3183481 -> 3179231 (-0.13%); split: -0.16%, +0.03% Meteor Lake Totals: Instrs: 152709353 -> 152666543 (-0.03%); split: -0.03%, +0.00% Cycle count: 17397176906 -> 17397668904 (+0.00%); split: -0.00%, +0.01% Fill count: 147896 -> 147893 (-0.00%) Max live registers: 31862891 -> 31861888 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04% Totals from 20913 (3.30% of 633046) affected shaders: Instrs: 6676676 -> 6633866 (-0.64%); split: -0.64%, +0.00% Cycle count: 1498330125 -> 1498822123 (+0.03%); split: -0.06%, +0.09% Fill count: 41010 -> 41007 (-0.01%) Max live registers: 1799295 -> 1798292 (-0.06%); split: -0.06%, +0.00% Max dispatch width: 12880 -> 14992 (+16.40%); split: +33.29%, -16.89% DG2 and Tiger Lake had similar results. (DG2 shown) Totals: Instrs: 152730878 -> 152688139 (-0.03%); split: -0.03%, +0.00% Cycle count: 17394835605 -> 17394179808 (-0.00%); split: -0.01%, +0.00% Max live registers: 31862843 -> 31861840 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04% Totals from 20912 (3.30% of 633046) affected shaders: Instrs: 6563021 -> 6520282 (-0.65%); split: -0.65%, +0.00% Cycle count: 1201999616 -> 1201343819 (-0.05%); split: -0.08%, +0.03% Max live registers: 1798392 -> 1797389 (-0.06%); split: -0.06%, +0.00% Max dispatch width: 12872 -> 14984 (+16.41%); split: +33.31%, -16.90% Ice Lake Totals: Instrs: 151914872 -> 151868108 (-0.03%) Cycle count: 15262958696 -> 15262665082 (-0.00%); split: -0.00%, +0.00% Max live registers: 32194225 -> 32193192 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5650880 -> 5650608 (-0.00%); split: +0.02%, -0.03% Totals from 22192 (3.48% of 637223) affected shaders: Instrs: 6419739 -> 6372975 (-0.73%) Cycle count: 184733818 -> 184440204 (-0.16%); split: -0.36%, +0.20% Max live registers: 1989950 -> 1988917 (-0.05%); split: -0.05%, +0.00% Max dispatch width: 5744 -> 5472 (-4.74%); split: +23.40%, -28.13% Skylake Totals: Instrs: 141027379 -> 140811741 (-0.15%) Cycle count: 14817704293 -> 14817418611 (-0.00%); split: -0.01%, +0.01% Max live registers: 31628796 -> 31627791 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5535176 -> 5539880 (+0.08%); split: +0.14%, -0.06% Totals from 22218 (3.53% of 628840) affected shaders: Instrs: 5944856 -> 5729218 (-3.63%) Cycle count: 182845101 -> 182559419 (-0.16%); split: -0.60%, +0.44% Max live registers: 1974576 -> 1973571 (-0.05%); split: -0.07%, +0.02% Max dispatch width: 16912 -> 21616 (+27.81%); split: +46.93%, -19.11% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	d0f1a94e3d	brw/build: Prepare BROADCAST for scalar values Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	5ea9ed4798	brw/nir: Prepare try_rebuild_source for scalar values Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	59f66b4150	brw/emit: Allow scalar sources to HF math instructions on Xe2 v2: Add a comment explaining the context of the workaround. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	4457073c32	brw/lower: Properly handle UNIFORM globals address in lower_trace_ray_logical_send v2: Don't shadow previous declaration of globals_addr. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Ian Romanick	007c92b2ac	brw/lower: Adjust source stride on DF is_scalar sources to MAD on Gfx9 This commit used to be "brw/emit: Allow scalar sources to 64-bit 3-source instructions". These instructions were fixed up in brw_eu_emit. There seems to be some conflict with the <0,1,0> stride an post-RA scheduling. The only difference between the passing code generated by this commit and the failing code generated by the older commit is some post-RA scheduling. v2: Change the stride of a MAD even if the instruction isn't lowered. MAD instructions that are already SIMD8 have to follow the same rules. 🤦 v3: Pull the lowering out to its own pass. Update the comment in brw_fs_validate. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00

... 4 5 6 7 8 ...

4222 commits