fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 00:30:13 +01:00

Author	SHA1	Message	Date
Caio Oliveira	0b310ae4d8	intel/brw: Rename fs_generator to brw_generator Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>	2025-01-17 00:04:41 +00:00
Kenneth Graunke	894393470a	brw: Fix Xe2 spilling code to limit to SIMD32 rather than SIMD16 LSC can do native SIMD32 messages on Xe2. Cuts spill/fills on Lunarlake: - q2rtx-rt-pipeline: -20.83% / -16.85% - Borderlands 3 DX12: -18.26% / -2.09% - Cyberpunk 2077: -2.18% / -0.11% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32986>	2025-01-11 09:33:09 +00:00
Lionel Landwerlin	8ac7802ac8	brw: move final send lowering up into the IR Because we do emit the final send message form in code generation, a lot of emissions look like this : add(8) vgrf0, u0, 0x100 mov(1) a0.1, vgrf0 # emitted by the generator send(8) ..., a0.1 By moving address register manipulation in the IR, we can get this down to : add(1) a0.1, u0, 0x100 send(8) ..., a0.1 This reduce register pressure around some send messages by 1 vgrf. All lost shaders in the below results are fragment SIMD32, due to the throughput estimator. If turned off, we loose no SIMD32 shaders with this change. DG2 results: Assassin's Creed Valhalla: Totals from 2044 (96.87% of 2110) affected shaders: Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00% Subgroup size: 23832 -> 23824 (-0.03%) Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82% Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39% Fill count: 2005 -> 1256 (-37.36%) Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00% Max live registers: 116765 -> 115058 (-1.46%) Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67% Cyberpunk 2077: Totals from 1181 (93.43% of 1264) affected shaders: Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01% Subgroup size: 13016 -> 13032 (+0.12%) Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39% Spill count: 12 -> 8 (-33.33%) Fill count: 9 -> 6 (-33.33%) Dota2: Totals from 173 (11.59% of 1493) affected shaders: Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34% Max live registers: 5787 -> 5779 (-0.14%) Max dispatch width: 1344 -> 1152 (-14.29%) Hitman3: Totals from 5072 (95.39% of 5317) affected shaders: Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00% Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48% Spill count: 3942 -> 3200 (-18.82%) Fill count: 10158 -> 8846 (-12.92%) Scratch Memory Size: 257024 -> 223232 (-13.15%) Max live registers: 328467 -> 324631 (-1.17%) Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73% Fortnite: Totals from 360 (4.82% of 7472) affected shaders: Instrs: 778068 -> 777925 (-0.02%) Subgroup size: 3128 -> 3136 (+0.26%) Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19% Max live registers: 50689 -> 50658 (-0.06%) Hogwarts Legacy: Totals from 1376 (84.00% of 1638) affected shaders: Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03% Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12% Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36% Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23% Scratch Memory Size: 99328 -> 89088 (-10.31%) Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23% Max dispatch width: 11848 -> 11920 (+0.61%) Metro Exodus: Totals from 92 (0.21% of 43072) affected shaders: Instrs: 262995 -> 262968 (-0.01%) Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25% Max live registers: 11152 -> 11140 (-0.11%) Red Dead Redemption 2 : Totals from 451 (7.71% of 5847) affected shaders: Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00% Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00% Max live registers: 42294 -> 42185 (-0.26%) Spiderman Remastered: Totals from 6820 (98.02% of 6958) affected shaders: Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65% Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25% Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61% Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58% Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74% Max live registers: 493149 -> 487458 (-1.15%) Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20% Strange Brigade: Totals from 3769 (91.21% of 4132) affected shaders: Instrs: 1354476 -> 1321474 (-2.44%) Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59% Max live registers: 199057 -> 193656 (-2.71%) Max dispatch width: 30272 -> 30240 (-0.11%) Witcher 3: Totals from 25 (2.40% of 1041) affected shaders: Instrs: 24621 -> 24606 (-0.06%) Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05% Max live registers: 1963 -> 1955 (-0.41%) LNL results: Assassin's Creed Valhalla: Totals from 1928 (98.02% of 1967) affected shaders: Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11% Subgroup size: 41264 -> 41280 (+0.04%) Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11% Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90% Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60% Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56% Max live registers: 205483 -> 202192 (-1.60%) Cyberpunk 2077: Totals from 1177 (96.40% of 1221) affected shaders: Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03% Subgroup size: 24912 -> 24944 (+0.13%) Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81% Spill count: 8 -> 3 (-62.50%) Fill count: 6 -> 3 (-50.00%) Max live registers: 126922 -> 125472 (-1.14%) Dota2: Totals from 428 (32.47% of 1318) affected shaders: Instrs: 89355 -> 89740 (+0.43%) Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55% Max live registers: 32863 -> 32847 (-0.05%) Fortnite: Totals from 5354 (81.72% of 6552) affected shaders: Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53% Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65% Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72% Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35% Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71% Hitman3: Totals from 4912 (97.09% of 5059) affected shaders: Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00% Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55% Spill count: 3739 -> 3136 (-16.13%) Fill count: 10657 -> 9564 (-10.26%) Scratch Memory Size: 373760 -> 318464 (-14.79%) Max live registers: 597566 -> 589460 (-1.36%) Hogwarts Legacy: Totals from 1471 (96.33% of 1527) affected shaders: Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05% Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68% Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95% Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83% Scratch Memory Size: 251904 -> 217088 (-13.82%) Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12% Metro Exodus: Totals from 18356 (49.81% of 36854) affected shaders: Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83% Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84% Spill count: 595 -> 546 (-8.24%) Fill count: 1604 -> 1408 (-12.22%) Max live registers: 2086937 -> 2086933 (-0.00%) Red Dead Redemption 2: Totals from 4171 (79.31% of 5259) affected shaders: Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83% Subgroup size: 86416 -> 86432 (+0.02%) Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53% Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59% Scratch Memory Size: 401408 -> 385024 (-4.08%) Spiderman Remastered: Totals from 6639 (98.94% of 6710) affected shaders: Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98% Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59% Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82% Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76% Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17% Max live registers: 918240 -> 906604 (-1.27%) Strange Brigade: Totals from 3675 (92.24% of 3984) affected shaders: Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00% Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09% Max live registers: 361849 -> 351265 (-2.92%) Witcher 3: Totals from 13 (46.43% of 28) affected shaders: Instrs: 593 -> 660 (+11.30%) Cycle count: 28302 -> 28714 (+1.46%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>	2025-01-11 08:41:42 +00:00
Caio Oliveira	4d43ee0dd6	intel/brw: Remove uses of VLAs Was causing trouble in some build configurations, we don't really need them. Use ralloc for consistency. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Caio Oliveira	e6a3770433	intel/compiler: Use INFINITY spill cost to represent no_spill Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Antonio Ospite <None> Reviewed-by: Kenneth Graunke <None> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>	2025-01-10 07:05:35 +00:00
Ian Romanick	ef3dc401da	brw: Add devinfo parameter to fs_inst::regs_read This isn't used now, but future commits will add uses. Doing this as a separate commit removes a lot of "just typing" churn from commits that have real changes to review. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>	2024-12-24 18:09:58 -08:00
Francisco Jerez	43d59c6186	intel/brw/xe3+: Relax SEND EOT register assignment restrictions. These restrictions have been removed from the hardware. Make the code enforcing and validating them conditional. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32758>	2024-12-20 14:03:15 -08:00
Caio Oliveira	c8f6d8154f	intel/brw: Remove overloads for brw_print_instruction/s functions Almost all cases now handled with default arguments. The only real extra work that was being done was pushed to the client code in debug_optimizer(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>	2024-12-12 22:01:48 +00:00
Lionel Landwerlin	69edf4144a	brw: use transpose unspill messages when possible This simplifies the unspill messages quite a bit. A/B testing on DG2 : BlackOps3 : +0.96% TotalWarPharaoh: +0.31% DG2 shader changes : Assassin's Creed Valhalla: Totals from 19 (0.89% of 2131) affected shaders: Instrs: 70542 -> 64369 (-8.75%) Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06% Black Ops 3: Totals from 55 (3.41% of 1612) affected shaders: Instrs: 389549 -> 350646 (-9.99%) Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15% Control: Totals from 1 (0.11% of 878) affected shaders: Instrs: 3409 -> 3212 (-5.78%) Cycle count: 255991 -> 250411 (-2.18%) Cyberpunk 2077: Totals from 1 (0.08% of 1264) affected shaders: Instrs: 2363 -> 2337 (-1.10%) Cycle count: 69283 -> 69186 (-0.14%) Fallout 4: Totals from 1 (0.06% of 1601) affected shaders: Instrs: 27946 -> 20056 (-28.23%) Cycle count: 2391398 -> 2153658 (-9.94%) Fortnite: Totals from 273 (3.65% of 7470) affected shaders: Instrs: 634377 -> 601519 (-5.18%) Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01% Hogwarts Legacy: Totals from 50 (3.02% of 1656) affected shaders: Instrs: 110455 -> 103339 (-6.44%) Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03% Metro Exodus: Totals from 70 (0.16% of 43076) affected shaders: Instrs: 253847 -> 245321 (-3.36%) Cycle count: 13269473 -> 13209131 (-0.45%) Spill count: 1111 -> 1108 (-0.27%) Fill count: 2868 -> 2865 (-0.10%) Red Dead Redemption 2: Totals from 139 (2.38% of 5847) affected shaders: Instrs: 496551 -> 450180 (-9.34%) Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04% Spill count: 6322 -> 6326 (+0.06%) Fill count: 15558 -> 15568 (+0.06%) Rise Of The Tomb Raider: Totals from 1 (0.56% of 178) affected shaders: Instrs: 1682 -> 1437 (-14.57%) Cycle count: 603670 -> 586766 (-2.80%) Spiderman Remastered: Totals from 820 (11.77% of 6965) affected shaders: Instrs: 4622877 -> 3984893 (-13.80%) Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16% Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25% Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27% Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35% Some of stats show spilling changes which is telling of how our spill code is not adequate. Some of the spilled values are probably being respilled which shouldn't be the case. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>	2024-12-04 08:59:07 +00:00
Lionel Landwerlin	a21cd8c5b6	brw: allocate physical register sizes for spilling All of the spilling code should work with physical register units because for example SEND messages will expect a physical register as destination. So always allocate a full physical register for the spilled/unspilled values and adjust the offsets of the registers to physical sizes too. Cc: mesa-stable Fixes: `aa494cba` ("brw: align spilling offsets to physical register sizes") Closes: mesa/mesa#11967 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Found-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32124>	2024-11-14 08:44:03 +00:00
Paulo Zanoni	c0bceaf057	brw: don't emit instruction to add zero in spilling code When the spill_offset is zero, don't emit an instruction that adds zero. Results on MTL: - Shaders: instructions helped: shaders/blender/581.shader_test FS SIMD8: 6760 -> 6759 (-0.01%) (scheduled: none) instructions helped: shaders/blender/1017.shader_test FS SIMD8: 6760 -> 6759 (-0.01%) (scheduled: none) instructions helped: shaders/blender/1045.shader_test FS SIMD8: 6474 -> 6473 (-0.02%) (scheduled: none) instructions helped: shaders/blender/723.shader_test FS SIMD8: 6458 -> 6457 (-0.02%) (scheduled: none) instructions helped: shaders/blender/1042.shader_test FS SIMD8: 6458 -> 6457 (-0.02%) (scheduled: none) instructions helped: shaders/blender/917.shader_test FS SIMD8: 4900 -> 4897 (-0.06%) (scheduled: none) instructions helped: shaders/blender/455.shader_test FS SIMD8: 4832 -> 4829 (-0.06%) (scheduled: none) cycles helped: shaders/blender/917.shader_test FS SIMD8: 891856 -> 891832 (<.01%) (scheduled: none) cycles helped: shaders/blender/455.shader_test FS SIMD8: 894692 -> 894660 (<.01%) (scheduled: none) total instructions in shared programs: 1596934 -> 1596923 (<.01%) instructions in affected programs: 42642 -> 42631 (-0.03%) helped: 7 HURT: 0 - Fossils: Instrs: 151744378 -> 151741213 (-0.00%) Cycle count: 16007811131 -> 16007643963 (-0.00%); split: -0.00%, +0.00% Totals from 1353 (0.21% of 632545) affected shaders: Instrs: 3925143 -> 3921978 (-0.08%) Cycle count: 2292838118 -> 2292670950 (-0.01%); split: -0.01%, +0.00% RELATIVE IMPROVEMENTS - Instrs Before After Delta Percentage mesa/benchmarks/gravity_mark/3e9c48cebaddf012/cs/0 1947 1941 -6 -0.31% mesa/steam-native/red_dead_redemption2/571534e21fb7bd2a/fs.8/0 3431 3421 -10 -0.29% mesa/steam-dxvk/batman_arkham_city_goty/d783eacc9ebe324d/fs.8/0 717 715 -2 -0.28% mesa/steam-dxvk/batman_arkham_city_goty/14e0878a6a9605c9/fs.8/0 724 722 -2 -0.28% mesa/steam-dxvk/batman_arkham_city_goty/d859c2ae858269dc/fs.8/0 744 742 -2 -0.27% mesa/steam-dxvk/total_war_warhammer3/18b9d4a3b1961616/vs/0 1539 1535 -4 -0.26% mesa/steam-dxvk/total_war_warhammer3/a21827ce57dc0e29/vs/0 1539 1535 -4 -0.26% (and a bunch of others where the delta is -2, -4 or -6) Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31694>	2024-10-23 20:19:48 +00:00
Caio Oliveira	4361a08254	intel/brw: Reduce scope of has_source_and_destination_hazard This predicate at the moment is only relevant during register allocation, so move it there and the code can ignore virtual instructions that were already lowered previously. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Caio Oliveira	6db7d1af16	intel/compiler: Rename shader_stats structs Add the `brw_` and `elk_` prefixes to the structs to avoid compilation failure building with LTO ("violates the C++ One Definition Rule") when the structs diverge. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>	2024-10-11 06:40:29 +00:00
Lionel Landwerlin	aa494cbacf	brw: align spilling offsets to physical register sizes In commit `fe3d90aedf` ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.") we aligned the width of scratch messages to physical register sizes (32B prior to Xe2, 64B for Xe2+). But our spilling offsets are computed using the register allocations sizes which are in units of 32B. That means on Xe2, you can end up spilling a virtual register allocated at 32B (which we use for surface state computations with exec_all) and then the spilling of that register will be emitted in SIMD16, having the upper 8 lanes overwriting the next spilled register. We could potentially limit spills to SIMD8 messages on Xe2 (only writing 32B of data), but we're also unlikely to have all 32B virtual register spilled next to one another. And if not tightly packed, we would have 64B registers stored on 2 different cachelines which sounds inefficient. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `fe3d90aedf` ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.") Backport-to: 24.2 Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>	2024-09-04 23:05:31 +00:00
Kenneth Graunke	8bca7e520c	intel/brw: Only force g0's liveness to be the whole program if spilling We don't actually need to extend g0's live range to the EOT message generally - most messages that end a shader are headerless. The main implicit use of g0 is for constructing scratch headers. With the last two patches, we now consider scratch access that may exist in the IR and already extend the liveness appropriately. There is one remaining problem: spilling. The register allocator will create new scratch messages when spilling a register, which need to create scratch headers, which need g0. So, every new spill or fill might extend the live range of g0, which would create new interference, altering the graph. This can be problematic. However, when compiling SIMD16 or SIMD32 fragment shaders, we don't allow spilling anyway. So, why not use allow g0? Also, when trying various scheduling modes, we first try allocation without spilling. If it works, great, if not, we try a (hopefully) less aggressive schedule, and only allow spilling on the lowest-pressure schedule. So, even for regular SIMD8 shaders, we can potentially gain the use of g0 on the first few tries at scheduling+allocation. Once we try to allocate with spilling, we go back to reserving g0 for the entire program, so that we can construct scratch headers at any point. We could possibly do better here, but this is simple and reliable with some benefit. Thanks to Ian Romanick for suggesting I try this approach. fossil-db on Alchemist shows some more spill/fill improvements: Totals: Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00% Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47% Spill count: 52891 -> 52471 (-0.79%) Fill count: 101599 -> 100818 (-0.77%) Scratch Memory Size: 3292160 -> 3197952 (-2.86%) Totals from 416541 (66.59% of 625484) affected shaders: Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01% Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67% Spill count: 420 -> 0 (-inf%) Fill count: 781 -> 0 (-inf%) Scratch Memory Size: 94208 -> 0 (-inf%) Witcher 3 shows a 33% reduction in scratch memory size, for example. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:34 -07:00
Kenneth Graunke	4ca4b064cf	intel/brw: Record g0 as live for sends with send_ex_desc_scratch set brw_send_indirect_split_message() implicitly reads g0 to construct the extended message descriptor for certain send messages when this is set. Record that liveness explicitly. Thanks to Francisco Jerez for reminding me about this use of g0. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:32 -07:00
Kenneth Graunke	9200fb966c	intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 The generator code for emitting legacy scratch headers was implicitly using g0 as a source. But the IR wasn't indicating any usage of g0, which means the liveness isn't properly tracked at the IR level. It works because we reserve g0 as permanently live for the whole program. In order to stop doing that, we need to record it properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:31 -07:00
Kenneth Graunke	545f20419f	intel/brw: Delete fs_reg_alloc::discard_interference_graph() Unused since commit `50519598ff`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:28 -07:00
Caio Oliveira	b98930c770	intel/brw: Move regalloc and scheduling functions out of fs_visitor Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Caio Oliveira	17b7e49089	intel/brw: Move out of fs_visitor and rename print instructions They use the brw_print prefix now. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>	2024-07-25 15:37:13 +00:00
Matt Turner	5e24c21625	intel/brw: Use REG_CLASS_COUNT Fixes: `5d87f41a54` ("intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>	2024-07-25 14:55:09 +00:00
Kenneth Graunke	c429d5025e	intel/brw: Don't force g1's live range to be the entire program The idea here was that pixel shader framebuffer writes used the g0 and g1 thread payload register values to construct the message header. However, most messages are headerless and don't use either. There's a 2012-era comment that the simulator at one point had a bug where certain headerless messages would incorrectly take the values from the g0/g1 register contents rather than using sideband. But, that was likely fixed eons ago. So we really don't need to do this. Furthermore, there are many more shader stages these days: - VS: r1 contains output URB handles - TCS: r1 contains ICP handles - TES: r1 contains gl_TessCoord.x (r4 contains output URB handles) - GS: r1 contains output URB handles - CS: r1 contains LocalID.X on DG2+ but nothing on older hardware - Task/Mesh: r1 contains LocalID.X - BS: r1 contains bindless stack handles Vertex and geometry aren't likely to benefit here because r1 is needed for their output messages, which are also what terminate the shader. TES will definitely benefit because we were making a value pointlessly live for the whole program. Same for TCS, to a lesser extent. Compute prior to DG2 was the worst, as g1 literally has no meaningful content, so there is no point to keeping it live. fossil-db on Alchemist shows substantial spill/fill improvements: Totals: Instrs: 148782351 -> 148741996 (-0.03%); split: -0.03%, +0.01% Cycles: 12602907531 -> 12605795191 (+0.02%); split: -0.70%, +0.72% Subgroup size: 7518608 -> 7518632 (+0.00%) Send messages: 7341727 -> 7341762 (+0.00%) Spill count: 54633 -> 52575 (-3.77%) Fill count: 104694 -> 100680 (-3.83%) Scratch Memory Size: 3375104 -> 3287040 (-2.61%) Totals from 301172 (48.21% of 624670) affected shaders: Instrs: 95531927 -> 95491572 (-0.04%); split: -0.05%, +0.01% Cycles: 9643531593 -> 9646419253 (+0.03%); split: -0.91%, +0.94% Subgroup size: 4492512 -> 4492536 (+0.00%) Send messages: 4399737 -> 4399772 (+0.00%) Spill count: 20034 -> 17976 (-10.27%) Fill count: 41530 -> 37516 (-9.67%) Scratch Memory Size: 1522688 -> 1434624 (-5.78%) Assassin's Creed Odyssey in particular has 20% fewer fills. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30146>	2024-07-23 02:26:52 +00:00
Caio Oliveira	3670c24740	intel/brw: Replace uses of fs_reg with brw_reg And remove the fs_reg alias. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:19 +00:00
Caio Oliveira	d00329e821	intel/brw: Replace some fs_reg constructors with functions Create three helper functions for ATTR, UNIFORM and VGRF creation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>	2024-07-03 02:53:18 +00:00
Kenneth Graunke	50519598ff	intel/brw: Skip discarding the interference graph We no longer need to reserve registers for constructing spill/fill messages. We have split sends and construct message headers in new temporary registers with a very short lifespan which are simply added to the existing interference graph as new nodes and allocated via the normal mechanism. This means that when we need to spill for the first time, we can avoid discarding and recomputing the entire interference graph. We also avoid needing to recreate all spill candidate information once ra_allocate() fails, because the graph remains valid, and none of the existing nodes had any changes to their interference. The existing spill candidates remain valid. This will slightly help improve compile time when needing to spill. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>	2024-06-20 09:47:18 +00:00
Kenneth Graunke	29d6264627	intel/brw: Build the scratch header on the fly for pre-LSC systems Instead of reserving a register to contain the spill header, which gets marked live for the entire program, we can just emit the ALU instructions to build it on the fly. (This is similar to the way we handle scratch on Alchemist with the newer LSC data port.) There are a couple of downsides that make this not obviously a win. First, in order to construct the scratch header on Gfx9-12, we have to use fields from g0, which will have to remain live anywhere that scratch access is required. This could negate the register pressure benefits of creating the header on the fly. However, g0 is oft used in other places anyway, so it may already be there. Another is that it's a non-trivial number of ALU instructions to construct the value. Still, trading lower pressure (so fewer spills, less memory access and stalls) for more cheap ALU seems like it ought to be a win. There is another valuable benefit: by not reserving a register, we eliminate the need to reconstruct the interference graph. (The next patch will actually do so.) shader-db on Icelake shows spills/fills at 54/53 helped, 4/10 hurt, and an 8% increase in ALU on affected shaders. Synmark's OglCSDof (a benchmark that spills) performance remains the same on Alderlake. fossil-db on Icelake shows a 5.6%/5.1% reduction in spills/fills and a 4% reduction in scratch memory size on affected shaders. Instruction counts go up by 11.07%, but cycle estimates only increase by 0.57%. Assassin's Creed Odyssey and Wolfenstein Youngblood both see 20-30% reductions in spills/fills, a significant improvement. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>	2024-06-20 09:47:18 +00:00
Francisco Jerez	eebc4ec264	intel/brw/xe2+: Round up spill/unspill data size to nearest reg_size multiple. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>	2024-05-15 17:16:52 +00:00
Kenneth Graunke	545bb8fb6f	intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_* Both of these helpers do the same thing. We now have brw_type_size_bits and brw_type_size_bytes and can use whichever makes sense in that place. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Kenneth Graunke	873fcdff38	intel/brw: Stop using long BRW_REGISTER_TYPE enum names s/BRW_REGISTER_TYPE/BRW_TYPE/g Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>	2024-04-25 11:41:48 +00:00
Francisco Jerez	74efde7663	intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*(). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Francisco Jerez	fa96274a87	intel/brw/xehp+: Replace lsc_msg_desc_dest_len()/lsc_msg_desc_src0_len() with helpers to do the computation. We cannot rely on the immediate message descriptor having accurate values for mlen and rlen at the IR level, since they are updated at codegen time via 'inst->mlen' and 'inst->size_written', which could end up with values inconsistent with the message descriptor if e.g. the split sends optimization had an effect. Instead, define helpers that do the computation without relying on the message descriptor, and use the pre-existing brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully equivalent to the lsc helpers deleted here) during disassembly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>	2024-04-01 00:00:03 +00:00
Kenneth Graunke	91252c98a8	intel/brw: Add assertions that EOT messages live in g112+ The validator already catches this, but asserting here makes it easier to catch the problem earlier in a debugger. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>	2024-03-05 11:39:26 +00:00
Kenneth Graunke	97bf3d3b2d	intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND There's no need for special handling here, it's just a send message with a trivial g0 header and descriptor. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>	2024-03-05 11:16:20 +00:00
Caio Oliveira	865ef36609	intel/brw: Remove brw_shader.h Find a better home for its existing content. Some functions are now just static functions at the usage sites. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:06 +00:00
Caio Oliveira	559d94cd0d	intel/brw: Use fs_visitor instead of backend_shader in various passes And since we are touching them, rename a couple of passes to follow same name convention as existing ones. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>	2024-02-29 19:28:05 +00:00
Caio Oliveira	35b07ab035	intel/brw: Use a single register set Different sets were needed for SIMD8/SIMD16 in old Gfx versions, but now we can use a single one regardless of the SIMD size. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	8f3c52c1da	intel/brw: Remove MRF type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	5c93a0e125	intel/brw: Remove Gfx8- remaining opcodes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:39 +00:00
Caio Oliveira	ed6f0665e0	intel/brw: Remove Gfx8- code from register allocator Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>	2024-02-28 05:45:38 +00:00
Caio Oliveira	cf730adc58	intel/compiler: Make fs_builder include fs_visitor and not the other way This will allow fs_builder have a reference to an fs_visitor (a "fs_shader" really), instead of a reference to a backend_shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>	2023-12-12 19:36:14 +00:00
Caio Oliveira	2d6240ab14	intel/compiler: Don't use fs_visitor::bld in fs_reg_alloc Just set up the builder without relying on the pre-existing one. Moves one step close to remove bld from fs_visitor. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26301>	2023-11-28 19:53:51 +00:00
Jordan Justen	c28539a2fe	intel/compiler: Use enum xe2_lsc_cache_load on xe2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>	2023-09-27 23:57:25 +00:00
Francisco Jerez	cef4d53daf	intel/xe2+: Round up size to reg_unit() in fs_reg_alloc::alloc_spill_reg(). Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	fe3d90aedf	intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	571ddf8516	intel/fs/xe2+: Fix payload node live range calculations for change in register size. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	2b7419d090	intel/fs: Fix signedness of payload_node_count argument of calculate_payload_ranges(). Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	6d39b3d6ae	intel/fs/ra/xe2: Scale up register allocation granularity by 2x on Xe2+ platforms. v2: Fix spill register allocation. Switch to brw_reg::nr representation in fake 256b units. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	bd98df5d8e	intel/compiler: Make MAX_VGRF_SIZE macro depend on devinfo and update it for Xe2. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Francisco Jerez	5d87f41a54	intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes. Rework: * Jordan: 16=>20 following `d33aff783d` ("intel/fs: add support for sparse accesses") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:35 -07:00
Jason Ekstrand	739e21fa9a	intel/fs: Add a parameter to speed up register spilling Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24299>	2023-07-28 14:51:42 +00:00

1 2 3

117 commits