Commit graph

117 commits

Author SHA1 Message Date
Caio Oliveira
0b310ae4d8 intel/brw: Rename fs_generator to brw_generator
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>
2025-01-17 00:04:41 +00:00
Kenneth Graunke
894393470a brw: Fix Xe2 spilling code to limit to SIMD32 rather than SIMD16
LSC can do native SIMD32 messages on Xe2.

Cuts spill/fills on Lunarlake:
- q2rtx-rt-pipeline: -20.83% / -16.85%
- Borderlands 3 DX12: -18.26% / -2.09%
- Cyberpunk 2077: -2.18% / -0.11%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32986>
2025-01-11 09:33:09 +00:00
Lionel Landwerlin
8ac7802ac8 brw: move final send lowering up into the IR
Because we do emit the final send message form in code generation, a
lot of emissions look like this :

  add(8)  vgrf0,    u0, 0x100
  mov(1)   a0.1, vgrf0          # emitted by the generator
  send(8)   ...,  a0.1

By moving address register manipulation in the IR, we can get this
down to :

  add(1)  a0.1,   u0, 0x100
  send(8)  ..., a0.1

This reduce register pressure around some send messages by 1 vgrf.

All lost shaders in the below results are fragment SIMD32, due to the
throughput estimator. If turned off, we loose no SIMD32 shaders with
this change.

DG2 results:

  Assassin's Creed Valhalla:
  Totals from 2044 (96.87% of 2110) affected shaders:
  Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00%
  Subgroup size: 23832 -> 23824 (-0.03%)
  Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82%
  Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39%
  Fill count: 2005 -> 1256 (-37.36%)
  Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00%
  Max live registers: 116765 -> 115058 (-1.46%)
  Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67%

  Cyberpunk 2077:
  Totals from 1181 (93.43% of 1264) affected shaders:
  Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01%
  Subgroup size: 13016 -> 13032 (+0.12%)
  Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39%
  Spill count: 12 -> 8 (-33.33%)
  Fill count: 9 -> 6 (-33.33%)

  Dota2:
  Totals from 173 (11.59% of 1493) affected shaders:
  Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34%
  Max live registers: 5787 -> 5779 (-0.14%)
  Max dispatch width: 1344 -> 1152 (-14.29%)

  Hitman3:
  Totals from 5072 (95.39% of 5317) affected shaders:
  Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00%
  Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48%
  Spill count: 3942 -> 3200 (-18.82%)
  Fill count: 10158 -> 8846 (-12.92%)
  Scratch Memory Size: 257024 -> 223232 (-13.15%)
  Max live registers: 328467 -> 324631 (-1.17%)
  Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73%

  Fortnite:
  Totals from 360 (4.82% of 7472) affected shaders:
  Instrs: 778068 -> 777925 (-0.02%)
  Subgroup size: 3128 -> 3136 (+0.26%)
  Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19%
  Max live registers: 50689 -> 50658 (-0.06%)

  Hogwarts Legacy:
  Totals from 1376 (84.00% of 1638) affected shaders:
  Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03%
  Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12%
  Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36%
  Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23%
  Scratch Memory Size: 99328 -> 89088 (-10.31%)
  Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23%
  Max dispatch width: 11848 -> 11920 (+0.61%)

  Metro Exodus:
  Totals from 92 (0.21% of 43072) affected shaders:
  Instrs: 262995 -> 262968 (-0.01%)
  Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25%
  Max live registers: 11152 -> 11140 (-0.11%)

  Red Dead Redemption 2 :
  Totals from 451 (7.71% of 5847) affected shaders:
  Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00%
  Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00%
  Max live registers: 42294 -> 42185 (-0.26%)

  Spiderman Remastered:
  Totals from 6820 (98.02% of 6958) affected shaders:
  Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65%
  Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25%
  Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61%
  Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58%
  Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74%
  Max live registers: 493149 -> 487458 (-1.15%)
  Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20%

  Strange Brigade:
  Totals from 3769 (91.21% of 4132) affected shaders:
  Instrs: 1354476 -> 1321474 (-2.44%)
  Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59%
  Max live registers: 199057 -> 193656 (-2.71%)
  Max dispatch width: 30272 -> 30240 (-0.11%)

  Witcher 3:
  Totals from 25 (2.40% of 1041) affected shaders:
  Instrs: 24621 -> 24606 (-0.06%)
  Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05%
  Max live registers: 1963 -> 1955 (-0.41%)

LNL results:

  Assassin's Creed Valhalla:
  Totals from 1928 (98.02% of 1967) affected shaders:
  Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11%
  Subgroup size: 41264 -> 41280 (+0.04%)
  Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11%
  Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90%
  Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60%
  Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56%
  Max live registers: 205483 -> 202192 (-1.60%)

  Cyberpunk 2077:
  Totals from 1177 (96.40% of 1221) affected shaders:
  Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03%
  Subgroup size: 24912 -> 24944 (+0.13%)
  Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81%
  Spill count: 8 -> 3 (-62.50%)
  Fill count: 6 -> 3 (-50.00%)
  Max live registers: 126922 -> 125472 (-1.14%)

  Dota2:
  Totals from 428 (32.47% of 1318) affected shaders:
  Instrs: 89355 -> 89740 (+0.43%)
  Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55%
  Max live registers: 32863 -> 32847 (-0.05%)

  Fortnite:
  Totals from 5354 (81.72% of 6552) affected shaders:
  Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53%
  Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65%
  Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72%
  Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35%
  Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71%

  Hitman3:
  Totals from 4912 (97.09% of 5059) affected shaders:
  Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00%
  Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55%
  Spill count: 3739 -> 3136 (-16.13%)
  Fill count: 10657 -> 9564 (-10.26%)
  Scratch Memory Size: 373760 -> 318464 (-14.79%)
  Max live registers: 597566 -> 589460 (-1.36%)

  Hogwarts Legacy:
  Totals from 1471 (96.33% of 1527) affected shaders:
  Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05%
  Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68%
  Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95%
  Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83%
  Scratch Memory Size: 251904 -> 217088 (-13.82%)
  Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12%

  Metro Exodus:
  Totals from 18356 (49.81% of 36854) affected shaders:
  Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83%
  Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84%
  Spill count: 595 -> 546 (-8.24%)
  Fill count: 1604 -> 1408 (-12.22%)
  Max live registers: 2086937 -> 2086933 (-0.00%)

  Red Dead Redemption 2:
  Totals from 4171 (79.31% of 5259) affected shaders:
  Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83%
  Subgroup size: 86416 -> 86432 (+0.02%)
  Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53%
  Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59%
  Scratch Memory Size: 401408 -> 385024 (-4.08%)

  Spiderman Remastered:
  Totals from 6639 (98.94% of 6710) affected shaders:
  Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98%
  Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59%
  Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82%
  Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76%
  Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17%
  Max live registers: 918240 -> 906604 (-1.27%)

  Strange Brigade:
  Totals from 3675 (92.24% of 3984) affected shaders:
  Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00%
  Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09%
  Max live registers: 361849 -> 351265 (-2.92%)

  Witcher 3:
  Totals from 13 (46.43% of 28) affected shaders:
  Instrs: 593 -> 660 (+11.30%)
  Cycle count: 28302 -> 28714 (+1.46%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Caio Oliveira
4d43ee0dd6 intel/brw: Remove uses of VLAs
Was causing trouble in some build configurations, we don't really need
them.  Use ralloc for consistency.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Caio Oliveira
e6a3770433 intel/compiler: Use INFINITY spill cost to represent no_spill
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Ian Romanick
ef3dc401da brw: Add devinfo parameter to fs_inst::regs_read
This isn't used now, but future commits will add uses. Doing this as a
separate commit removes a lot of "just typing" churn from commits that
have real changes to review.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Francisco Jerez
43d59c6186 intel/brw/xe3+: Relax SEND EOT register assignment restrictions.
These restrictions have been removed from the hardware.  Make the code
enforcing and validating them conditional.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32758>
2024-12-20 14:03:15 -08:00
Caio Oliveira
c8f6d8154f intel/brw: Remove overloads for brw_print_instruction/s functions
Almost all cases now handled with default arguments.  The only real
extra work that was being done was pushed to the client code in
debug_optimizer().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>
2024-12-12 22:01:48 +00:00
Lionel Landwerlin
69edf4144a brw: use transpose unspill messages when possible
This simplifies the unspill messages quite a bit.

A/B testing on DG2 :

BlackOps3 : +0.96%
TotalWarPharaoh: +0.31%

DG2 shader changes :

  Assassin's Creed Valhalla:
  Totals from 19 (0.89% of 2131) affected shaders:
  Instrs: 70542 -> 64369 (-8.75%)
  Cycle count: 18810945 -> 18560169 (-1.33%); split: -1.40%, +0.06%

  Black Ops 3:
  Totals from 55 (3.41% of 1612) affected shaders:
  Instrs: 389549 -> 350646 (-9.99%)
  Cycle count: 344168275 -> 340652311 (-1.02%); split: -1.17%, +0.15%

  Control:
  Totals from 1 (0.11% of 878) affected shaders:
  Instrs: 3409 -> 3212 (-5.78%)
  Cycle count: 255991 -> 250411 (-2.18%)

  Cyberpunk 2077:
  Totals from 1 (0.08% of 1264) affected shaders:
  Instrs: 2363 -> 2337 (-1.10%)
  Cycle count: 69283 -> 69186 (-0.14%)

  Fallout 4:
  Totals from 1 (0.06% of 1601) affected shaders:
  Instrs: 27946 -> 20056 (-28.23%)
  Cycle count: 2391398 -> 2153658 (-9.94%)

  Fortnite:
  Totals from 273 (3.65% of 7470) affected shaders:
  Instrs: 634377 -> 601519 (-5.18%)
  Cycle count: 31870433 -> 31624089 (-0.77%); split: -0.78%, +0.01%

  Hogwarts Legacy:
  Totals from 50 (3.02% of 1656) affected shaders:
  Instrs: 110455 -> 103339 (-6.44%)
  Cycle count: 6613728 -> 6530832 (-1.25%); split: -1.28%, +0.03%

  Metro Exodus:
  Totals from 70 (0.16% of 43076) affected shaders:
  Instrs: 253847 -> 245321 (-3.36%)
  Cycle count: 13269473 -> 13209131 (-0.45%)
  Spill count: 1111 -> 1108 (-0.27%)
  Fill count: 2868 -> 2865 (-0.10%)

  Red Dead Redemption 2:
  Totals from 139 (2.38% of 5847) affected shaders:
  Instrs: 496551 -> 450180 (-9.34%)
  Cycle count: 43233944 -> 40947386 (-5.29%); split: -5.33%, +0.04%
  Spill count: 6322 -> 6326 (+0.06%)
  Fill count: 15558 -> 15568 (+0.06%)

  Rise Of The Tomb Raider:
  Totals from 1 (0.56% of 178) affected shaders:
  Instrs: 1682 -> 1437 (-14.57%)
  Cycle count: 603670 -> 586766 (-2.80%)

  Spiderman Remastered:
  Totals from 820 (11.77% of 6965) affected shaders:
  Instrs: 4622877 -> 3984893 (-13.80%)
  Cycle count: 235094963186 -> 234483925430 (-0.26%); split: -0.42%, +0.16%
  Spill count: 73414 -> 73581 (+0.23%); split: -0.02%, +0.25%
  Fill count: 215090 -> 215627 (+0.25%); split: -0.02%, +0.27%
  Scratch Memory Size: 3520512 -> 3528704 (+0.23%); split: -0.12%, +0.35%

Some of stats show spilling changes which is telling of how our spill
code is not adequate. Some of the spilled values are probably being
respilled which shouldn't be the case.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32110>
2024-12-04 08:59:07 +00:00
Lionel Landwerlin
a21cd8c5b6 brw: allocate physical register sizes for spilling
All of the spilling code should work with physical register units
because for example SEND messages will expect a physical register as
destination.

So always allocate a full physical register for the spilled/unspilled
values and adjust the offsets of the registers to physical sizes too.

Cc: mesa-stable
Fixes: aa494cba ("brw: align spilling offsets to physical register sizes")
Closes: mesa/mesa#11967

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Found-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32124>
2024-11-14 08:44:03 +00:00
Paulo Zanoni
c0bceaf057 brw: don't emit instruction to add zero in spilling code
When the spill_offset is zero, don't emit an instruction that adds
zero.

Results on MTL:

- Shaders:
instructions helped:   shaders/blender/581.shader_test FS SIMD8:         6760 -> 6759 (-0.01%) (scheduled: none)
instructions helped:   shaders/blender/1017.shader_test FS SIMD8:        6760 -> 6759 (-0.01%) (scheduled: none)
instructions helped:   shaders/blender/1045.shader_test FS SIMD8:        6474 -> 6473 (-0.02%) (scheduled: none)
instructions helped:   shaders/blender/723.shader_test FS SIMD8:         6458 -> 6457 (-0.02%) (scheduled: none)
instructions helped:   shaders/blender/1042.shader_test FS SIMD8:        6458 -> 6457 (-0.02%) (scheduled: none)
instructions helped:   shaders/blender/917.shader_test FS SIMD8:         4900 -> 4897 (-0.06%) (scheduled: none)
instructions helped:   shaders/blender/455.shader_test FS SIMD8:         4832 -> 4829 (-0.06%) (scheduled: none)

cycles helped:   shaders/blender/917.shader_test FS SIMD8:         891856 -> 891832 (<.01%) (scheduled: none)
cycles helped:   shaders/blender/455.shader_test FS SIMD8:         894692 -> 894660 (<.01%) (scheduled: none)

total instructions in shared programs: 1596934 -> 1596923 (<.01%)
instructions in affected programs: 42642 -> 42631 (-0.03%)
helped: 7
HURT: 0

- Fossils:

Instrs: 151744378 -> 151741213 (-0.00%)
Cycle count: 16007811131 -> 16007643963 (-0.00%); split: -0.00%, +0.00%

Totals from 1353 (0.21% of 632545) affected shaders:
Instrs: 3925143 -> 3921978 (-0.08%)
Cycle count: 2292838118 -> 2292670950 (-0.01%); split: -0.01%, +0.00%

 RELATIVE IMPROVEMENTS - Instrs                                      Before     After      Delta      Percentage
 mesa/benchmarks/gravity_mark/3e9c48cebaddf012/cs/0                  1947       1941       -6         -0.31%
 mesa/steam-native/red_dead_redemption2/571534e21fb7bd2a/fs.8/0      3431       3421       -10        -0.29%
 mesa/steam-dxvk/batman_arkham_city_goty/d783eacc9ebe324d/fs.8/0     717        715        -2         -0.28%
 mesa/steam-dxvk/batman_arkham_city_goty/14e0878a6a9605c9/fs.8/0     724        722        -2         -0.28%
 mesa/steam-dxvk/batman_arkham_city_goty/d859c2ae858269dc/fs.8/0     744        742        -2         -0.27%
 mesa/steam-dxvk/total_war_warhammer3/18b9d4a3b1961616/vs/0          1539       1535       -4         -0.26%
 mesa/steam-dxvk/total_war_warhammer3/a21827ce57dc0e29/vs/0          1539       1535       -4         -0.26%
(and a bunch of others where the delta is -2, -4 or -6)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31694>
2024-10-23 20:19:48 +00:00
Caio Oliveira
4361a08254 intel/brw: Reduce scope of has_source_and_destination_hazard
This predicate at the moment is only relevant during register
allocation, so move it there and the code can ignore virtual
instructions that were already lowered previously.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Caio Oliveira
6db7d1af16 intel/compiler: Rename shader_stats structs
Add the `brw_` and `elk_` prefixes to the structs to avoid compilation
failure building with LTO ("violates the C++ One Definition Rule") when
the structs diverge.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Lionel Landwerlin
aa494cbacf brw: align spilling offsets to physical register sizes
In commit fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message
width for Xe2 regs.") we aligned the width of scratch messages to
physical register sizes (32B prior to Xe2, 64B for Xe2+).

But our spilling offsets are computed using the register allocations
sizes which are in units of 32B. That means on Xe2, you can end up
spilling a virtual register allocated at 32B (which we use for surface
state computations with exec_all) and then the spilling of that
register will be emitted in SIMD16, having the upper 8 lanes
overwriting the next spilled register.

We could potentially limit spills to SIMD8 messages on Xe2 (only
writing 32B of data), but we're also unlikely to have all 32B virtual
register spilled next to one another. And if not tightly packed, we
would have 64B registers stored on 2 different cachelines which sounds
inefficient.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>
2024-09-04 23:05:31 +00:00
Kenneth Graunke
8bca7e520c intel/brw: Only force g0's liveness to be the whole program if spilling
We don't actually need to extend g0's live range to the EOT message
generally - most messages that end a shader are headerless.  The main
implicit use of g0 is for constructing scratch headers.  With the last
two patches, we now consider scratch access that may exist in the IR
and already extend the liveness appropriately.

There is one remaining problem: spilling.  The register allocator will
create new scratch messages when spilling a register, which need to
create scratch headers, which need g0.  So, every new spill or fill
might extend the live range of g0, which would create new interference,
altering the graph.  This can be problematic.

However, when compiling SIMD16 or SIMD32 fragment shaders, we don't
allow spilling anyway.  So, why not use allow g0?  Also, when trying
various scheduling modes, we first try allocation without spilling.
If it works, great, if not, we try a (hopefully) less aggressive
schedule, and only allow spilling on the lowest-pressure schedule.

So, even for regular SIMD8 shaders, we can potentially gain the use
of g0 on the first few tries at scheduling+allocation.

Once we try to allocate with spilling, we go back to reserving g0
for the entire program, so that we can construct scratch headers at
any point.  We could possibly do better here, but this is simple and
reliable with some benefit.

Thanks to Ian Romanick for suggesting I try this approach.

fossil-db on Alchemist shows some more spill/fill improvements:

   Totals:
   Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47%
   Spill count: 52891 -> 52471 (-0.79%)
   Fill count: 101599 -> 100818 (-0.77%)
   Scratch Memory Size: 3292160 -> 3197952 (-2.86%)

   Totals from 416541 (66.59% of 625484) affected shaders:
   Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01%
   Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67%
   Spill count: 420 -> 0 (-inf%)
   Fill count: 781 -> 0 (-inf%)
   Scratch Memory Size: 94208 -> 0 (-inf%)

Witcher 3 shows a 33% reduction in scratch memory size, for example.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:34 -07:00
Kenneth Graunke
4ca4b064cf intel/brw: Record g0 as live for sends with send_ex_desc_scratch set
brw_send_indirect_split_message() implicitly reads g0 to construct the
extended message descriptor for certain send messages when this is set.

Record that liveness explicitly.

Thanks to Francisco Jerez for reminding me about this use of g0.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:32 -07:00
Kenneth Graunke
9200fb966c intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0
The generator code for emitting legacy scratch headers was implicitly
using g0 as a source.  But the IR wasn't indicating any usage of g0,
which means the liveness isn't properly tracked at the IR level.

It works because we reserve g0 as permanently live for the whole
program.  In order to stop doing that, we need to record it properly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:31 -07:00
Kenneth Graunke
545f20419f intel/brw: Delete fs_reg_alloc::discard_interference_graph()
Unused since commit 50519598ff.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>
2024-08-01 16:37:28 -07:00
Caio Oliveira
b98930c770 intel/brw: Move regalloc and scheduling functions out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
17b7e49089 intel/brw: Move out of fs_visitor and rename print instructions
They use the brw_print prefix now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Matt Turner
5e24c21625 intel/brw: Use REG_CLASS_COUNT
Fixes: 5d87f41a54 ("intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30314>
2024-07-25 14:55:09 +00:00
Kenneth Graunke
c429d5025e intel/brw: Don't force g1's live range to be the entire program
The idea here was that pixel shader framebuffer writes used the g0 and
g1 thread payload register values to construct the message header.
However, most messages are headerless and don't use either.  There's a
2012-era comment that the simulator at one point had a bug where certain
headerless messages would incorrectly take the values from the g0/g1
register contents rather than using sideband.  But, that was likely
fixed eons ago.  So we really don't need to do this.

Furthermore, there are many more shader stages these days:
- VS: r1 contains output URB handles
- TCS: r1 contains ICP handles
- TES: r1 contains gl_TessCoord.x (r4 contains output URB handles)
- GS: r1 contains output URB handles
- CS: r1 contains LocalID.X on DG2+ but nothing on older hardware
- Task/Mesh: r1 contains LocalID.X
- BS: r1 contains bindless stack handles

Vertex and geometry aren't likely to benefit here because r1 is needed
for their output messages, which are also what terminate the shader.
TES will definitely benefit because we were making a value pointlessly
live for the whole program.  Same for TCS, to a lesser extent.  Compute
prior to DG2 was the worst, as g1 literally has no meaningful content,
so there is no point to keeping it live.

fossil-db on Alchemist shows substantial spill/fill improvements:

   Totals:
   Instrs: 148782351 -> 148741996 (-0.03%); split: -0.03%, +0.01%
   Cycles: 12602907531 -> 12605795191 (+0.02%); split: -0.70%, +0.72%
   Subgroup size: 7518608 -> 7518632 (+0.00%)
   Send messages: 7341727 -> 7341762 (+0.00%)
   Spill count: 54633 -> 52575 (-3.77%)
   Fill count: 104694 -> 100680 (-3.83%)
   Scratch Memory Size: 3375104 -> 3287040 (-2.61%)

   Totals from 301172 (48.21% of 624670) affected shaders:
   Instrs: 95531927 -> 95491572 (-0.04%); split: -0.05%, +0.01%
   Cycles: 9643531593 -> 9646419253 (+0.03%); split: -0.91%, +0.94%
   Subgroup size: 4492512 -> 4492536 (+0.00%)
   Send messages: 4399737 -> 4399772 (+0.00%)
   Spill count: 20034 -> 17976 (-10.27%)
   Fill count: 41530 -> 37516 (-9.67%)
   Scratch Memory Size: 1522688 -> 1434624 (-5.78%)

Assassin's Creed Odyssey in particular has 20% fewer fills.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30146>
2024-07-23 02:26:52 +00:00
Caio Oliveira
3670c24740 intel/brw: Replace uses of fs_reg with brw_reg
And remove the fs_reg alias.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Caio Oliveira
d00329e821 intel/brw: Replace some fs_reg constructors with functions
Create three helper functions for ATTR, UNIFORM and VGRF creation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:18 +00:00
Kenneth Graunke
50519598ff intel/brw: Skip discarding the interference graph
We no longer need to reserve registers for constructing spill/fill
messages.  We have split sends and construct message headers in new
temporary registers with a very short lifespan which are simply added
to the existing interference graph as new nodes and allocated via the
normal mechanism.

This means that when we need to spill for the first time, we can avoid
discarding and recomputing the entire interference graph.  We also avoid
needing to recreate all spill candidate information once ra_allocate()
fails, because the graph remains valid, and none of the existing nodes
had any changes to their interference.  The existing spill candidates
remain valid.

This will slightly help improve compile time when needing to spill.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
2024-06-20 09:47:18 +00:00
Kenneth Graunke
29d6264627 intel/brw: Build the scratch header on the fly for pre-LSC systems
Instead of reserving a register to contain the spill header, which
gets marked live for the entire program, we can just emit the ALU
instructions to build it on the fly.  (This is similar to the way
we handle scratch on Alchemist with the newer LSC data port.)

There are a couple of downsides that make this not obviously a win.
First, in order to construct the scratch header on Gfx9-12, we have
to use fields from g0, which will have to remain live anywhere that
scratch access is required.  This could negate the register pressure
benefits of creating the header on the fly.  However, g0 is oft used
in other places anyway, so it may already be there.  Another is that
it's a non-trivial number of ALU instructions to construct the value.
Still, trading lower pressure (so fewer spills, less memory access
and stalls) for more cheap ALU seems like it ought to be a win.

There is another valuable benefit: by not reserving a register, we
eliminate the need to reconstruct the interference graph.  (The next
patch will actually do so.)

shader-db on Icelake shows spills/fills at 54/53 helped, 4/10 hurt,
and an 8% increase in ALU on affected shaders.  Synmark's OglCSDof
(a benchmark that spills) performance remains the same on Alderlake.

fossil-db on Icelake shows a 5.6%/5.1% reduction in spills/fills and a
4% reduction in scratch memory size on affected shaders.  Instruction
counts go up by 11.07%, but cycle estimates only increase by 0.57%.
Assassin's Creed Odyssey and Wolfenstein Youngblood both see 20-30%
reductions in spills/fills, a significant improvement.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
2024-06-20 09:47:18 +00:00
Francisco Jerez
eebc4ec264 intel/brw/xe2+: Round up spill/unspill data size to nearest reg_size multiple.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
2024-05-15 17:16:52 +00:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Francisco Jerez
74efde7663 intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Francisco Jerez
fa96274a87 intel/brw/xehp+: Replace lsc_msg_desc_dest_len()/lsc_msg_desc_src0_len() with helpers to do the computation.
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect.  Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Kenneth Graunke
91252c98a8 intel/brw: Add assertions that EOT messages live in g112+
The validator already catches this, but asserting here makes it easier
to catch the problem earlier in a debugger.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
2024-03-05 11:39:26 +00:00
Kenneth Graunke
97bf3d3b2d intel/brw: Replace CS_OPCODE_CS_TERMINATE with SHADER_OPCODE_SEND
There's no need for special handling here, it's just a send message
with a trivial g0 header and descriptor.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27924>
2024-03-05 11:16:20 +00:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Caio Oliveira
559d94cd0d intel/brw: Use fs_visitor instead of backend_shader in various passes
And since we are touching them, rename a couple of passes
to follow same name convention as existing ones.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:05 +00:00
Caio Oliveira
35b07ab035 intel/brw: Use a single register set
Different sets were needed for SIMD8/SIMD16 in old Gfx versions, but now
we can use a single one regardless of the SIMD size.

Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
8f3c52c1da intel/brw: Remove MRF type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5c93a0e125 intel/brw: Remove Gfx8- remaining opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
ed6f0665e0 intel/brw: Remove Gfx8- code from register allocator
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
cf730adc58 intel/compiler: Make fs_builder include fs_visitor and not the other way
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
2d6240ab14 intel/compiler: Don't use fs_visitor::bld in fs_reg_alloc
Just set up the builder without relying on the pre-existing one.  Moves
one step close to remove bld from fs_visitor.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26301>
2023-11-28 19:53:51 +00:00
Jordan Justen
c28539a2fe intel/compiler: Use enum xe2_lsc_cache_load on xe2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
2023-09-27 23:57:25 +00:00
Francisco Jerez
cef4d53daf intel/xe2+: Round up size to reg_unit() in fs_reg_alloc::alloc_spill_reg().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
fe3d90aedf intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
571ddf8516 intel/fs/xe2+: Fix payload node live range calculations for change in register size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
2b7419d090 intel/fs: Fix signedness of payload_node_count argument of calculate_payload_ranges().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
6d39b3d6ae intel/fs/ra/xe2: Scale up register allocation granularity by 2x on Xe2+ platforms.
v2: Fix spill register allocation.  Switch to brw_reg::nr
    representation in fake 256b units.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
bd98df5d8e intel/compiler: Make MAX_VGRF_SIZE macro depend on devinfo and update it for Xe2.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
5d87f41a54 intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.
Rework:
 * Jordan: 16=>20 following d33aff783d ("intel/fs: add support for
   sparse accesses")

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:35 -07:00
Jason Ekstrand
739e21fa9a intel/fs: Add a parameter to speed up register spilling
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24299>
2023-07-28 14:51:42 +00:00