Commit graph

3995 commits

Author SHA1 Message Date
Caio Oliveira
f0fe0026c0 intel/brw: Remove extra wrapping around fs_visitor in tests
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33100>
2025-01-18 07:41:35 -08:00
Caio Oliveira
94fa449318 intel/brw: Add missing cases to flags_written()
These virtual opcodes will write the whole flag set, either directly
(via brw_fill_flag()) or indirectly by using LOAD_LIVE_CHANNELS.

Issue was found when analysing a hang that would disappear
if the lowering of those opcodes was pulled all the way up
right before brw_opt_cmod_propagation (which uses the
flags_written).

Fixes: 019770f026 ("intel/brw: Add SHADER_OPCODE_VOTE_*")
Fixes: 2bd7592b0b ("intel/brw: Add SHADER_OPCODE_BALLOT")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12347
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12479
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33085>
2025-01-18 05:30:23 +00:00
Lionel Landwerlin
d63b5fc8c5 brw: handle load_printf_buffer_size intrinsic
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>
2025-01-17 18:09:45 +00:00
Alyssa Rosenzweig
e7a1d704d0 intel: set max_buffer_size to nir_lower_printf
instead of relying on an implicit value which doesn't make much sense.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33067>
2025-01-17 18:09:45 +00:00
Caio Oliveira
0b310ae4d8 intel/brw: Rename fs_generator to brw_generator
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>
2025-01-17 00:04:41 +00:00
Caio Oliveira
3659934862 intel/brw: Add brw_generator.h header
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>
2025-01-17 00:04:41 +00:00
Caio Oliveira
a5a9f42a39 intel/brw: Rename brw_fs_generator.cpp to brw_generator.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>
2025-01-17 00:04:41 +00:00
Lionel Landwerlin
2774fb32e6 brw: fix coarse_z computation on Xe2+
The payload format changed and we forgot to update this path.

Putting a Fixes: commit that is kind of related but probably not the
source of the issue.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12031
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11871
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12042
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12339
Fixes: 4672fcbc76 ("intel/fs: Fix PS thread payload setup for depth_w_coef_reg.")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33029>
2025-01-16 07:19:57 +00:00
Caio Oliveira
634daf2827 intel/brw: Rename brw_fs_validate to brw_validate
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32843>
2025-01-13 23:56:22 +00:00
Kenneth Graunke
894393470a brw: Fix Xe2 spilling code to limit to SIMD32 rather than SIMD16
LSC can do native SIMD32 messages on Xe2.

Cuts spill/fills on Lunarlake:
- q2rtx-rt-pipeline: -20.83% / -16.85%
- Borderlands 3 DX12: -18.26% / -2.09%
- Cyberpunk 2077: -2.18% / -0.11%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32986>
2025-01-11 09:33:09 +00:00
Lionel Landwerlin
8ac7802ac8 brw: move final send lowering up into the IR
Because we do emit the final send message form in code generation, a
lot of emissions look like this :

  add(8)  vgrf0,    u0, 0x100
  mov(1)   a0.1, vgrf0          # emitted by the generator
  send(8)   ...,  a0.1

By moving address register manipulation in the IR, we can get this
down to :

  add(1)  a0.1,   u0, 0x100
  send(8)  ..., a0.1

This reduce register pressure around some send messages by 1 vgrf.

All lost shaders in the below results are fragment SIMD32, due to the
throughput estimator. If turned off, we loose no SIMD32 shaders with
this change.

DG2 results:

  Assassin's Creed Valhalla:
  Totals from 2044 (96.87% of 2110) affected shaders:
  Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00%
  Subgroup size: 23832 -> 23824 (-0.03%)
  Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82%
  Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39%
  Fill count: 2005 -> 1256 (-37.36%)
  Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00%
  Max live registers: 116765 -> 115058 (-1.46%)
  Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67%

  Cyberpunk 2077:
  Totals from 1181 (93.43% of 1264) affected shaders:
  Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01%
  Subgroup size: 13016 -> 13032 (+0.12%)
  Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39%
  Spill count: 12 -> 8 (-33.33%)
  Fill count: 9 -> 6 (-33.33%)

  Dota2:
  Totals from 173 (11.59% of 1493) affected shaders:
  Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34%
  Max live registers: 5787 -> 5779 (-0.14%)
  Max dispatch width: 1344 -> 1152 (-14.29%)

  Hitman3:
  Totals from 5072 (95.39% of 5317) affected shaders:
  Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00%
  Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48%
  Spill count: 3942 -> 3200 (-18.82%)
  Fill count: 10158 -> 8846 (-12.92%)
  Scratch Memory Size: 257024 -> 223232 (-13.15%)
  Max live registers: 328467 -> 324631 (-1.17%)
  Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73%

  Fortnite:
  Totals from 360 (4.82% of 7472) affected shaders:
  Instrs: 778068 -> 777925 (-0.02%)
  Subgroup size: 3128 -> 3136 (+0.26%)
  Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19%
  Max live registers: 50689 -> 50658 (-0.06%)

  Hogwarts Legacy:
  Totals from 1376 (84.00% of 1638) affected shaders:
  Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03%
  Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12%
  Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36%
  Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23%
  Scratch Memory Size: 99328 -> 89088 (-10.31%)
  Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23%
  Max dispatch width: 11848 -> 11920 (+0.61%)

  Metro Exodus:
  Totals from 92 (0.21% of 43072) affected shaders:
  Instrs: 262995 -> 262968 (-0.01%)
  Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25%
  Max live registers: 11152 -> 11140 (-0.11%)

  Red Dead Redemption 2 :
  Totals from 451 (7.71% of 5847) affected shaders:
  Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00%
  Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00%
  Max live registers: 42294 -> 42185 (-0.26%)

  Spiderman Remastered:
  Totals from 6820 (98.02% of 6958) affected shaders:
  Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65%
  Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25%
  Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61%
  Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58%
  Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74%
  Max live registers: 493149 -> 487458 (-1.15%)
  Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20%

  Strange Brigade:
  Totals from 3769 (91.21% of 4132) affected shaders:
  Instrs: 1354476 -> 1321474 (-2.44%)
  Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59%
  Max live registers: 199057 -> 193656 (-2.71%)
  Max dispatch width: 30272 -> 30240 (-0.11%)

  Witcher 3:
  Totals from 25 (2.40% of 1041) affected shaders:
  Instrs: 24621 -> 24606 (-0.06%)
  Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05%
  Max live registers: 1963 -> 1955 (-0.41%)

LNL results:

  Assassin's Creed Valhalla:
  Totals from 1928 (98.02% of 1967) affected shaders:
  Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11%
  Subgroup size: 41264 -> 41280 (+0.04%)
  Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11%
  Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90%
  Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60%
  Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56%
  Max live registers: 205483 -> 202192 (-1.60%)

  Cyberpunk 2077:
  Totals from 1177 (96.40% of 1221) affected shaders:
  Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03%
  Subgroup size: 24912 -> 24944 (+0.13%)
  Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81%
  Spill count: 8 -> 3 (-62.50%)
  Fill count: 6 -> 3 (-50.00%)
  Max live registers: 126922 -> 125472 (-1.14%)

  Dota2:
  Totals from 428 (32.47% of 1318) affected shaders:
  Instrs: 89355 -> 89740 (+0.43%)
  Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55%
  Max live registers: 32863 -> 32847 (-0.05%)

  Fortnite:
  Totals from 5354 (81.72% of 6552) affected shaders:
  Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53%
  Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65%
  Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72%
  Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35%
  Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71%

  Hitman3:
  Totals from 4912 (97.09% of 5059) affected shaders:
  Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00%
  Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55%
  Spill count: 3739 -> 3136 (-16.13%)
  Fill count: 10657 -> 9564 (-10.26%)
  Scratch Memory Size: 373760 -> 318464 (-14.79%)
  Max live registers: 597566 -> 589460 (-1.36%)

  Hogwarts Legacy:
  Totals from 1471 (96.33% of 1527) affected shaders:
  Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05%
  Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68%
  Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95%
  Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83%
  Scratch Memory Size: 251904 -> 217088 (-13.82%)
  Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12%

  Metro Exodus:
  Totals from 18356 (49.81% of 36854) affected shaders:
  Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83%
  Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84%
  Spill count: 595 -> 546 (-8.24%)
  Fill count: 1604 -> 1408 (-12.22%)
  Max live registers: 2086937 -> 2086933 (-0.00%)

  Red Dead Redemption 2:
  Totals from 4171 (79.31% of 5259) affected shaders:
  Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83%
  Subgroup size: 86416 -> 86432 (+0.02%)
  Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53%
  Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59%
  Scratch Memory Size: 401408 -> 385024 (-4.08%)

  Spiderman Remastered:
  Totals from 6639 (98.94% of 6710) affected shaders:
  Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98%
  Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59%
  Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82%
  Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76%
  Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17%
  Max live registers: 918240 -> 906604 (-1.27%)

  Strange Brigade:
  Totals from 3675 (92.24% of 3984) affected shaders:
  Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00%
  Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09%
  Max live registers: 361849 -> 351265 (-2.92%)

  Witcher 3:
  Totals from 13 (46.43% of 28) affected shaders:
  Instrs: 593 -> 660 (+11.30%)
  Cycle count: 28302 -> 28714 (+1.46%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
a27d98e933 brw: avoid having the scratch surface handle partially written
Allows it to be visible through the def_analysis.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
aac906c16c brw: add scheduler support for address registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
0a5bdf1199 brw: add infra to make use of the address register in the IR
This limits the address register to simple cases inside a block.

Validation ensures that the address register is only written once and
read once.

Instruction scheduling makes sure that instructions using the address
register in the generator are not scheduled while there is an usage of
the register in the IR.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
c9fa235c28 brw: split validation iteration into blocks
No functional change.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
9b73a73a6e brw: use phys_nr() more in generation
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
b110b06447 brw: introduce a new register type for the address register
We want to reuse the brw::nr field as a virtual address register
identifer. So we can't use brw::file=ARF brw::nr=ADDRESS.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Kenneth Graunke
de1eaa4019 brw: Always use MEMORY_LOAD for load_ubo_uniform_block_intel intrinsics
Rather than emitting FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD to do block
loads that were cacheline aligned, loading entire cachelines at a time,
we now rely on NIR passes to group, CSE, and vectorize things into
appropriately sized blocks.  This means that we'll usually still load
a cacheline, but we may load only 32B if we don't actually need anything
from the full 64B.  Prior to Xe2, this saves us registers, and it ought
to save us some bandwidth as well as the response length can be lowered.

The cacheline-aligning hack was the main reason not to simply call
fs_nir_emit_memory_access(), so now we do that instead, porting yet
one more thing to the common memory opcode framework.

We unfortunately still emit the old FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD
opcode for non-block intrinsics.  We'd have to clean up 16-bit handling
among other things in order to eliminate this, but we should in the
future.

fossil-db results on Alchemist for this and the previous patch together:

   Instrs: 161481888 -> 161297588 (-0.11%); split: -0.12%, +0.01%
   Subgroup size: 8102976 -> 8103000 (+0.00%)
   Send messages: 7895489 -> 7846178 (-0.62%); split: -0.67%, +0.05%
   Cycle count: 16583127302 -> 16703162264 (+0.72%); split: -0.57%, +1.29%
   Spill count: 72316 -> 67212 (-7.06%); split: -7.25%, +0.19%
   Fill count: 134457 -> 125970 (-6.31%); split: -6.83%, +0.52%
   Scratch Memory Size: 4093952 -> 3787776 (-7.48%); split: -7.53%, +0.05%
   Max live registers: 33037765 -> 32947425 (-0.27%); split: -0.28%, +0.00%
   Max dispatch width: 5780288 -> 5778536 (-0.03%); split: +0.17%, -0.20%
   Non SSA regs after NIR: 177862542 -> 178816944 (+0.54%); split: -0.06%, +0.60%

In particular, several titles see incredible reductions in spill/fills:

   Shadow of the Tomb Raider: -65.96% / -65.44%
   Batman: Arkham City GOTY:  -53.49% / -28.57%
   Witcher 3:                 -16.33% / -14.29%
   Total War: Warhammer III:   -9.60% / -10.14%
   Assassins Creed Odyssey:    -6.50% /  -9.92%
   Red Dead Redemption 2:      -6.77% /  -8.88%
   Far Cry: New Dawn:          -7.97% /  -4.53%

Improves performance in many games on Arc A750:

   Cyberpunk 2077: 5.8%
   Witcher 3: 4%
   Shadow of the Tomb Raider: 3.3%
   Assassins Creed: Valhalla: 3%
   Spiderman Remastered: 2.75%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
21636ff9fa brw: Align and combine constant-offset UBO loads in NIR
The hope here is to replace our backend handling for loading whole
cachelines at a time from UBOs into NIR-based handling, which plays
nicely with the NIR load/store vectorizer.

Rounding down offsets to multiples of 64B allows us to globally CSE
UBO loads across basic blocks.  This is really useful.  However, blindly
rounding down the offset to a multiple of 64B can trigger anti-patterns
where...a single unaligned memory load could have hit all the necessary
data, but rounding it down split it into two loads.

By moving this to NIR, we gain more control of the interplay between
nir_opt_load_store_vectorize and this rebasing and CSE'ing.  The backend
can then simply load between nir_def_{first,last}_component_read() and
trust that our NIR has the loads blockified appropriately.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
36d0485ae4 brw: Allow CSE of MEMORY_MODE_CONSTANT loads
This matches the behavior of FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
7ce66e2b61 brw: Add a new MEMORY_MODE_CONSTANT option
This will translate to HDC Constant Cache loads or LSC UGM loads.

On LSC, MEMORY_MODE_UNTYPED would be fine, but for HDC we need to
distinguish between the regular and constant cache data ports.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
cfbb5ebcdd brw: Skip unread leading/trailing components in convergent block loads
The NIR vectorizer may produce block loads with unread trailing
components.  Upcoming passes may produce unread leading components
as well.  With a bit of finesse, we can skip loading those, and only
bother with the ones we actually need.  This can sometimes save us on
loads and MOVs.

v2: Skip this for SLM reads on pre-LSC platforms (caught by Lionel).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
4f0c852a4e brw: Skip unnecessary work for trivial emit_uniformize of IMMs
If we pass an immediate, just trivially return that immediate.

This preserves the property that if x was an IMM, emit_uniformize(x)
will also be an IMM, without the need for optimizations to eliminate
unnecessary operations.  That way, you can call emit_uniformize() on
a value and still check whether it's constant afterwards.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
a0b1e07976 brw: Make get_nir_src_imm() usable for non-32-bit-sizes.
We return an immediate for 32-bit constant values, but fall back to
calling get_nir_src() for other values, as 64-bit, and even 8-bit
immediates have odd restrictions.  We could probably support 16-bit
here without too many issues, but we leave it be for now.

This makes it usable for case where we'd like to get constants for
32-bit values but where it may be a different bit-size too.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
03f948f5fd brw: Skip fetching unread leading components of UBO loads
We were already skipping unread trailing components, but now we skip
them on both ends.

About -3.5% spills on Shadow of the Tomb Raider on Alchemist (mostly a
wash elsewhere, but it will help additional shaders with later patches).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Kenneth Graunke
c8b2ab041e brw: Add more safeguards against misaligned OWord Block messages
HDC doesn't support block loads/stores with sub-DWord (<4B) aligned
offsets, and shared local memory has to use the Aligned OWord Block
messages which require OWord (16B) alignment.

Make the validator detect this case and say no.  Also make the lowering
code assert that the alignment is valid as a second line of defense.

LSC has no such restrictions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888>
2025-01-10 22:44:09 +00:00
Caio Oliveira
7fadd864dd intel/elk: Fix typo in assertion
Just assert that the array will fit whatever the MAX is for a given
Gfx version.

Fixes: 172c1ab984 ("intel/elk: Add ELK_MAX_MRF_ALL for static allocating arrays")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32978>
2025-01-10 20:16:59 +00:00
Caio Oliveira
c9e667b7ad intel/elk: Remove uses of VLAs
Was causing trouble in some build configurations, we don't really need
them.  Unless there's a good reason, defaults to use ralloc for
consistency with the larger codebase.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Caio Oliveira
172c1ab984 intel/elk: Add ELK_MAX_MRF_ALL for static allocating arrays
Replace usage of variable length arrays.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Caio Oliveira
4d43ee0dd6 intel/brw: Remove uses of VLAs
Was causing trouble in some build configurations, we don't really need
them.  Use ralloc for consistency.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Caio Oliveira
faf4c35b74 intel/compiler: Use linear allocator for ACP trees in copy-prop
Replace usage of variable length array.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Caio Oliveira
e6a3770433 intel/compiler: Use INFINITY spill cost to represent no_spill
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
2025-01-10 07:05:35 +00:00
Vinson Lee
83809f06a7 intel/elk: Fix assert with side effect
Fix defect reported by Coverity Scan.

Side effect in assertion (ASSERT_SIDE_EFFECT)
assert_side_effect: Argument ++eot_count of assert() has a side effect.
The containing function might work differently in a non-debug build.

Fixes: ebd6738260 ("intel/elk/chv: Implement WaClearArfDependenciesBeforeEot")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32884>
2025-01-09 04:07:42 +00:00
Kenneth Graunke
35f175301d brw: Fix vectorizer hole_size condition after signedness change
Marek recently changed hole_size to be signed, rather than unsigned.
A negative hole_size means that the two loads overlap - and thus are
prime candidates to be combined.

My original hole_size handling was:

   if hole_size > 4 * (8 - low->num_components) then don't vectorize

For non-overlapping loads, this worked: NIR's largest vector is vec16,
and if low was already a vec16, combining it with anything would exceed
that, so it'd never be considered.  That meant low would always be a
vec8 or less, so (8 - low->num_components) was a positive number.

Now that we see overlapping loads, we can see a vec16 low, vec4 high,
and also a negative hole size, giving us fun comparisons like:

   -16 > 4 * (8 - 16)   =>   -16 > -32   => true, don't vectorize

Which is absolutely the wrong thing to do, because the high load's data
is entirely included within the former load's data.

The idea here was to make sure the second load would be able to pack at
least one component into the first's V8 result.  But even this isn't the
best, because...even if it's simply adjacent, doing one V16 load is more
efficient than requesting two back to back V8 loads.

So, we just simplify down to a static check: if there's an entire V8 of
hole, don't vectorize.  This already won't happen because the core pass
has max_hole set to 28 bytes (7 32-bit components), but that could
change based on the needs of other drivers, so let's be defensive.

fossil-db results on Alchemist:

   Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05%
   Subgroup size: 8092544 -> 8092568 (+0.00%)
   Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05%
   Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35%
   Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29%
   Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80%
   Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05%
   Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01%
   Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26%
   Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03%

Fixes: c21bc65ba7 ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32932>
2025-01-08 00:19:54 +00:00
Caio Oliveira
868016d92c intel/brw/xe2+: Do not use $.dst or $.src SWSB annotations in SENDs
When a SEND instruction is a EOT, the scoreboard lowering will not
allocate a new SBID for it, since nothing needs to wait for it.  In
Gfx12 this allowed the SEND to get out-of-order $.dst or $.src
dependencies.

Starting on Xe2+ this is not supported anymore, in favor of supporting
more combined modes.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32712>
2025-01-07 22:23:59 +00:00
Tapani Pälli
1cc17e9ce9 intel/compiler: take reg_unit size into account with ubo ranges
Fixes: 1ab4fe2dd6 ("brw: Don't shrink UBO push ranges in the backend")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12423
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32925>
2025-01-07 21:38:06 +00:00
Kenneth Graunke
4ab04799ee brw: Delete assign_constant_locations and push_constant_loc[]
The push_constant_loc[] array is always an identity mapping these days,
so it's kind of pointless.  Just use the original uniform number and
skip the unnecessary "remap" step.  With that gone, and shrinking UBO
ranges gone, assign_constant_locations() is now empty and can be removed
as well.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Kenneth Graunke
93e186e1a4 brw: Delete pull constant lowering
Now that we never shrink ranges in the backend, we never lower push
constants to pull constants late in the backend either.  get_pull_loc
will never return true, and so all of brw_lower_constant_loads becomes
a noop.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Kenneth Graunke
1ab4fe2dd6 brw: Don't shrink UBO push ranges in the backend
Back in the bad old days (vec4?) we had a bunch of smarts in the backend
to dead code eliminate unused vector components and re-pack regular
uniforms, so we really couldn't decide how much data we were pushing
until very late in the backend.  Nowadays we have none of that - we do
all of our elimination and packing in NIR.  anv shrinks ranges to deal
with Vulkan API push constants, and iris treats everything as a UBO and
as of the previous commit will also shrink appropriately.

So we don't need to do this anymore...which will let us simplify quite
a bit of code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Kenneth Graunke
583ad35455 brw: Limit maximum push UBO ranges to 64 registers in the NIR pass.
anv already does this limiting, since it needs to handle non-UBO push
constants as well.  iris treats everything as a UBO, but doesn't have
a limiter and was relying on the backend to handle it.

Do this in the NIR pass so that we can eliminate the backend code.
It's not necessary for anv, but handling it here is simple and less
error prone for iris, which calls this in a number of places.  We know
we need to limit things to this much; anv can limit more if needed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Caio Oliveira
6968794c50 intel/brw: Add missing bits in 3-src SWSB encoding for Xe2+
Fix invalid SWSB annotation in dEQP-VK.glsl.builtin.precision.mix.mediump.vec4 for LNL.

Fixes: 4a24f49b57 ("intel/compiler/xe2: Implement codegen of three-source instructions.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32846>
2025-01-03 21:19:26 +00:00
Caio Oliveira
e1aebf8a0c intel/brw: Remove 'fs' prefix from passes and related functions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>
2025-01-02 18:11:05 +00:00
Caio Oliveira
25384dccc0 intel/brw: Remove 'fs' prefix from passes filenames
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>
2025-01-02 18:11:05 +00:00
Marek Olšák
c21bc65ba7 nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.

Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
2025-01-01 00:03:55 +00:00
Caio Oliveira
056b14b882 intel/brw: Move two NIR passes to brw_nir.c
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32799>
2024-12-30 20:18:23 +00:00
Caio Oliveira
1154b07d09 intel/brw: Add missing call to invalidate analysis
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32798>
2024-12-30 19:01:40 +00:00
Caio Oliveira
3ca6fa7487 intel/brw: Gather brw_reg related implementations in brw_reg.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32800>
2024-12-30 18:26:59 +00:00
Caio Oliveira
5860e07f92 intel/brw: Rename brw_compact_inst_* helpers to brw_eu_compact_inst_*
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira
228aba779f intel/brw: Rename brw_inst_* helpers to brw_eu_inst_*
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira
3031b22a8a intel/brw: Rename brw_inst_bits/set_bits to brw_eu_inst_bits/set_bits
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00