Commit graph

5257 commits

Author SHA1 Message Date
Kenneth Graunke
f873cfd7a0 brw: Pass devinfo to lower_bit_size, not compiler
We only need devinfo.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:48 +00:00
Kenneth Graunke
1df2158f50 brw: Delete use_bindless_sampler_offset flag
No drivers use this.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:48 +00:00
Kenneth Graunke
4bdef9824a anv, brw: Consolidate ex_bso bits to a static devinfo inline
If we have extended bindless surface offset (ExBSO) support, we want to
use it.  Consolidate the anv_physical_device and brw_compiler bits into
a single static inline that take devinfo.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:47 +00:00
Kenneth Graunke
aa939db0c5 iris: Move recompile debugging to work on iris program keys
iris decides to do recompiles or not based on its own program keys,
not the brw or elk keys.  So, it makes sense to handle the "why did
we have to recompile a new variant" debugging based on those keys as
well.  It also unifies the code, eliminating a brw/elk split, so it's
actually less code.

Additionally, this was the only remaining user of the brw code, so we
can delete that, resulting in even larger cleanups.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:42 +00:00
Kenneth Graunke
d013ef4c0f brw: Make use_tcs_multi_patch a static inline taking devinfo
This simplifies some iris wrapping for multiple compilers and also
saves some space in the brw_compiler singleton.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:42 +00:00
Kenneth Graunke
9531c6b89e brw: Make indirect_ubos_use_sampler a static inline bool taking devinfo
Having the named field allowed us to indicate that our code conditions
are referring to the specific decision about how we handle indirect
UBOs, rather than some other arbitrary hardware change.

Still, there's no need to store this in a singleton struct - we can
easily have a static inline bool that does the devinfo check for us.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:42 +00:00
Kenneth Graunke
de03b38daa intel/elk, hasvk: Drop indirect_ubos_use_sampler option and DP code
This is always set to true for elk platforms.  No need for the option.

crocus also assumes that we take the sampler path.  hasvk had support
for both paths (leftover from when the driver still supported Gfx12).

We started using HDC messages for indirect UBO access on Tigerlake
(Gfx12.x) because of cache reworks that made it more viable.  On all
prior platforms, we used the sampler because it has additional L1/L2
caches that the dataport lacks.  Additionally, Ivybridge and nearby
platforms had notoriously slow L3 access in some very common cases.

Note that we do use the dataport for constant-offset UBO access,
since we can combine many reads into larger block loads.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:42 +00:00
Ian Romanick
df704bd38e elk: Call nir_opt_algebraic_late in elk_postprocess_nir
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Make sure that lowering undone in elk_nir_optimize are reapplied.

No shader-db or fossil-db changes on any Intel platform. This is most
likely to impact either Gfx8 on ANV or Gfx7.5 on HASVK. I don't
fossil-db test either of those platforms.

I tried doing a similar thing here as is done in BRW (previous commit),
but that caused a couple Haswell shaders to fall off a performance
cliff:

total spills in shared programs: 8247 -> 8311 (0.78%)
spills in affected programs: 6 -> 70 (1066.67%)
helped: 0 / HURT: 2

total fills in shared programs: 8558 -> 8910 (4.11%)
fills in affected programs: 6 -> 358 (5866.67%)
helped: 0 / HURT: 2

Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>
2026-02-14 02:06:59 +00:00
Ian Romanick
11b96a84b0 brw: Call nir_opt_algebraic_late later in brw_postprocess_nir_opts
Move the call to nir_opt_algebraic_late after the last time
brw_nir_optimize might be called. nir_opt_algebraic_distribute_src_mods
works together with the late algebraic optimizations, so move it also.

shader-db:

Lunar Lake
total instructions in shared programs: 17081222 -> 17080842 (<.01%)
instructions in affected programs: 419931 -> 419551 (-0.09%)
helped: 545 / HURT: 826

total cycles in shared programs: 878437752 -> 879236226 (0.09%)
cycles in affected programs: 506003142 -> 506801616 (0.16%)
helped: 3091 / HURT: 3189

LOST:   18
GAINED: 16

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19994270 -> 19993231 (<.01%)
instructions in affected programs: 490499 -> 489460 (-0.21%)
helped: 660 / HURT: 800

total cycles in shared programs: 882498776 -> 882834186 (0.04%)
cycles in affected programs: 477858602 -> 478194012 (0.07%)
helped: 3458 / HURT: 3564

total fills in shared programs: 4371 -> 4370 (-0.02%)
fills in affected programs: 7 -> 6 (-14.29%)
helped: 1 / HURT: 0

LOST:   28
GAINED: 10

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
total instructions in shared programs: 19943849 -> 19942782 (<.01%)
instructions in affected programs: 467384 -> 466317 (-0.23%)
helped: 655 / HURT: 796

total cycles in shared programs: 860085674 -> 861410289 (0.15%)
cycles in affected programs: 426900998 -> 428225613 (0.31%)
helped: 3250 / HURT: 3441

LOST:   19
GAINED: 14

fossil-db:

Lunar Lake
Totals:
Instrs: 926472091 -> 926204838 (-0.03%); split: -0.04%, +0.01%
CodeSize: 14845921056 -> 14842776112 (-0.02%); split: -0.10%, +0.08%
Send messages: 41459570 -> 41459574 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104481085069 -> 104583692712 (+0.10%); split: -0.14%, +0.24%
Spill count: 3454651 -> 3457340 (+0.08%); split: -0.15%, +0.23%
Fill count: 4958779 -> 4958487 (-0.01%); split: -0.46%, +0.45%
Max live registers: 193805970 -> 193839002 (+0.02%); split: -0.00%, +0.02%
Max dispatch width: 49114416 -> 49113776 (-0.00%); split: +0.01%, -0.01%
Non SSA regs after NIR: 142953905 -> 142800740 (-0.11%); split: -0.12%, +0.01%

Totals from 420256 (20.80% of 2020128) affected shaders:
Instrs: 448571327 -> 448304074 (-0.06%); split: -0.09%, +0.03%
CodeSize: 7312002800 -> 7308857856 (-0.04%); split: -0.21%, +0.17%
Send messages: 17716494 -> 17716498 (+0.00%); split: -0.00%, +0.00%
Cycle count: 52178854998 -> 52281462641 (+0.20%); split: -0.28%, +0.48%
Spill count: 2945654 -> 2948343 (+0.09%); split: -0.17%, +0.26%
Fill count: 4404768 -> 4404476 (-0.01%); split: -0.51%, +0.51%
Max live registers: 60875448 -> 60908480 (+0.05%); split: -0.01%, +0.06%
Max dispatch width: 9455280 -> 9454640 (-0.01%); split: +0.04%, -0.04%
Non SSA regs after NIR: 60542740 -> 60389575 (-0.25%); split: -0.28%, +0.02%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1000081384 -> 999726726 (-0.04%); split: -0.05%, +0.01%
CodeSize: 16764458080 -> 16761624256 (-0.02%); split: -0.09%, +0.07%
Subgroup size: 27599528 -> 27599544 (+0.00%)
Send messages: 45538933 -> 45538951 (+0.00%); split: -0.00%, +0.00%
Cycle count: 93303830912 -> 93370118192 (+0.07%); split: -0.19%, +0.26%
Spill count: 3739306 -> 3739719 (+0.01%); split: -0.22%, +0.23%
Fill count: 5089719 -> 5083626 (-0.12%); split: -0.56%, +0.44%
Max live registers: 122041364 -> 122055848 (+0.01%); split: -0.00%, +0.01%
Max dispatch width: 38117296 -> 38127200 (+0.03%); split: +0.06%, -0.03%
Non SSA regs after NIR: 164296197 -> 164299306 (+0.00%); split: -0.01%, +0.01%

Totals from 338754 (14.82% of 2285730) affected shaders:
Instrs: 452723479 -> 452368821 (-0.08%); split: -0.10%, +0.03%
CodeSize: 7861878032 -> 7859044208 (-0.04%); split: -0.19%, +0.16%
Subgroup size: 16 -> 32 (+100.00%)
Send messages: 17050010 -> 17050028 (+0.00%); split: -0.00%, +0.00%
Cycle count: 52881801997 -> 52948089277 (+0.13%); split: -0.33%, +0.46%
Spill count: 3271458 -> 3271871 (+0.01%); split: -0.25%, +0.26%
Fill count: 4628422 -> 4622329 (-0.13%); split: -0.61%, +0.48%
Max live registers: 30738902 -> 30753386 (+0.05%); split: -0.01%, +0.06%
Max dispatch width: 4787264 -> 4797168 (+0.21%); split: +0.47%, -0.26%
Non SSA regs after NIR: 61748026 -> 61751135 (+0.01%); split: -0.03%, +0.03%

Tiger Lake
Totals:
Instrs: 1011068379 -> 1010977290 (-0.01%); split: -0.03%, +0.02%
CodeSize: 14197751744 -> 14197683040 (-0.00%); split: -0.07%, +0.07%
Send messages: 46431228 -> 46431220 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85066526419 -> 85085088071 (+0.02%); split: -0.16%, +0.18%
Spill count: 3853750 -> 3855185 (+0.04%); split: -0.15%, +0.19%
Fill count: 6716746 -> 6719594 (+0.04%); split: -0.25%, +0.29%
Max live registers: 122307387 -> 122326083 (+0.02%); split: -0.00%, +0.02%
Max dispatch width: 38009632 -> 38003280 (-0.02%); split: +0.03%, -0.05%
Non SSA regs after NIR: 158403572 -> 158415390 (+0.01%); split: -0.01%, +0.02%

Totals from 277728 (12.17% of 2281577) affected shaders:
Instrs: 349206856 -> 349115767 (-0.03%); split: -0.07%, +0.05%
CodeSize: 5042621104 -> 5042552400 (-0.00%); split: -0.20%, +0.20%
Send messages: 13132243 -> 13132235 (-0.00%); split: -0.00%, +0.00%
Cycle count: 36183327716 -> 36201889368 (+0.05%); split: -0.38%, +0.43%
Spill count: 2210072 -> 2211507 (+0.06%); split: -0.26%, +0.33%
Fill count: 4188439 -> 4191287 (+0.07%); split: -0.39%, +0.46%
Max live registers: 24956695 -> 24975391 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 3948832 -> 3942480 (-0.16%); split: +0.32%, -0.48%
Non SSA regs after NIR: 45616425 -> 45628243 (+0.03%); split: -0.04%, +0.06%

Ice Lake
Totals:
Instrs: 1009584306 -> 1009411757 (-0.02%); split: -0.02%, +0.01%
CodeSize: 12593466880 -> 12592958096 (-0.00%); split: -0.01%, +0.01%
Send messages: 47274203 -> 47274171 (-0.00%); split: -0.00%, +0.00%
Cycle count: 84920281455 -> 84914027301 (-0.01%); split: -0.05%, +0.04%
Spill count: 2988523 -> 2986191 (-0.08%); split: -0.14%, +0.07%
Fill count: 5296078 -> 5288737 (-0.14%); split: -0.21%, +0.07%
Max live registers: 125429384 -> 125444786 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 41269072 -> 41267312 (-0.00%); split: +0.03%, -0.03%
Non SSA regs after NIR: 163223895 -> 163236623 (+0.01%); split: -0.01%, +0.02%

Totals from 243818 (10.45% of 2334244) affected shaders:
Instrs: 296953759 -> 296781210 (-0.06%); split: -0.08%, +0.02%
CodeSize: 3643224480 -> 3642715696 (-0.01%); split: -0.04%, +0.03%
Send messages: 11518671 -> 11518639 (-0.00%); split: -0.00%, +0.00%
Cycle count: 33065548412 -> 33059294258 (-0.02%); split: -0.13%, +0.11%
Spill count: 1346515 -> 1344183 (-0.17%); split: -0.32%, +0.15%
Fill count: 2537906 -> 2530565 (-0.29%); split: -0.43%, +0.14%
Max live registers: 21476776 -> 21492178 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 3727288 -> 3725528 (-0.05%); split: +0.31%, -0.35%
Non SSA regs after NIR: 41050474 -> 41063202 (+0.03%); split: -0.04%, +0.07%

Skylake
Totals:
Instrs: 513573157 -> 513462971 (-0.02%); split: -0.02%, +0.00%
CodeSize: 5950280672 -> 5950001392 (-0.00%); split: -0.01%, +0.00%
Send messages: 24909757 -> 24909758 (+0.00%); split: -0.00%, +0.00%
Cycle count: 57636102242 -> 57634726342 (-0.00%); split: -0.03%, +0.03%
Spill count: 627286 -> 627241 (-0.01%); split: -0.01%, +0.00%
Fill count: 837888 -> 837804 (-0.01%); split: -0.01%, +0.00%
Max live registers: 87272271 -> 87284192 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 32278832 -> 32271800 (-0.02%); split: +0.02%, -0.04%
Non SSA regs after NIR: 87387713 -> 87387614 (-0.00%); split: -0.00%, +0.00%

Totals from 177432 (10.30% of 1722906) affected shaders:
Instrs: 127170648 -> 127060462 (-0.09%); split: -0.10%, +0.01%
CodeSize: 1443406368 -> 1443127088 (-0.02%); split: -0.03%, +0.01%
Send messages: 5444220 -> 5444221 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15423028495 -> 15421652595 (-0.01%); split: -0.10%, +0.10%
Spill count: 235844 -> 235799 (-0.02%); split: -0.03%, +0.01%
Fill count: 333783 -> 333699 (-0.03%); split: -0.03%, +0.01%
Max live registers: 13765573 -> 13777494 (+0.09%); split: -0.01%, +0.10%
Max dispatch width: 3086880 -> 3079848 (-0.23%); split: +0.24%, -0.47%
Non SSA regs after NIR: 17623772 -> 17623673 (-0.00%); split: -0.00%, +0.00%

Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>
2026-02-14 02:06:59 +00:00
Ian Romanick
5af0b8bd09 brw: Call nir_opt_algebraic_late in brw_nir_create_raygen_trampoline
Make sure that lowering undone in brw_nir_optimize are reapplied.

No shader-db changes on any Intel platform.

Why are there fossil-db changes on platforms that don't support ray tracing?

Lunar Lake
Totals:
Instrs: 926636441 -> 926636313 (-0.00%); split: -0.00%, +0.00%
Send messages: 41510729 -> 41510723 (-0.00%); split: -0.00%, +0.00%
Cycle count: 104509492613 -> 104509490569 (-0.00%); split: -0.00%, +0.00%
Max live registers: 193792922 -> 193792890 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 150091934 -> 150092170 (+0.00%); split: -0.00%, +0.00%

Totals from 10 (0.00% of 2020428) affected shaders:
Instrs: 8142 -> 8014 (-1.57%); split: -3.14%, +1.57%
Send messages: 192 -> 186 (-3.12%); split: -7.29%, +4.17%
Cycle count: 131892 -> 129848 (-1.55%); split: -6.93%, +5.38%
Max live registers: 1442 -> 1410 (-2.22%); split: -3.05%, +0.83%
Non SSA regs after NIR: 950 -> 1186 (+24.84%); split: -26.95%, +51.79%

Meteor Lake
Totals:
Instrs: 1000805547 -> 1000805543 (-0.00%); split: -0.00%, +0.00%
Cycle count: 93131592265 -> 93131619619 (+0.00%); split: -0.00%, +0.00%
Max live registers: 122081268 -> 122081244 (-0.00%); split: -0.00%, +0.00%

Totals from 16 (0.00% of 2286241) affected shaders:
Instrs: 18652 -> 18648 (-0.02%); split: -1.39%, +1.37%
Cycle count: 369520 -> 396874 (+7.40%); split: -2.94%, +10.34%
Max live registers: 1350 -> 1326 (-1.78%); split: -4.15%, +2.37%

DG2
Totals:
Instrs: 999834626 -> 999834651 (+0.00%); split: -0.00%, +0.00%
Send messages: 45719398 -> 45719403 (+0.00%); split: -0.00%, +0.00%
Cycle count: 93118238139 -> 93118269557 (+0.00%); split: -0.00%, +0.00%
Max live registers: 122098944 -> 122098936 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 169413734 -> 169413661 (-0.00%); split: -0.00%, +0.00%

Totals from 13 (0.00% of 2286795) affected shaders:
Instrs: 18799 -> 18824 (+0.13%); split: -1.04%, +1.18%
Send messages: 492 -> 497 (+1.02%); split: -2.44%, +3.46%
Cycle count: 352838 -> 384256 (+8.90%); split: -1.08%, +9.98%
Max live registers: 1237 -> 1229 (-0.65%); split: -2.91%, +2.26%
Non SSA regs after NIR: 2191 -> 2118 (-3.33%); split: -20.86%, +17.53%

Tiger Lake
Totals:
Instrs: 1011816778 -> 1011816714 (-0.00%); split: -0.00%, +0.00%
Send messages: 46515289 -> 46515285 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85148902406 -> 85148894668 (-0.00%); split: -0.00%, +0.00%
Max live registers: 122362180 -> 122362172 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 38036160 -> 38036176 (+0.00%)
Non SSA regs after NIR: 160317521 -> 160317649 (+0.00%); split: -0.00%, +0.00%

Totals from 6 (0.00% of 2282318) affected shaders:
Instrs: 9204 -> 9140 (-0.70%); split: -1.43%, +0.74%
Send messages: 258 -> 254 (-1.55%); split: -3.10%, +1.55%
Cycle count: 287652 -> 279914 (-2.69%); split: -3.29%, +0.60%
Max live registers: 552 -> 544 (-1.45%); split: -2.90%, +1.45%
Max dispatch width: 48 -> 64 (+33.33%)
Non SSA regs after NIR: 914 -> 1042 (+14.00%); split: -14.00%, +28.01%

Ice Lake
Totals:
Instrs: 1012203285 -> 1012203249 (-0.00%); split: -0.00%, +0.00%
Send messages: 47358859 -> 47358858 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85112165276 -> 85112171905 (+0.00%); split: -0.00%, +0.00%
Max live registers: 125545002 -> 125544992 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 41335696 -> 41335656 (-0.00%)
Non SSA regs after NIR: 166448597 -> 166448602 (+0.00%); split: -0.00%, +0.00%

Totals from 13 (0.00% of 2335519) affected shaders:
Instrs: 16486 -> 16450 (-0.22%); split: -1.67%, +1.46%
Send messages: 368 -> 367 (-0.27%); split: -4.89%, +4.62%
Cycle count: 347643 -> 354272 (+1.91%); split: -1.34%, +3.25%
Max live registers: 1104 -> 1094 (-0.91%); split: -3.80%, +2.90%
Max dispatch width: 192 -> 152 (-20.83%)
Non SSA regs after NIR: 2100 -> 2105 (+0.24%); split: -21.76%, +22.00%

Skylake
Totals:
Instrs: 504548665 -> 504548057 (-0.00%); split: -0.00%, +0.00%
Send messages: 24479148 -> 24479118 (-0.00%); split: -0.00%, +0.00%
Cycle count: 57575198140 -> 57575179256 (-0.00%); split: -0.00%, +0.00%
Max live registers: 85570671 -> 85570575 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 85097646 -> 85098486 (+0.00%); split: -0.00%, +0.00%

Totals from 22 (0.00% of 1703671) affected shaders:
Instrs: 19866 -> 19258 (-3.06%); split: -3.72%, +0.66%
Send messages: 464 -> 434 (-6.47%); split: -8.19%, +1.72%
Cycle count: 250854 -> 231970 (-7.53%); split: -9.23%, +1.70%
Max live registers: 2024 -> 1928 (-4.74%); split: -5.53%, +0.79%
Non SSA regs after NIR: 2498 -> 3338 (+33.63%); split: -8.33%, +41.95%

Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>
2026-02-14 02:06:59 +00:00
Ian Romanick
fd29183901 elk: Use F16TO32 for nir_op_f2f32 of float16 source
This matches the behavior of nir_op_unpack_half_2x16_split_x. Gfx7
uses a special opcode for this conversion. Fixes numerous assertion
failures in shader-db on Ivy Bridge and Haswell.

I am not sure why this was never encountered previously.

Fixes: 609c46cf23 ("nir/lower_alu_width: emit f2f32 for unpack_half_2x16")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39567>
2026-02-14 02:06:59 +00:00
Alyssa Rosenzweig
bd5ebbb2f8 brw: drop buggy SLM optimization
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:

non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39795>
2026-02-13 20:28:28 +00:00
Lionel Landwerlin
1f1f484570 brw/iris: move ubo range analysis pass to iris
Anv isn't using this pass anymore.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:26 +00:00
Lionel Landwerlin
d1a1e98e4e brw: handle non-GRF aligned pushed UBO masking
Right now all the drivers align push data to GRF (32B pre Xe2, 64B
post Xe2) but the push constant delivery mechanism can actually pack
32B ranges so alignment is not required.

Off course we need the push UBO masking to deal with unaligned pushed
ranges.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Calder Young <cgiacun@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:25 +00:00
Lionel Landwerlin
c1c9048dbf anv: add a couple of surfaces to read descriptors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:25 +00:00
Sagar Ghuge
1fb8435b77 nir: Add nir_resource_intel_internal entry
Will use the load/store_ssbo with nir_resource_intel_internal later in
this series.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Lionel Landwerlin
2ef29502ed brw: enable ex_bso for LSC_SS
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Lionel Landwerlin
9bb152c9a9 brw: make PULL_CONSTANT opcodes more like MEMORY opcodes
Using binding & binding_type sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:22 +00:00
Matt Turner
14c65322e8 elk/cse: use copies in operands_match instead of in-place modification
`operands_match` was modifying instruction source operands in-place
(through the `elk_fs_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>
2026-02-11 18:43:03 +00:00
Matt Turner
93f39f87c4 elk/cse: fix operands_match corrupting non-IMM register data
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.

Guard all `.f` accesses with `.file == IMM` checks.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>
2026-02-11 18:43:03 +00:00
Matt Turner
b302faad8b brw/cse: use copies in operands_match instead of in-place modification
`operands_match` was modifying instruction source operands in-place
(through the `brw_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>
2026-02-11 18:43:02 +00:00
Matt Turner
f5e0f63216 brw/cse: fix operands_match corrupting non-IMM register data
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.

Guard all `.f` accesses with `.file == IMM` checks.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39814>
2026-02-11 18:43:02 +00:00
Georg Lehmann
5926209996 brw/nir_lower_fsign: try to fix NaN correctness
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>
2026-02-10 18:42:03 +00:00
Kenneth Graunke
05ed18a37b elk: Delete mesh shader remnants
This compiler does not support mesh shaders.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
2026-02-09 21:56:05 +00:00
Kenneth Graunke
3b4af8907f brw: Delete wm_prog_data::urb_setup_channel[]
The entire array is always initialized to zero and never modified.

Cuts the size of brw_wm_prog_data by 32%.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
2026-02-09 21:56:04 +00:00
Caio Oliveira
6b0e29bc77 brw: Fix cooperative matrix constant sources other than src0
Code was wrongly using src0 to pick the constant value.

Fixes: bf9ad36f2d ("brw: Properly handle cooperative matrices created with constants")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39769>
2026-02-09 19:52:16 +00:00
Kenneth Graunke
c5859b2d40 intel: Rename wm_prog_key to fs_prog_key
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is the shader key for the fragment shader.  Nobody even knows
what the windowizer/masker unit is or does anymore.  Even on Gen4-6,
"fs" is still clearer.  This makes the codebase easier to read.

This is only about 15 years overdue.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
2026-02-06 20:52:01 -08:00
Kenneth Graunke
56e638be81 intel: Rename wm_prog_data to fs_prog_data
This is the program data for the fragment shader.  Nobody even knows
what the windowizer/masker unit is or does anymore.  Even on Gen4-6,
"fs" is still clearer.  This makes the codebase easier to read.

This is only about 15 years overdue.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
2026-02-06 20:51:59 -08:00
Kenneth Graunke
beb4b78fe7 intel: Rename intel_msaa_flags to intel_fs_config
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.

We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
2026-02-06 20:51:43 -08:00
Georg Lehmann
d71db17e53 elk: remove unpack_half support
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>
2026-02-06 06:12:36 +00:00
Georg Lehmann
d8391d70fe elk/lower_storage_image: use f2f32 instead of unpack_half
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>
2026-02-06 06:12:36 +00:00
Georg Lehmann
e5f1e08f3e brw: remove unpack_half support
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>
2026-02-06 06:12:36 +00:00
Georg Lehmann
caf982218d brw/lower_storage_image: use f2f32 instead of unpack_half
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39511>
2026-02-06 06:12:36 +00:00
Caio Oliveira
06251fcc24 brw/print: Don't print extra space at the end
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39597>
2026-02-06 01:00:31 +00:00
Kenneth Graunke
6fbe201a12 brw: Convert VS/TES/GS outputs to URB intrinsics.
For VS/TES/GS, we lower all outputs to temporaries and emit copies at the
end of the shader (or for GS, at each EmitVertex() call) from those
temporaries back to real outputs.  We use vec8 URB writes without
writemasking, since our output area's contents are undefined anyhow.

This is simpler than what TCS and Mesh do, which allow for output
variables to be read/written at a per-component level at any time,
with the output memory being used for cross-thread communication.

Rather than using the complicated TCS/Mesh handling and relying on
vectorization, we port the emit_urb_writes() approach to NIR.  This
also takes care of emitting the VUE header with default values when
fields aren't explicitly written by the shader.

We also handle multiview in the process.  It simplifies things, and
also drops another case of non-semantic IO in brw.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:21 +00:00
Kenneth Graunke
52341b8b9c brw: Split EOT handling out of emit_urb_writes()
The TES workaround code is still going to be needed even after
we rework URB output handling for VS/TES/GS to use NIR intrinsics.

For VS, we know at least one URB write will have been emitted at
the end of the program, so we can just tag it.

GS already handles EOT via emit_gs_thread_end().

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:21 +00:00
Kenneth Graunke
1f0773e951 brw: Add VUE header varyings to io_component()
This is needed for VS/TES/GS outputs.  Mesh takes a different path
because those are per-primitive.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:21 +00:00
Kenneth Graunke
54def4020c brw: Set a valid varying_to_slot for VUE header fields other than PSIZ
This lets us look up things in varying_to_slot[] without having to
special case VIEWPORT, LAYER, and PRIMITIVE_SHADING_RATE.  All of them
map to the same slot as PSIZ, slot 0, the VUE header.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:20 +00:00
Kenneth Graunke
076a183b8f brw: Move TES VUE map calculation before lowering outputs
We'll need the VUE map when we convert to using URB intrinsics.
Prepare for that by reordering VUE map setup before IO lowering.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:20 +00:00
Kenneth Graunke
2af44670ed brw: Implement load_urb_output_handle_intel for VS/GS stages
Simply get the payload field.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:20 +00:00
Kenneth Graunke
0cbf49aa8f brw: Drop urb_handle parameter from store_urb()
We always store to outputs, never inputs.  Just use the output handle.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
2026-02-03 19:11:19 +00:00
Alyssa Rosenzweig
bc69e4364f intel: report code size in shader stats
This is missing from ANV's statistics currently.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39633>
2026-02-02 23:30:24 +00:00
Alyssa Rosenzweig
fc53da9c39 intel: simplify shader stats names
This brings what ANV reports closer to what Iris reports, and is mostly dropping
redundancies.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39633>
2026-02-02 23:30:24 +00:00
Alyssa Rosenzweig
3d5170c705 intel: add scheduling mode statistic
This is for parity with what we do in the current GL shader-db path.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39633>
2026-02-02 23:30:24 +00:00
José Roberto de Souza
1f61b1c367 intel/brw: Add BRW_DEPENDENCY_INSTRUCTIONS invalidation when instructions are added or removed in brw_opt_split_virtual_grfs()
This fix a brw_ip_ranges shader analysis, were it fails because there is a
different number of instructions than expected after brw_opt_split_virtual_grfs()
optimization.

Reproduced in Piglit test spec@arb_sample_shading@builtin-gl-sample-mask 0:
   arb_sample_shading-builtin-gl-sample-mask: ../src/intel/compiler/brw_analysis.h:150: T& brw_analysis<T, C>::require() [with T = brw_ip_ranges; C = brw_shader]: Assertion `p->validate(c)' failed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39629>
2026-02-02 14:46:50 +00:00
Caio Oliveira
db4bc5407f brw: Print "GRF registers" in INTEL_DEBUG=shaders output
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39601>
2026-01-29 20:16:48 +00:00
Caio Oliveira
0d19fc8256 brw: Fix "GRF registers" stats output
Pick the value from the brw_shader instead of from the prog_data, since
when there are multiple variants, the prog_data one will have the
maximum value.  Picking the wrong value also caused compute shaders
that had a single variant to report 0 GRFs since the prog_data was
being filled after the generate_code() call.

Issue spotted by Felix DeGrood.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39601>
2026-01-29 20:16:48 +00:00
Kenneth Graunke
bfca9d32d3 brw: Fix geometry shaders with non-constant vertex indices
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Geometry shaders load from separate handles for each vertex, so they
don't incorporate the vertex index in the URB offset like tessellation
shaders do.  This means we can have a constant offset (within a vertex's
section) but not have a constant vertex index.

Prior to 41d7debcfe we were emitting non-folded ALU so we thought the
offset was non-constant at this point.  Now we can properly detect
constant offsets...but still don't want to use push inputs for
non-constant vertex indices.

Fixes: 41d7debcfe ("brw: Use nir_imul_imm in per-vertex/per-primitive offset calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39603>
2026-01-29 00:18:20 +00:00
Caio Oliveira
cc06e1ebe2 brw: Remove outdated comment about remove_dead_variables
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This now also removes dead variables created by split_array_vars,
and in the future it is reasonable other optimizations inside the
optimization loop to make temp variables dead.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39596>
2026-01-28 22:26:43 +00:00
Caio Oliveira
354dbbe3ae brw: Use the "early break" loop macros when possible
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This macro will stop the loop early if there's no chance to make further
progress.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39504>
2026-01-28 19:52:02 +00:00