Commit graph

5187 commits

Author SHA1 Message Date
Alyssa Rosenzweig
712719a2ae jay: do moves on the float pipe where possible
this allows us to use accumulators more.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
6f2b1cece6 jay: model MAC
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
b6e88ab904 jay/to_binary: fix packing of simd-split accumulators
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Rhys Perry
ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`

There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used

Clarify that the parent here is where the src's def is used, not where
it's defined.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
2026-05-06 17:09:22 +00:00
Lionel Landwerlin
6f5d30c0a2 anv: add apply_layout support for device bindable shaders/pipelines
We consider them like bindless stages (no binding table) as much as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:44 +00:00
Lionel Landwerlin
1281e2b9a0 anv/intel: add device generated commands shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:43 +00:00
Lionel Landwerlin
c30a4d4fdb anv/brw/nir: fix wa_18019110168
Several things were wrong :
  - incorrect offset in the FS push constant data
  - incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:41 +00:00
Lionel Landwerlin
f309f0b1a0 intel: add resource intrinsic support for heaps
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Lionel Landwerlin
25bc517ef5 brw: add heap support to brw_lower_storage_image
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Lionel Landwerlin
5ec7d31e20 brw/lower_texel_address: add heap support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Calder Young
4120ae4963 brw: Avoid vectorizing loads in NIR if it could extend into a different page
Took inspiration from RADV to make nir_opt_load_store_vectorize robust against
page faults, by checking the align_offset and align_mul to see if any extra
components could be overlapping into a different page.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Calder Young
3ac6233655 brw: Avoid rounding every convergent block load up to a full register
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Calder Young
8ce98fedc4 anv: Make sure robust UBO access does not fault
We can just conditionally replace the address with an address to a zero
initialized cacheline if the read is going to go out of bounds.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Caio Oliveira
1ebc14bcb9 brw: Stop tracking inline parameter usage in prog_key/prog_data
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist.  They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.

In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup.  Remove/Update some stale
comments since we are here.

Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
2026-04-30 16:39:22 +00:00
Caio Oliveira
e1745e0bd9 brw: Fix max_dispatch_width collection for CS with variable size
The intention of the original commit was to make all the shaders report
the same max_dispatch_width.  When CS has multiple variants, this was
not happening as expected.

Fixes: 2acc2f18ea ("intel/compiler: report max dispatch width statistic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41209>
2026-04-29 15:52:04 +00:00
Alyssa Rosenzweig
a78634ccb0 jay/to_binary: rename grf -> phys_reg
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
since it covers accumulators to

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
ab87a035c9 jay: drop a bunch of stale TODO and XXX
These are either done, or never going to be done, or otherwise stale or
silly or unnecessary. Drop a bunch.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
70d09d97ef jay: predicate NoMask instructions in uniform IF's
Totals:
Instrs: 4742391 -> 4742257 (-0.00%)
CodeSize: 70245120 -> 70243520 (-0.00%); split: -0.00%, +0.00%

Totals from 81 (3.06% of 2647) affected shaders:
Instrs: 337727 -> 337593 (-0.04%)
CodeSize: 4992992 -> 4991392 (-0.03%); split: -0.03%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
f199f00564 jay: adjust flag replication
Now instructions still read/write UFLAG, which preserves the information about
lane 0 we need for proper predication etc.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
930d36b54a jay: smarten predication pass
Merge the empty else optimization, the then-block predication, and the
break-while fusion into a unified "try to predicate each side of an if, peephole
optimizing control flow" optimization. This is simpler and more general.

Totals:
Instrs: 4783809 -> 4775647 (-0.17%)
CodeSize: 70766656 -> 70674064 (-0.13%); split: -0.13%, +0.00%

Totals from 1109 (41.90% of 2647) affected shaders:
Instrs: 4130644 -> 4122482 (-0.20%)
CodeSize: 61180848 -> 61088256 (-0.15%); split: -0.15%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
80081ef7b2 jay: check for inverse-ballots in jay_uses_flag
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
86f19bc983 jay: propagate inverse-ballots only locally
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
d7283a25d7 jay: do not copyprop ballots globally
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
5828b66b65 jay: convert to LCSSA
for correctness with loops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fed6b7bea0 jay: drop UGPR->UMEM spilling path
This is totally broken now that we have a physical CFG for UGPRs. And of course,
UGPRs generally were totally broken without the physical CFG. So I conclude
this code basically never worked. Which is good because it was also basically
always dead too. Just delete it and replace with a clear error message, instead
of pretending it works and either randomly splatting validation or just straight
up miscompiling silently or whatever.

We might need an alternative UGPR->GPR spill path some day but that day is not
today.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
ad040f2fbb jay: introduce a physical control flow graph
Consider:

   u0 = foo()

   if (divergent) {
      u0 = bar()
      r0 = baz(u0)
   } else {
      r0 = quux(u0)
   }

Logically, this is fine, there is no interference between bar() and u0. But
physically, both sides of the if execute so the bar() write to u0 overwrites the
variable the else reads. So this is a miscompile.

The solution is to model the extra edges in the physical control flow graph,
which lives next to the existing logical control flow graph. Liveness for UGPRs
now follows the physical CFG, while liveness for GPRs continues to follow the
logical CFG. That models the interference properly, while still allowing phis to
work as before (since phis writing UGPRs follow uniform bits of control flow
that are necessarily critical edge free for the same reason the logical CFG is).

Because our RA copies shuffled registers back at block ends (following
Colombet), there's no issue with live range splits here (unlike aco which
inserts phis for this case and then needs to worry about critical edges around
those phis).

There might still be an extremely-challenging-to-hit bug here with UGPR spilling
which I need to think more about. It might be fine as-is? Not convinced though.
But this is big enough and strictly less broken than what we have right now and
the full solution will build on this, so here we are.

Fixes artefating in SuperTuxKart and Celestia knows what else.

Totals:
Instrs: 2770938 -> 2771269 (+0.01%); split: -0.00%, +0.02%
CodeSize: 40133712 -> 40138480 (+0.01%); split: -0.01%, +0.02%

Totals from 158 (5.97% of 2647) affected shaders:
Instrs: 514523 -> 514854 (+0.06%); split: -0.02%, +0.09%
CodeSize: 7603040 -> 7607808 (+0.06%); split: -0.03%, +0.09%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fadb826515 jay/opt_propagate: disable f64 opts for now
could be done but would need more work.

No stats change.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
8e4145948f jay/opt_propagate: fold uflag copies
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
b9f8f2477e jay: inline jay_control()
This accessor is more opaque imho.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
978d20e5fe jay: drop jay_exec_mask
this strategy is panning out nicely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
238c4ecf40 jay: fix 16-bit predicated compares
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
0bd4f1b874 jay: consolidate file prefixes
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
15365f8ea2 jay: jayize swsb print
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fccd68625c jay: shrink stack allocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Kenneth Graunke
0a5c748e19 jay: Don't forget UACCUM!
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
3308626e12 jay/assign_flags: don't burn a flag for ballots
Increases GPR pressure somehow but it's obviously the right thing to do.

SIMD16:

   Totals:
   Instrs: 2767536 -> 2767381 (-0.01%); split: -0.01%, +0.00%
   CodeSize: 44323392 -> 40075680 (-9.58%); split: -9.58%, +0.00%

   Totals from 2147 (81.11% of 2647) affected shaders:
   Instrs: 2704498 -> 2704343 (-0.01%); split: -0.01%, +0.00%
   CodeSize: 43477568 -> 39229856 (-9.77%); split: -9.77%, +0.00%

SIMD32:

   Totals:
   Instrs: 4731031 -> 4746775 (+0.33%); split: -0.33%, +0.67%
   CodeSize: 76609152 -> 70004080 (-8.62%); split: -8.68%, +0.06%
   Number of spill instructions: 50110 -> 50187 (+0.15%); split: -0.00%, +0.16%
   Number of fill instructions: 51341 -> 51804 (+0.90%); split: -0.00%, +0.91%

   Totals from 2136 (80.70% of 2647) affected shaders:
   Instrs: 4666677 -> 4682421 (+0.34%); split: -0.34%, +0.67%
   CodeSize: 75735136 -> 69130064 (-8.72%); split: -8.78%, +0.06%
   Number of spill instructions: 50108 -> 50185 (+0.15%); split: -0.00%, +0.16%
   Number of fill instructions: 51339 -> 51802 (+0.90%); split: -0.00%, +0.91%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
2c77717e5c jay/assign_flags: don't burn a null flag
SIMD32:

   Totals from 423 (15.98% of 2647) affected shaders:
   Instrs: 740042 -> 736360 (-0.50%); split: -1.25%, +0.75%
   CodeSize: 11984176 -> 11925888 (-0.49%); split: -1.23%, +0.74%
   Number of spill instructions: 4675 -> 4676 (+0.02%)
   Number of fill instructions: 5698 -> 5684 (-0.25%); split: -0.28%, +0.04%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
796886f72c jay/assign_flags: refactor for next commit
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Georg Lehmann
26ec32dada intel/nir_opt_peephole_ffma: fix fp_math_ctlr for modifiers
If abs/neg don't preserve nan/inf/sz, the whole expressions won't.

Fixes: 1b0808adf3 ("intel/nir: Make ffma peephole optimization preserve fp_fast_math flags")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41101>
2026-04-28 18:26:58 +00:00
Ian Romanick
e301817753 brw: Don't lower phis involved in DPAS instructions to scalar
On my Arc A380 (DG2), this more than doubles the performance of Jeff
Bolz's cooperative matrix benchmark. With llama.cpp modified to use
cooperative matrix on DG2, performance is improved by 37%.

Closes: #15311
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
2026-04-27 18:09:16 +00:00
Ian Romanick
09b43966ba brw: Lower all phis to scalar
The next commit will cause some very specific phis to not be lowered to
scalar, and that's the reason the callback is used instead of
nir_lower_all_phis_to_scalar.

It's worth noting that the comment in nir_lower_phis_to_scalar.c
specifically calls out Deus Ex as the reason some phis should not be
lowered. At least on current BRW, zero shaders from Deus Ex trace were
affected for spills or fills on any Intel platform.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050005 -> 17051449 (<.01%)
instructions in affected programs: 41032 -> 42476 (3.52%)
helped: 29 / HURT: 159

total cycles in shared programs: 876411976 -> 876433702 (<.01%)
cycles in affected programs: 1455550 -> 1477276 (1.49%)
helped: 40 / HURT: 150

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 916599633 -> 916694854 (+0.01%); split: -0.00%, +0.01%
CodeSize: 14705971792 -> 14708302384 (+0.02%); split: -0.00%, +0.02%
Send messages: 40870114 -> 40870113 (-0.00%)
Cycle count: 102360965889 -> 102364169753 (+0.00%); split: -0.00%, +0.01%
Spill count: 3460669 -> 3460240 (-0.01%)
Fill count: 4988325 -> 4987891 (-0.01%)
Max live registers: 192914542 -> 192918153 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 48848112 -> 48848128 (+0.00%)
Non SSA regs after NIR: 141633613 -> 141671589 (+0.03%); split: -0.00%, +0.03%

Totals from 5713 (0.28% of 2010434) affected shaders:
Instrs: 5215921 -> 5311142 (+1.83%); split: -0.09%, +1.91%
CodeSize: 88940784 -> 91271376 (+2.62%); split: -0.20%, +2.82%
Send messages: 284751 -> 284750 (-0.00%)
Cycle count: 275671864 -> 278875728 (+1.16%); split: -0.74%, +1.90%
Spill count: 857 -> 428 (-50.06%)
Fill count: 845 -> 411 (-51.36%)
Max live registers: 667776 -> 671387 (+0.54%); split: -0.86%, +1.40%
Max dispatch width: 160416 -> 160432 (+0.01%)
Non SSA regs after NIR: 1127904 -> 1165880 (+3.37%); split: -0.10%, +3.47%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
2026-04-27 18:09:16 +00:00
Alyssa Rosenzweig
bccaeb28bb brw/nir_lower_cs_intrinsics: do some math at 16-bit
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There are less than 2^16 lanes within a threadgroup, so it is safe to do
all math at 16-bit. This allows us to use 16-bit integer division which is
much faster than 32-bit integer division (in terms of the lowerings).

In a "hello world" kernel with variable wg size, simd32 goes 72 inst -> 57
inst on jay and 82 -> 67 inst on brw.

OTOH it's a loss for non-variable wg size, so do it only there to avoid
unwelcome stats regresions on Vulkan.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41084>
2026-04-24 17:13:24 +00:00
Caio Oliveira
0422165d9a brw: Remove various unused fields
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
These are a mix of fields whose last used was removed or fields that were
never used, possibly because they remained in a patch while the rest of the
code changed before landing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41139>
2026-04-24 15:04:25 +00:00
Caio Oliveira
26ef12f7c1 brw: Use brw prefix to LSC helpers tied to brw
Mapping from BRW ops to LSC ops.  And the len() helpers
that use the REG_SIZE as unit -- which is a BRW convention.

Acked-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41006>
2026-04-22 18:25:41 +00:00
Caio Oliveira
9329da6d88 brw: Don't set saturate for SYNC instruction
This helper might be used as by another instruction emission,
which itself might have set the saturate bit in the default
state.  This might result in the SYNC being created already
with saturate bit set.

Since SYNC doesn't have saturate, clear that field
instead of sometimes having it set.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41005>
2026-04-22 16:06:42 +00:00
Sagar Ghuge
620835926d brw: Pass write back register for ray query messages
For DG2 (Bspec 47937) has the same programming note as of Xe2+,

   "When this bit is set in the header, Trace Ray Message behaves like a
   Ray Query. This message requires a write-back message indicating
   RayQuery for all valid Rays (SIMD lanes) have completed."

So this patch is just passing a write back destination register when we
have ray query message.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039>
2026-04-21 23:16:09 +00:00
José Roberto de Souza
64bc538f5e intel/brw: Explicitly upcast UB to UW for SHR with vector immediates
HW does not allow instructions with vector immediates to cross a GRF boundary if
it has a stride.

Under register pressure, the register allocator may place a temporary register
across such a boundary.

To resolve this, we now explicitly emit a MOV to upcast the UB payload into a
UW VGRF.
This ensures the SHR instruction operates on a dense, well-aligned region that
satisfies hardware alignment constraints.

Below is the portion of the shader exhibiting this issue:

Native code for unnamed fragment shader GLSL6 (src_hash 0x9c84a007) (sha1 48745e7dae90d08f8a9bbe4dbf837de23440c841f0344e669cb8af9df79bce58)
SIMD32 shader: 44 instructions. 0 loops. 354 cycles. 0:0 spills:fills, 2 sends, scheduled with mode latency-sensitive. Promoted 0 constants. GRF registers: 22. Non-SSA regs (after NIR): 11. Compacted 800 to 800 bytes (0%)
mov(1)          f1<1>UW         g0.30<0,1,0>UW                  { align1 WE_all 1N };
mov(1)          f1.1<1>UW       g1.30<0,1,0>UW                  { align1 WE_all 1N I@1 };
mov(32)         g2<2>UW         g0.20<2,8,0>UW                  { align1 WE_all };
mov(32)         g4<2>UW         g0.21<2,8,0>UW                  { align1 WE_all };
mov(32)         g8<2>UW         g1.20<2,8,0>UW                  { align1 WE_all };
mov(32)         g10<2>UW        g1.21<2,8,0>UW                  { align1 WE_all };
mov(16)         g12<4>UB        g0.60<1,8,0>UB                  { align1 1H };
mov(16)         g13<4>UB        g1.60<1,8,0>UB                  { align1 2H };
add(32)         g0<1>UW         g2<16,8,2>UW    0x01000100V     { align1 WE_all I@6 };
add(32)         g1<1>UW         g4<16,8,2>UW    0x01010000V     { align1 WE_all I@6 };
add(32)         g2<1>UW         g8<16,8,2>UW    0x01000100V     { align1 WE_all I@6 };
add(32)         g3<1>UW         g10<16,8,2>UW   0x01010000V     { align1 WE_all I@6 };
shr(16)         g4<1>UW         g12<32,8,4>UB   0x76543210V     { align1 1H I@6 };
mov(16)         g14.32<4>UB     g13<32,8,4>UB                   { align1 2H I@6 };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N I@6 };
mov(16)         g5<1>UW         g0<16,8,2>UW                    { align1 1H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N I@6 };
mov(16)         g0<1>UW         g1<16,8,2>UW                    { align1 1H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 5N I@6 };
mov(16)         g5.16<1>UW      g2<16,8,2>UW                    { align1 2H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 5N I@6 };
mov(16)         g0.16<1>UW      g3<16,8,2>UW                    { align1 2H };
shr(16)         g4.16<1>UW      g14.32<32,8,4>UB 0x76543210V    { align1 2H I@5 };
    ERROR: Invalid register region for source 0.  See special restrictions section.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40856>
2026-04-21 22:51:45 +00:00
Jordan Justen
fa784fffd0 brw: Don't set header_size at init since it will be re-set in later code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ref: efcba73b49 ("brw: switch to new sampler payload description scheme")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>
2026-04-21 19:23:41 +00:00
Lionel Landwerlin
0539f26065 brw: track push constants shader stats
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>
2026-04-21 16:29:14 +00:00
Sagar Ghuge
7a627fa8f3 anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and
DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we
need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH
levels.

Fixes: cb423ee6 ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921")
Fixes: c1a44e8d ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41012>
2026-04-21 01:38:34 +00:00