Commit graph

5202 commits

Author SHA1 Message Date
Paulo Zanoni
7eab94d542 intel/nir: fix sparse shadow comparison for BRW
While Jay overwrites sparse_tex->op with the newer opcodes that only
return red and the sparse stuff, BRW keeps using the original opcode
of the cloned instruction, so it can't change def->num_components.

This was not previously detectable since we did not have sparse
enabled for depth/stencil on Anv for a while. A patch to re-enable
that was proposed a while ago (MR !37423), never merged, but then a
recent attempt to try to merge it (by me) detected this regression.
Let's fix the regression first, then we can finally re-enable sparse
depth/stencil support in Anv, hopefully.

Fixes: 7468261d3d ("intel/nir: Make intel_nir_lower_sparse work for either brw or jay")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37423>
2026-05-07 23:47:51 +00:00
Kenneth Graunke
2729b1608f brw: Limit SIMD width based on NIR rather than first backend compile
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I originally added this mechanism to have the first (SIMD8) compile
note that certain features were in use which would prevent SIMD16/32
from compiling, so we could skip the work of trying those.

But these days, there aren't many cases, and the ones we have are
easily detectable based on the NIR.  We can detect it earlier without
even having to do the SIMD8 compile.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
c5928d40ae brw: Drop dead code from dispatch limit check for dual source blending
We checked that ver is 11 or 12.  It can't be >= 20.  This is dead code.

Dual source blending on Xe2 does not have native SIMD32 RT write message
support, but SIMD splitting is currently lowering it to low/high SIMD16
message pairs when using SIMD32 dispatch.  I'm not aware of any of the
hardware errata from previous platform still applying.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
599d26db00 brw: Set prog_data::dual_src_blend from NIR outputs written bitfield
Simpler and set earlier.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
afb97ff2af brw: Switch FS outputs to semantic IO and FRAG_RESULT_DUAL_SRC_BLEND
The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1.  This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location.  That cascades to meaning we no longer have to
store a tuple in driver_location.  And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Kenneth Graunke
fbaa5ad0c3 iris: Implement force_dual_color_blend_by_location via NIR
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR.  By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Alyssa Rosenzweig
5636a57f60 jay/lower_scoreboard: use SYNC.allrd/allwr
This collapses piles of silliness.

Totals:
CodeSize: 71626288 -> 70710000 (-1.28%)

Totals from 1634 (61.73% of 2647) affected shaders:
CodeSize: 66319376 -> 65403088 (-1.38%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:26 +00:00
Alyssa Rosenzweig
c1dc9d3b1a jay/lower_scoreboard: be the sole emitter of SYNC
this gets closer to something we can schedule and avoids some pointless syncs.

Totals from 491 (18.55% of 2647) affected shaders:
Instrs: 602994 -> 602946 (-0.01%)
CodeSize: 9063888 -> 9015904 (-0.53%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:26 +00:00
Alyssa Rosenzweig
0885ed10f5 jay/lower_scoreboard: use .src annotations
This is less heavy handed, avoiding unnecessary stalls after SENDs in a
bunch of common cases. The stats (SIMD32) are:

Totals:
CodeSize: 70345392 -> 71674272 (+1.89%)

Totals from 1774 (67.02% of 2647) affected shaders:
CodeSize: 67359248 -> 68688128 (+1.97%)

What's happening here is we are inserting extra SYNC.nop instructions in a
bunch of cases for the .src preceding the eventual .dst. However, putting aside
the i-cache impact for a moment, this is showing the optimization doing what it
should (deferring dst syncs and inserting cheaper src syncs first). So this
should be positive in reality despite the negative stat impact.

The most hurt shaders are pooling up SYNC.nop's at the end of blocks due to
local-only SWSB and lack of SYNC.allwr optimization. The latter is added later
in this MR. The former is planned.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
130e724d5e jay/lower_scoreboard: refactor SYNC.nop insertion
for next commit

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
1ecd75a397 jay/lower_scoreboard: fix tracking for A@* and *@7
update the tracking with what we actually waited on, not what we ideally wanted
to wait on. reduces extra annotations in some cases.

SIMD32:

Totals from 194 (7.33% of 2647) affected shaders:
CodeSize: 14473840 -> 14469088 (-0.03%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
93edf9a3fd jay/lower_scoreboard: refactor wait pipe code
for next commit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
18e09858eb jay/lower_scoreboard: elide more dependencies
IGC does these optimizations and I think they should be safe given my mental
model. Given a sequence like:

   r0 = add.f32 r1, r2
   r1 = add.f32 r3, r4

Each ALU pipe is pipelined but in-order. Therefore, the second add cannot
possibly complete before the first add, so it cannot write r1 before the first
add reads r1, so we can elide the write-after-read dependency. That in term
avoids a pipeline bubble between the two instructions. Ditto for
write-after-write.

Similarly if the distance is too great within an in-order pipe since there is a
maximum pipeline length, it's not infinite.

Note that if there was cross-pipe dependencies we do need the annotation since
the pipes themselves are parallel.

SIMD32:

Totals from 58 (2.19% of 2647) affected shaders:
CodeSize: 3316592 -> 3315056 (-0.05%); split: -0.05%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
e4dc161277 jay: assign accumulators post-RA
Greedy post-RA substitution pass, similar to IGC's AccSubstitution pass.
Stats together with the previous commits.

SIMD16:

   Totals from 2209 (83.45% of 2647) affected shaders:
   Instrs: 2701029 -> 2696350 (-0.17%)
   CodeSize: 39166720 -> 40372272 (+3.08%); split: -0.36%, +3.44%

SIMD32:

   Totals from 2211 (83.53% of 2647) affected shaders:
   Instrs: 4691165 -> 4641188 (-1.07%)
   CodeSize: 69365792 -> 69341616 (-0.03%); split: -0.50%, +0.47%

The instruction count reduction is from RA shuffle code getting coalesced via
accumulators. The code size changes are from:

* Fewer moves from the instr count reduction (helped)
* Smaller MADs encoded as MACs (helped)
* Fewer SYNC.nop due to fewer scoreboarding annotations (helped)
* Less compaction due to explicit accumulator operands (hurt)

I expect significant cycle count changes from this but we don't have a cycle
model wired up yet, so reading the assembly will have to do.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
8b324591d1 jay: move simd32 deswizzling to float pipe
for more accumulator usage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
712719a2ae jay: do moves on the float pipe where possible
this allows us to use accumulators more.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
6f2b1cece6 jay: model MAC
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Alyssa Rosenzweig
b6e88ab904 jay/to_binary: fix packing of simd-split accumulators
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
2026-05-06 23:25:25 +00:00
Rhys Perry
ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`

There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used

Clarify that the parent here is where the src's def is used, not where
it's defined.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
2026-05-06 17:09:22 +00:00
Lionel Landwerlin
6f5d30c0a2 anv: add apply_layout support for device bindable shaders/pipelines
We consider them like bindless stages (no binding table) as much as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:44 +00:00
Lionel Landwerlin
1281e2b9a0 anv/intel: add device generated commands shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:43 +00:00
Lionel Landwerlin
c30a4d4fdb anv/brw/nir: fix wa_18019110168
Several things were wrong :
  - incorrect offset in the FS push constant data
  - incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:41 +00:00
Lionel Landwerlin
f309f0b1a0 intel: add resource intrinsic support for heaps
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Lionel Landwerlin
25bc517ef5 brw: add heap support to brw_lower_storage_image
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Lionel Landwerlin
5ec7d31e20 brw/lower_texel_address: add heap support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39478>
2026-05-05 18:21:16 +00:00
Calder Young
4120ae4963 brw: Avoid vectorizing loads in NIR if it could extend into a different page
Took inspiration from RADV to make nir_opt_load_store_vectorize robust against
page faults, by checking the align_offset and align_mul to see if any extra
components could be overlapping into a different page.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Calder Young
3ac6233655 brw: Avoid rounding every convergent block load up to a full register
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Calder Young
8ce98fedc4 anv: Make sure robust UBO access does not fault
We can just conditionally replace the address with an address to a zero
initialized cacheline if the read is going to go out of bounds.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
2026-05-01 19:51:41 +00:00
Caio Oliveira
1ebc14bcb9 brw: Stop tracking inline parameter usage in prog_key/prog_data
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist.  They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.

In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup.  Remove/Update some stale
comments since we are here.

Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
2026-04-30 16:39:22 +00:00
Caio Oliveira
e1745e0bd9 brw: Fix max_dispatch_width collection for CS with variable size
The intention of the original commit was to make all the shaders report
the same max_dispatch_width.  When CS has multiple variants, this was
not happening as expected.

Fixes: 2acc2f18ea ("intel/compiler: report max dispatch width statistic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41209>
2026-04-29 15:52:04 +00:00
Alyssa Rosenzweig
a78634ccb0 jay/to_binary: rename grf -> phys_reg
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
since it covers accumulators to

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
ab87a035c9 jay: drop a bunch of stale TODO and XXX
These are either done, or never going to be done, or otherwise stale or
silly or unnecessary. Drop a bunch.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
70d09d97ef jay: predicate NoMask instructions in uniform IF's
Totals:
Instrs: 4742391 -> 4742257 (-0.00%)
CodeSize: 70245120 -> 70243520 (-0.00%); split: -0.00%, +0.00%

Totals from 81 (3.06% of 2647) affected shaders:
Instrs: 337727 -> 337593 (-0.04%)
CodeSize: 4992992 -> 4991392 (-0.03%); split: -0.03%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
f199f00564 jay: adjust flag replication
Now instructions still read/write UFLAG, which preserves the information about
lane 0 we need for proper predication etc.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
930d36b54a jay: smarten predication pass
Merge the empty else optimization, the then-block predication, and the
break-while fusion into a unified "try to predicate each side of an if, peephole
optimizing control flow" optimization. This is simpler and more general.

Totals:
Instrs: 4783809 -> 4775647 (-0.17%)
CodeSize: 70766656 -> 70674064 (-0.13%); split: -0.13%, +0.00%

Totals from 1109 (41.90% of 2647) affected shaders:
Instrs: 4130644 -> 4122482 (-0.20%)
CodeSize: 61180848 -> 61088256 (-0.15%); split: -0.15%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
80081ef7b2 jay: check for inverse-ballots in jay_uses_flag
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
86f19bc983 jay: propagate inverse-ballots only locally
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
d7283a25d7 jay: do not copyprop ballots globally
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
5828b66b65 jay: convert to LCSSA
for correctness with loops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fed6b7bea0 jay: drop UGPR->UMEM spilling path
This is totally broken now that we have a physical CFG for UGPRs. And of course,
UGPRs generally were totally broken without the physical CFG. So I conclude
this code basically never worked. Which is good because it was also basically
always dead too. Just delete it and replace with a clear error message, instead
of pretending it works and either randomly splatting validation or just straight
up miscompiling silently or whatever.

We might need an alternative UGPR->GPR spill path some day but that day is not
today.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
ad040f2fbb jay: introduce a physical control flow graph
Consider:

   u0 = foo()

   if (divergent) {
      u0 = bar()
      r0 = baz(u0)
   } else {
      r0 = quux(u0)
   }

Logically, this is fine, there is no interference between bar() and u0. But
physically, both sides of the if execute so the bar() write to u0 overwrites the
variable the else reads. So this is a miscompile.

The solution is to model the extra edges in the physical control flow graph,
which lives next to the existing logical control flow graph. Liveness for UGPRs
now follows the physical CFG, while liveness for GPRs continues to follow the
logical CFG. That models the interference properly, while still allowing phis to
work as before (since phis writing UGPRs follow uniform bits of control flow
that are necessarily critical edge free for the same reason the logical CFG is).

Because our RA copies shuffled registers back at block ends (following
Colombet), there's no issue with live range splits here (unlike aco which
inserts phis for this case and then needs to worry about critical edges around
those phis).

There might still be an extremely-challenging-to-hit bug here with UGPR spilling
which I need to think more about. It might be fine as-is? Not convinced though.
But this is big enough and strictly less broken than what we have right now and
the full solution will build on this, so here we are.

Fixes artefating in SuperTuxKart and Celestia knows what else.

Totals:
Instrs: 2770938 -> 2771269 (+0.01%); split: -0.00%, +0.02%
CodeSize: 40133712 -> 40138480 (+0.01%); split: -0.01%, +0.02%

Totals from 158 (5.97% of 2647) affected shaders:
Instrs: 514523 -> 514854 (+0.06%); split: -0.02%, +0.09%
CodeSize: 7603040 -> 7607808 (+0.06%); split: -0.03%, +0.09%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fadb826515 jay/opt_propagate: disable f64 opts for now
could be done but would need more work.

No stats change.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
8e4145948f jay/opt_propagate: fold uflag copies
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
b9f8f2477e jay: inline jay_control()
This accessor is more opaque imho.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
978d20e5fe jay: drop jay_exec_mask
this strategy is panning out nicely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
238c4ecf40 jay: fix 16-bit predicated compares
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
0bd4f1b874 jay: consolidate file prefixes
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
15365f8ea2 jay: jayize swsb print
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fccd68625c jay: shrink stack allocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Kenneth Graunke
0a5c748e19 jay: Don't forget UACCUM!
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00