Commit graph

5257 commits

Author SHA1 Message Date
Alyssa Rosenzweig
238c4ecf40 jay: fix 16-bit predicated compares
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
0bd4f1b874 jay: consolidate file prefixes
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
15365f8ea2 jay: jayize swsb print
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
fccd68625c jay: shrink stack allocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Kenneth Graunke
0a5c748e19 jay: Don't forget UACCUM!
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
3308626e12 jay/assign_flags: don't burn a flag for ballots
Increases GPR pressure somehow but it's obviously the right thing to do.

SIMD16:

   Totals:
   Instrs: 2767536 -> 2767381 (-0.01%); split: -0.01%, +0.00%
   CodeSize: 44323392 -> 40075680 (-9.58%); split: -9.58%, +0.00%

   Totals from 2147 (81.11% of 2647) affected shaders:
   Instrs: 2704498 -> 2704343 (-0.01%); split: -0.01%, +0.00%
   CodeSize: 43477568 -> 39229856 (-9.77%); split: -9.77%, +0.00%

SIMD32:

   Totals:
   Instrs: 4731031 -> 4746775 (+0.33%); split: -0.33%, +0.67%
   CodeSize: 76609152 -> 70004080 (-8.62%); split: -8.68%, +0.06%
   Number of spill instructions: 50110 -> 50187 (+0.15%); split: -0.00%, +0.16%
   Number of fill instructions: 51341 -> 51804 (+0.90%); split: -0.00%, +0.91%

   Totals from 2136 (80.70% of 2647) affected shaders:
   Instrs: 4666677 -> 4682421 (+0.34%); split: -0.34%, +0.67%
   CodeSize: 75735136 -> 69130064 (-8.72%); split: -8.78%, +0.06%
   Number of spill instructions: 50108 -> 50185 (+0.15%); split: -0.00%, +0.16%
   Number of fill instructions: 51339 -> 51802 (+0.90%); split: -0.00%, +0.91%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
2c77717e5c jay/assign_flags: don't burn a null flag
SIMD32:

   Totals from 423 (15.98% of 2647) affected shaders:
   Instrs: 740042 -> 736360 (-0.50%); split: -1.25%, +0.75%
   CodeSize: 11984176 -> 11925888 (-0.49%); split: -1.23%, +0.74%
   Number of spill instructions: 4675 -> 4676 (+0.02%)
   Number of fill instructions: 5698 -> 5684 (-0.25%); split: -0.28%, +0.04%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Alyssa Rosenzweig
796886f72c jay/assign_flags: refactor for next commit
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
2026-04-28 23:13:50 +00:00
Georg Lehmann
26ec32dada intel/nir_opt_peephole_ffma: fix fp_math_ctlr for modifiers
If abs/neg don't preserve nan/inf/sz, the whole expressions won't.

Fixes: 1b0808adf3 ("intel/nir: Make ffma peephole optimization preserve fp_fast_math flags")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41101>
2026-04-28 18:26:58 +00:00
Ian Romanick
e301817753 brw: Don't lower phis involved in DPAS instructions to scalar
On my Arc A380 (DG2), this more than doubles the performance of Jeff
Bolz's cooperative matrix benchmark. With llama.cpp modified to use
cooperative matrix on DG2, performance is improved by 37%.

Closes: #15311
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
2026-04-27 18:09:16 +00:00
Ian Romanick
09b43966ba brw: Lower all phis to scalar
The next commit will cause some very specific phis to not be lowered to
scalar, and that's the reason the callback is used instead of
nir_lower_all_phis_to_scalar.

It's worth noting that the comment in nir_lower_phis_to_scalar.c
specifically calls out Deus Ex as the reason some phis should not be
lowered. At least on current BRW, zero shaders from Deus Ex trace were
affected for spills or fills on any Intel platform.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050005 -> 17051449 (<.01%)
instructions in affected programs: 41032 -> 42476 (3.52%)
helped: 29 / HURT: 159

total cycles in shared programs: 876411976 -> 876433702 (<.01%)
cycles in affected programs: 1455550 -> 1477276 (1.49%)
helped: 40 / HURT: 150

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 916599633 -> 916694854 (+0.01%); split: -0.00%, +0.01%
CodeSize: 14705971792 -> 14708302384 (+0.02%); split: -0.00%, +0.02%
Send messages: 40870114 -> 40870113 (-0.00%)
Cycle count: 102360965889 -> 102364169753 (+0.00%); split: -0.00%, +0.01%
Spill count: 3460669 -> 3460240 (-0.01%)
Fill count: 4988325 -> 4987891 (-0.01%)
Max live registers: 192914542 -> 192918153 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 48848112 -> 48848128 (+0.00%)
Non SSA regs after NIR: 141633613 -> 141671589 (+0.03%); split: -0.00%, +0.03%

Totals from 5713 (0.28% of 2010434) affected shaders:
Instrs: 5215921 -> 5311142 (+1.83%); split: -0.09%, +1.91%
CodeSize: 88940784 -> 91271376 (+2.62%); split: -0.20%, +2.82%
Send messages: 284751 -> 284750 (-0.00%)
Cycle count: 275671864 -> 278875728 (+1.16%); split: -0.74%, +1.90%
Spill count: 857 -> 428 (-50.06%)
Fill count: 845 -> 411 (-51.36%)
Max live registers: 667776 -> 671387 (+0.54%); split: -0.86%, +1.40%
Max dispatch width: 160416 -> 160432 (+0.01%)
Non SSA regs after NIR: 1127904 -> 1165880 (+3.37%); split: -0.10%, +3.47%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
2026-04-27 18:09:16 +00:00
Alyssa Rosenzweig
bccaeb28bb brw/nir_lower_cs_intrinsics: do some math at 16-bit
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There are less than 2^16 lanes within a threadgroup, so it is safe to do
all math at 16-bit. This allows us to use 16-bit integer division which is
much faster than 32-bit integer division (in terms of the lowerings).

In a "hello world" kernel with variable wg size, simd32 goes 72 inst -> 57
inst on jay and 82 -> 67 inst on brw.

OTOH it's a loss for non-variable wg size, so do it only there to avoid
unwelcome stats regresions on Vulkan.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41084>
2026-04-24 17:13:24 +00:00
Caio Oliveira
0422165d9a brw: Remove various unused fields
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
These are a mix of fields whose last used was removed or fields that were
never used, possibly because they remained in a patch while the rest of the
code changed before landing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41139>
2026-04-24 15:04:25 +00:00
Caio Oliveira
26ef12f7c1 brw: Use brw prefix to LSC helpers tied to brw
Mapping from BRW ops to LSC ops.  And the len() helpers
that use the REG_SIZE as unit -- which is a BRW convention.

Acked-by: Iván Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41006>
2026-04-22 18:25:41 +00:00
Caio Oliveira
9329da6d88 brw: Don't set saturate for SYNC instruction
This helper might be used as by another instruction emission,
which itself might have set the saturate bit in the default
state.  This might result in the SYNC being created already
with saturate bit set.

Since SYNC doesn't have saturate, clear that field
instead of sometimes having it set.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41005>
2026-04-22 16:06:42 +00:00
Sagar Ghuge
620835926d brw: Pass write back register for ray query messages
For DG2 (Bspec 47937) has the same programming note as of Xe2+,

   "When this bit is set in the header, Trace Ray Message behaves like a
   Ray Query. This message requires a write-back message indicating
   RayQuery for all valid Rays (SIMD lanes) have completed."

So this patch is just passing a write back destination register when we
have ray query message.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039>
2026-04-21 23:16:09 +00:00
José Roberto de Souza
64bc538f5e intel/brw: Explicitly upcast UB to UW for SHR with vector immediates
HW does not allow instructions with vector immediates to cross a GRF boundary if
it has a stride.

Under register pressure, the register allocator may place a temporary register
across such a boundary.

To resolve this, we now explicitly emit a MOV to upcast the UB payload into a
UW VGRF.
This ensures the SHR instruction operates on a dense, well-aligned region that
satisfies hardware alignment constraints.

Below is the portion of the shader exhibiting this issue:

Native code for unnamed fragment shader GLSL6 (src_hash 0x9c84a007) (sha1 48745e7dae90d08f8a9bbe4dbf837de23440c841f0344e669cb8af9df79bce58)
SIMD32 shader: 44 instructions. 0 loops. 354 cycles. 0:0 spills:fills, 2 sends, scheduled with mode latency-sensitive. Promoted 0 constants. GRF registers: 22. Non-SSA regs (after NIR): 11. Compacted 800 to 800 bytes (0%)
mov(1)          f1<1>UW         g0.30<0,1,0>UW                  { align1 WE_all 1N };
mov(1)          f1.1<1>UW       g1.30<0,1,0>UW                  { align1 WE_all 1N I@1 };
mov(32)         g2<2>UW         g0.20<2,8,0>UW                  { align1 WE_all };
mov(32)         g4<2>UW         g0.21<2,8,0>UW                  { align1 WE_all };
mov(32)         g8<2>UW         g1.20<2,8,0>UW                  { align1 WE_all };
mov(32)         g10<2>UW        g1.21<2,8,0>UW                  { align1 WE_all };
mov(16)         g12<4>UB        g0.60<1,8,0>UB                  { align1 1H };
mov(16)         g13<4>UB        g1.60<1,8,0>UB                  { align1 2H };
add(32)         g0<1>UW         g2<16,8,2>UW    0x01000100V     { align1 WE_all I@6 };
add(32)         g1<1>UW         g4<16,8,2>UW    0x01010000V     { align1 WE_all I@6 };
add(32)         g2<1>UW         g8<16,8,2>UW    0x01000100V     { align1 WE_all I@6 };
add(32)         g3<1>UW         g10<16,8,2>UW   0x01010000V     { align1 WE_all I@6 };
shr(16)         g4<1>UW         g12<32,8,4>UB   0x76543210V     { align1 1H I@6 };
mov(16)         g14.32<4>UB     g13<32,8,4>UB                   { align1 2H I@6 };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N I@6 };
mov(16)         g5<1>UW         g0<16,8,2>UW                    { align1 1H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 1N I@6 };
mov(16)         g0<1>UW         g1<16,8,2>UW                    { align1 1H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 5N I@6 };
mov(16)         g5.16<1>UW      g2<16,8,2>UW                    { align1 2H };
sync nop(1)                     null<0,1,0>UB                   { align1 WE_all 5N I@6 };
mov(16)         g0.16<1>UW      g3<16,8,2>UW                    { align1 2H };
shr(16)         g4.16<1>UW      g14.32<32,8,4>UB 0x76543210V    { align1 2H I@5 };
    ERROR: Invalid register region for source 0.  See special restrictions section.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40856>
2026-04-21 22:51:45 +00:00
Jordan Justen
fa784fffd0 brw: Don't set header_size at init since it will be re-set in later code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ref: efcba73b49 ("brw: switch to new sampler payload description scheme")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41035>
2026-04-21 19:23:41 +00:00
Lionel Landwerlin
0539f26065 brw: track push constants shader stats
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39451>
2026-04-21 16:29:14 +00:00
Sagar Ghuge
7a627fa8f3 anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and
DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we
need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH
levels.

Fixes: cb423ee6 ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921")
Fixes: c1a44e8d ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41012>
2026-04-21 01:38:34 +00:00
Alyssa Rosenzweig
fd46a48ccc jay/ra: only use stride=4 temps
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
SIMD16:

   Totals from 56 (2.12% of 2647) affected shaders:
   Instrs: 541831 -> 542004 (+0.03%); split: -0.40%, +0.44%
   CodeSize: 8597680 -> 8597248 (-0.01%); split: -0.45%, +0.44%

SIMD32:

   Totals:
   Instrs: 4858179 -> 4734713 (-2.54%); split: -2.78%, +0.24%
   CodeSize: 78651424 -> 76667440 (-2.52%); split: -2.76%, +0.24%

   Totals from 1108 (41.86% of 2647) affected shaders:
   Instrs: 4241312 -> 4117846 (-2.91%); split: -3.18%, +0.27%
   CodeSize: 68753152 -> 66769168 (-2.89%); split: -3.16%, +0.27%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig
1f62da938b jay/ra: drop memory copy reordering
No shader-db changes, and no longer required for correctness.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig
45845ea7f2 jay/ra: use accumulator for stride=4 swaps
SIMD16:

   Totals:
   Instrs: 2767930 -> 2767190 (-0.03%)
   CodeSize: 44327408 -> 44312304 (-0.03%); split: -0.04%, +0.00%

   Totals from 142 (5.36% of 2647) affected shaders:
   Instrs: 658928 -> 658188 (-0.11%)
   CodeSize: 10514512 -> 10499408 (-0.14%); split: -0.16%, +0.01%

SIMD32:

   Totals:
   Instrs: 4884039 -> 4858179 (-0.53%)
   CodeSize: 79079008 -> 78651424 (-0.54%); split: -0.54%, +0.00%

   Totals from 761 (28.75% of 2647) affected shaders:
   Instrs: 3803274 -> 3777414 (-0.68%)
   CodeSize: 61707728 -> 61280144 (-0.69%); split: -0.70%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:12 +00:00
Alyssa Rosenzweig
489f883277 jay/ra: use accumulator for memory swaps
SIMD1:

   Totals from 34 (1.28% of 2647) affected shaders:
   Instrs: 427731 -> 434349 (+1.55%); split: -0.03%, +1.58%
   CodeSize: 6773248 -> 6881136 (+1.59%); split: -0.04%, +1.63%
   Number of spill instructions: 1833 -> 1700 (-7.26%)
   Number of fill instructions: 2095 -> 1944 (-7.21%)

SIMD32:

   Totals from 621 (23.46% of 2647) affected shaders:
   Instrs: 3663406 -> 3739089 (+2.07%); split: -0.62%, +2.68%
   CodeSize: 59392464 -> 60624704 (+2.07%); split: -0.61%, +2.68%
   Number of spill instructions: 52115 -> 50109 (-3.85%); split: -3.90%, +0.05%
   Number of fill instructions: 53864 -> 51355 (-4.66%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig
2e5fd6da42 jay/ra: use accumulator for memory copies
SIMD16:

   Totals from 34 (1.28% of 2647) affected shaders:
   Instrs: 424527 -> 427731 (+0.75%); split: -0.03%, +0.78%
   CodeSize: 6720896 -> 6773248 (+0.78%); split: -0.04%, +0.82%
   Number of spill instructions: 1967 -> 1833 (-6.81%)
   Number of fill instructions: 2247 -> 2095 (-6.76%)

SIMD32:

   Totals:
   Instrs: 4691989 -> 4808356 (+2.48%); split: -0.46%, +2.94%
   CodeSize: 76011248 -> 77884320 (+2.46%); split: -0.46%, +2.92%
   Number of spill instructions: 54223 -> 52115 (-3.89%); split: -4.08%, +0.19%
   Number of fill instructions: 56519 -> 53864 (-4.70%)

   Totals from 606 (22.89% of 2647) affected shaders:
   Instrs: 3509511 -> 3625878 (+3.32%); split: -0.61%, +3.93%
   CodeSize: 56909488 -> 58782560 (+3.29%); split: -0.61%, +3.90%
   Number of spill instructions: 54223 -> 52115 (-3.89%); split: -4.08%, +0.19%
   Number of fill instructions: 56519 -> 53864 (-4.70%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig
7d2a88a9e5 jay/ra: don't reserve registers when not spilling
No changes at SIMD16. At SIMD32:

Totals:
Instrs: 4691895 -> 4691989 (+0.00%); split: -0.03%, +0.03%
CodeSize: 76010880 -> 76011248 (+0.00%); split: -0.03%, +0.03%
Number of spill instructions: 54369 -> 54223 (-0.27%)
Number of fill instructions: 56668 -> 56519 (-0.26%)

Totals from 71 (2.68% of 2647) affected shaders:
Instrs: 75963 -> 76057 (+0.12%); split: -1.67%, +1.79%
CodeSize: 1229792 -> 1230160 (+0.03%); split: -1.71%, +1.74%
Number of spill instructions: 146 -> 0 (-inf%)
Number of fill instructions: 149 -> 0 (-inf%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:11 +00:00
Alyssa Rosenzweig
e5bf153d4f jay/lower_post_ra: drop old 2<-->8 lowering
this XOR based lowering is no longer needed.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig
915af8e121 jay/lower_post_ra: remove SWAP macro
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig
4c5ad7a832 jay/register_allocate: start using accumulators
this lets us lower away 8<-->2 copies/swaps in a faster, more straightforward
way by (ab)using accumulators. I think as an edge case this plays nicely enough
with my plans to profit from accs for normal fma-heavy code.

SIMD16:

   Totals:
   Instrs: 2761525 -> 2758108 (-0.12%)
   CodeSize: 44222384 -> 44167168 (-0.12%)

   Totals from 33 (1.25% of 2647) affected shaders:
   Instrs: 422130 -> 418713 (-0.81%)
   CodeSize: 6713680 -> 6658464 (-0.82%)

SIMD32:

   Totals:
   Instrs: 4911601 -> 4691895 (-4.47%)
   CodeSize: 79553984 -> 76010880 (-4.45%)

   Totals from 947 (35.78% of 2647) affected shaders:
   Instrs: 4143501 -> 3923795 (-5.30%)
   CodeSize: 67174592 -> 63631488 (-5.27%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:10 +00:00
Alyssa Rosenzweig
53c1c076a8 jay: validate non-SSA accumulators
just enough for us to do parallel copy lowering with them.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig
28cf0f52c1 jay/to_binary: handle packing accumulators
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig
aa37d8b248 jay/print: deal with bare r0 copies
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:09 +00:00
Kenneth Graunke
e55af8793f jay: Add missing ROR case
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig
6c862b1951 jay: fix SEL types
SEL.f32 flushes denorms but SEL.u32 does not. That means changing the type of
the SEL is only justified if we know we're used as a float. This fixes
miscompilation in cases like:

   ieq(1, bcsel(a, fneg(b), c))

Previously we'd be too greedy and form

   (a) SEL.f32 t, -b, c
   cmp.u32 t, 1

But that would inadvertently flush c which is an integer here. So just set the
type based on what we're used as. Some regressions due to is_only_used_as_float
not seeing through phis (..could probably be fixed?).

Totals:
Instrs: 2760796 -> 2761525 (+0.03%); split: -0.06%, +0.08%
CodeSize: 44244128 -> 44222384 (-0.05%); split: -0.13%, +0.08%

Totals from 945 (35.70% of 2647) affected shaders:
Instrs: 1968645 -> 1969374 (+0.04%); split: -0.08%, +0.11%
CodeSize: 31721968 -> 31700224 (-0.07%); split: -0.17%, +0.11%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:09 +00:00
Alyssa Rosenzweig
b5898a418b jay: relax mov type check
prevents regression with next patch which turns u32 into s32.

Totals:
Instrs: 2764288 -> 2760796 (-0.13%)
CodeSize: 44299920 -> 44244128 (-0.13%); split: -0.13%, +0.00%

Totals from 193 (7.29% of 2647) affected shaders:
Instrs: 255455 -> 251963 (-1.37%)
CodeSize: 4160400 -> 4104608 (-1.34%); split: -1.34%, +0.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:07 +00:00
Alyssa Rosenzweig
1b648326ac jay: refuse to propagate ADDRESS copies
at least until we have address RA..

Totals:
Instrs: 2764282 -> 2764288 (+0.00%)
CodeSize: 44299872 -> 44299920 (+0.00%)

Totals from 2 (0.08% of 2647) affected shaders:
Instrs: 4215 -> 4221 (+0.14%)
CodeSize: 67456 -> 67504 (+0.07%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:07 +00:00
Alyssa Rosenzweig
56ffad0c3a jay: call DCE an extra time
Totals:
Instrs: 2767235 -> 2765908 (-0.05%); split: -0.10%, +0.05%
CodeSize: 44349488 -> 44328688 (-0.05%); split: -0.10%, +0.06%

Totals from 347 (13.11% of 2647) affected shaders:
Instrs: 718067 -> 716740 (-0.18%); split: -0.39%, +0.20%
CodeSize: 11626032 -> 11605232 (-0.18%); split: -0.39%, +0.21%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:06 +00:00
Alyssa Rosenzweig
d85eb51e17 jay/register_allocate: don't depend on indexing
this can get messed up by optimizations.

Totals:
Instrs: 2768612 -> 2764317 (-0.16%); split: -0.29%, +0.13%
CodeSize: 44367648 -> 44300352 (-0.15%); split: -0.28%, +0.13%

Totals from 867 (32.75% of 2647) affected shaders:
Instrs: 1694745 -> 1690450 (-0.25%); split: -0.47%, +0.22%
CodeSize: 27387648 -> 27320352 (-0.25%); split: -0.46%, +0.21%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:06 +00:00
Alyssa Rosenzweig
a964f321a5 jay: don't print internal without the flag
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:06 +00:00
Alyssa Rosenzweig
3a73c76373 jay: fix spiller coupling code
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:05 +00:00
Alyssa Rosenzweig
cd6c5a2f90 jay: improve spiller debug
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:05 +00:00
Alyssa Rosenzweig
d637554418 jay: fix simd32 deswizzle
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:05 +00:00
Alyssa Rosenzweig
f728e3cb05 jay: test logic op fusing
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:04 +00:00
Alyssa Rosenzweig
698223ccd1 jay/test-optimizer: fuse before/after cases
new macro to DRY.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:04 +00:00
Alyssa Rosenzweig
99796bff04 jay: fold logic ops
Totals:
Instrs: 2798036 -> 2784419 (-0.49%); split: -0.58%, +0.10%
CodeSize: 44815024 -> 44614000 (-0.45%); split: -0.56%, +0.11%
Number of fill instructions: 2270 -> 2280 (+0.44%)

Totals from 1298 (49.04% of 2647) affected shaders:
Instrs: 2165338 -> 2151721 (-0.63%); split: -0.75%, +0.13%
CodeSize: 34865440 -> 34664416 (-0.58%); split: -0.72%, +0.15%
Number of fill instructions: 1571 -> 1581 (+0.64%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:04 +00:00
Alyssa Rosenzweig
5d22e9d2a5 jay: allow predication of pure-flag instrs
i.e. compares

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:03 +00:00
Alyssa Rosenzweig
2ab8a614dd jay/register_allocate: tie predicated-defaults
(if we can)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:03 +00:00
Alyssa Rosenzweig
d74ada78c0 jay/assign_flags: handle predicated CMP
the optimizer will generate this soon, so make sure flag RA can deal.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:02 +00:00
Alyssa Rosenzweig
375945ea0b jay/lower_pre_ra: skip predication
otherwise the assert blows up

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:02 +00:00
Alyssa Rosenzweig
176b9a0f0c jay/opt_dead_code: handle predication
otherwise we'll get validation splat soon.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41064>
2026-04-20 22:32:02 +00:00