Commit graph

222863 commits

Author SHA1 Message Date
Alyssa Rosenzweig
bc22a37d98 jay: schedule for pressure
Implement a simple pre-RA bottom-up list scheduler with the goal of decreasing
register pressure. On Xe2, this significantly reduces spilling.

SSA form allows us to estimate register demand cheaply and accurately, which
theoretically [1] gives this algorithm the two Hippocratic properties:

1. Shaders with low register pressure are unaffected.
2. Register pressure can only be decreased, never increased.

In other words: first, do no harm.

The heuristic itself is very simple: greedily choose instructions that decrease
liveness using a backwards list scheduler. This is far from optimal! But thanks
to the above properties, even a heuristic that picked random instructions would
be a win overall - by construction, we can only ever win.

In other words: this scheduler is your older brother powering off the game
console any time he's about to lose a game, maintaining a 100% win rate.

[1] In reality, neither property is strictly satisfied due to the messy details
of mapping our clean logical model onto Intel's many weird physical register
files. Nevertheless, the algorithm is well-motivated and the empirical results
on Xe2 are excellent.

SIMD16:

   Totals:
   Instrs: 2754194 -> 2753957 (-0.01%); split: -0.23%, +0.22%
   CodeSize: 41094768 -> 41092768 (-0.00%); split: -0.23%, +0.23%
   Number of spill instructions: 1724 -> 1129 (-34.51%)
   Number of fill instructions: 1912 -> 1119 (-41.47%)

   Totals from 168 (6.35% of 2647) affected shaders:
   Instrs: 850994 -> 850757 (-0.03%); split: -0.75%, +0.73%
   CodeSize: 12825680 -> 12823680 (-0.02%); split: -0.74%, +0.73%
   Number of spill instructions: 1724 -> 1129 (-34.51%)
   Number of fill instructions: 1912 -> 1119 (-41.47%)

SIMD32:

   Totals:
   Instrs: 4688858 -> 4557800 (-2.80%); split: -3.53%, +0.74%
   CodeSize: 70177200 -> 68214816 (-2.80%); split: -3.53%, +0.74%
   Number of spill instructions: 50316 -> 45795 (-8.99%); split: -9.56%, +0.57%
   Number of fill instructions: 51526 -> 45075 (-12.52%); split: -13.23%, +0.71%

   Totals from 819 (30.94% of 2647) affected shaders:
   Instrs: 3810182 -> 3679124 (-3.44%); split: -4.35%, +0.91%
   CodeSize: 57044000 -> 55081616 (-3.44%); split: -4.35%, +0.91%
   Number of spill instructions: 49264 -> 44743 (-9.18%); split: -9.76%, +0.58%
   Number of fill instructions: 50182 -> 43731 (-12.86%); split: -13.58%, +0.73%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
81e21a8756 jay: factor jay_op_(starts,ends)_block queries
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
e72ffb0046 jay: annotate pure sends
for scheduling, CSE, etc

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
c069b7e47c jay/opt_propagate: avoid branching on poison
logically it doesn't matter because we'll bail on a later check, but this is
still UB and therefore releases nasal demons.

i am jealous of Faith's Rust compilers. there, I said it.

==107281== Conditional jump or move depends on uninitialised value(s)
==107281==    at 0x7069768: propagate_backwards (jay_opt_propagate.c:327)
==107281==    by 0x7069768: jay_opt_propagate_backwards (jay_opt_propagate.c:367)
==107281==    by 0x7058960: jay_compile (jay_from_nir.c:2677)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
4b0c3f5c32 jay/lower_scoreboard: add asserts on key bounds
if these are botched you get UB (-:

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
4c97493b69 jay/lower_scoreboard: handle accumulator hazard
Challenging to hit but fixes
dEQP-GLES3.functional.shaders.swizzle_math_operations.vector_multiply.mediump_ivec4_wzyx_zyxw_fragment
with scheduling changes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
9a68101bc2 jay/liveness: drop redundant source filtering
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
9b68b4e7a1 jay/liveness: speed up physical CFG merging
on top of scheduler changes, compile-time of shaders/blender/1017.shader_test:

Difference at 95.0% confidence
	-0.00173202 +/- 0.00116931
	-0.791537% +/- 0.532384%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
1b50d3eed2 jay/liveness: remove pointless bitset init
dup initializes it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
5da3b57605 jay: insert simd32 deswizzle in a dedicated pass
we don't actually need the DESWIZZLE pseudo instruction, and the pseudo op
complicates pre-RA scheduling.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Alyssa Rosenzweig
47c6601d5e jay: relax fragment payload layout
this isn't optimal but it should unblock bring up.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Co-authored-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:46 +00:00
Kenneth Graunke
cb75c9f962 brw: Lower sample_pos for non-per-sample shaders in NIR
We generalize the sample_mask_in lowering to handle this too.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
2026-05-21 15:34:45 +00:00
Mike Blumenkrantz
58308b7580 zink: add another anv/adl flake
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41728>
2026-05-21 14:58:52 +00:00
Mike Blumenkrantz
64be743fbe zink: fix unbinding vertex buffers from null VS state
num_bindings doesn't encompass all the bound buffers if bindings reuse
the same buffers

Fixes: f8c96df9d2 ("zink: move vbo unbind to bind_vertex_state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41728>
2026-05-21 14:58:52 +00:00
Samuel Pitoiset
07754c960a radv: validate drirc option names at compile time
This would prevent any typos or if something is backported incorrectly
in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41700>
2026-05-21 14:26:28 +00:00
Samuel Pitoiset
ccb669a05f util: add very basic way to validate drirc files
This just checks for option names that don't exist. This is something
that already happened in the past with RADV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41700>
2026-05-21 14:26:28 +00:00
Samuel Pitoiset
e685f8d6aa radv/ci: cleanup list of expected failures
Triage invalid tests to make it easier to see real failures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41717>
2026-05-21 14:03:22 +00:00
Samuel Pitoiset
91cf0a6e6d radv: use the new generation script for drirc
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41634>
2026-05-21 12:57:43 +00:00
Samuel Pitoiset
bf787fd91b radv: rename few drirc options for consistency
So that the option name matches everywhere.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41634>
2026-05-21 12:57:41 +00:00
Lucas Francisco Fryzek
7b84183201 util/u_trace: Don't use empty initializer list
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Modify empty initializer list to use a zero initializer so we aren't
relying on the gnu extension.

Fixes: 690d9b0d00 ("util/u_trace: Rework resource management")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41715>
2026-05-21 12:18:07 +00:00
Benjamin Gaignard
0e91cf34af pan/format: Advertise support for AFBC(32x8,sparse)
Some video decoders spit out AFBC(32x8,sparse) images. Advertise
support for this modifier so we can import such images.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
2026-05-21 11:50:16 +00:00
Daniel Stone
4203b770b4 pan/afbc: Properly validate format/parameter combinations
AFBC has a number of superblock sizes and valid layouts, with differing
combinations allowed.

It's quite clear that 16x16 is ambivalent about whether or not
block-split mode is used. 64x4 prohibits block-split mode, and 32x8
either requires or prohibits it depending on the format.

Add proper handling so we filter out the right combinations.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
2026-05-21 11:50:16 +00:00
Daniel Stone
c5415c7aed pan/mod: Reorder linear modifier checks
As with AFBC, split the checks into 'can this ever work' vs. 'can this
work for what I want it to?'.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
2026-05-21 11:50:16 +00:00
Daniel Stone
4364f5352a pan/mod: Protect against no usage flags for 64k
This doesn't happen now, but it will later.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
2026-05-21 11:50:15 +00:00
Daniel Stone
0fb529053b pan/afbc: Code motion for split modifier queries
Reorder the AFBC modifier checking code to first query whether the
device can do the mode at all, then to query whether or not the format +
modifier is supported at all, then to query whether the specific image
usage is OK, then to query whether or not it's optimal.

This will come in useful later when we want to split modifier queries
into: can this modifier ever be used, what can this modifier be used
for, and is this the best modifier for this usage.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40886>
2026-05-21 11:50:15 +00:00
Karol Herbst
48ec237bf9 zink: proper advertise keep_weak_ffma for fp16
Zink never sets the fp16 screen cap, but the caps also are initialized
after zink_screen_init_compiler. So just replicate the check to be safe
here.

Fixes: 2146e09962 ("zink: keep ffma_weak and use GLSLstd450Fma for it")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41722>
2026-05-21 10:50:32 +00:00
squidbus
b1c72223af kk: Support VK_KHR_unified_image_layouts
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Metal has no concept of image layouts, and we don't care about them.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41613>
2026-05-21 09:59:38 +00:00
squidbus
f52f7bf8d5 kk: Support attachment feedback loop extensions
Metal GPU image optimization is disabled for attachment feedback
usage since it causes some CTS flakes.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41613>
2026-05-21 09:59:38 +00:00
squidbus
2a119991f6 kk: Support VK_KHR_shader_fma
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41692>
2026-05-21 09:36:35 +00:00
squidbus
33ce3040e6 kk: Support VK_EXT_host_image_copy
Metal provides straightforward ways to copy an image to/from memory,
and image-to-image copies can be implemented by chaining them.

Note that host copy of combined depth-stencil is not supported, as
Metal does not allow CPU copy for these formats. Additionally, GPU
optimized contents are not allowed with host image copy usage; CTS
directly initializes the raw memory of optimized images to random
invalid data, which appears to decompress differently on GPU vs CPU
and fail.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>
2026-05-21 02:06:46 -07:00
squidbus
76125cb7af kk: Separate linear and GPU optimized image layout properties
`linear` controls whether the created image is in linear layout, and
`optimized_layout` controls only the `allowGPUOptimizedContents`
Metal property.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41714>
2026-05-21 02:06:46 -07:00
Valentine Burley
8cc3ca6231 turnip/ci: Add nightly Android CTS job
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The job runs the following modules with ANGLE:
 - CtsGraphicsTestCases
 - CtsNativeHardwareTestCases
 - CtsSkQPTestCases

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
2026-05-21 06:21:02 +00:00
Valentine Burley
03a84a1e03 ci/android: Add arm64 support for Android CTS
Android CTS for both arm64 and x86_64 Android targets always ships with
an x86_64 host JDK. Tradefed supports running on arm64 hosts though, so
provide a native JDK by installing Debian's openjdk-21-jdk-headless
package on arm64.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
2026-05-21 06:21:02 +00:00
Valentine Burley
aeb40ed23b ci/android: Update Android CTS to android-cts-16.0_r5
The latest Android 16 release.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
2026-05-21 06:21:02 +00:00
Valentine Burley
5dacbdc3ce ci: Bump ci-deb-repo revision to update aapt
Update aapt from the Android 14-based version in Trixie to a custom
fork based on the upstream Android 16 QPR2 branch, which fixes the
following error spam on arm64:

E aapt2   : Entry offset at index 0 points outside the Type's boundaries
E aapt2   : Entry offset at index 1 points outside the Type's boundaries
E aapt2   : Entry offset at index 2 points outside the Type's boundaries
E aapt2   : Entry offset at index 3 points outside the Type's boundaries
E aapt2   : Entry offset at index 4 points outside the Type's boundaries
E aapt2   : Entry offset at index 5 points outside the Type's boundaries
E aapt2   : Entry offset at index 6 points outside the Type's boundaries
E aapt2   : Entry offset at index 7 points outside the Type's boundaries
E aapt2   : Entry offset at index 8 points outside the Type's boundaries
E aapt2   : Entry offset at index 9 points outside the Type's boundaries
E aapt2   : Entry offset at index 10 points outside the Type's boundaries
E aapt2   : Entry offset at index 11 points outside the Type's boundaries
E aapt2   : Entry offset at index 12 points outside the Type's boundaries
E aapt2   : Entry offset at index 13 points outside the Type's boundaries
E aapt2   : Entry at index 14 is too small (0)
E aapt2   : Index 15 points to entry with unaligned offset 0x03080001

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41440>
2026-05-21 06:21:02 +00:00
Timothy Arceri
ca88f851c8 ac/nir/lower_tex_coord: basic lower tex coord test
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Tests issue from: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15494

Assisted-by: ChatGPT (GPT-5.5)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41666>
2026-05-21 00:54:56 +00:00
Mike Blumenkrantz
eb5bb61f87 lavapipe: enable some forgotten ds3 states
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
these are already used by shader objects

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41683>
2026-05-20 22:53:35 +00:00
Collabora's Gfx CI Team
18ba81e5b6 Uprev Piglit to 6fd29fe44f8857b876a67bee962919635f22ecc8
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
11ce9eb56e...6fd29fe44f

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>
2026-05-20 21:37:44 +00:00
Sergi Blanch Torne
7831892158 xfiles: update before uprev
Running jobs in the uprev, some results don't come from the uprev itself, but
they are already in the mesa nightly run:
https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/1670759.

Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>
2026-05-20 21:37:44 +00:00
Mike Blumenkrantz
c7758681f3 zink: rework custom sample locations
this is more consistent and comprehensible

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41707>
2026-05-20 21:15:48 +00:00
Christoph Neuhauser
7eba054c5b anv: Add compute only divergent atomics fusion optimization for Blender
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Blender uses atomic operations as part of its virtual shadow mapping
implementation. Virtual shadow mapping page tagging in compute shaders
benefits from divergent atomics fusion, while fragment shaders doing the
atomic raster step in general have worse performance with this
optimization turned on.
Thus, an option is added to only apply divergent atomics fusion to compute
shaders in ANV, and this option is enabled for Blender.

Initial support for divergent atomics fusion optimization in ANV was added
in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631.

Signed-off-by: Christoph Neuhauser <christoph.neuhauser@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41706>
2026-05-20 19:29:15 +00:00
Jordan Justen
28f6a442c6 brw/compact: Precompact using 2src fields on 3src instructions
In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not
compact an instruction because precompact overwrites portions of the
instruction. (Treating the three source instruction as a two source
when accessing instruction fields.)

This instruction could be compacted:

mad(8)          g65<1>F         g61<4,4,1>F     g64<4,4,1>F     -g17<4,4,1>F { align16 1Q };

But, since precompact erroneously sets bits, the instruction isn't
compacted.

Fossil testing:

 * Tested with 0a3f3fd193 ("brw: drop unused color_outputs_valid
   key") reverted, as fossils are currently producing inconsitent
   results otherwise.

 * Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change.

SKL:

Totals:
CodeSize: 8335219296 -> 8320248992 (-0.18%)

Totals from 359508 (14.42% of 2492689) affected shaders:
CodeSize: 2838254352 -> 2823284048 (-0.53%)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>
2026-05-20 11:52:52 -07:00
Mike Blumenkrantz
65b75137b4 zink: disable implicit sync handling for qcom proprietary
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41710>
2026-05-20 17:58:16 +00:00
Karol Herbst
8735aa72a1 nak: optimize iadds with an uniform operand in iadds of address calculations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Instead of doing the iadd manually we can use the uniform slot of the
ld/st/atom instruction getting rid of the iadd altogether.

Additionally for global memory we can also consume a 32 bit offset instead
of requiring it to be 64 bit.

Totals from 158539 (13.07% of 1212873) affected shaders:
CodeSize: 2308216336 -> 2242231136 (-2.86%); split: -2.86%, +0.00%
Number of GPRs: 8682436 -> 8662675 (-0.23%); split: -0.26%, +0.04%
SLM Size: 238816 -> 238604 (-0.09%)
Static cycle count: 2169063422 -> 2147747544 (-0.98%); split: -0.99%, +0.01%
Spills to memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Fills from memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Spills to reg: 45053 -> 45273 (+0.49%); split: -0.04%, +0.53%
Fills from reg: 36385 -> 36757 (+1.02%); split: -0.04%, +1.06%
Max warps/SM: 6027232 -> 6034616 (+0.12%); split: +0.12%, -0.00%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
2026-05-20 17:23:33 +00:00
Karol Herbst
4cebda7f66 nak: add UGPR/GPR lowering for load/store/atom instructions
This tries to handle all combinations we might run into to. We should rely
on previous optimizations that the more difficult cases never happend.

As a side benefit instead of lowering a UGPR to a GPR, it will now be
moved to the UGPR slot.

Totals from 258010 (21.27% of 1212873) affected shaders:
CodeSize: 3742700224 -> 3576740928 (-4.43%); split: -4.44%, +0.01%
Number of GPRs: 13606055 -> 13496463 (-0.81%); split: -0.86%, +0.05%
SLM Size: 589740 -> 589660 (-0.01%)
Static cycle count: 3271547493 -> 3272550831 (+0.03%); split: -0.47%, +0.50%
Spills to memory: 56180 -> 56136 (-0.08%)
Fills from memory: 56180 -> 56136 (-0.08%)
Spills to reg: 108211 -> 110013 (+1.67%); split: -0.63%, +2.30%
Fills from reg: 99216 -> 100471 (+1.26%); split: -0.30%, +1.56%
Max warps/SM: 9921228 -> 9972060 (+0.51%); split: +0.52%, -0.00%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
2026-05-20 17:23:33 +00:00
Karol Herbst
273204e24e nir: add uniform address to nvidia IO intrinsics
Adding the zero constants have a minor impact on stats due to some unlucky
interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs.

Totals from 61 (0.01% of 1212873) affected shaders:
CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27%
Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
2026-05-20 17:23:33 +00:00
Karol Herbst
22daaddd67 nak: wire up UGPR Ld/St/Atom encoding
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
2026-05-20 17:23:33 +00:00
Karol Herbst
b1323de44a nak/sm70: add helper for memory load store addresses
This also makes the selection of 32 vs 64 bit addresses based on the
actual source in the IR.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
2026-05-20 17:23:33 +00:00
Karol Herbst
32fd51687d nir: add nir_intrinsic_cmat_load_shared_nv to nir_get_io_offset_src_number
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
2026-05-20 17:23:32 +00:00
Konstantin Seurer
cfdaa26a64 vulkan,spirv: Update spec to 1.4.352
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41682>
2026-05-20 15:36:39 +00:00