Commit graph

219661 commits

Author SHA1 Message Date
Mary Guillemard
8f2eeee7ba vulkan: Do not override the shader_flags in case of no task shader
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This should be doing a or and not an assign.
This fixes issues on NVK with mesh stages on DGC.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40266>
2026-03-10 20:03:56 +00:00
Antonino Maniscalco
d526bbc29b zink: don't care about generated gs output primitive
Zink uses the output primitive of the last vertex stage when deciding
the raster primitive. When we generate the gs the output primitive
depends on the raster primitive.

Not only does the generated gs output primitive have no value in chosing
the raster primitive, it can also get us stuck with the last raster
primitve which is of course incorrect.

Ignore it for generated shaders.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32399>
2026-03-10 19:21:08 +00:00
Lionel Landwerlin
df06d117c5 anv: fix internal compute shader constant data pull
Forgot to update this path that must now use the new intrinsic.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15012
Fixes: 9f2215b480 ("anv/brw: remove push constant load emulation from the backend compiler")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
2026-03-10 18:24:04 +00:00
Lionel Landwerlin
f508c6acbb brw/nir: improve shader_indirect_data_intel handling
Use is_scalar to know if we can do transpose loading.

Also enable vectorization if 2 intrinsics share the same source (it
means the only difference is the base).

Fixes: e14d6b535c ("brw/nir: add new intrinsics to load data from the indirect address")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
2026-03-10 18:24:04 +00:00
Samuel Pitoiset
6c1d9612ef radv: only emit FORCE_S_VALID(1) for MSAA depth/stencil images
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This affects GFX12 only.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40303>
2026-03-10 17:38:55 +00:00
Samuel Pitoiset
7cd3d40f86 radv: set {color,ds}_samples for inherited rendering state
There is no distinction for secondaries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40303>
2026-03-10 17:38:55 +00:00
Samuel Pitoiset
0da3714bd3 ac,radv,radeonsi: add has_db_force_stencil_valid_bug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40303>
2026-03-10 17:38:54 +00:00
Samuel Pitoiset
cb5f2a0521 radv: initialize HiZ for UNDEFINED transitions on transfer queue
This doesn't consider layers/mips because it doesn't seem possible,
but it doesn't hurt correctness either, it just means HiZ is disabled.

This fixes dEQP-VK.api.copy_and_blit.core.use_after_copy.*_tq on GFX12.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40304>
2026-03-10 16:51:57 +00:00
Eric Engestrom
2b7077b8ba freedreno: fix a few missed afuc -> qrisc renames
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 6e3d805735 ("freedreno: Rename afuc to QRisc")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40322>
2026-03-10 16:05:24 +00:00
Silvio Vilerino
b629487a6a d3d12: Implement trim notification residency eviction
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40129>
2026-03-10 15:06:44 +00:00
Silvio Vilerino
e54e8fceec ci: Bump DirectX-Headers and Agility SDK dependencies to v1.619.1
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40129>
2026-03-10 15:06:44 +00:00
Georg Lehmann
d7348ea501 aco/ra: don't tie definition when the operand is in a preserved reg
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
444eb3dce5 aco/ra: try to allocate registers for dot2 to allow VOPD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
788aafba2a aco/sched_vopd: create dot2acc from VOP3P dot2
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
47599b2c38 aco/opt_postRA: remove try_convert_fma_to_vop2
This is now done directly in the VOPD scheduler.

Foz-DB GFX1201:
Totals from 600 (0.52% of 114655) affected shaders:
no stats changed

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
6cef434478 aco/sched_vopd: convert fma with inline constants to fmamk/fmaak
This optimization was previously done in the post-RA optimizer,
but it is more fitting for the vopd scheduler.

Doing it here also has the benefit that we don't unnecessarily use
the constant bus when VOPD can't be used.

No Foz-DB changes on GFX12 until the next commit.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
1ae9931145 aco/scheld_vopd: make VOPDInfo more flexible by adding a swizzle
No Foz-DB changes on GFX1201.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:55 +00:00
Erik Faye-Lund
e9e1d9a721 pan/ci: update traces result
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization changed the rendered result for 11 pixels, all with
less than 1% change. Neither the old nor the new is obviously more
correct than the other, and the CTS is fine. So let's assume this change
is unproblematic, and accept the new result.

Fixes: 3d304d5647 ("nir/opt_algebraic: remove is_used_once on outer instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40321>
2026-03-10 13:54:01 +00:00
Samuel Pitoiset
e470f9df7f zink/ci: update traces expectations for VANGOGH/GFX1201
They render correctly, just slightly differently.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40249>
2026-03-10 12:07:46 +00:00
Samuel Pitoiset
1d3a7fe191 zink/ci: update the lists for CEZANNE and VANGOGH
The guardband changes fixed these but I don't know why.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40249>
2026-03-10 12:07:46 +00:00
Samuel Pitoiset
e293993fff radv: optimize clipping performance with PA_SU_HARDWARE_SCREEN_OFFSET
This optimization was missing in RADV for a very long time.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6492
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40249>
2026-03-10 12:07:46 +00:00
Samuel Pitoiset
c7cfa5324d radv: use common guardband computations
That shouldn't change anything.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40249>
2026-03-10 12:07:46 +00:00
Samuel Pitoiset
bcccd49368 ac,radeonsi: move guardband computations to common code
Added a comment from Marek Olsak explaining this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40249>
2026-03-10 12:07:46 +00:00
Samuel Pitoiset
2ca7d93519 ac,radeonsi: pre-compute some raster config in ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40249>
2026-03-10 12:07:46 +00:00
Samuel Pitoiset
3e8e31add7 amd/drm-shim: bump version_minor to 52
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is required to make sure that conformant_trunc_coord is correctly
enabled/disabled. Otherwise, it might be disabled on GFX11 GPUs with
drm-shim.

Bumping the minor version shouldn't have any other effects.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40313>
2026-03-10 11:19:33 +00:00
Samuel Pitoiset
db905159fd amd/drm-shim: add phoenix
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40313>
2026-03-10 11:19:33 +00:00
Samuel Pitoiset
7fd114b563 amd/drm-shim: add rembrandt
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40313>
2026-03-10 11:19:33 +00:00
Valentine Burley
8b706f4c0f ci: Update kernel to pick up new network adapter
The only change in this kernel is enabling CONFIG_IGB for upcoming jobs,
with no impact on current jobs.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 11:12:26 +01:00
Valentine Burley
2addcc1dce venus/ci: Add an Android Venus on Turnip job on a618
Add a nightly job running Cuttlefish with Venus on Turnip.

Similar to the existing Venus-on-ANV jobs, this uses Cuttlefish's
'venus_guest_angle' mode to run deqp-vk and deqp-egl with ANGLE and
Venus inside the Android guest, with Turnip on the host.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:54:03 +01:00
Valentine Burley
3d00926006 ci: Add test-android container for arm64
Introduce the arm64 counterpart of the debian/x86_64_test-android
container/rootfs.

Building Android arm64 targets is complicated by the fact that Google
only provides the Android NDK for x86_64 hosts. Because of this, the
debian/arm64_test-android setup is split into two parts:

debian/arm64_test-android-tools
Despite the name, this is a native x86_64 container used to build
ANGLE, dEQP, and deqp-runner for Android arm64 targets. The resulting
artifacts are uploaded to S3 and later consumed by the final image.

debian/arm64_test-android
This is the final arm64 container/rootfs. It downloads the previously
built tools and installs the Cuttlefish Debian package.
The Cuttlefish guest image and additional host tools are not included
in this image. It is currently only used in LAVA, where Cuttlefish
artifacts can be deployed separately and kept cached across container
rebuilds.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:54:03 +01:00
Valentine Burley
e26a8f0e76 ci/container: Prepare test-android for multi-arch support
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:54:03 +01:00
Valentine Burley
9cd5239c01 ci/container: Generalize debian/x86_64_test-android container
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:54:03 +01:00
Valentine Burley
f50a358569 ci/android: Update Cuttlefish build
The new version has the following changes:
 - Working display on WebRTC with drm_hwcomposer after Wayland dmabuf
   server fixes
 - arm64 support for Venus GPU mode
 - Updated virglrenderer to latest main, 85c9cc77 ("vkr: enable
   VK_KHR_shader_fma")
 - Improved boot times
 - New DRM native context GPU modes

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:54:01 +01:00
Valentine Burley
12277d3f75 ci/android: Disable wifi for Cuttlefish
Wifi can occasionally cause crashes on the host, and we don't need it for
graphics testing.

[  401.084158] Unable to handle kernel paging request at virtual address ff800099ff80ffb2
[  401.092309] Mem abort info:
[  401.095190]   ESR = 0x0000000096000004
[  401.099045]   EC = 0x25: DABT (current EL), IL = 32 bits
[  401.104501]   SET = 0, FnV = 0
[  401.107640]   EA = 0, S1PTW = 0
[  401.110875]   FSC = 0x04: level 0 translation fault
[  401.115885] Data abort info:
[  401.118850]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  401.124489]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  401.129684]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  401.135140] [ff800099ff80ffb2] address between user and kernel address ranges
[  401.142468] Internal error: Oops: 0000000096000004 [#1]  SMP
[  401.148283] Modules linked in: vhost_vsock vhost vhost_iotlb ipv6
[  401.154556] CPU: 2 UID: 0 PID: 718 Comm: Wi-Fi HwsimMsg  Tainted: G        W           6.17.3-gddf65230edb2 #1 PREEMPT

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:53:30 +01:00
Valentine Burley
60a8785bb8 ci: Strip qemu from rootfs
Cuttlefish install qemu as a dependency, but we don't use it.
Remove it to save space.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:53:30 +01:00
Valentine Burley
0478046036 venus/ci: Remove hanging timeout override for ADL and TGL jobs
New deqp-runner version prints messages more frequently.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39993>
2026-03-10 09:53:29 +01:00
Georg Lehmann
452025f75e nir: add free bits in nir_io_semantics for future use
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:22 +00:00
Georg Lehmann
a25f00eaed nir: merge xfb and xfb2 into one 64bit intrinsic index
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:22 +00:00
Georg Lehmann
4ba581887e nir: support intrinsic indicies larger than 32 bits
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:21 +00:00
Georg Lehmann
abfd6a4df9 nir: don't assume indicies are always 32bit when accessing them as raw data
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:20 +00:00
Georg Lehmann
aa831b6690 nir/opt_algebraic: skip more redundant alignment iand
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Useful for smaller/larger loads. Also there is no reason to be bitsize
specific here if we use an signed constant.

Foz-DB Navi48:
Totals from 8 (0.01% of 114655) affected shaders:
Instrs: 7629 -> 7612 (-0.22%)
CodeSize: 40772 -> 40692 (-0.20%)
Latency: 54880 -> 54944 (+0.12%)
InvThroughput: 8879 -> 8880 (+0.01%); split: -0.08%, +0.09%
VALU: 4029 -> 4027 (-0.05%); split: -0.15%, +0.10%
SALU: 1260 -> 1249 (-0.87%)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40292>
2026-03-10 06:57:50 +00:00
Tapani Pälli
8fb5614ba0 intel/dev: implement urb handle limits for Wa_16025326720
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40300>
2026-03-10 05:44:15 +00:00
Timothy Arceri
bd42f62b0f glx: guard glx_screen frontend_screen member
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Guards workaround code with the same conditions as glx_screen`s
frontend_screen member.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Fixes: 67eeee43e0 ("driconf: add a way to override GLX_CONTEXT_RESET_ISOLATION_BIT_ARB")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15021
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40290>
2026-03-10 01:53:15 +00:00
Paulo Zanoni
85751506ab elk: don't use instr->const_index[] directly
From what I understand, use of const_index[] by the driver is
dangerous and should be avoided, as commits such as a6330ed4d0
("nir: add ACCESS to load_uniforms") may result in the indexes
changing, breaking the driver. Switch to using the parameter names in
order to make the code more future-proof.

For elk_fs_nir.cpp and elk_vec4_tes.cpp we can verify in the generated
nir_intrinsics.c that the wanted value is actually
nir_intrinsic_base().

For elk_nir.c, according to Caio Oliveira:

  "The code is checking for certain load/store via the is_input() and
   is_output() checks a few lines above. I've checked all them have
   BASE at 0."

Thanks to Ian Romanick for his guidance regarding this patch.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39438>
2026-03-10 01:03:42 +00:00
Karol Herbst
bd552b41cc nvk: skip lowering load_global_constant_bounded on turing inside lower_load_intrinsic
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:05 +00:00
Karol Herbst
f7ad45e5fc nak: support has_load_global_bounded on turing and newer
Totals:
CodeSize: 9401446416 -> 8663482432 (-7.85%); split: -7.85%, +0.00%
Number of GPRs: 47297665 -> 47508294 (+0.45%); split: -0.14%, +0.59%
SLM Size: 1202912 -> 1203000 (+0.01%); split: -0.09%, +0.10%
Static cycle count: 5984801035 -> 4714013561 (-21.23%); split: -21.24%, +0.00%
Spills to memory: 44482 -> 45073 (+1.33%); split: -1.68%, +3.01%
Fills from memory: 44482 -> 45073 (+1.33%); split: -1.68%, +3.01%
Spills to reg: 184822 -> 149129 (-19.31%); split: -21.54%, +2.23%
Fills from reg: 223885 -> 170692 (-23.76%); split: -25.49%, +1.73%
Max warps/SM: 50642520 -> 50564740 (-0.15%); split: +0.03%, -0.19%

Totals from 185510 (15.95% of 1163204) affected shaders:
CodeSize: 3910084048 -> 3172120064 (-18.87%); split: -18.88%, +0.01%
Number of GPRs: 10625243 -> 10835872 (+1.98%); split: -0.63%, +2.61%
SLM Size: 659568 -> 659656 (+0.01%); split: -0.17%, +0.19%
Static cycle count: 3920553863 -> 2649766389 (-32.41%); split: -32.42%, +0.01%
Spills to memory: 8498 -> 9089 (+6.95%); split: -8.81%, +15.77%
Fills from memory: 8498 -> 9089 (+6.95%); split: -8.81%, +15.77%
Spills to reg: 109049 -> 73356 (-32.73%); split: -36.51%, +3.77%
Fills from reg: 116031 -> 62838 (-45.84%); split: -49.18%, +3.34%
Max warps/SM: 6885584 -> 6807804 (-1.13%); split: +0.25%, -1.38%

This also helps significantly reduce shader compile times since it reduces
the number of basic blocks.  With DragonAge: The Veilguard, it reduces
shader compile times by around 20%.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:05 +00:00
Karol Herbst
7722bde53b nak: use ldg input predicate in nak_nir_lower_non_uniform_ldcx
Totals:
CodeSize: 9442133184 -> 9401446416 (-0.43%); split: -0.43%, +0.00%
Number of GPRs: 47300490 -> 47297665 (-0.01%); split: -0.01%, +0.00%
Static cycle count: 6120907718 -> 5984801035 (-2.22%); split: -2.22%, +0.00%
Spills to reg: 184810 -> 184822 (+0.01%); split: -0.01%, +0.02%
Fills from reg: 223860 -> 223885 (+0.01%); split: -0.01%, +0.02%
Max warps/SM: 50641540 -> 50642520 (+0.00%); split: +0.00%, -0.00%

Totals from 12079 (1.04% of 1163204) affected shaders:
CodeSize: 461892048 -> 421205280 (-8.81%); split: -8.81%, +0.00%
Number of GPRs: 1060493 -> 1057668 (-0.27%); split: -0.43%, +0.16%
Static cycle count: 922257513 -> 786150830 (-14.76%); split: -14.76%, +0.00%
Spills to reg: 14704 -> 14716 (+0.08%); split: -0.14%, +0.22%
Fills from reg: 24213 -> 24238 (+0.10%); split: -0.08%, +0.19%
Max warps/SM: 320540 -> 321520 (+0.31%); split: +0.39%, -0.08%

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:05 +00:00
Karol Herbst
9d90cbc314 nak: add input predicate to load_global_nv and OpLd
This is new in SM75 (Turing). Let's use it because it allows us to get rid
of the if/else around bound checked global loads.

There are some changes in fossils, but it seems that's mostly due to CFG
optimizations doing things a bit differently?

Totals:
CodeSize: 9442152688 -> 9442133184 (-0.00%); split: -0.00%, +0.00%
Static cycle count: 6120910991 -> 6120907718 (-0.00%); split: -0.00%, +0.00%
Spills to reg: 184789 -> 184810 (+0.01%)
Fills from reg: 223831 -> 223860 (+0.01%); split: -0.00%, +0.01%

Totals from 334 (0.03% of 1163204) affected shaders:
CodeSize: 22020752 -> 22001248 (-0.09%); split: -0.10%, +0.01%
Static cycle count: 26582978 -> 26579705 (-0.01%); split: -0.01%, +0.00%
Spills to reg: 3110 -> 3131 (+0.68%)
Fills from reg: 3401 -> 3430 (+0.85%); split: -0.03%, +0.88%

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:05 +00:00
Karol Herbst
d2bf824baf nak: replace legalize_ext_instr with explicit lowering
legalize_ext_instr wasn't doing anything besides lowering uniform sources
and panicing on a bunch of Source types.

Having a common helper looping over all sources doesn't make much sense,
because all the instructions are widly different in regards to UGPRs. The
panics will be hit while emitting the sources as well, so this helper
provided little help and wasn't flexible enough for what we need.

Furthermore some instructions like LDG also take an additional input
predicate that legalize_ext_instr can't handle.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:04 +00:00
Karol Herbst
95f19bd5eb nak: invalidate loop analysis with nak_nir_lower_load_store
We'll start to lower load_global_bounded there and that will invalidate
loop analysis, because the amount of instructions will change within a
block.

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:04 +00:00