Commit graph

224098 commits

Author SHA1 Message Date
José Roberto de Souza
dbf64e9ad5 anv: Use anv_device_get_general_state_pool()
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42133>
2026-06-10 22:49:10 +00:00
José Roberto de Souza
9d06679d89 anv: Add function to get each anv_state_pool
Xe3P will allow us to reduce the number of anv_state_pool in use, this will
improve performance as it will result in less uAPI calls to allocate memory
and less memory waste in anv_state_pool with not much use.

As this will be a run-time decision, here I'm adding a function to get each
anv_state_pool, then we can just change the function and all the callers will
use the correct anv_state_pool.

Next patches will replace directly access to each anv_state_pool by
a function call in the next patches.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42133>
2026-06-10 22:49:10 +00:00
Julien Schueller
fd616bab71 glx: avoid crash on glXBindTexImageEXT when no texture target set
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
If a GLXPixmap is created without GLX_TEXTURE_TARGET_EXT,
textureTarget remains 0. Calling glXBindTexImageEXT on such a
drawable would pass 0 to _mesa_get_current_tex_object(), triggering
an internal implementation error and a null-pointer segfault.

Return early when textureTarget is 0 - the drawable was never set
up for texturing, so bind is a no-op.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Assisted-by: DeepSeek V4 Flash
Closes: #58
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42093>
2026-06-10 20:57:35 +00:00
Benjamin Cheng
69c7f6d456 radv/video: Use {min,max}_qp caps from ac
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42136>
2026-06-10 20:34:33 +00:00
Benjamin Cheng
880fbcbeee ac/video: Add {min,max}_qp to video enc caps
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42136>
2026-06-10 20:34:33 +00:00
Benjamin Cheng
c2e76e111d radv/video: Report MULTIPLE_SLICE_SEGMENTS_PER_TILE_BIT
VCN supports one tile only, but with multiple slice segments.

Cc: mesa-stable
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42136>
2026-06-10 20:34:33 +00:00
Benjamin Cheng
b8b8035c6b radv/video: Set accurate minQp/QIndex
The spec requires us to follow the constantQp/base_q_idx from the app,
which is constrained by the caps. Report the more accurate caps.

Cc: mesa-stable
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42136>
2026-06-10 20:34:33 +00:00
Job Noorman
11334c438a ir3: fix possible signed overflow in ir3_link_add
`1 << 31` is undefined since `1` is a signed integer.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 1f9839907a ("ir3: Skip missing VS outputs in VS out map when linking")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42147>
2026-06-10 20:07:01 +00:00
Christian Gmeiner
b83f446642 panvk: Advertise VK_KHR_shader_fma
vtn lowers OpFmaKHR to nir_op_ffma and every Mali has a native fused
multiply-add, so there is nothing to do in the backend.

fp16 is gated on shaderFloat16. A 16-bit OpFmaKHR also needs the Float16
capability and only shaderFloat16 turns that on, so without it the bit
would not be usable. Mali has no fp64, so that one stays off.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42075>
2026-06-10 19:42:49 +00:00
Danylo Piliaiev
67471fed86 tu: Enable texel buffer / SSBO emulation for known problematic games
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Job Noorman
9b32234726 tu: Add option to raise the maximum SSBO size
Emulates SSBOS via global memory, real SSBO size and global base address
are stored in the descriptor. The size can be accessed using resbase,
the base address is parsed manualy from the descriptor by passing the
bindless base address into the shader via a driver UBO or const file.

nir_lower_ssbo is used to lower SSBO accesses to global memory when the
buffer size exceeds the limit. We also use it to insert bounds checks on
global memory. The final code for SSBO accesses looks like this:

if (@get_ssbo_size >= max_storage_buffer_range_bytes) {
    if (offset < @get_ssbo_size) {
        // global memory access using base (from resbase) + offset
    } else {
        // do nothing (stores) or return 0 (loads)
    }
} else {
    // original SSBO access
}

A new pass is added to lower @load_ssbo_address generated by
nir_lower_ssbo. We set native_offset=true for nir_lower_ssbo to make
sure it doesn't generate 64 bit address math. The new pass then
transforms @load/store_global into @load/store_global_ir3 passing the 32
bit offset from @load_ssbo_address.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Danylo Piliaiev
dc1bb7bbf4 tu: Add option to raise the maximum texel buffer size
Emulates texel buffers via 3D image access, real texel buffer
size and start offset (due to image aligment requirements) are
stored in the descriptor and accessed via resbase.

- Read-only access: isam.a.1d to read as 3d image.
- RW access: stib.b.typed.3d/ldib.b.typed.3d to read as 3d image.

Verified that proprietary D3D12 driver uses the same workaround,
the only difference is that proprietary driver uses arrayed 2d load
for read-only access instead of 3d load, but benefits are not verified.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Danylo Piliaiev
652864e385 tu/a8xx: Set real storage/texel buffer size limits
From tests A8XX seem to fix incompatible with D3D12 limits.
However, proprietary driver exposes old texel buffer element limit.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Danylo Piliaiev
d18b637a7c tu: Specify max texel buffer and storage buffer limits via GPU props
A8XX has different storage buffer range limit.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Danylo Piliaiev
fd99d813af tu: Add allow_oob_indirect_ubo_loads to device cache uuid
Fixes: f4c40fc89c ("tu: Add workaround for D3D11 games accessing UBO out of bounds")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Danylo Piliaiev
3c36e3b7b1 ir3: Add resbase_ir3 intrinsic
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Job Noorman
2fee7ac87f nir/lower_ssbo: add option to insert bounds checks
This is mostly useful in combination with `min_ssbo_size` when the
native SSBO access instructions do the bounds check in HW so we don't
want to add bounds checks for all SSBO accesses.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Job Noorman
7b2dfdf15d nir/lower_ssbo: add option to only lower large SSBOs
Some HW may have native SSBO instructions that only support a limited
buffer size. It may be beneficial to use those instructions for small
SSBOs and only fall back to global memory accesses for large ones.

This commit adds an option (min_ssbo_size) that, if non-zero, will cause
code like this to be emitted:

if (@get_ssbo_size >= min_ssbo_size) {
    // global memory access
} else {
    // original SSBO access
}

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Job Noorman
ca9c01ddc5 nir/lower_ssbo: take offset_shift into account
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
Job Noorman
c0c1a2b0af nir/get_io_index_src_number: support @load_ssbo_address
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41477>
2026-06-10 18:15:01 +00:00
squidbus
6e5773687f kk,wsi/metal: Support VK_(KHR/EXT)_swapchain_maintenance1
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Primary additions are support for releasing images and changing
present mode in the Metal WSI backend.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42062>
2026-06-10 17:41:01 +00:00
squidbus
5882459c45 kk,wsi/metal: Support VK_EXT_hdr_metadata
HDR metadata is packed and passed through as the `CAMetalLayer` `EDRMetadata`.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42045>
2026-06-10 17:07:45 +00:00
squidbus
621b816aeb wsi/metal: Support HDR10 color spaces
HDR color spaces also should enable `wantsExtendedDynamicRangeContent`.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42045>
2026-06-10 17:07:45 +00:00
Michel Dänzer
178a3d7396 egl/gbm: Eliminate local variable "max_age" in get_back_bo
Use dri2_surf->back->age instead.

No functional change intended.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
2026-06-10 16:21:59 +00:00
Michel Dänzer
a668971bb5 egl/gbm: Use continue instead of nested block
Suggested by Daniel Stone.

No functional change intended.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
2026-06-10 16:21:59 +00:00
Michel Dänzer
aa3ef4dd42 egl/gbm: Eliminate local variable "age" in get_back_bo
Use buffer->age instead.

No functional change intended.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
2026-06-10 16:21:59 +00:00
Michel Dänzer
5fdaab9ef8 egl/gbm: Use local variable for better readability in get_back_bo
No functional change intended.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
2026-06-10 16:21:58 +00:00
Michel Dänzer
ab8e57cf31 egl/gbm: Ignore current front buffer in get_back_bo
Even if the front buffer isn't locked yet, it will normally get locked,
so we can't reuse it as a back buffer.

Pointed out by Daniel Stone.

Fixes: 4a976b60b1 ("egl_dri2: use gbm_surface as the native window type in drm platform")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
2026-06-10 16:21:58 +00:00
Michel Dänzer
962fd789c8 egl/gbm: Ignore buffers with no BO for destroying excess BOs
Can't destroy a BO that doesn't exist in the first place.

Should fix the crash described in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41845#note_3510521 .

Fixes: dd7ae41091 ("egl/gbm: Destroy excess BOs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42111>
2026-06-10 16:21:58 +00:00
Jose Maria Casanova Crespo
519f631e6b v3dv: gate Dawn-required limits and features behind V3D_WEBGPU_OVERRIDE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Exposes higher limits than the ones supported by the HW and several
ArrayDynamicIndexing features not yet implemented so the Dawn WebGPU
implementation can be used while it doesn't exercise these limits or
features.

The override is enabled using the V3D_WEBGPU_OVERRIDE=1 envvar. When
it is enabled it:

- Increases the framebuffer dimension limit from the real HW value
  (4096 on RPi4, 7680 on RPi5) to 8192.
- Bumps the advertised maxMipLevels reported per format from 13 to
  14 to match the bumped 8192-wide images and 15 for non-2D images.
  The TMU HW already supports that for sampling.
- Increases max_varying_components from 64 to 72 (HW limit is 64).
- Exposes features that are not actually implemented; CTS tests that
  exercise them will hit asserts in debug builds:
  - shaderUniformBufferArrayDynamicIndexing
  - shaderSampledImageArrayDynamicIndexing
  - shaderStorageBufferArrayDynamicIndexing
  - shaderStorageImageArrayDynamicIndexing
- Increases maxImageDimension1D   from 4096 to 16384
- Increases maxImageDimension2D   from 4096 to  8192
- Increases maxImageDimension3D   from 4096 to 16384
- Increases maxImageDimensionCube from 4096 to 16384

When V3D_WEBGPU_OVERRIDE is unset (the default), the driver
advertises the real HW limits already set up by the preceding
"use real HW framebuffer limit" change, so Vulkan CTS conformance
is unaffected.

To help diagnose applications that hit the over-advertised paths,
mesa_loge errors are emitted from three places:

- lower_vulkan_resource_index() warns before the existing UNREACHABLE
  for dynamic descriptor indexing, so the cause is visible in release
  builds where the assertion is compiled out.
- create_image() warns when vkCreateImage is called with attachment
  usage and dimensions above the real HW framebuffer limit. Storage
  and sampled-only images above that limit work fine via the TMU.
- job_compute_frame_tiling() erros when a render job width/height
  exceeds the real HW framebuffer limit.

The per-plane slices[] array in struct v3dv_image is sized at
V3D_MAX_MIP_LEVELS + 2 so the override case (which advertises 14/15
mip levels for the bumped 8192-wide 2D images and 16384 for 1D/3D images)
still fits without enlarging the default array.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo
0ae28c9056 broadcom: raise framebuffer size to 7680 on V3D 7.1
Add a new max_framebuffer_size to devinfo so V3D 4.2 and V3D 7.1 can
expose different framebuffer dimensions: 4096 on RPi4 and 7680 on RPi5.
This is bounded by the maximum clip size supported by the framebuffer.

Take advantage of this to also raise maxImageDimensions* to
max_framebuffer_size.

A non-power-of-two framebuffer means framebuffer_size_for_pixel_count can
compute a height larger than max_framebuffer_size. Clamp the height to the
maximum and recompute the width from the division so w * h <= num_pixels.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo
5242d4c171 broadcom: add and use max_render_targets to devinfo
Use the new devinfo value instead the V3D_MAX_RENDER_TARGETS
macro.

We only maintain the usage of the macro in devinfo initialization
and the V3D in the versioned file src/gallium/drivers/v3d/v3dx_state.c

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
2026-06-10 15:22:27 +00:00
Samuel Pitoiset
f21a95f890 radv/ci: skip all WSI tests also on NAVI21/NAVI31
To make sure pre-merge jobs don't hit the random issue, it would still
be tested by nightly jobs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42146>
2026-06-10 14:58:31 +00:00
Jose Maria Casanova Crespo
94abf86561 v3dv: set non-zero array stride in null texture descriptor state
pack_null_texture_state(), introduced to support VK_KHR_robustness2
nullDescriptor for image bindings, left the TEXTURE_SHADER_STATE
"Array Stride (64-byte aligned)" field at 0.

On real V3D HW it is fine: a TMU read against a null descriptor
returns zero regardless of the descriptor contents, but V3D simulator
validates the TMU array stride before issuing the read.

Setting array_stride_64_byte_aligned = 1 (64 bytes raw) fixes failing
dEQP-VK.robustness.robustness2.bind.*.null_descriptor.samples_1.3d.*
tests case under the simulator.

Fixes: 990d76eae6 ("v3dv: Implement and enable nullDescriptor support")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42112>
2026-06-10 14:38:50 +00:00
Eric Engestrom
127acbb126 ci: bump fedora from 42 to 44
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41620>
2026-06-10 13:53:26 +00:00
Eric Engestrom
4ebf2e3baa ci: bump bindgen version from 0.71.1 to 0.72.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41620>
2026-06-10 13:53:26 +00:00
Eric Engestrom
dae8bc711d ci: bump rust version from 1.90 to 1.96
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41620>
2026-06-10 13:53:25 +00:00
Eric Engestrom
47570e74ec meson: exclude known buggy versions of bindgen
Backport-to: *
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41620>
2026-06-10 13:53:25 +00:00
Eric Engestrom
09ea05cf23 rusticl: skip bindgen for pipe_shader_state_from_tgsi
bindgen up to at least 0.72.1 generates invalid code (see below) and
that function is not used, so simply skip it.

    src/gallium/frontends/rusticl/rusticl_mesa_bindings.c:795:81: error: duplicate ‘const’ declaration specifier [-Werror=duplicate-decl-specifier]
      795 | void pipe_shader_state_from_tgsi__extern(struct pipe_shader_state *state, const const struct tgsi_token *tokens) { pipe_shader_state_from_tgsi(state, tokens); }
          |                                                                                 ^~~~~

Backport-to: *
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41620>
2026-06-10 13:53:25 +00:00
yserrr
38a98a4803 v3d: remove duplicate util_blitter_save_so_targets() call
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
v3d_blitter_save() saves the stream output targets twice.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42139>
2026-06-10 13:36:26 +00:00
Rhys Perry
addc719ec2 radv: workaround has_smem_partial_oob_access_bug
Just use an existing flag to increase the bo size slightly.

Fixes a ring gfx timeout with
dEQP-VK.spirv_assembly.instruction.compute.opfma.fp32.vec3.undef.denorm_flush.directed
on vega10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: *
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41937>
2026-06-10 13:01:47 +00:00
Rhys Perry
f7a3884278 ac/gpu_info: add has_smem_partial_oob_access_bug
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: *
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41937>
2026-06-10 13:01:46 +00:00
Rhys Perry
bed7ba2780 aco: schedule split barriers
Move the s_barrier_signal as earlier and the s_barrier_wait later.

fossil-db (gfx1201):
Totals from 2152 (1.03% of 208640) affected shaders:
Instrs: 1463236 -> 1463248 (+0.00%); split: -0.00%, +0.01%
CodeSize: 7710732 -> 7710720 (-0.00%); split: -0.00%, +0.00%
Latency: 7164883 -> 7159042 (-0.08%); split: -0.10%, +0.01%
InvThroughput: 1593643 -> 1593651 (+0.00%); split: -0.00%, +0.00%
VClause: 30170 -> 30166 (-0.01%)
SClause: 26771 -> 26772 (+0.00%)
Copies: 123002 -> 123004 (+0.00%)
SALU: 221966 -> 221967 (+0.00%)
VOPD: 1680 -> 1681 (+0.06%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:19 +00:00
Rhys Perry
26b942c306 aco: use split barrier instructions
fossil-db (gfx1201):
Totals from 135 (0.06% of 208640) affected shaders:
Instrs: 155940 -> 155932 (-0.01%); split: -0.02%, +0.02%
CodeSize: 905460 -> 905432 (-0.00%); split: -0.02%, +0.01%
Latency: 1910087 -> 1909703 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 886321 -> 886280 (-0.00%)
Copies: 12025 -> 12024 (-0.01%)
VALU: 89681 -> 89679 (-0.00%)
VOPD: 177 -> 178 (+0.56%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:19 +00:00
Rhys Perry
a95f841125 aco: add split barrier instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:19 +00:00
Rhys Perry
7c6be36cf4 aco: don't emit waitcnts before subgroup-scope execution barriers
This delays the waitcnt for has_attr_ring_wait_bug by a few instructions.

fossil-db (gfx1201):
Totals from 9 (0.00% of 208640) affected shaders:
Instrs: 19352 -> 19506 (+0.80%)
CodeSize: 101180 -> 101716 (+0.53%)
Latency: 660221 -> 678782 (+2.81%); split: -0.00%, +2.81%
InvThroughput: 95106 -> 97398 (+2.41%)

fossil-db (navi33):
Totals from 58834 (28.20% of 208626) affected shaders:
Instrs: 22424304 -> 22424571 (+0.00%)
CodeSize: 110198112 -> 110199184 (+0.00%)
Latency: 115894319 -> 126491124 (+9.14%); split: -0.00%, +9.14%
InvThroughput: 19424631 -> 19754358 (+1.70%); split: -0.00%, +1.70%

I don't think the stats are very accurate. This seems to often move the
s_waitcnt down into a divergent branch, but the wait still happens later
if the branch isn't taken, so the wait is counted twice.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:18 +00:00
Rhys Perry
3676c3860e aco: only assume load/store with semantic_atomic is atomic
ACCESS_ATOMIC was added a while ago.

fossil-db (gfx1201):
Totals from 84 (0.04% of 208640) affected shaders:
Instrs: 74569 -> 74402 (-0.22%)
CodeSize: 379220 -> 378552 (-0.18%)
Latency: 589791 -> 575984 (-2.34%)
InvThroughput: 56042 -> 54921 (-2.00%)

fossil-db (navi33):
Totals from 79 (0.04% of 208626) affected shaders:
Instrs: 69170 -> 69015 (-0.22%)
CodeSize: 349580 -> 348928 (-0.19%)
Latency: 563270 -> 549156 (-2.51%)
InvThroughput: 61245 -> 59887 (-2.22%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:18 +00:00
Rhys Perry
a0d5c117fc aco: optimize redundant s_wait_alu vm_vsrc(0) during waitcnt insertion
fossil-db (gfx1201):
Totals from 143 (0.07% of 208640) affected shaders:
Instrs: 104804 -> 104588 (-0.21%)
CodeSize: 543148 -> 542320 (-0.15%)
Latency: 751702 -> 751446 (-0.03%); split: -0.04%, +0.00%
InvThroughput: 78599 -> 78588 (-0.01%); split: -0.02%, +0.00%

fossil-db (navi33):
Totals from 170 (0.08% of 208626) affected shaders:
Instrs: 107230 -> 106983 (-0.23%)
CodeSize: 554952 -> 553940 (-0.18%)
Latency: 746901 -> 746628 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 102412 -> 102390 (-0.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:18 +00:00
Rhys Perry
650715b077 aco: fix printing of primitive exports
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:17 +00:00
Rhys Perry
c815c51dcb aco/waitcnt: always use uint32_t for event masks
This shouldn't fix anything, because event_vmem_bvh was never used here.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41364>
2026-06-10 12:13:17 +00:00