Compare commits

...

334 commits

Author SHA1 Message Date
Eric Engestrom
efd5b779da VERSION: bump for 26.0.2 2026-03-12 12:56:33 +01:00
Eric Engestrom
3646899ffd docs: add release notes for 26.0.2 2026-03-12 12:56:33 +01:00
Mike Blumenkrantz
5cf88188bd egl/device: fix the fix for explicit sw rejection in non-sw EGL_PLATFORM=device
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
"explicit sw" means llvmpipe, which cannot be a real drm device. this requires also
returning only a single device so as to avoid leaking non-sw drivers

should fix LIBGL_ALWAYS_SOFTWARE=1 eglinfo

Fixes: 8a339cdebc ("egl: fix sw fallback rejection in non-sw EGL_PLATFORM=device")
(cherry picked from commit c9b2986607)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Job Noorman
1f89a0fb96 ir3: don't predicate vote_all/vote_any
These get lowered to control flow which isn't allowed inside predicated
blocks.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 39088571f0 ("ir3: add support for predication")
(cherry picked from commit 5e4a7d01fe)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Job Noorman
d78e309e4d ir3: update context builder after ir3_get_predicate
If we are currently inserting instructions after the src of the
predicate conversion, uses of the predicate will be inserted before its
def (the conversion). Fix this by updating the context builder to point
to after the conversion.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fda91b49d7 ("ir3: refactor builders to use ir3_builder API")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15043
(cherry picked from commit f88e8b778d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Samuel Pitoiset
0e31cb83ce radv: fix missing L2 cache invalidation with streamout on GFX12
COPY_DATA emitted from the CP isn't coherent with L2, in case the
buffer filled size needs to be copied.

This fixes rare and random flickering with Mafia 3 Definitive Edition
on RDNA4.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14697
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit d9420eed9e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Sagar Ghuge
ada713b32f anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
WA states that we need to allocate maximum number of stackIDs per DSS
from RT_DISPATCH_GLOBALS to 2048.

We can still throttle/control the CFE_STATE::StackID to be in range
specified by the field.

This does impact performance having CFE_STATE::stackIDs capped to 2K
by default. More the outstanding ray queries, larger the working set and
have more impact on cache hit rate.

This affect performance on Xe2+ onwards:
* Boundary Benchmark:            36.2%
* Solar Bay extreme:             9.8%
* Hitman world of assassination: 3.9%

Fixes: c1a44e8d43 ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit cb423ee636)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Tapani Pälli
446fab4a4a anv: add handling for Wa_14026600921
This is the Xe3 version of the earlier workaround.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 840e6e855b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Tapani Pälli
77add2d8f2 intel/dev: update mesa_defs.json from workaround database
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit c75309c8f1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Faith Ekstrand
7054ea6d45 pan/bi: Be more careful about bit sizes in b2f lowering
Fixes: 21bdee7bcc ("pan/bi: Switch to lower_bool_to_bitsize")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 08c437f644)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Faith Ekstrand
9c2b19219a nir/lower_bool_to_bitsize: Make all bN_csel sources match
Previously, we assumed that the selector for bcsel could be whatever,
regardless of the bit sizes of the data and we'd just fix it in the
back-end.  This works okay for scalars but falls over the moment we
vectorize because all our vector handling assumes bit sizes match.
Since matching bit sizes is what the hardware wants anyway, it's better
to do the right thing in NIR and hope copy-propagation can fold in
conversions if needed.

Unfortunately, copy prop isn't that smart yet so this does hurt a bit:

    Instrs: 1193679 -> 1198086 (+0.37%); split: -0.06%, +0.43%
    CodeSize: 11915136 -> 11950592 (+0.30%); split: -0.05%, +0.34%
    Full: 160985 -> 160941 (-0.03%); split: -0.04%, +0.01%
    Estimated normalized CVT cycles: 4456.938557000181 -> 4480.876069000186 (+0.54%); split: -0.13%, +0.67%
    Estimated normalized SFU cycles: 6350.9375 -> 6392.21875 (+0.65%)
    Estimated normalized Load/Store cycles: 205773.0 -> 205795.0 (+0.01%)
    Maximum number of threads: 12864 -> 12863 (-0.01%)
    Number of spill instructions: 22487 -> 22489 (+0.01%)
    Number of fill instructions: 52179 -> 52219 (+0.08%)

Hurt shaders:

    google-meet-clvk/BgBlur
    google-meet-clvk/Relight
    parallel-rdp/small_subgroup
    parallel-rdp/small_uber_subgroup

The proper solution here is to teach copy-prop about this stuff so that
it can propagate swizzles into ALU ops when they're supported:
https://gitlab.freedesktop.org/panfrost/mesa/-/issues/265

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14945
Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 3fd471dca5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Faith Ekstrand
740734ac72 etnaviv: Call lower_bool_to_int32 not to_bitsize
It calls both for some reason but never handles any other booleans than
32-bit.  This was probably a mistake.

Fixes: e63a7882a0 ("etnaviv: call nir_lower_bool_to_bitsize")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 6fb3995659)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:12 +01:00
Mary Guillemard
f550eb1903 vulkan: Do not override the shader_flags in case of no task shader
This should be doing a or and not an assign.
This fixes issues on NVK with mesh stages on DGC.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 8f2eeee7ba)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Antonino Maniscalco
7b2af9e15a zink: don't care about generated gs output primitive
Zink uses the output primitive of the last vertex stage when deciding
the raster primitive. When we generate the gs the output primitive
depends on the raster primitive.

Not only does the generated gs output primitive have no value in chosing
the raster primitive, it can also get us stuck with the last raster
primitve which is of course incorrect.

Ignore it for generated shaders.

Cc: mesa-stable
(cherry picked from commit d526bbc29b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Timothy Arceri
b5304ffef7 glx: guard glx_screen frontend_screen member
Guards workaround code with the same conditions as glx_screen`s
frontend_screen member.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Fixes: 67eeee43e0 ("driconf: add a way to override GLX_CONTEXT_RESET_ISOLATION_BIT_ARB")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15021
(cherry picked from commit bd42f62b0f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Iván Briano
836a22d1a2 anv: don't try to fast clear D/S with multiview
If multiview is enabled on the render pass, baseLayer and layerCount
will be 0 and 1 respectively and throw us off.
We can still fast clear if view_mask == 1, but anything else hits the
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL restriction.

Fixes: e488773b29 ("anv: Fast clear depth/stencil surface in vkCmdClearAttachments")

Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 5d22f307d5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Ian Romanick
2a2dba1bc7 elk/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT
shader-db:

Broadwell
total instructions in shared programs: 18607516 -> 18607530 (<.01%)
instructions in affected programs: 2095 -> 2109 (0.67%)
helped: 0 / HURT: 8

total cycles in shared programs: 955704436 -> 955702925 (<.01%)
cycles in affected programs: 34299 -> 32788 (-4.41%)
helped: 2 / HURT: 6

All Haswell and older platforms had similar results. (Haswell shown)
total instructions in shared programs: 16989200 -> 16989201 (<.01%)
instructions in affected programs: 461 -> 462 (0.22%)
helped: 0 / HURT: 1

total cycles in shared programs: 946537070 -> 946537035 (<.01%)
cycles in affected programs: 16378 -> 16343 (-0.21%)
helped: 1 / HURT: 0

Test: piglit!1100
Reported-by: Georg Lehmann
Fixes: ca675b73d3 ("i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 64c60582b5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Ian Romanick
829e5ccc84 brw/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT
This optimization was added in October 2013, and the error was only just
now discovered. Removing the SEL.G.SAT optimization affected zero
shader-db shaders, and it affected 9 fossil-db shaders for instruction
size only.

I haven't checked to see if any of the hurt shaders are helped by
!39987.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17093041 -> 17093055 (<.01%)
instructions in affected programs: 2072 -> 2086 (0.68%)
helped: 0 / HURT: 8

total cycles in shared programs: 876739578 -> 876739154 (<.01%)
cycles in affected programs: 18946 -> 18522 (-2.24%)
helped: 2 / HURT: 6

fossil-db:

Lunar Lake
Totals:
Instrs: 906230557 -> 906240487 (+0.00%); split: -0.00%, +0.00%
CodeSize: 14498856128 -> 14499003168 (+0.00%); split: -0.00%, +0.00%
Send messages: 40667184 -> 40667205 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104068494103 -> 104068561943 (+0.00%); split: -0.00%, +0.00%
Max live registers: 189570192 -> 189570204 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 48157648 -> 48157552 (-0.00%)
Non SSA regs after NIR: 139823587 -> 139823016 (-0.00%); split: -0.00%, +0.00%

Totals from 9172 (0.46% of 1985212) affected shaders:
Instrs: 10774709 -> 10784639 (+0.09%); split: -0.00%, +0.09%
CodeSize: 177868384 -> 178015424 (+0.08%); split: -0.08%, +0.17%
Send messages: 311154 -> 311175 (+0.01%); split: -0.00%, +0.01%
Cycle count: 232471392 -> 232539232 (+0.03%); split: -0.15%, +0.18%
Max live registers: 1243549 -> 1243561 (+0.00%); split: -0.00%, +0.01%
Max dispatch width: 196672 -> 196576 (-0.05%)
Non SSA regs after NIR: 509663 -> 509092 (-0.11%); split: -0.19%, +0.08%

Test: piglit!1100
Reported-by: Georg Lehmann
Fixes: ca675b73d3 ("i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 6c6c6ce054)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Eric R. Smith
63a6e0ffc9 pco: fix a typo in the check for optimization looping
The count isn't incremented anywhere else.

Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Fixes: f1b24267d2 ("pco: rework nir processing and passes")
(cherry picked from commit 8521051cfa)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Pavel Ondračka
eea697b179 r300: disable clip-discard watermark for triangles
Commit 0d4aa5f55f introduced the watermark to optimize the guardband
state changes and always computed new_distance as MAX2(distance,
watermark).

That is correct for point/line paths where distance > 0, but it keeps a
non-zero discard distance alive when the next draw sets distance = 0
(triangles). This leaks wide point/line clip-discard state into later
triangle draws and can clip away large parts of geometry (as observed in
Sauerbraten). Only apply the watermark when distance > 0 and reset it to
zero otherwise so triangle draws disable clip-discard as intended.

Fixes: 0d4aa5f55f ("r300: pop-free clipping")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14959
(cherry picked from commit ce33f82f83)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Samuel Pitoiset
ecb7bf7b68 radv: fix local invocation index for mesh/task and quad derivatives on GFX12
It must be lowered.

This fixes
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.{mesh,task}.*.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3c4cb16159)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Samuel Pitoiset
f858d2238e radv: fix a GPU hang with PS epilogs and secondary command buffers
If the secondary changes the fragment output state and if the same
PS epilog used before ExecuteCommands() is re-bind immediately after
that call, the PS epilog state wouldn't be re-emitted.

Apply the same change for VS prologs, although the logic is slightly
different and the bug shouldn't occur. The whole logic of secondaries
should be completely rewritten because it's definitely not robust.

This fixes a GPU hang in Where Winds Meet, see
https://github.com/doitsujin/dxvk/issues/5436.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1a00587c44)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Yiwei Zhang
2b6e7f0be2 lvp: avoid advertising dmabuf support for kms_swrast
Lavapipe relies on true udmabuf support for dmabuf export allocation.
This changes aligns the behavior with both llvmpipe_allocate_memory_fd
and llvmpipe_import_memory_fd.

Fixes: 7d0a631f20 ("llvmpipe: export dmabuf caps for kms_swrast")
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
(cherry picked from commit 5ab8c8a439)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Mel Henning
60e29a07c0 driconf: force_vk_vendor on No Man's Sky + NVK
Cc: mesa-stable
Reviewed-by: Mary Guillemard <mary@mary.zone>
(cherry picked from commit bfde63e4d8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Georg Lehmann
8f6c3dcc90 nir/opt_algebraic: fix frsq clamp pattern
This is not NaN correct.
And also make the pattern 32bit only because the constant is hard coded
FLT_MAX.

Fixes: 780b5c1037 ("nir/algebraic: Simplify some Inf and NaN avoidance code")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit ab773fc5d4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Danylo Piliaiev
4a4a86390b tu: Don't read .patch_input_gmem of unused attachment
There was duplicated code to set unscaled_input_fragcoord and a read
from VK_ATTACHMENT_UNUSED attachment, which incorrectly updated
builder->unscaled_input_fragcoord.

ubsan:
 tu_pipeline.cc:4734:44: runtime error: load of value 127, which is not a valid value for type 'bool'

Seen in:
 dEQP-VK.renderpasses.renderpass1.custom_resolve.monolithic.stencil_only_s8

Fixes: 97da0a7734 ("tu: Rewrite to use common Vulkan dynamic state")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit 81a76be861)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Danylo Piliaiev
ace5f6c88d tu: Store gmem attachments after custom resolve in dyn RP
For dynamic renderpass we created a fake second subpass,
which would is used by CmdBeginCustomResolveEXT, however
CmdBeginCustomResolveEXT doesn't trigger tile stores, but
attachments didn't know they should be stored after fake
custom resolve subpass.

Fixes: 520e3f3a47 ("tu: Implement VK_EXT_custom_resolve")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit 67c54c4465)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Caio Oliveira
8355670805 nir: Fix constant folding for iadd_sat
Use INT_MIN instead of INT_MAX for underflow.

Fixes: cc4b50b023 ("nir/opcodes: use u_overflow to fix incorrect checks")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
(cherry picked from commit da57fbfb07)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:11 +01:00
Connor Abbott
725626858d tu: Fix setting will_be_resolved with MSRTSS
We were setting it on the user's attachments, which become
resolve/unresolve attachments, but it should be set on the color
and depth/stencil attachments.

Cc: mesa-stable
(cherry picked from commit d0be4ab2ab)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Connor Abbott
9a361c3801 tu: Set polygon mode when blitting
Noticed by inspection.

Cc: mesa-stable
(cherry picked from commit 1d167ffe77)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Yiwei Zhang
b88c8f37e4 pan: fix to not clear out of bitset range
Fixes: 617f0562bb ("pan: Use bitset instead of bool array in bi_find_loop_blocks")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit ec24d1afb6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Lucas Fryzek
d7ee1e68df vulkan/wsi: Check that xshm can be attached
Cc: mesa-stable
Co-authored-by: Carlos Lopez <clopez@igalia.com>
(cherry picked from commit 4933e60bc2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Lucas Fryzek
5f4eccf1fb glx: Check that xshm can be attached
Cc: mesa-stable
Co-authored-by: Carlos Lopez <clopez@igalia.com>
(cherry picked from commit a67af81944)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Lucas Fryzek
2c4c7fbfa9 egl/dri: Check that xshm can be attached
Cc: mesa-stable
Co-authored-by: Carlos Lopez <clopez@igalia.com>
(cherry picked from commit 5f481dd89d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Lucas Fryzek
23b88ba221 x11: Add helper util to check for xshm support
Cc: mesa-stable
(cherry picked from commit 9e1671dea9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Lucas Fryzek
8d313e5d1c drisw: Properly mark shmid as -1 when alloc fails
Cc: mesa-stable
(cherry picked from commit b93bf19d94)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Timothy Arceri
681de5a641 st/glsl_to_nir: update state var locations earlier
We need to update the state var locations before the
st_serialize_base_nir() calls otherwise
_mesa_optimize_state_parameters() can alter params such that
variants wont be able to find the correct match when calling
_mesa_lookup_state_param_idx().

Prior to 891d46f5 this worked because after failing to match
we would end up adding additional params back in that we had
just attempted to optimise.

Fixes: a6fcc2835e ("
st/glsl_to_nir: make sure the variant has the correct locations set")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14837

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit 6c60f423b3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Timothy Arceri
0edb7039cb mesa/st: use same path for setting state ref locations
After the fix in a6fcc2835e we can now take the same path whether
allow_st_finalize_nir_twice is set or not.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b59c3ac82a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Caio Oliveira
b2a34da82f spirv: Fix spec constant to handle Select for non-native floats
There was an assumption that if the instruction had non-native float
as a source, the first source would have such type.  This doesn't
hold for Select, and the code failed in two ways

- The boolean source of Select was being converted to the non-native
  float type.

- The loop that resolves the bit-size for unsized operands would
  trip at `assert(i == 0)` because Select has more than one source.

Re-organize the code to track the types of the sources independently,
and fix both issues above.

Fixes: 90e1b12890 ("spirv: Add bfloat16 support to SpecConstantOp")
Fixes: 51d3c4c889 ("spirv: support float8 spec constant op")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 6affcb43a7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Caio Oliveira
4588b025c8 spirv: Pull constant source fixup to the existing loop
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit b0c3b20bff)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Caio Oliveira
0775d0f1b5 spirv: Refactor ALU opcode translation to take bit sizes
Only used by Convert operations, so just pass 0 from callers that
are not Convert and clarify that in the code.

Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 1c3c987d5c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Timothy Arceri
a66a9280fb glsl: add workaround for MDK2 HD
Allows a shader to compile that uses an embedded struct declaration
which are not allowed in glsl 1.20+

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14986
(cherry picked from commit f109bfc3f1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Rhys Perry
1d66a995ce nir/range_analysis: set deleted key
If (uintptr_t)&deleted_key is small enough, inserting entries into the
hash table might not work correctly.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit c0079e09ca)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Ian Romanick
0d52c7941e brw: Also check for ADDRESS file in update_for_reads
Like accumulators and ARF address registers, the virtual address
registers are not tracked in a way the defs analysis can know
about. This could actually be fixed, but that is future work.

Fixes: b110b06447 ("brw: introduce a new register type for the address register")
Suggested-by: Lionel
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8624da56ee)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:10 +01:00
Ian Romanick
815691378b brw: Use brw_reg_is_arf in update_for_reads
brw_reg::nr encodes both which ARF it is and which instance of that
ARF. In other words, nr for acc0 and acc2 have some bits that say
BRW_ARF_ACCUMULATOR and some bits that say 0 vs 2. The previous test
would only detect acc0.

Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 366410e913)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Ian Romanick
f21bc439a1 brw: Don't mark_invalid in update_for_reads for non-VGRF destination
This can occur if NULL or an accumulator is an explicit destination.
update_for_reads still needs to process the sources.

v2: Pass a brw_reg to ::mark_invalid, and do the VGRF check in that one
place.

Fixes: 0d144821f0 ("intel/brw: Add a new def analysis pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit a548466186)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Jose Maria Casanova Crespo
31ea1923de v3d: reject fast TLB blit when RT formats don't match
v3d_tlb_blit_fast includes the blit onto a pending job that writes
to the source resource. The TLB data is already unpacked according to
the job's RT format, so storing it with a different RT format performs
a channel reinterpretation rather than a raw byte copy, corrupting the
data.

So when copying from RGB10_A2UI to RG16UI with glCopyImageSubData,
the copy_image path remaps both formats to R16G16_UNORM for a raw
32-bit copy. The fast TLB blit found the pending clear job
(RGB10_A2UI, 4 channels: 10-10-10-2) and stored its TLB data as RG16UI
(2 channels: 16-16), writing the unpacked 10-bit R and G channel values
into 16-bit fields instead of preserving the raw packed bits.

Previous internal_type/bpp check was insufficient: both RGB10_A2UI
and RG16UI share internal_type=16UI and the source bpp (64) exceeds
the destination bpp (32), but their channel layouts are different.

Add a check that the job's source surface RT format matches the blit
destination RT format before allowing the fast path.

Fixes: 66de8b4b5c ("v3d: add a faster TLB blit path")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 5454221cfb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Marek Olšák
f7d391f851 ac: set the correct number of Z planes for ALLOW_EXPCLEAR
This is an old driver bug that could cause Z corruption on gfx8-11.5.

v2: handle allow_expclear differently

Cc: mesa-stable

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
(cherry picked from commit 4cfe08e583)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Karol Herbst
d29063d4f2 nir: fix nir_round_int_to_float for fp16
fp16 has quite the limited value range and with bigger integers
nir_round_int_to_float might return Inf where it shouldn't depending on
the rounding mode.

Fixes conversions half_rt[npz]_(u)?(int|long) CL CTS tests.

Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
(cherry picked from commit e1ed7de274)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Karol Herbst
3d8ff40d58 nir: fix nir_alu_type_range_contains_type_range for fp16 to int
The special value "Inf" doesn't fit into an int and therefore we have to
clamp regardless of whether all the other values would fit. And because
f2u32 and f2u64 define out-of-range conversions as UB in nir, we need to
clamp.

This change should have no effect for non saturating conversions.

Fixes "conversions long_sat_*half" CL CTS tests

Cc: mesa-stable
Suggested-by: Rob Clark <rob.clark@oss.qualcomm.com>
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 8e8fb2ebaa)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Boris Brezillon
7ee55d3a5f pan/kmod: Allow mmap() on foreign buffers
If the BO comes from a different subsystem
(args.extra_flags & DRM_PANTHOR_BO_IS_IMPORTED), we should normally
add extra DMA_BUF_IOCTL_SYNC calls around CPU accesses to ensure the
CPU mapping consistency, but this is something we never worried about
(we've always assumed exporters were exposing uncached mappings with
NOP {begin,end}_cpu_access() implementations), and it worked fine until
now.

The long term plan is to hook up DMA_BUF_IOCTL_SYNC, but this requires
more work, and we need a quick fix that can be backported easily, hence
this revert+FIXME.

Fixes: b5e47ba894 ("pan/kmod: Add new helpers to sync BO CPU mappings")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14963
Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/282
Closes: https://gitlab.freedesktop.org/wayland/weston/-/issues/1101
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 30f1d5bab9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Pierre-Eric Pelloux-Prayer
b299e0323a mesa: don't wraparound st_context::work_counter
st->release_counter is initialized to 0, so if we happen to call
st_add_releasebuf with a non-NULL releasebuf when st->work_counter
is 0 due to wraparound in st_context_add_work, we might end up never
calling st_prune_releasebufs.

Since st_context_add_work and st_add_releasebuf both use work_counter
as a "some work was done" and don't care about the actual value, we
can remove the wraparound which will fix the buffer not being released
issue.

Fixes: b3133e250e ("gallium: add pipe_context::resource_release to eliminate buffer refcounting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14955
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14499
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 10d32feae8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Christoph Pillmayer
d6ea90b495 pan/bi: Move FAUs to memory for memory phis
We can have PHIs like this: m10 = PHI u2, 3.
For these, insert_coupling_code will spill the sources but that doesn't
work properly for FAU values before this commit because bi_index_as_mem
asserts that index.type == BI_INDEX_NORMAL and we also can't look up an
FAU index in ctx->S_exit or ctx->remat.

Fixes: 6c64ad93 ("panfrost: spill registers in SSA form")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 8a4d8d490b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Christoph Pillmayer
955a82bb83 pan/bi: Fix coupling spill placement
In the following arrangement the old logic leads to the following:
                       |
                       v
            +----------+------------+
            |block5                 |
            |m815 = PHI m1034, m860 |<-----------+
            |343 = FMA.f32 ...      |            |
            +----------+------------+            |
                       |                         |
        +--------------+                         |
        |              |                         |
        v              v                         |
     +-----+        +-----+                      |
     |b6   |        |b7,8 |                      |
     |     |        |     |                      |
     +-----+        +--+--+                      |
        |    +---+     |    +---+                |
        +----|b9 +-----+----|b10+---+            |
        v    +---+          +---+   v            |
+-------+-------------+     +-------+---------+  |
|block12              |     |block11          |  |
|m882 = PHI m815, m860|     |m860 = MEMMOV 343+--+
+---------+-----------+     +-----------------+
          v

The spill of / into m860 (corresponding to 343) ends up in block11 when
insert_coupling_code(succ=block5, pred=block11) because of the memory
phi in block5. Later, in insert_coupling_code(block12, block9), we
reject inserting the spill after ca9c9957. As a result, m860 is
undefined along block5 -> block7,8 -> block9 -> block12.

When the spill position is chosen first, ctx->block is block5 so
choose_spill_position falsely returns the fallback position. The issue
can be fixed by explicitly passing the "current block".

Fixes: ca9c9957 ("pan: Avoid some redundant SSA spills")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 09e1ba28e5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Timothy Arceri
734e53c96b glsl: relax precision matching on unused uniforms ES
0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
allowed precision mismatches on uniforms, however if you lower precision on
16-bit consts, then this error triggers instead.

So here we relax the type matching and just make sure we match int vs
float.

Fixes: 0886be09 ("glsl: Allow precision mismatch on dead data with GLSL ES 1.00")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5337
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 73bc604128)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Pavel Ondračka
02f422a145 r300: disable HiZ for PIPE_FUNC_ALWAYS
AMD docs support this:
R5xx Acceleration v1.5 says safest handling for ZFUNC changes is to disable
HiZ except specific LESS/LEQUAL and GREATER/GEQUAL transitions.
ATI OpenGL Programming and Optimization Guide advises avoiding ALWAYS when
trying to benefit from HiZ so that would imply fglrx also disables HiZ
there.

On RV530 this fixes the following dEQPs:
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.43
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.74

Fixes: 12dcbd5954 ("r300g: enable Hyper-Z by default on r500")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8093
(cherry picked from commit b0f019f8cf)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
David Rosca
c001485f3b vl: Also disable MPEG2 Main profile when mpeg12 decode is disabled
Fixes: f4959c16c8 ("meson: add mpeg12dec as a video-codec")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit 55bab89951)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Jose Maria Casanova Crespo
7d25d214f5 vc4: flush write jobs before BO replacement in DISCARD_WHOLE path
The DISCARD_WHOLE_RESOURCE path in vc4_map_usage_prep() replaces the
resource's BO with vc4_resource_bo_alloc(). As the RCL resolves
rsc->bo at job submit in vc4_submit_setup_rcl_surface(), any pending
write job would store to the new BO instead of the old one, corrupting
the new written data.

This is the same bug that was fixed in v3d in the previous commit.

Fixes: 18ccda7b86 ("vc4: When asked to discard-map a whole resource, discard it.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit ecb6c5d555)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Jose Maria Casanova Crespo
fb8f81a1d8 v3d: flush write jobs before BO replacement in DISCARD_WHOLE path
The DISCARD_WHOLE_RESOURCE path in v3d_map_usage_prep() replaces the
resource's BO with v3d_resource_bo_alloc(). As the RCL resolves
rsc->bo at job submit in emit_rcl() any pending write job would
store to the new BO instead of the old one, corrupting the new
written data.

This is adressed by flushing all pending write jobs affecting the
resource before replacing its BO.

This fixes multiple tests where data copied to a renderbuffer was
overwritten by a previos GPU clear. Test are from the subgroup:
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.*

Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 1eaa46da09)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Jesse Natalie
5bf2bcd81e d3d12: Fix importing external resources
Fixes: 97061dd7 ("d3d12: Add support for Xbox GDK.")
(cherry picked from commit 9e277ed2b6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Samuel Pitoiset
f1f583b3bc radv: fix copying images with different swizzle modes on SDMA7
Swizzle modes must match on SDMA7 (GFX12), and the micro tile mode
doesn't exist.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit cc21e61e43)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:09 +01:00
Rhys Perry
223af79274 aco: perform dce for blocks skipped for process_block()
We might need to DCE users of dead instructions removed by
process_block().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 17b18496f6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Erik Faye-Lund
6e5d08c8e5 gallium/dri: set LIBVA_DRIVERS_PATH in devenv
We're setting this in the non-DRI codepath, but this was missed when we
started embedding the VA driver into libgallium. This means we no longer
were able to use VA-API from meson devenv, like we could before.

Fixes: 212d57f7e6 ("targets/va: Build va driver into libgallium when building with dri")
(cherry picked from commit 7e4744909b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Patrick Lerda
6f28830365 r600: fix cs atomic operations when the shader is called multiple times
This change is useful when the compute shader is called multiple
times with the atomic operations enabled. It fixes some data
coherency issues. This is done by moving
evergreen_emit_atomic_buffer_setup() after r600_flush_emit().

This change is also a partial fix for compute_shader.pipeline-compute-chain.
In this specific case, it makes the memory barrier working.

This change was tested on cayman and barts; it makes these tests
fully deterministic:
khr-gl4[2-6]/shader_atomic_counters/advanced-usage-many-dispatches: fail pass
khr-gles31/core/shader_atomic_counters/advanced-usage-many-dispatches: fail pass
deqp-gles31/functional/synchronization/inter_call/without_memory_barrier/atomic_counter_dispatch_.*_calls_.*_invocations: fail pass

Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit dad942b468)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Pavel Ondračka
b1775f660a r300: copy target when merging alpha output instruction
The alpha instruction always wrote to the same rendertarget as the rgb and the
original target was ignored (surprisingly the HW docs explicitly allows rgb and
alpha to write to different targets). This makes tesseract rendering a bit
better, but there are still some remaining issues.

Fixes: 1c2c4ddbd1 ("r300g: copy the compiler from r300c")
Reviewed-by: Filip Gawin <filip@gawin.net>
(cherry picked from commit 87a881558f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Pierre-Eric Pelloux-Prayer
f1a3aa4036 frontends/va: fix undefined ref error
When building with "-Dvideo-codecs=h264dec,h265dec,av1dec" va/encode.c
won't be built but it's still required because it's used from
picture.c

Fixes: c4f05bdf60 ("frontends/va: include picture_*.c based on selected codec")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 82a51ba9b3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Mike Blumenkrantz
020b960d03 radv: fix multiview fast clears
this was only clearing layer0 because it was ignoring the viewmask

cc: mesa-stable

(cherry picked from commit b8ee6f3d30)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Lionel Landwerlin
195fbfb2f1 anv: dirty all push constant stages in simple shader
Above we're reprogramming push constants as well at a couple of
workarounds that require dirtying all stages.

cmd_buffer->state.gfx.push_constant_stages was already set in the
above function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4fa1eddb4c ("anv: optimize binding table flushing")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14953
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 38ef732169)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Icenowy Zheng
049caf1696 pvr: only specially handle gfx subcmd for BeginQuery
Among all subcommands, only gfx subcommands are bound to a query pool,
other subcommands seem to need no special handling.

In addition, if a ResetQuery is done before BeginQuery, the last
subcommand will be a event one, which fails the current assert that
assumes it's a gfx one.

Change the assertion of the subcommand being a gfx one to an addition
check of whether the subcommand is a gfx one.

This fixes crash of Vulkan CTS 1.4.5.1 test
dEQP-VK.query_pool.discard.normal.no_depth.none.discard .

Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 5a497316d4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Benjamin Cheng
3611d60cea radeonsi/vcn: Use full pitch for pre-encode input
In 1f83e73145, the pre-encode input picture size was also reduced.
However it was recently discovered that VCN FW uses the input picture
pitch as the pitch for this, which means that previous change broke
pre-encode.

Fixes: 1f83e73145 ("radeonsi/vcn: Reduce allocated size for pre-encode recon pics")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
(cherry picked from commit 2b2b1d405a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Connor Abbott
ebfdb11193 ir3: Fix constlen trimming when more than one stage is trimmed
The logic is supposed to find the stage with the maximum constlen to
trim for each time we have to trim a stage. But by not resetting
max_constlen each time, we would "trim" the same stage repeatedly,
leaving us thinking the total is below the limit when it actually isn't.

Cc: mesa-stable
(cherry picked from commit ae8928b638)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Connor Abbott
5840bf0b1b tu: Use HW offset 0 in 3d loads/clears with FDM
The HW uses ViewportIndex to select which GRAS_BIN_FOVEAT offset to use.
For normal 3d draws, either the ViewportIndex equals the view/layer or
we make the offset the same for all viewports/layers, but we aren't
aware of this in the 3d path and we always use viewport 0.

Use the HW offset 0 when subtracting the HW offset. This is a bit of a
hack, but it should work. This fixes LOAD_OP_LOAD with FDM.

Fixes: b34b089ca1 ("tu: Use GRAS bin offset registers")
(cherry picked from commit 68c0031f56)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Lionel Landwerlin
98ec831d58 anv: add missing handling for attachment locations in secondaries
Fixes:
  dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.interaction_with_shader_object
  dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.remap_single_attachment_shader_object

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 095c470d25)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Luigi Santivetti
ec658ea317 zink: fix format conversion logic for the alpha emulation case
cc: mesa-stable

Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Fixes: 252bff0f ("zink: use real A8_UNORM when possible")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit 640bc3bc53)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Georg Lehmann
6c7f739b9d aco/insert_fp_mode: don't skip setting round for fract
fract(-FLT_MIN) is < 1.0 with rtz but 1.0 with rtne.

Fixes: 7212a75c5e ("aco/insert_fp_mode: exclude some instructions that will never round")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 8f4de30d05)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Mike Blumenkrantz
f1a64582dd st/bitmap: only release YUV samplerviews
this is consistent with other callers of st_get_sampler_views() and
avoids desync in the sv cache

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14934
Fixes: 73da0dcddc ("gallium: eliminate frontend refcounting from samplerviews")

Acked-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 1a5c660ef5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Mike Blumenkrantz
0486f6bf8b zink: add TRANSFER_WRITE -> HOST_READ sync to end of batch
this is technically required by spec, even though at a practical level
it probably has no effect

cc: mesa-stable

(cherry picked from commit 3ba275aa70)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Georg Lehmann
5a61e04572 ci: disable debian-ppc64el and debian-s390x
They failed a lot today, no idea why. But having flakes in pre merge CI sucks.

(cherry picked from commit b05271f16c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:08 +01:00
Eric Engestrom
6788336325 .pick_status.json: Update to 73dba1e151
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-03-11 23:21:07 +01:00
Eric Engestrom
b602b7f01e fixup! docs: add release notes for 26.0.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40359>
2026-02-26 19:18:41 +01:00
Eric Engestrom
51fe0abad8 docs: add sha sum for 26.0.1
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
2026-02-25 17:35:18 +01:00
Eric Engestrom
bf5998be6e VERSION: bump for 26.0.1 2026-02-25 16:54:24 +01:00
Eric Engestrom
ed6f967681 docs: add release notes for 26.0.1 2026-02-25 16:54:23 +01:00
Leon Perianu
f1a2f841f2 pvr: fix format table properties duplicate
- RGBA8888_* is a preprocessor alias for R8G8B8A8_* in u_format.yaml.
- Both entries in the format tables collide on the same enum value, and
   RGBA8888 overwrites R8G8B8A8.
- The fix was reverting to the version that was in the commit
39e949434c because there is a different format
was used that did not cause any collisions.

dEQP fixes:
   dEQP-VK.api.info.format_properties.r8g8b8a8_sint
   dEQP-VK.api.info.format_properties.r8g8b8a8_snorm
   dEQP-VK.api.info.format_properties.r8g8b8a8_uint
   dEQP-VK.api.info.format_properties.r8g8b8a8_unorm

Fixes: 9f740b26a6 ("pvr: Fix bugs in the format table")
Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 7c6dbb099a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:24 +01:00
Mary Guillemard
977f0409b2 hk: Fix crash in hk_handle_passthrough_gs
We should be returning if no GS is needed and no GS shader is bound.
This fix various segfaults introduced by the original fix.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: e10f29399f ("hk: fix passthrough GS key invalidation")
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 6d040df750)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:24 +01:00
Lionel Landwerlin
03847a6f0b anv: remove snprintf for aux op transition
With perfetto that string is processed later leading to
use-after-free.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 413e169f45)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Lionel Landwerlin
77f3279c37 anv: dirty descriptors after blorp operations
Blorp emits 3DSTATE_BINDING_TABLE_POINTER_* instructions in 3D mode.

At the moment we're saved by the push constants reemitting the btp but
we'll drop that in the next commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 533c748b34)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Samuel Pitoiset
87a238d829 radv: fix potential GPU hangs with secondaries on transfer queue
Cache flushes should be skipped on SDMA. In practice,
radv_emit_cache_flush() should only be called on GFX/ACE.

SDMA NOP packets are emitted in barriers directly.

This fixes recent VKCTS coverage
dEQP-VK.api.command_buffers.secondary_on_transfer_queue.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit c4d5090d69)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Samuel Pitoiset
6d7f2e3fbd ac/nir: fix writemask for dual source blending on GFX11+
This should definitely be an OR operation if MRT0 and MRT1 don't write
the same channels. This also requires to set the writemask manually
because when it's 0 (in case a dual-source output is missing), the
intrinsic computes the mask itself with the number of components.

No fossils-db changes on NAVI33.

Fixes: 45d8cd037a ("ac/nir: rewrite ac_nir_lower_ps epilog to fix dual src blending with mono PS")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14878
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2eb9420061)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Nick Hamilton
6edfd32388 pvr: Add support for fragment pass through shader
On the Rogue architecture add support for using a fragment passthrough
shader when there is no fragment shader present in a graphics
pipeline but the sample mask is required.

fix:
dEQP-VK.pipeline.monolithic.empty_fs.masked_samples

Backport-to: 26.0

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Co-authored-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 14508b4c9a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Nick Hamilton
23cd27b129 pvr: Update CI fails list after render pass fixes
Backport-to: 26.0

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit b87d995d32)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Jarred Davies
5000c31573 pvr: Add missing support for tile buffers to SPM EOT programs
Configure the EOT setup for SPM EOT programs so that the generated
programs load the tile buffer into the output buffer before doing
the emit

Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71

Backport-to: 26.0

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit d1f2ad17dd)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Nick Hamilton
022e34b5f3 pvr: Add missing support for preserve attachments
In subpasses preserve attachments are not used by the subpass but
their contents must be preserved throughout the subpass.

Add a list for the preserve attachments info specified by a subpass
and when determining a subpass attachments total uses check the
preserve attachments list and add it uses to the total.

Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71

Backport-to: 26.0

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 0e01b9ef2d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Nick Hamilton
a965c71ec6 pvr: Rename pvr_render_input_attachment
The struct will also be used for preserve attachments in the next
commit.

Backport-to: 26.0

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit e18670347a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Jarred Davies
fb1ba13c57 pvr: Fix allocating the required scratch buffer space for tile buffers
When calculating the dwords per pixel the output registers should
always be taken into account in addition to the number of tile buffers.

Fixes incorrect scratch buffer space calculation when both output
registers and tile buffers are emitted by a render.

Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71

Fixes: 3457f8083a ("pvr: Acquire scratch buffer on framebuffer creation.")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit df445dc9b9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Nick Hamilton
9ad2e48819 pvr: Fix incorrect subpass merging optimisation
The subpass merging optimisation check for when subpasses are using
tile buffers was in the incorrect location.

The current check is in a function called from two places but only
the first of these should have been doing the optimisation check.

This was incorrectly affecting the number of renders that subpass
merging could avoid.

Partial fix for:
dEQP-VK.renderpass.*.attachment_allocation.input_output.71

Fixes: 10b6a0d567 ("pvr: Add support for generating render pass hw setup data.")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 0640ac7e3b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Danylo Piliaiev
ac49313d06 ir3: Align TCS per-patch output to 64 bytes to prevent stale reads
Empirically, TCS outputs have to be aligned to 64 bytes,
otherwise stale data may be read in rare cases. The exact
reason is not clear, but tests and proprietary driver behavior
strongly point at the need for 64 byte alignment.

Fixes tesselation issues in at least "Conan Exiles" but likely in many
more cases.

CC: mesa-stable

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit 47251b2e2d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Rhys Perry
ba82a16761 aco: resolve hazards before calls
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 613b4fe407)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Rhys Perry
697fbaddb5 aco: reset all vgpr_used_by_vmem_ in resolve_all_gfx11
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit dfda890ae8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Benjamin Otte
d7607b6a4e lavapipe: Fix features for nonsubsampled ycbcr formats
The Vulkan spec says about VkFormatFeatureFlagBits:

  If a format does not incorporate chroma downsampling (it is
  not a “422” or “420” format) but the implementation supports
  sampler Y′CBCR conversion for this format, the implementation
  must set VK_FORMAT_FEATURE_MIDPOINT_CHROMA_SAMPLES_BIT.

Fixes: af062126ae
Signed-off-by: Benjamin Otte <otte@redhat.com>
(cherry picked from commit 0b6dd167ac)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Robert Mader
a163dec3ff lavapipe: enable dmabuf import for planar drm formats
Like e.g. NV12. This just requires some minor fixes around offset
handling.

(cherry picked from commit 0b6340fd94)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Mike Blumenkrantz
499a74569f zink: only do pre-sync transfer barrier after a renderpass
this is otherwise pointless and (for swapchain images) broken
(because they may never have acquired an image)

discovered by @valentine

cc: mesa-stable

(cherry picked from commit d47ba92d42)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Samuel Pitoiset
545509553a radv/meta: fix depth/stencil resolves with different regions
This is possible since VK_KHR_maintenance10.

This fixes new VKCTS coverage in
dEQP-VK.pipeline.*.multisample.m10_resolve.*.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ab6147e8ef)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Tapani Pälli
befb9af14b util: bring back fix to avoid strict aliasing bugs in xxhash
This is commit b9e163fa67 that got lost in xxhash upgrade 070bf8986c.

Fixes graphics artifacts seen in multiple workloads with Intel driver
when using clang compiler.

Fixes also CTS tests:

 dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_cubemap
 dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_3d
 dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_2d_array
 dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_2d_multisample_array

v2: pass arguments from meson.build instead of hardcoding
    (Eric Engestrom)

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14684
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14107
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13895
Fixes: 070bf8986c ("util: Upgrade xxhash.h to v0.8.3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit d2351b3d04)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Faith Ekstrand
a457021d67 panvk: Also load output attachments with LOAD_OP_NONE+STORE_OP_NONE
We already had this for LOAD_OP_DONT_CARE but we also need it for
LOAD_OP_NONE.

Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 44ff0c4707)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Faith Ekstrand
262e7feab9 panvk/jm: Refactor BeginRendering()
The old code was all out of order and made no sense.  There's a reason
it made no sense. It was wrong.  Cleaning this up fixes a solid 1/3 of
the remaining Bifrost CTS fails in CI.

Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 962d1f33e1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Faith Ekstrand
e29de2865e panvk/preload: Stop assuming 32 registers
cc: mesa-stable

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 3bb7d929f4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Faith Ekstrand
37191db342 panvk: Create both Z/S descriptors, even for separate Z/S
The Vulkan spec says that aspects are ignored for Z/S attachments so we
shouldn't consider that as a factor when deciding whether or not to
create other aspect descriptors.  This will be irrelevant in a couple of
commits but we need it for the backport anyway.

Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 19ad26a8de)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Faith Ekstrand
3a92074d8c nir/gather_info: Add support for panfrost tile load/store intrinsics
Fixes: 6fc1030e4f ("nir: Add some new panfrost fragment shader intrinsics")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 88ad8bc75d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:23 +01:00
Faith Ekstrand
897f5814ed pan/clear: Stop packing undefined bits in colors
The util code doesn't actually fill things with zeros so the high bits
are undefined.  If we really want things replicated, we need to mask off
just the bits we care about.

Cc: mesa-stable
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
(cherry picked from commit 4d8551552e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Emma Anholt
61f09295f3 ir3/ra: Fix DOUBLE_ONLY limit pressure computation.
As the comment says, we want to limit our pressure based on underlying HW
reg file size, not max it out to HW reg file size.  This caused us to not
spill when we should when the HW reg size was bigger than the ISA reg file
size, leading to OOB writes in RA when it tried to allocate to the limit
pressure we spilled to.

Fixes segfaults in llama.cpp's test-backend-ops.

Fixes: e6e34883a9 ("ir3: Add wavesize control")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14846
(cherry picked from commit 0c6da326f8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
José Roberto de Souza
b7752ddbc3 intel/perf: Add HSW verx10 to intel_perf_query_result_write_mdapi()
HSW is verx10 75 and when we switched from ver to verx10 I forgot to add the case
75.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a097a3d214 ("intel/perf: Change mdapi switch cases from ver to verx")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14902
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 48c685ee39)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Natalie Vock
71145cb846 radv/nir: Correctly handle workgroup sizes not aligned to 32
Since the stride is always 32 dwords, we need to treat the workgroup
size as multiples of that value. Using MAX2() only works for cases where
the workgroup size is less than 32, which was hit by some CTS with 1x1
workgroups.

Cc: mesa-stable
(cherry picked from commit b08f9f192c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Samuel Pitoiset
54293d4fdd radv: fix potential corruption after FMASK decompression on GFX6-8
While reworking image resolves completely in RADV, I found a very weird
bug where the only fix was to emit caches immediately after
decompressing the source resolve image (after FMASK_DECOMPRESS).

I have been struggling this for few hours and figured that it was
something related to context rolls (ie. as long the context was rolled
out, emitting the flushes immediately was required).

It turns out this was a known hardware bug on GFX6 that was implemented
in PAL. Though PAL only applies on GFX6 but GFX7-8 are also affected
based on my testing. Note that RadeonSI flushes CB_META too.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 837078b8d5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Lionel Landwerlin
6f75431e98 anv: disable ccs modifier reporting when ccs modifiers are disabled
Reporting the modifiers when we're going to disable it in the back
hits various asserts in anv_image.c

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2418c91537 ("anv/drirc: disable Xe2 CCS drm modifiers for GTK engine")
Helps: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14853
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4f38b5c888)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Lionel Landwerlin
5fa6c15b36 anv: apply the same ccs disabling for Xe3 than Xe2
The new compression scheme introduced in Xe2 also applies to Xe3, so
we're liable for the same bugs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2418c91537 ("anv/drirc: disable Xe2 CCS drm modifiers for GTK engine")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4ac47f8dde)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Rhys Perry
849cdbcf72 aco: fix gfx6-8 store_scratch() with function calls
Might happen with radv_emulate_rt=true.

Fixes the_great_circle/a6079328b8df7712 with polaris10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 75722da909)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Ian Romanick
bfeb230f9b elk/cmod: Don't propagate from CMP to ADD if there is a write between
If either source of the CMP is modified before an appropriate ADD is
found, the ADD and the CMP will not have the same result.

No shader-db changes on any ELK platform. I suspect the problematic
cases only occur after scheduling has rearranged instructions. This is
likely the reason BRW didn't experience this problem until 09450faf.

Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit da1fd9786b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Ian Romanick
024c5de569 elk/cmod: Don't propagate from CMP to possible Inf + (-Inf)
This is a backport of BRW e26270249b.

shader-db:

All Intel platforms had similar results. (Broadwell shown)
total instructions in shared programs: 18623918 -> 18624594 (<.01%)
instructions in affected programs: 125179 -> 125855 (0.54%)
helped: 0 / HURT: 139

total cycles in shared programs: 957073100 -> 957072484 (<.01%)
cycles in affected programs: 16534168 -> 16533552 (<.01%)
helped: 42 / HURT: 68

Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit bdbfe8de4d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Ian Romanick
d68b3091b2 brw/cmod: Don't propagate from CMP to ADD if there is a write between
If either source of the CMP is modified before an appropriate ADD is
found, the ADD and the CMP will not have the same result.

shader-db:

Lunar Lake
total instructions in shared programs: 17098815 -> 17098818 (<.01%)
instructions in affected programs: 1187 -> 1190 (0.25%)
helped: 0 / HURT: 3

total cycles in shared programs: 876858960 -> 876858968 (<.01%)
cycles in affected programs: 6878 -> 6886 (0.12%)
helped: 0 / HURT: 1

Meteor Lake, DG2, Tiger Lake, Ice Lake, and Skylake had similar results. (Meteor Lake shown)
total instructions in shared programs: 20034973 -> 20034984 (<.01%)
instructions in affected programs: 4599 -> 4610 (0.24%)
helped: 0 / HURT: 11

total cycles in shared programs: 881033088 -> 881033108 (<.01%)
cycles in affected programs: 57872 -> 57892 (0.03%)
helped: 0 / HURT: 5

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 918873064 -> 918873269 (+0.00%)
CodeSize: 14747338416 -> 14747339360 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104141836677 -> 104141840371 (+0.00%); split: -0.00%, +0.00%

Totals from 205 (0.01% of 2011421) affected shaders:
Instrs: 290415 -> 290620 (+0.07%)
CodeSize: 4280704 -> 4281648 (+0.02%); split: -0.01%, +0.03%
Cycle count: 18166526 -> 18170220 (+0.02%); split: -0.00%, +0.02%

Closes: #14874
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit d1614cd6db)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Frank Binns
e1ae66262f pvr: Fix alloc callbacks usage when freeing frame buffers
When creating frame buffers the alloc callbacks are used in the host
allocations, those same alloc callbacks need to be used when freeing
those allocations but are missing in some places causing the CTS to
report memory leaks in certain test cases.

Fixes: 146364ab9f ("pvr: add support for VK_KHR_dynamic_rendering")

fix:
dEQP-VK.api.object_management.alloc_callback_fail.framebuffer
dEQP-VK.api.object_management.single_alloc_callbacks.framebuffer

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 05ef9f01a7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Frank Binns
dea37352ba pvr/ci: move some timing out tests from fails to skips
Some of these test cases where already in the skip list.

Signed-off-by: Frank Binns <frank.binns@imgtec.com>
(cherry picked from commit 74fd985c6c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Yiwei Zhang
22c27bd3ea venus: sync protocol for strict aliasing compliance
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124148 for details.

Backport log: headers are generated from the protocol used by 26.0
              branch with the strict aliasing fix

(cherry picked from commit 6411ee0c2d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Aitor Camacho
40cf87c35a kk: Fix graphics pipeline serialization
Bundles all graphics pipeline creation information required by Metal into
the vertex shader so we can later rebuild the pipeline. This allows us to
correctly create pipelines from caches that were loaded from files.

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit cdbf7242f3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Aitor Camacho
358c8f257a kk: Move gfx pipeline data to the info struct within kk_shader
Makes it easier to serialize and add data specific to the gfx pipeline.

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 99d8246d1c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:22 +01:00
Aitor Camacho
6152bf1cfb kk: Fix compute pipeline cache
When deserializing the compute shader from a blob, we need to recreate the
pipeline because the blob may have been loaded from file and therefore the
reference to the Metal resource will be invalid.

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 75f6f46c0f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Aitor Camacho
024143cca4 kk: Correctly release pipeline handles at shader destroy
The condition to release Metal pipelines incorrectly checks which shader
stage we are destroying leading to leads when graphics pipelines had to
be released.

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 622ebba476)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Aitor Camacho
9a63c20469 kk: Fix shader uint32_t value serialization
We need to write with blob_write_uint32 if we are using blob_read_uint32

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 15c0dd39fc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Aitor Camacho
a3f872630b kk: Fill pipelineUUID
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit b350f059f5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Natalie Vock
6f88b07e5d radv: Initialize nir_lower_io_to_scalar progress variable
The NIR_PASS macro only overwrites this when the pass actually makes
progress. If the pass doesn't make progress, the variable stays
uninitialized.

Clang correctly spots this and warns about it.

Cc: mesa-stable
(cherry picked from commit 47e4a68a83)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Mike Blumenkrantz
641a3ea0d9 zink: fix broken compiler assert
cc: mesa-stable

(cherry picked from commit 44f2c40830)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Natalie Vock
c4bb652871 radv/rt: Only use ds_bvh_stack_rtn if the stack base is possible to encode
The hardware only provides 13 bits for encoding the stack base (in
dwords). That translates to the stack base being required to be below
8192 dwords, or 32kB. It's possible to exceed this - LDS is 64kB after
all. Add an explicit check to make sure we don't end up with offsets
that overflow the hw's address fields. This fixes Metro Exodus Enhanced
Edition, which was using ray queries in a 1024-thread sized workgroup,
resulting in exactly 64kB of LDS being required for the stack.

This check isn't required for RT pipelines as we always use 32 or 64
wide workgroups with no other LDS used, so it's impossible to reach this
stack base limit.

Cc: mesa-stable
(cherry picked from commit 59a397793e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Olivia Lee
47caf527e3 hk: fix passthrough GS key invalidation
Just seeing that a passthrough GS was already bound is not sufficient to
know that it is a *matching* passthrough GS. If the application binds a
new VS that requires a different passthrough GS key than the previous
VS, then we need to bind a different passthrough GS.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
(cherry picked from commit e10f29399f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Janne Grunau
3397d3995f hk: Use aligned vector fill in hk_CmdFillBuffer if possible
30% faster with 16KB buffers, more than twice as fast with 8MB and
larger buffers.

(cherry picked from commit 651a321ee2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Janne Grunau
1ce5b5b361 asahi: Implement clear_buffer using libagx_fill*
Use either libagx_fill_uint4 or libagx_fill based of size and object
alignment for clear_sizes which are a power of two up to 16.
Reported fill rate for 256MB buffers on a M1 Ultra (G13D) in
gpu-ratemeter is 355 GB/s for 16 byte aligned buffers and 155 GB/s for
4 byte aligned buffers.

Signed-off-by: Janne Grunau <janne-fdr@jannau.net>
(cherry picked from commit 5c2d62c030)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Janne Grunau
37a269e303 asahi: Use GPU for buffer copies in resource_copy_region()
Use a compute shader to copy PIPE_BUFFERs. Based on hk's hk_cmd_copy().
For large copy sizes (>= 128MB) it achieves 3/4 of the available memory
bandwidth on a M1 Ultra (G13D). `gpu-ratemeter gl.bufbw` reports
~625 GB/s for 256MB buffer size. Apple specifies the memory bandwidth of
the M1 Ultra with 819.2 GB/s.

Signed-off-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 3f5497ded8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Pavel Ondračka
0f21dc1bd4 mesa: implement FRAMEBUFFER_RENDERABLE internalformat query
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Erik Faye-Lund <erik-faye-lund@collabora.com>
Cc: mesa-stable
(cherry picked from commit 2b76f2e4a7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Jianxun Zhang
372c7545e6 anv: Limit modifier disabling workaround to specific GTK versions
The issue caused us to put a switch to disable (Xe2) drm modifers
in 2418c91537 is fixed in GTK 4.20.3,
so we can enable the modifiers with this and newer GTK releases.

GTK https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/9164:
b2a42d5a6e Revert "vulkan: Wait for device to be idle before
           create/recreating swapchain"
270735a151 vulkan: Rework swapchain present implementation

The hex values represent the GTK version range: [4.0.0, 4.20.2] for
VK_MAKE_VERSION(), refer to:
f493f5c88d

Cc: mesa-stable
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit df7d333656)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Wei Hao
f60b93b454 radeonsi: fix threaded shader compilation finishing after context is destroyed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit ec6d077351)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Ryan Zhang
96ee7156af panvk: guard against NULL pointers to avoid crash
Vkcts simulate_oom caselist try to alloc fail manual
which caused the panvk crash. We should guard driver
cannot access null pointor.

Fixes: 598a8d9d11 ("panvk: Collect allocated push
sets at the command level")

Fixed:
dEQP-VK.wsi.wayland.swapchain.simulate_oom.*

Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
(cherry picked from commit 418e6c4ed9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Lars-Ivar Hesselberg Simonsen
11db64a7d3 pan/genxml/v13: Fix HSR Prepass typo
Fixes: ece01443e1 ("pan/genxml: Add v13 definition")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 71500a32fa)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Lars-Ivar Hesselberg Simonsen
43b9a2ea5e panvk: Fix dcd_flags1 dirty bit
dcd_flags1 was not counted as dirty in case the color attachment map was
updated. This could lead to an outdated value for render_target_mask.

Fixes: a4670a67e0 ("panvk/csf: Set the correct DCD_FLAGS_1.render_rarget_mask")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 75242b1862)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Pavel Ondračka
98e2234eb4 r300: align macro-tiled stride-addressed textures in X
Odd macro-tile counts in X trigger flaky rendering/readback in
parallel stress runs with macro-tiled NPOT textures (for example
piglit draw-pixel-with-texture -auto -fbo).

When a texture is macro-tiled and uses stride addressing, align the
width to two macro tiles. This keeps the stride at an even number of
macro tiles in X and avoids the corruption without disabling
macrotiling.

I was not able to find anything about this in the docs.

Cc: mesa-stable
(cherry picked from commit 0763fb947a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Yiwei Zhang
7c0b97be73 venus: workaround a gcc-15 dead store elimination (DSE) bug
No issue with clang or gcc-14.x (or earlier versions). The issue only
shows up since gcc-15.1. The compiler somehow fails to consider those
cs helpers dereferencing the pointer from the pNext chain for reads,
and thus has falsely optimized away the pNext store. This change works
around this with a no-op memory clobber.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13242
Cc: mesa-stable
(cherry picked from commit b0397b967d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Timothy Arceri
6fb7a07c79 st/glsl_to_nir: make sure the variant has the correct locations set
For drivers that set allow_st_finalize_nir_twice locations are set
when the variable is created. But for variants here we update the
locations in case parameter opt pass or something else changed the
location.

Fixes: 891d46f517 ("st/glsl_to_nir: dont add duplicate state tokens")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14837

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit a6fcc2835e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Timothy Arceri
d7fa6a4deb mesa: add _mesa_lookup_state_param_idx() helper
This will be used in the following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit c3aae0714c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:21 +01:00
Ian Romanick
0710d042db elk: Call nir_opt_algebraic_late in elk_postprocess_nir
Make sure that lowering undone in elk_nir_optimize are reapplied.

No shader-db or fossil-db changes on any Intel platform. This is most
likely to impact either Gfx8 on ANV or Gfx7.5 on HASVK. I don't
fossil-db test either of those platforms.

I tried doing a similar thing here as is done in BRW (previous commit),
but that caused a couple Haswell shaders to fall off a performance
cliff:

total spills in shared programs: 8247 -> 8311 (0.78%)
spills in affected programs: 6 -> 70 (1066.67%)
helped: 0 / HURT: 2

total fills in shared programs: 8558 -> 8910 (4.11%)
fills in affected programs: 6 -> 358 (5866.67%)
helped: 0 / HURT: 2

Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit df704bd38e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Ian Romanick
1f65b768a1 brw: Call nir_opt_algebraic_late later in brw_postprocess_nir_opts
Move the call to nir_opt_algebraic_late after the last time
brw_nir_optimize might be called. nir_opt_algebraic_distribute_src_mods
works together with the late algebraic optimizations, so move it also.

shader-db:

Lunar Lake
total instructions in shared programs: 17081222 -> 17080842 (<.01%)
instructions in affected programs: 419931 -> 419551 (-0.09%)
helped: 545 / HURT: 826

total cycles in shared programs: 878437752 -> 879236226 (0.09%)
cycles in affected programs: 506003142 -> 506801616 (0.16%)
helped: 3091 / HURT: 3189

LOST:   18
GAINED: 16

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19994270 -> 19993231 (<.01%)
instructions in affected programs: 490499 -> 489460 (-0.21%)
helped: 660 / HURT: 800

total cycles in shared programs: 882498776 -> 882834186 (0.04%)
cycles in affected programs: 477858602 -> 478194012 (0.07%)
helped: 3458 / HURT: 3564

total fills in shared programs: 4371 -> 4370 (-0.02%)
fills in affected programs: 7 -> 6 (-14.29%)
helped: 1 / HURT: 0

LOST:   28
GAINED: 10

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
total instructions in shared programs: 19943849 -> 19942782 (<.01%)
instructions in affected programs: 467384 -> 466317 (-0.23%)
helped: 655 / HURT: 796

total cycles in shared programs: 860085674 -> 861410289 (0.15%)
cycles in affected programs: 426900998 -> 428225613 (0.31%)
helped: 3250 / HURT: 3441

LOST:   19
GAINED: 14

fossil-db:

Lunar Lake
Totals:
Instrs: 926472091 -> 926204838 (-0.03%); split: -0.04%, +0.01%
CodeSize: 14845921056 -> 14842776112 (-0.02%); split: -0.10%, +0.08%
Send messages: 41459570 -> 41459574 (+0.00%); split: -0.00%, +0.00%
Cycle count: 104481085069 -> 104583692712 (+0.10%); split: -0.14%, +0.24%
Spill count: 3454651 -> 3457340 (+0.08%); split: -0.15%, +0.23%
Fill count: 4958779 -> 4958487 (-0.01%); split: -0.46%, +0.45%
Max live registers: 193805970 -> 193839002 (+0.02%); split: -0.00%, +0.02%
Max dispatch width: 49114416 -> 49113776 (-0.00%); split: +0.01%, -0.01%
Non SSA regs after NIR: 142953905 -> 142800740 (-0.11%); split: -0.12%, +0.01%

Totals from 420256 (20.80% of 2020128) affected shaders:
Instrs: 448571327 -> 448304074 (-0.06%); split: -0.09%, +0.03%
CodeSize: 7312002800 -> 7308857856 (-0.04%); split: -0.21%, +0.17%
Send messages: 17716494 -> 17716498 (+0.00%); split: -0.00%, +0.00%
Cycle count: 52178854998 -> 52281462641 (+0.20%); split: -0.28%, +0.48%
Spill count: 2945654 -> 2948343 (+0.09%); split: -0.17%, +0.26%
Fill count: 4404768 -> 4404476 (-0.01%); split: -0.51%, +0.51%
Max live registers: 60875448 -> 60908480 (+0.05%); split: -0.01%, +0.06%
Max dispatch width: 9455280 -> 9454640 (-0.01%); split: +0.04%, -0.04%
Non SSA regs after NIR: 60542740 -> 60389575 (-0.25%); split: -0.28%, +0.02%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1000081384 -> 999726726 (-0.04%); split: -0.05%, +0.01%
CodeSize: 16764458080 -> 16761624256 (-0.02%); split: -0.09%, +0.07%
Subgroup size: 27599528 -> 27599544 (+0.00%)
Send messages: 45538933 -> 45538951 (+0.00%); split: -0.00%, +0.00%
Cycle count: 93303830912 -> 93370118192 (+0.07%); split: -0.19%, +0.26%
Spill count: 3739306 -> 3739719 (+0.01%); split: -0.22%, +0.23%
Fill count: 5089719 -> 5083626 (-0.12%); split: -0.56%, +0.44%
Max live registers: 122041364 -> 122055848 (+0.01%); split: -0.00%, +0.01%
Max dispatch width: 38117296 -> 38127200 (+0.03%); split: +0.06%, -0.03%
Non SSA regs after NIR: 164296197 -> 164299306 (+0.00%); split: -0.01%, +0.01%

Totals from 338754 (14.82% of 2285730) affected shaders:
Instrs: 452723479 -> 452368821 (-0.08%); split: -0.10%, +0.03%
CodeSize: 7861878032 -> 7859044208 (-0.04%); split: -0.19%, +0.16%
Subgroup size: 16 -> 32 (+100.00%)
Send messages: 17050010 -> 17050028 (+0.00%); split: -0.00%, +0.00%
Cycle count: 52881801997 -> 52948089277 (+0.13%); split: -0.33%, +0.46%
Spill count: 3271458 -> 3271871 (+0.01%); split: -0.25%, +0.26%
Fill count: 4628422 -> 4622329 (-0.13%); split: -0.61%, +0.48%
Max live registers: 30738902 -> 30753386 (+0.05%); split: -0.01%, +0.06%
Max dispatch width: 4787264 -> 4797168 (+0.21%); split: +0.47%, -0.26%
Non SSA regs after NIR: 61748026 -> 61751135 (+0.01%); split: -0.03%, +0.03%

Tiger Lake
Totals:
Instrs: 1011068379 -> 1010977290 (-0.01%); split: -0.03%, +0.02%
CodeSize: 14197751744 -> 14197683040 (-0.00%); split: -0.07%, +0.07%
Send messages: 46431228 -> 46431220 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85066526419 -> 85085088071 (+0.02%); split: -0.16%, +0.18%
Spill count: 3853750 -> 3855185 (+0.04%); split: -0.15%, +0.19%
Fill count: 6716746 -> 6719594 (+0.04%); split: -0.25%, +0.29%
Max live registers: 122307387 -> 122326083 (+0.02%); split: -0.00%, +0.02%
Max dispatch width: 38009632 -> 38003280 (-0.02%); split: +0.03%, -0.05%
Non SSA regs after NIR: 158403572 -> 158415390 (+0.01%); split: -0.01%, +0.02%

Totals from 277728 (12.17% of 2281577) affected shaders:
Instrs: 349206856 -> 349115767 (-0.03%); split: -0.07%, +0.05%
CodeSize: 5042621104 -> 5042552400 (-0.00%); split: -0.20%, +0.20%
Send messages: 13132243 -> 13132235 (-0.00%); split: -0.00%, +0.00%
Cycle count: 36183327716 -> 36201889368 (+0.05%); split: -0.38%, +0.43%
Spill count: 2210072 -> 2211507 (+0.06%); split: -0.26%, +0.33%
Fill count: 4188439 -> 4191287 (+0.07%); split: -0.39%, +0.46%
Max live registers: 24956695 -> 24975391 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 3948832 -> 3942480 (-0.16%); split: +0.32%, -0.48%
Non SSA regs after NIR: 45616425 -> 45628243 (+0.03%); split: -0.04%, +0.06%

Ice Lake
Totals:
Instrs: 1009584306 -> 1009411757 (-0.02%); split: -0.02%, +0.01%
CodeSize: 12593466880 -> 12592958096 (-0.00%); split: -0.01%, +0.01%
Send messages: 47274203 -> 47274171 (-0.00%); split: -0.00%, +0.00%
Cycle count: 84920281455 -> 84914027301 (-0.01%); split: -0.05%, +0.04%
Spill count: 2988523 -> 2986191 (-0.08%); split: -0.14%, +0.07%
Fill count: 5296078 -> 5288737 (-0.14%); split: -0.21%, +0.07%
Max live registers: 125429384 -> 125444786 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 41269072 -> 41267312 (-0.00%); split: +0.03%, -0.03%
Non SSA regs after NIR: 163223895 -> 163236623 (+0.01%); split: -0.01%, +0.02%

Totals from 243818 (10.45% of 2334244) affected shaders:
Instrs: 296953759 -> 296781210 (-0.06%); split: -0.08%, +0.02%
CodeSize: 3643224480 -> 3642715696 (-0.01%); split: -0.04%, +0.03%
Send messages: 11518671 -> 11518639 (-0.00%); split: -0.00%, +0.00%
Cycle count: 33065548412 -> 33059294258 (-0.02%); split: -0.13%, +0.11%
Spill count: 1346515 -> 1344183 (-0.17%); split: -0.32%, +0.15%
Fill count: 2537906 -> 2530565 (-0.29%); split: -0.43%, +0.14%
Max live registers: 21476776 -> 21492178 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 3727288 -> 3725528 (-0.05%); split: +0.31%, -0.35%
Non SSA regs after NIR: 41050474 -> 41063202 (+0.03%); split: -0.04%, +0.07%

Skylake
Totals:
Instrs: 513573157 -> 513462971 (-0.02%); split: -0.02%, +0.00%
CodeSize: 5950280672 -> 5950001392 (-0.00%); split: -0.01%, +0.00%
Send messages: 24909757 -> 24909758 (+0.00%); split: -0.00%, +0.00%
Cycle count: 57636102242 -> 57634726342 (-0.00%); split: -0.03%, +0.03%
Spill count: 627286 -> 627241 (-0.01%); split: -0.01%, +0.00%
Fill count: 837888 -> 837804 (-0.01%); split: -0.01%, +0.00%
Max live registers: 87272271 -> 87284192 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 32278832 -> 32271800 (-0.02%); split: +0.02%, -0.04%
Non SSA regs after NIR: 87387713 -> 87387614 (-0.00%); split: -0.00%, +0.00%

Totals from 177432 (10.30% of 1722906) affected shaders:
Instrs: 127170648 -> 127060462 (-0.09%); split: -0.10%, +0.01%
CodeSize: 1443406368 -> 1443127088 (-0.02%); split: -0.03%, +0.01%
Send messages: 5444220 -> 5444221 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15423028495 -> 15421652595 (-0.01%); split: -0.10%, +0.10%
Spill count: 235844 -> 235799 (-0.02%); split: -0.03%, +0.01%
Fill count: 333783 -> 333699 (-0.03%); split: -0.03%, +0.01%
Max live registers: 13765573 -> 13777494 (+0.09%); split: -0.01%, +0.10%
Max dispatch width: 3086880 -> 3079848 (-0.23%); split: +0.24%, -0.47%
Non SSA regs after NIR: 17623772 -> 17623673 (-0.00%); split: -0.00%, +0.00%

Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 11b96a84b0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Ian Romanick
2874160ce2 brw: Call nir_opt_algebraic_late in brw_nir_create_raygen_trampoline
Make sure that lowering undone in brw_nir_optimize are reapplied.

No shader-db changes on any Intel platform.

Why are there fossil-db changes on platforms that don't support ray tracing?

Lunar Lake
Totals:
Instrs: 926636441 -> 926636313 (-0.00%); split: -0.00%, +0.00%
Send messages: 41510729 -> 41510723 (-0.00%); split: -0.00%, +0.00%
Cycle count: 104509492613 -> 104509490569 (-0.00%); split: -0.00%, +0.00%
Max live registers: 193792922 -> 193792890 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 150091934 -> 150092170 (+0.00%); split: -0.00%, +0.00%

Totals from 10 (0.00% of 2020428) affected shaders:
Instrs: 8142 -> 8014 (-1.57%); split: -3.14%, +1.57%
Send messages: 192 -> 186 (-3.12%); split: -7.29%, +4.17%
Cycle count: 131892 -> 129848 (-1.55%); split: -6.93%, +5.38%
Max live registers: 1442 -> 1410 (-2.22%); split: -3.05%, +0.83%
Non SSA regs after NIR: 950 -> 1186 (+24.84%); split: -26.95%, +51.79%

Meteor Lake
Totals:
Instrs: 1000805547 -> 1000805543 (-0.00%); split: -0.00%, +0.00%
Cycle count: 93131592265 -> 93131619619 (+0.00%); split: -0.00%, +0.00%
Max live registers: 122081268 -> 122081244 (-0.00%); split: -0.00%, +0.00%

Totals from 16 (0.00% of 2286241) affected shaders:
Instrs: 18652 -> 18648 (-0.02%); split: -1.39%, +1.37%
Cycle count: 369520 -> 396874 (+7.40%); split: -2.94%, +10.34%
Max live registers: 1350 -> 1326 (-1.78%); split: -4.15%, +2.37%

DG2
Totals:
Instrs: 999834626 -> 999834651 (+0.00%); split: -0.00%, +0.00%
Send messages: 45719398 -> 45719403 (+0.00%); split: -0.00%, +0.00%
Cycle count: 93118238139 -> 93118269557 (+0.00%); split: -0.00%, +0.00%
Max live registers: 122098944 -> 122098936 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 169413734 -> 169413661 (-0.00%); split: -0.00%, +0.00%

Totals from 13 (0.00% of 2286795) affected shaders:
Instrs: 18799 -> 18824 (+0.13%); split: -1.04%, +1.18%
Send messages: 492 -> 497 (+1.02%); split: -2.44%, +3.46%
Cycle count: 352838 -> 384256 (+8.90%); split: -1.08%, +9.98%
Max live registers: 1237 -> 1229 (-0.65%); split: -2.91%, +2.26%
Non SSA regs after NIR: 2191 -> 2118 (-3.33%); split: -20.86%, +17.53%

Tiger Lake
Totals:
Instrs: 1011816778 -> 1011816714 (-0.00%); split: -0.00%, +0.00%
Send messages: 46515289 -> 46515285 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85148902406 -> 85148894668 (-0.00%); split: -0.00%, +0.00%
Max live registers: 122362180 -> 122362172 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 38036160 -> 38036176 (+0.00%)
Non SSA regs after NIR: 160317521 -> 160317649 (+0.00%); split: -0.00%, +0.00%

Totals from 6 (0.00% of 2282318) affected shaders:
Instrs: 9204 -> 9140 (-0.70%); split: -1.43%, +0.74%
Send messages: 258 -> 254 (-1.55%); split: -3.10%, +1.55%
Cycle count: 287652 -> 279914 (-2.69%); split: -3.29%, +0.60%
Max live registers: 552 -> 544 (-1.45%); split: -2.90%, +1.45%
Max dispatch width: 48 -> 64 (+33.33%)
Non SSA regs after NIR: 914 -> 1042 (+14.00%); split: -14.00%, +28.01%

Ice Lake
Totals:
Instrs: 1012203285 -> 1012203249 (-0.00%); split: -0.00%, +0.00%
Send messages: 47358859 -> 47358858 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85112165276 -> 85112171905 (+0.00%); split: -0.00%, +0.00%
Max live registers: 125545002 -> 125544992 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 41335696 -> 41335656 (-0.00%)
Non SSA regs after NIR: 166448597 -> 166448602 (+0.00%); split: -0.00%, +0.00%

Totals from 13 (0.00% of 2335519) affected shaders:
Instrs: 16486 -> 16450 (-0.22%); split: -1.67%, +1.46%
Send messages: 368 -> 367 (-0.27%); split: -4.89%, +4.62%
Cycle count: 347643 -> 354272 (+1.91%); split: -1.34%, +3.25%
Max live registers: 1104 -> 1094 (-0.91%); split: -3.80%, +2.90%
Max dispatch width: 192 -> 152 (-20.83%)
Non SSA regs after NIR: 2100 -> 2105 (+0.24%); split: -21.76%, +22.00%

Skylake
Totals:
Instrs: 504548665 -> 504548057 (-0.00%); split: -0.00%, +0.00%
Send messages: 24479148 -> 24479118 (-0.00%); split: -0.00%, +0.00%
Cycle count: 57575198140 -> 57575179256 (-0.00%); split: -0.00%, +0.00%
Max live registers: 85570671 -> 85570575 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 85097646 -> 85098486 (+0.00%); split: -0.00%, +0.00%

Totals from 22 (0.00% of 1703671) affected shaders:
Instrs: 19866 -> 19258 (-3.06%); split: -3.72%, +0.66%
Send messages: 464 -> 434 (-6.47%); split: -8.19%, +1.72%
Cycle count: 250854 -> 231970 (-7.53%); split: -9.23%, +1.70%
Max live registers: 2024 -> 1928 (-4.74%); split: -5.53%, +0.79%
Non SSA regs after NIR: 2498 -> 3338 (+33.63%); split: -8.33%, +41.95%

Fixes: 442daeb54a ("nir/opt_algebraic: use fcanonicalize")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 5af0b8bd09)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Konstantin Seurer
3ffe4b257b vulkan/cmd_queue: Fixup stride for multi draws
Copying the draw infos packs them so the stride needs to be set to the
struct size.

cc: mesa-stable

Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
(cherry picked from commit be5ab80de1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Ian Romanick
45ce75f3bc nir: Use STACK_ARRAY instead of NIR_VLA
The number of fields comes from the shader, so it could be a value large
enough that using alloca would be problematic.

Fixes: c11833ab24 ("nir,spirv: Rework function calls")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9017d37e84)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Ian Romanick
978fd42b4b spirv: Use STACK_ARRAY instead of NIR_VLA
The number of fields comes from the shader, so it could be a value large
enough that using alloca would be problematic.

Fixes: 2a023f30a6 ("nir/spirv: Add basic support for types")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 3da828d2dd)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Jesse Natalie
5048a2ed1c meson: Include DirectX-Headers dependency for all VK Windows builds
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14839
Cc: mesa-stable
Reviewed-by: Eric Engestrom <eric@igalia.com>
(cherry picked from commit f0066a3150)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Alyssa Rosenzweig
806f0a35a4 brw: drop buggy SLM optimization
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:

non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bd5ebbb2f8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Anna Maniscalco
6278aa107a freedreno/common: set has_astc_hdr true for a7xx targets
Fixes: dc07473524 ("freedreno/fdl: add astc hdr formats")
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit e959dd0dd7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Daniel Schürmann
7fda785505 nir/clone: Fix cloning indirect call instructions
Fixes: bb40284f76 ('nir: Add indirect calls')
(cherry picked from commit 88b4221519)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Samuel Pitoiset
a2ad1789fa ac,radv,radeonsi: use correct swizzle/pitch for depth-only images with SDMA
This fixes new VKCTS coverage
dEQP-VK.api.copy_and_blit.core.use_after_copy.*.

is_stencil isn't set for RadeonSI because it doesn't do SDMA copies
with Z/S.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 1be4ffdff9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Eric Engestrom
88e238de07 .pick_status.json: Mark 7dd7731ac7 as denominated
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Aitor Camacho
4229b57783 wsi/metal: Expose additional color spaces if instance extension enabled
Caught through VVL test NegativeWsi.SwapchainImageFormatList. The test
would try to create a swapchain with a color space from
VK_EXT_swapchain_colorspace without enabling the extension. This is
because wsi would expose those color spaces even when the extension was
not enabled.

Fixes: fd045ac99c ("wsi/metal: add support for color spaces")

Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit e6f118f12b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Lionel Landwerlin
1994d93542 isl: fix 32bit math with 4GB buffer size
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d956957153)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Lionel Landwerlin
af97f7fe38 anv: add missing constant cache invalidation for descriptor buffers
A descriptor buffer promoted to push constants requires a constant
cache invalidation if it is modified on the device.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 42b70cf05a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Lionel Landwerlin
12da136c07 anv: fix nested command buffer relocations
When executing 3 command buffers :

vkCmdExecuteCommands(CB_B, CB_C);
vkCmdExecuteCommands(CB_A, CB_B);

vkQueueSubmit(CB_A);

We're not transfering correctly the relocations of CB_C from CB_B to
CB_A.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e64889635c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Konstantin Seurer
f8ce75c40c radv: Fix setting the viewport for depth stencil FS resolves
Fixes: 704fbbb ("radv/meta: rework depth/stencil resolves using graphics")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit f574de2249)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Lionel Landwerlin
2abdb028dd anv: flush render caches on first pipeline select
Given a situation like this :
  - CB_A: begin, renderDepthA, end
  - CB_B: begin, computeA, barrier (depth), computeB, end

The depth cache is not being flushed between renderDepthA & computeB
because :
  - it's not flushed at the end of CB_A (it's not required)
  - when CB_B starts, we're still on GFX pipeline mode but do not
    flush render caches because pipeline mode is unknown
  - when barrier is CB_B is executed, we're already in compute
    pipeline mode and HW cannot flush depth.

The fix is to flush RT/depth cached when switching from unknown
pipeline mode any pipeline mode.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6dae6ef5f ("vulkan: Optimize implicit end_subpass barrier")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14816
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: David Gow <david@davidgow.net>
(cherry picked from commit 888ac904a3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Juston Li
6dbd6ee94b anv: set missing protected bit for protected depth/stencil surfaces
This bit is set in mocs for other protected attachment types by
anv_image_fill_surface_state() however was ommited for depth/stencil
attachments here.

Without the protected bit set, it causes heavy black artifacting when
attaching a protected depth attachment image to a framebuffer.

Fixes: 794b0496e9 ("anv: enable protected memory")
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit f84ed620c2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Matt Turner
2c250e6235 elk/cse: use copies in operands_match instead of in-place modification
`operands_match` was modifying instruction source operands in-place
(through the `elk_fs_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit 14c65322e8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Matt Turner
03e6f285e5 elk/cse: fix operands_match corrupting non-IMM register data
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.

Guard all `.f` accesses with `.file == IMM` checks.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit 93f39f87c4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Matt Turner
2b221e5a1a brw/cse: use copies in operands_match instead of in-place modification
`operands_match` was modifying instruction source operands in-place
(through the `brw_reg *src` pointer member) and relying on a
save/restore pattern to undo the modifications. Work on local copies
instead, which is simpler and avoids mutating shared state in a
comparison function.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit b302faad8b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:20 +01:00
Matt Turner
14c7d820cd brw/cse: fix operands_match corrupting non-IMM register data
The MUL case in `operands_match` was reading and writing the `.f` union
member unconditionally, even when the register's `.file != IMM`. In that
case `.f` aliases the struct containing `.nr`/`.swizzle`/etc, so the
`fabsf()` call could corrupt the `.nr` by clearing bit 31.

Guard all `.f` accesses with `.file == IMM` checks.

Fixes: 47c4b38540 ("i965/fs: Allow CSE to handle MULs with negated arguments.")
(cherry picked from commit f5e0f63216)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:19 +01:00
Eric Engestrom
547fd52a66 pick-ui: add Backport-to: * as a synonym to Cc: mesa-stable
(cherry picked from commit b2d99b9378)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:19 +01:00
Eric Engestrom
9a0d13be9a bin/gen_release_notes: fix support for python 3.14
There is no default even loop anymore, we need to make one if we want
one now.

Cc: mesa-stable
(cherry picked from commit c7603a11de)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:19 +01:00
Eric Engestrom
e5fb4a0682 .pick_status.json: Update to 03d2cc2b2a
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40092>
2026-02-25 14:22:19 +01:00
Eric Engestrom
8794fced82 docs: add sha sum for 26.0.0
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
2026-02-11 19:19:12 +01:00
Eric Engestrom
c10cba7efa VERSION: bump for 26.0.0 2026-02-11 19:07:29 +01:00
Eric Engestrom
e0f7bc0024 docs: add release notes for 26.0.0 2026-02-11 19:07:29 +01:00
Georg Lehmann
3062621cf6 aco/opt_postRA: don't optimize across calls
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Could do better by checking which registers are clobbered/preserved,
but that's unlikely to be useful anyway.

Backport-to: 26.0

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit fc7b5d7eed)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Georg Lehmann
33ca80ea38 aco: handle all SALU that modifies PC in needs_exec_mask
Calls use swappc.

Backport-to: 26.0

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 10b12a6ee2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Georg Lehmann
d8acb10c56 aco/lower_branches: consider jump target of conditional branches based on vcc
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 421a4dacf0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Karol Herbst
acdbdcc53b vtn: set default fp_math_ctrl values for kernels
The kernel capabilty has the `FPFastMathMode` decoration, but not the
`FPFastMathDefault` execution mode, so a SPIR-V module not using
`SPV_KHR_float_controls2` has no way of setting any defaults.

Fixes: 9da2d21804 ("vtn: implement default fp_math_ctrl without using execution mode")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Tested-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit faf3a93e8f)

[Eric: adjusted commit because of missing 46a617884e, as suggested by the author
at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39790#note_3325830]

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Dave Airlie
b53dbb573a gallivm: handle u16 correct on const loads.
I somehow screwed this up on my previous attempt at fixing this bug,

This should fix the loop limiter bug on big endian properly.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Cc: mesa-stable
Fixes: e28cfb2bad ("gallivm: handle u8/u16 const loads properly on big-endian.")
(cherry picked from commit c016346b50)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Eric R. Smith
160efe917e mesa: do not unbind general point when different indexed points are deleted
When a buffer is deleted, we have to remove it from all binding points.
We were re-using the code for BindBufferRange for this; however, this
caused the general binding point to be unbound (bound to NULL)
unconditionally, even if a different buffer is bound there. Fix this by
inlining the various bind calls into the delete buffers code.

cc: mesa-stable

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14755
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit fa418f1e73)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Samuel Pitoiset
5f6d1e4b44 radv/meta: fix CmdCopyBufferToImage2() on compute queue with compressed HTILE
Only for partial copies because image stores don't decompress on writes
(ie. HTILE isn't updated by image stores).

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 9f5a20abde)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Karol Herbst
dc8a39037b vtn/opencl: flush denorms for cbrt()
libclc doesn't so we have to. fixes math_brutefore cbrt on Iris.

Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
(cherry picked from commit af954427bf)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
OPNA2608
890ff49038 rocket: Fix printing of rknpu_mem_create.dma_addr
The Linux kernel's __u64 isn't always implemented as a long long, and there's no nice define for printing it like with uint64_t.

(cherry picked from commit 41b9dc3a2c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
OPNA2608
00db096003 vc4: Fix printing of get_tiling.modifier
The Linux kernel's __u64 isn't always implemented as a long long, and there's no nice define for printing it like with uint64_t.

(cherry picked from commit 4c699087d4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
José Expósito
ede2a1ce84 venus: Fix error log on PPC
On the ppc64le architecture error log fail to compile with error:

    ../src/virtio/vulkan/vn_renderer_virtgpu.c: In function ‘virtgpu_ioctl_map’:
    ../src/virtio/vulkan/vn_renderer_virtgpu.c:751:66: error: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long unsigned int’} [-Werror=format=]
    751 |          "mmap failed: gpu_fd=%d, handle=%u, size=%zu, offset=%llu, err=%s",
        |                                                               ~~~^
        |                                                                  |
        |                                                                  long long unsigned int
        |                                                               %lu
    752 |          gpu->fd, gem_handle, size, args.offset, strerror(errno));
        |                                     ~~~~~~~~~~~
        |                                         |
        |                                         __u64 {aka long unsigned int}
    cc1: some warnings being treated as errors

Parse the parameters to fix the failure.

Fixes: a49b7adad8 ("venus: add error log coverage for virtgpu backend")
(cherry picked from commit dd3fe2d671)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
José Expósito
ee5075f221 winsys/amdgpu: Fix userq job info log on PPC
On the ppc64le architecture the macro printing the userq job info fails
to compile with error:

   In file included from ../src/gallium/winsys/amdgpu/drm/amdgpu_cs.cpp:11:
   ../src/gallium/winsys/amdgpu/drm/amdgpu_cs.cpp: In function ‘int amdgpu_cs_submit_ib_userq(amdgpu_userq*, amdgpu_cs*, uint32_t*, unsigned int, uint32_t*, unsigned int, uint64_t*, uint64_t)’:
   ../src/gallium/winsys/amdgpu/drm/amdgpu_cs.cpp:1652:20: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long unsigned int’} [-Werror=format=]
   1652 |          mesa_logi("amdgpu: uq_log: %s:  num_wait_fences=%d  uq_va=%llx  job=%llx\n",
   1653 |                    amdgpu_userq_str[acs->queue_index], userq_wait_data.num_fences, fence_info[i].va,
         |                                                                                    ~~~~~~~~~~~~~~~~
         |                                                                                                  |
         |                                                                                                  __u64 {aka long unsigned int}
   ../src/util/log.h:78:70: note: in definition of macro ‘mesa_logi’
      78 | #define mesa_logi(fmt, ...) mesa_log(MESA_LOG_INFO, (MESA_LOG_TAG), (fmt), ##__VA_ARGS__)
         |                                                                      ^~~
   ../src/gallium/winsys/amdgpu/drm/amdgpu_cs.cpp:1652:20: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 7 has type ‘__u64’ {aka ‘long unsigned int’} [-Werror=format=]
   1652 |          mesa_logi("amdgpu: uq_log: %s:  num_wait_fences=%d  uq_va=%llx  job=%llx\n",
   1653 |                    amdgpu_userq_str[acs->queue_index], userq_wait_data.num_fences, fence_info[i].va,
   1654 |                    fence_info[i].value);
         |                    ~~~~~~~~~~~~~~~~~~~
         |                                  |
         |                                  __u64 {aka long unsigned int}
   ../src/util/log.h:78:70: note: in definition of macro ‘mesa_logi’
      78 | #define mesa_logi(fmt, ...) mesa_log(MESA_LOG_INFO, (MESA_LOG_TAG), (fmt), ##__VA_ARGS__)
         |                                                                      ^~~

Parse the parameters to fix the failure.

Fixes: 2547fd0f59 ("winsys/amdgpu: print userq job info")
(cherry picked from commit 757ae04bd9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Caio Oliveira
99bb93440f brw: Fix cooperative matrix constant sources other than src0
Code was wrongly using src0 to pick the constant value.

Fixes: bf9ad36f2d ("brw: Properly handle cooperative matrices created with constants")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 6b0e29bc77)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Faith Ekstrand
e0682b4317 pan/bi: Don't attempt to fuse AND(ICMP, ICMP) if the AND is swizzled
There might be cases under which we can make this work but they're
tricky at best.  For now, don't even try.

Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 918624174b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Faith Ekstrand
4dd76ad0a1 pan/bi: Run lower_alu_width after opt_algebraic_late
It can generate extract instructions which we expect to be scalar.

Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit deb9244436)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Faith Ekstrand
6eded1a7d0 nir/lower_bool_to_bit_size: Use the correct num_components for conversions
There's a nice little comment here saying we use the same write mask (an
out of date term in NIR) and swizzle but we're no longer actually doing
that.  Depending on nir_builder magic, we may actually generate a scalar
when we really want a vector.  The fix is to use more builder helpers
and just eat the potential copy.

Fixes: 3180656bbc ("nir: don't use nir_build_alu() with incomplete sources")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 711b3358a8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Karol Herbst
c0e5d821e1 rusticl/mesa: only use resource_from_user_memory if the cap is advertised
Fixes some buffer tests on some iris configurations.

Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Tested-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 240bae6b23)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:48 +00:00
Alyssa Rosenzweig
cf716d4586 nir: disable fast-math for lowering conversions
the lowerings for e.g. f2f16_rtp have carefully written sequences using
Infinity. nir_opt_algebraic will stomp right through this. `feq x, inf`
without an exact flag is basically always a bug. Disable fast math here.
Fixes OpenCL CTS test_half on Iris.

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 91550d0709)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Yiwei Zhang
74d8362ebd pan/kmod: drop pan_kmod_bo_check_import_flags validation
The passed flags is always zero on the import paths:
- panfrost_bo_import
- panvk_AllocateMemory
- panvk_GetMemoryFdPropertiesKHR

Fixes: 1c7793ea0b ("panvk: Advertise a HOST_CACHED memory type if we have WC maps")
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit 8d25f9821b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Vinson Lee
f25e220841 freedreno/decode: Fix const correctness in get_tex_count
Fix compiler error:

../src/freedreno/decode/cffdec.c:580:7: error: assigning to 'char *'
from 'const char *' discards qualifiers
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]
  580 |    p = strstr(name, "CONST");
      |      ^ ~~~~~~~~~~~~~~~~~~~~~

glibc now provides C23-style type-generic string functions. strstr
returns const char * when passed a const char * argument. Update p
declaration to const since it's only used for offset calculation.

Fixes: 1ea4ef0d3b ("freedreno: slurp in decode tools")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit bc34a122f3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Khem Raj
bdfae9cc8b glx: fix const qualifier warnings found with C23 glibc support
glibc master has been C23'fying the functions which is resulting errors

Several functions assigned results of bsearch/strstr/strpbrk/memchr to
non-const pointers, triggering -Wincompatible-pointer-types-discards-qualifiers
under clang/gcc with -Werror. Cast bsearch return values where needed and
propagate const correctness for strstr/strpbrk/memchr results.

Removes build failures with strict warning flags without changing behavior.

Signed-off-by: Khem Raj <raj.khem@gmail.com>

[Eric: changed the glxglvnd.c hunk to add the missing `const` instead of casting it away]

Cc: mesa-stable
(cherry picked from commit 268e19378f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Samuel Pitoiset
6febbade40 radv: fix late decompressions for fbfetch with more corner cases
With layers, or custom sample locations for depth.
Found this by inspection.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ce3539b54f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Iago Toral Quiroga
6e899b3eba nir/opt_vectorize_load_store: allow sizes unaligned with high offset for loads
This was added specifically for vectorized stores, so allow for loads.

Without this, the pass will fail to vectorize 2 consecutive 16-bit loads
into a single 32-bit load.

Fixes: 2ed79f80ba ("nir/load_store_vectorize: Skip new bit-sizes that are unaligned with high_offset")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit f6a2d14008)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Tapani Pälli
3442ccdcba anv: skip compressed flag for bo if not supported by modifier
This has not been problem before the compression hint given to kernel
but now that we set it we hit problems when allocating bo if modifier
does not support compression.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14625
Fixes: f91de58818 ("anv: Add support to DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit fc814fa828)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Reilly Brogan
66cd1f224c amd,compiler: fix const errors found with C23 glibc support
In glibc 2.43 the strstr function now propagate const to the output, triggering -Wincompatible-pointer-types-discards-qualifiers
under clang/gcc with -Werror.

Fix two of these cases by adding the const qualifier.

cc: mesa-stable

(cherry picked from commit ece5f671b3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Vinson Lee
a871c42e39 compiler/clc: Fix const correctness in libclc_add_generic_variants
Fix compiler error:

../src/compiler/clc/nir_load_libclc.c:266:13: error: initializing
'char *' with an expression of type 'const char *' discards qualifiers
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]
  266 |       char *U3AS1 = strstr(func->name, "U3AS1");
      |             ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~

glibc now provides C23-style type-generic string functions. strstr
returns const char * when passed a const char * argument. Update U3AS1
declaration to const since it's only used for offset calculation.

Fixes: 4a08ee7ecf ("spirv/libclc: Add generic versions of arithmetic functions")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
(cherry picked from commit 85fd63068e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Christian Gmeiner
cf20e1610c pan/compiler: Fix progress reporting in pan_nir_lower_store_component
lower_store_component() always returns false even though it modifies
NIR instructions (rewrites sources, creates new SSA defs, removes
previous stores). This triggers the "NIR changed but no progress
reported" assertion in nir_shader_intrinsics_pass.

Return true when a store_output or store_per_view_output intrinsic is
processed, since the function always modifies the shader in that case.

Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/274
Cc: mesa-stable
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 4938ad435e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Yiwei Zhang
3235b9a3dc venus: remove obsolete asserts for ANB image creation
Those have long been supported by vn_image_deferred_info_init because of
AHB support. For non-aliased ANB image, those are directly passed from
the platform swapchain create info as well. So we just need to drop the
obsolete asserts to make newer Android platform and ANGLE happy.

Cc: mesa-stable
(cherry picked from commit 091c4f43ff)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Rudi Heitbaum
6de99bcc31 mesa: retain const qualifier from pointer
Since glibc-2.43:

For ISO C23, the functions bsearch, memchr, strchr, strpbrk, strrchr, strstr, wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr that return pointers into their input arrays now have definitions as macros that return a pointer to a const-qualified type when the input argument is a pointer to a const-qualified type.

https://lists.gnu.org/archive/html/info-gnu/2026-01/msg00005.html

Resolves the following warnings:
    src/mesa/glapi/glapi/gen/enums.c: In function '_mesa_enum_to_string':
    src/mesa/glapi/glapi/gen/enums.c:7799:8: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
     7799 |    elt = bsearch(& nr, enum_string_table_offsets,
          |        ^

    ../src/egl/main/egldispatchstubs.c: In function 'FindProcIndex':
    ../src/egl/main/egldispatchstubs.c:52:7: warning: initialization discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
       52 |       bsearch(name, __EGL_DISPATCH_FUNC_NAMES, __EGL_DISPATCH_COUNT,
          |       ^~~~~~~

Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
(cherry picked from commit 1acc96b8cb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Arjob Mukherjee
e8ec898e98 pvr: Fixup for deqp-vk.api 2d.optimal.* conformance
Its no longer an error for depth and stencil formats to have invalid
accumulator format.

Fixes the following tests:
* dEQP-VK.api.info.image_format_properties.2d.optimal.d16_unorm
* dEQP-VK.api.info.image_format_properties.2d.optimal.d24_unorm_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.x8_d24_unorm_pack32

Backport-to: 26.0
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
(cherry picked from commit 58c7437d3a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Samuel Pitoiset
5773c7bda6 radv/meta: fix the key for DCC decompress on compute
This could return the graphics DCC pipeline if it was created before,
and crash or potentially hang the GPU.

Found this while working on in-progress VKCTS coverage.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ad7151f4bf)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Zan Dobersek
aba5409384 tu/kgsl: wait-only submit handling should not ignore sparse bind commands
Commit cf4bd2e412 added a fast path for handling no-command submits to
accommodate a kernel behavior quirk. Sparse support was complete before
that change but landed afterwards, leaving sparse submits that don't have
command buffers but do have sparse bind commands to take that fast path,
leaving the bind commands unhandled. The condition for the fast path is
fixed to address that.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 71ef46717c ("tu/kgsl: Add support for sparse binding")
(cherry picked from commit 5b33ee9f0b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Calder Young
cd67481fd2 anv: Avoid dumping BVH before command buffer is submitted
Fixes a race condition where a BVH will be dumped before its command buffer is
actually submitted if a different command buffer completes between the time the
BVH dump is recorded and the time the command buffer is actually submitted.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Fixes: 1b55f101 ("anv/bvh: Dump BVH synchronously upon command buffer completion")
(cherry picked from commit 95e471e558)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Mel Henning
c87b0a77a1 zink: Emit float controls for preserve_denorms too
Fixes: 6afa1b3bad ("zink: handle denorm preserve execution modes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
(cherry picked from commit 9189a70598)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Tapani Pälli
bb9fef071e iris: set DisableAnyMCTRresponsefix to zero on init
This is to make sure early culling related Wa_16020518922 is enabled
properly.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 331238e44e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Tapani Pälli
e1446659f8 anv: set DisableAnyMCTRresponsefix to zero on init
This is to make sure early culling related Wa_16020518922 is enabled
properly.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14204
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9aaed82543)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Tapani Pälli
3988bebbe9 intel/genxml: add CHICKEN_RASTER_2 with required bit for Xe3
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 61b5e91bba)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Mary Guillemard
3e4b65fc7a nvk: Reenable compression support with nouveau 1.4.2
Now that the small/large pages race is fixed, we can safely enable it
back when the kernel side report 1.4.2 support.

Fixes: f3c53cf66b ("nvk: Disable large pages for now")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit b524bf368e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Aitor Camacho
eefede5549 kk: Fix disabling workaround 4
Fixes: 67d05f71e9 ("kk: Track fragment helper status since Metal does not correctly demote them")

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 29900e8229)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39828>
2026-02-11 14:54:47 +00:00
Erico Nunes
5b7b66e43b Revert "ci: lima farm maintenance"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reverts commit ca1d59d813.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39696>
(cherry picked from commit f3131bc145)
2026-02-11 15:51:45 +01:00
Eric Engestrom
3fe056f178 .pick_status.json: Update to d7814bcad0 2026-02-11 14:21:56 +01:00
Eric Engestrom
4ac24ba7e5 VERSION: bump for 26.0.0-rc3
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
2026-02-04 19:35:07 +01:00
Bernd Kuhls
f83e86c29f blake3: add blake3_neon.c only for little endian archs
Fixes build error on big endian archs:

Build machine cpu family: x86_64
Build machine cpu: x86_64
Host machine cpu family: aarch64
Host machine cpu: cortex-a53
Target machine cpu family: aarch64
Target machine cpu: cortex-a53
[...]
../src/util/blake3/blake3_neon.c:6:2: error: #error "This implementation only supports little-endian ARM."
    6 | #error "This implementation only supports little-endian ARM."

as detected by buildroot autobuilders:
https://autobuild.buildroot.net/results/efd/efd07d97df4e0c1ceb07fc26e17898afef5435b9/build-end.log

For reference:
$ grep -i endian output/build/mesa3d-25.3.4/buildroot-build/cross-compilation.conf
endian = 'big'

Signed-off-by: Bernd Kuhls <bernd@kuhls.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39681>
(cherry picked from commit 248b818407)
2026-02-04 18:39:35 +01:00
Samuel Pitoiset
1e415d1bdf radv: emit pending flushes after late decompressions with fbfetch
If the rendering state is inherited in the secondary, otherwise nothing
wait for the pending flushes after a decompression pass. One more
argument to stop delaying this.

Fixes
dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.*

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39678>
(cherry picked from commit 13c9e529bd)
2026-02-04 18:39:35 +01:00
Samuel Pitoiset
870140c527 radv: disable unordered submits when SQTT queue events are enabled
Otherwise the QueuePresent event is missing and RGP is confused.

Fixes: 82d06b58ad ("radv: use vk_drm_syncobj_copy_payloads")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39158>
(cherry picked from commit 83ca338e37)
2026-02-04 18:39:35 +01:00
Hyunjun Ko
a7d0da012e anv/video: disable encoder on untested platforms
Not enough tested on over Gen12 platforms.
Turns out to be not working on DG2, for example.

Cc: mesa-stable
Closes: #14449

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39676>
(cherry picked from commit d2c24a0d8b)
2026-02-04 18:39:35 +01:00
Loïc Molinari
51ed940bb8 panfrost: Fix clean_pixel_write_enable forced check for AFBC
Clean tiles must actually be written back for AFBC buffers (color,
z/s) when either one of the effective tile size dimension is smaller
than the superblock dimension. This commit fixes the current check
which compares the effective tile size to the superblock size.

Fixes: 762a0f4133 ("panfrost: Add the concept of render block")
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
(cherry picked from commit 098b69a05c)
2026-02-04 18:39:35 +01:00
Valentine Burley
db6cbb8410 tu: Fix memory leak of patchpoints_ctx in dynamic rendering
tu_CmdBeginRendering was unconditionally allocating a new
patchpoints_ctx. When resuming a render pass chain, this overwrote the
existing context from the suspended pass, leaking it and all associated
FDM patchpoints.

Fixes: 0dd06c74d6 ("tu: Fix FDM patchpoint memory leak")
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39639>
(cherry picked from commit d4ad50752f)
2026-02-04 18:39:35 +01:00
Konstantin Seurer
b32cd7c265 radv/bvh: Make sure internal nodes are collapsed when possible
Avoiding NaNs should have the same effect but it's good practice to not
rely on float OPs for correctness.

Fixes: 95a89f7 ("radv: Report smaller bvh sizes when possible")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
(cherry picked from commit 24a1e3d8c2)
2026-02-04 18:39:35 +01:00
Konstantin Seurer
4b86b5e53d vulkan: Make sure no NaNs end up in the BVH
Fixes: 2032268 ("vulkan: Avoid NAN in the IR BVH")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
(cherry picked from commit 60c1e4e3e6)
2026-02-04 18:39:35 +01:00
Konstantin Seurer
cd1a3b7482 radv/rra: Fix nullptr dereference
cc: mesa-stable

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
(cherry picked from commit 2f3a9c10f4)
2026-02-04 18:39:35 +01:00
Lucas Stach
4cae263356 etnaviv: idle the pipe before flushing texture caches
As seen in the Vivante kernel driver function gckHARDWARE_Flush(),
GPUs without gcvFEATURE_TEX_CACHE_FLUSH_FIX, which translates to
all GPUs before halti5, need a full stall of the GPU pipeline
before flushing the texture caches.

This fixes sporadic GPU hangs observed in use-cases where texture
data updates are intermixed with draws without any state changes
that might necessitate a stall.

Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39673>
(cherry picked from commit 643ba9a784)
2026-02-04 18:39:35 +01:00
Emma Anholt
4107091cfe ci/tu: Clear stale xfails from the nightlies.
Fixes: 63243bcc3e ("tu: Fix TU_DRAW_STATE_VB size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39568>
(cherry picked from commit c0a4d3ef1e)
2026-02-04 18:39:35 +01:00
Emma Anholt
26a8c34ff4 lima/ci: Remove erroneous skips.
When you get UnexpectedResult(skip), that means take your xfail out
because it's now skipping.  Which is the fix, instead of "take the xfail
out and add it to manual skips".

Fixes: e54440d15e ("Uprev Piglit to a3826de3c26a279599d15b018a9a3e75ca46f4f8")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39568>
(cherry picked from commit 42e17a948e)
2026-02-04 18:39:35 +01:00
Juan A. Suarez Romero
86f442db75 broadcom/cle: bump up gen version for v3d
The generation version for V3D XML package was marked as 3.3, but
actually we removed all the code supporting this generation, and the
generations we support now are from 4.2 onwards.

So we bump up the generation version.

Fixes: 9c4829473a ("broadcom/cle: remove v33 and v41 from xml definition")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39577>
(cherry picked from commit 5a85b3d9f4)
2026-02-04 18:39:35 +01:00
Qiang Yu
67ad90c108 radeonsi: fix mesh shader outputs kill
Mesh shader uses store per vertex output for point size
and store per primitive output for layer id.

This fixes gpu-ratemeter run slow for kill point size
and layer id cases when mono shader is used which expect
to kill these outputs.

Also gather fragment shader per primitive input info
to kill mesh shader per primitive output.

Fixes: e6e21dfbf2 ("radeonsi: kill outputs for mesh shader")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39644>
(cherry picked from commit f20cd07e21)
2026-02-04 18:39:34 +01:00
Nanley Chery
a7ace43e9a anv: Don't set the display flag on WSI blit sources
These images are never used with scanout hardware.

Fixes: 2c00b7d1e6 ("anv: flag WSI images as scanout images for ISL")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>
(cherry picked from commit c429d7479e)
2026-02-04 18:39:34 +01:00
Nanley Chery
d6d5071a84 anv: Treat non-WSI PRESENT_SRC as TRANSFER_SRC
For non-WSI images, explicitly map VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL in anv_layout_to_aux_state().

Before this patch, the function passed PRESENT_SRC into
vk_image_layout_to_usage_flags() and got a return value of 0 from it
(that function expects that layout to be explicitly handled by the
caller). This caused the logic dependent on the return value to be
unreliable.

Fixes: c5cad407f8 ("anv: handle non-wsi images in anv_layout_to_aux_state")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>
(cherry picked from commit f616d4fb2a)
2026-02-04 18:39:34 +01:00
Nanley Chery
7571128959 anv: Fix clear state of WSI blit sources during presentation
On gfx12+, this fixes assert failures in hybrid GPU scenarios.

Fixes: 811c413f98 ("anv: Don't return the Xe2+ fast-clear type early")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39618>
(cherry picked from commit 476f461ce7)
2026-02-04 18:39:34 +01:00
Nanley Chery
f4e0da9e07 anv: Don't return the Xe2+ fast-clear type early
Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll
need to make more use of the function for some MCS changes in later
commits.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 811c413f98)
2026-02-04 18:39:34 +01:00
Patrick Lerda
14615a0add r600: improve vs_as_ls switch reliability
This change updates the vs_as_ls switch logic to make it
reliable. It resets the dirty flag when the switch is
happening. It uses also evergreen_emit_vs_constant_buffers()
to try to update again some of the states which could be
lost otherwise.

This change fixes some "flakes". These tests needed previously
to be executed twice to set the hardware in the proper state
for the test to pass. It also fixes the main issue of the
texture_view.view_sampling test.

This change was tested on palm and cayman. Here are the tests
which are now utterly fixed:
khr-gl4[3-6]/stencil_texturing/functional: fail pass
khr-gl4[4-6]/texture_cube_map_array/texture_size_tesselation_ev_sh: fail pass
khr-gles31/core/texture_cube_map_array/texture_size_tesselation_ev_sh: fail pass
khr-glesext/texture_cube_map_array/texture_size_tesselation_ev_sh: fail pass

Fixes: 25f96c1120 ("r600: hook up constants/samplers/sampler view for tessellation")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39269>
(cherry picked from commit 9c5e15e6f5)
2026-02-04 18:39:34 +01:00
Christian Gmeiner
8f6282d846 meson: Restore .clang-format for ninja clang-format target
The empty .clang-format file in the project root is required for meson
to generate the clang-format target. It was accidentally deleted.

Fixes: efe60d2940 ("intel: remove unused show_shader_stage debug option")

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39648>
(cherry picked from commit 449261b6ba)
2026-02-04 18:39:34 +01:00
Mel Henning
05889250e6 nvk: Report additional host_image_copy layouts
Fixes dEQP-VK.image.host_image_copy.properties.properties
on VK CTS 1.4.5

Fixes: d5df263ac9 ("nvk: Enable VK_EXT_host_image_copy")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39634>
(cherry picked from commit 0e9d29f518)
2026-02-04 18:39:34 +01:00
Natalie Vock
872c536a88 radv/rt: Fix discardable attributes on chit and traversal shaders
It was incorrect to mark chit/miss arguments as discardable without
the equivalent in the traversal shader. Also, tail calls with modified
parameters that aren't marked discardable are incorrect.

This could lead to random corruption by clobbering parameter values
across two levels of nested calls: A Raygen shader calls traversal,
expecting e.g. the ray tMax parameter to be preserved. Traversal
overwrites the parameter's register with the hit t and tail-calls chit,
which immediately returns to raygen. Now the raygen shader still has the
clobbered tMax (which is actually the ray hit t) - if it calls traversal
multiple times, the second traversal iteration may use the previous
ray's hit t as tMax instead of the intended value.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 3275be503c)
2026-02-04 18:39:34 +01:00
Natalie Vock
8ab3b18cd7 radv/rt: Fix some tail-call compatibility checks
There were two issues here:
1. Tail calls where the tail-callee receives modified parameters are
hazardous and only work if the parameter is return or discardable.
Otherwise, the caller of the function that executes the tail-call may
not expect some of the parameters to be clobbered.
2. There was also an indexing confusion with the call instruction vs.
call signature parameters. The call instruction has not been adapted
to the new lowered signatures, where the system args are prepended. To
make things clearer, split the loop into two, one iterating over
parameters in the call signature and one for parameters of the call
instruction.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 0d7705c206)
2026-02-04 18:39:34 +01:00
Natalie Vock
0753012766 aco: Don't exclude discardable parameters from register preservation
The original semantic of discardable parameters was "okay, nothing
actually uses this parameter, feel free to clobber it", but we were
only using it with tail calls from a function without discardable
parameters, which was broken.

Instead, slightly change the use-case and utilize the "discardable"
attribute to mark parameters that the callee will clobber in a tail
call. This makes doing tail calls safe when the tail callee receives a
modified set of parameters.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit ad23e02a28)
2026-02-04 18:39:34 +01:00
Natalie Vock
7b1c9adfea radv/rt: Refactor shader group stack size calculation to include traversal stack
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39579>
(cherry picked from commit 62254ab0be)
2026-02-04 18:39:34 +01:00
Mel Henning
d46967fafb nvk: Initialize SET_ALPHA_TO_COVERAGE_OVERRIDE
This matches the initialization that the proprietary driver does.

Fixes dEQP-VK.query_pool.discard.*.alpha_to_coverage* on vk cts 1.4.5

Cc: mesa-stable
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39621>
(cherry picked from commit 8d7f14620b)
2026-02-04 18:39:34 +01:00
Konstantin Seurer
55043ae265 vulkan: Limit the number of LBVH invocations
Fixes: 0817551 ("vulkan: Handle inactive primitives with LBVH builds")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39569>
(cherry picked from commit 529c83a134)
2026-02-04 18:39:34 +01:00
Valentine Burley
102b3d8008 tu: Handle VkDrmFormatModifierPropertiesList2EXT
Expose DRM format modifiers via VkDrmFormatModifierPropertiesList2EXT.
VVL is one notable user.

This is required for VK_EXT_image_drm_format_modifier when
VK_KHR_format_feature_flags2 is supported.

Cc: mesa-stable
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39600>
(cherry picked from commit e185f40fc3)
2026-02-04 18:39:34 +01:00
Karol Herbst
79f909808c clc: fix compile compatability with LLVM-22
See d090311aa7

Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39374>
(cherry picked from commit dc03f94e07)
2026-02-04 18:39:34 +01:00
Karol Herbst
ca428e3b3c nir: fix nir_fixup_is_exported for LLVM-22
Starting with LLVM-22 we won't see the kernel wrapper anymore, and this
is a trivial fix to get around this.

See: 5458eb2511

Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39374>
(cherry picked from commit 24d20df3d6)
2026-02-04 18:39:34 +01:00
Karol Herbst
84566763c2 clc: enable generic address space and seq_cst and device scope atomic features
This is going to be required with LLVM-22.

See 423bdb2bf2

Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39374>
(cherry picked from commit 6eda573a8a)
2026-02-04 18:39:33 +01:00
Karol Herbst
05c679d37b clc: support some atomic and generic address space features
Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39374>
(cherry picked from commit 01e1392139)
2026-02-04 18:39:33 +01:00
Karol Herbst
c6f8d2ef92 clc: reorder headers to fix compilation errors due to UNUSED
Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39374>
(cherry picked from commit 7f9a7ed553)
2026-02-04 18:39:33 +01:00
Lars-Ivar Hesselberg Simonsen
4a3a3a7d84 panfrost/bi: Fix unbound texel buffers
In case of texel buffers that are read in the shader, but not bound by
the application, the current implementation would incorrectly try to
read from non-existent buffers.

To ensure this does not happen, this change sets the format for any
unbound attributes to CONST_0000, which will kill any actual
reads/writes and always return zeroes.

This fixes the following two tests:
- spec@arb_shading_language_420pack@active sampler conflict
- spec@arb_texture_buffer_object@render-no-bo

Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39431>
(cherry picked from commit aec8132b8b)
2026-02-04 18:39:33 +01:00
David Rosca
525cce7c2a radv/video: Fix maxActiveReferencePictures for H265 decode
Also change to use H265 constant for maxDpbSlots (both values for H264 and H265
are the same).

Fixes: ee535aa039 ("radv: video: rework maxActiveReferenceSlot/MaxDpbSlots")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39609>
(cherry picked from commit 7607aeefa6)
2026-02-04 18:39:33 +01:00
Eric Engestrom
5ec65a4378 Revert "meson: static link spirv-tools for darwin"
This reverts commit f21d0f2cbe.

This causes issues with other platforms trying to do static builds.

A better option is for everyone to use `meson setup --prefer-static`.

Fixes: f21d0f2cbe ("meson: static link spirv-tools for darwin")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14751
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39613>
(cherry picked from commit 342a5ba44e)
2026-02-04 18:39:33 +01:00
Samuel Pitoiset
a5d92b79fa radv: fix tracking of pipelines used in secondaries
This is just wrong if the secondary uses ESO because the emitted
pipelines would be NULL in the secondary, but if the app re-binds
the same pipeline in the primary it would consider it as already
emitted. A sequence like this would break:

CmdBindPipeline(compute)
CmdDispatch()
CmdExecuteCommands() --> with ESO compute
CmdBindPipeline(compute)
CmdDispatch()

This tracking is probably useless anyways because it's unlikely that
apps will rebind the same pipeline right after CmdExecuteCommands() but
let's keep it because this is a bugfix.

Fixes
dEQP-VK.api.command_buffers.pipeline_shader_object_mix_with_secondaries.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39587>
(cherry picked from commit 9ad02b5724)
2026-02-04 18:39:33 +01:00
Samuel Pitoiset
7eb9d75017 radv: zero-initialize image view objects
Mostly to make sure that color/depth descriptors are zero-initialized
in case applications are missing the usage flags. In this case, they
will be considerd as null descriptors.

This hides the issue in
https://gitlab.freedesktop.org/mesa/mesa/-/issues/14637
but the real fix has to be in the Steam Overlay.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39585>
(cherry picked from commit fa4da581c6)
2026-02-04 18:39:33 +01:00
Hyunjun Ko
e73e4e1554 anv/video: Compute AV1 tile positions internally
The pMiColStarts/pMiRowStarts arrays from applications may have
incorrect units. Instead of using them directly, compute the tile
start positions in superblock units internally based on the tile
dimensions.

Cc: mesa-stable
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>
(cherry picked from commit 8e9fec8e40)
2026-02-04 18:39:33 +01:00
Hyunjun Ko
162ef4da2c anv/video: fix a typo in Vulkan AV1 decoding.
Cc: mesa-stable
Fixes: e510efed05d("anv: support in-loop super resolution for AV1 decoding")
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39471>
(cherry picked from commit 8004f46466)
2026-02-04 18:39:33 +01:00
Rhys Perry
d3a67ee1d9 radv: fix when incomplete rt pipeline libraries are loaded from cache
It might be that the radv_pipeline_cache_lookup_nir_handle() in
radv_ray_tracing_pipeline_cache_search() fails but we will later need the
NIR. If rt_stages[i].shader was non-NULL, then we would not have created
the NIR.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.2
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38263>
(cherry picked from commit 89eefdcadb)
2026-02-04 18:39:33 +01:00
Olivia Lee
dc140f5500 hk: fix hk_passthrough_gs_key size computation
The non-dynamic members of xfb_info are already included in
sizeof(hk_passthrough_gs_key), so adding nir_xfb_info_size counts them
twice. Because of this we were including uninitialized memory in the key
in hk_handle_passthrough_gs, which is undefined behavior.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39574>
(cherry picked from commit d6745b358d)
2026-02-04 18:39:33 +01:00
Tapani Pälli
d89eceaa2c anv: route clear operations on compute to companion
This fixes bunch of cts tests hitting issues when attempting
anv_image_mcs_op with compute.

Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39581>
(cherry picked from commit 85978ccd28)
2026-02-04 18:39:33 +01:00
Zan Dobersek
be9d5d6508 tu: allocate transient attachments used for LRZ
When proceeding with rendering, any transient attachment that will be used
as LRZ buffer should also be allocated. With GMEM rendering, these
attachments otherwise remained unloaded and subsequent LRZ clears produced
GPU faults.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 764b3d9161 ("tu: Implement transient attachments and lazily allocated memory")
Fixes: #14604
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39535>
(cherry picked from commit b6a049ea4b)
2026-02-04 18:39:33 +01:00
Mike Blumenkrantz
5dddf74a34 ntv: emit ViewIndex with flat for fragment stage
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39606>
(cherry picked from commit 999aaac12e)
2026-02-04 18:39:33 +01:00
Nick Hamilton
d7a47c1627 pvr: Fix the isp samples per tile calculation
The samples per tile calculation was incorrect for sample count 4 and 8.

Fix:
dEQP-VK.pipeline.monolithic.multisample.std_sample_locations.draw.depth.samples_4.*
dEQP-VK.pipeline.monolithic.multisample.std_sample_locations.draw.stencil.samples_4.*

Backport-to: 26.0

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39580>
(cherry picked from commit 9f9788330e)
2026-02-04 18:39:33 +01:00
Lionel Landwerlin
844a79b474 vulkan/wsi/direct: remove VkDisplay created from GetDrmDisplayEXT on ReleaseDisplayEXT
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39556>
(cherry picked from commit 1112c1461d)
2026-02-04 18:39:33 +01:00
Georg Lehmann
1f5f2cc952 nir/opt_algebraic: use correct syntax to create exact fsat
Fixes: 3b06824e4c ("nir/opt_algebraic: optimize some post peephole select patterns")

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39586>
(cherry picked from commit d8ef28671d)
2026-02-04 18:39:33 +01:00
Tomeu Vizoso
8111b41eb4 dril: don't build a rocket_dri.so
As Rocket has no graphics capability.

Fixes: 5b829658f7 ("rocket: Initial commit of a driver for Rockchip's NPU")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38532>
(cherry picked from commit a5daecafd3)
2026-02-04 18:39:32 +01:00
Eric Engestrom
1c4642663b .pick_status.json: Mark a66d19b691 as denominated 2026-02-04 18:39:32 +01:00
Eric Engestrom
7c3ff4cecc .pick_status.json: Update to 248b818407 2026-02-04 18:39:32 +01:00
Eric Engestrom
42f03572d1 VERSION: bump for 26.0.0-rc2
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
2026-01-28 17:41:52 +01:00
Ella Stanforth
8808ec23fa pvr/csbgen: fix packing multiple addresses
Cc: mesa-stable
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39231>
(cherry picked from commit 7be87ca82a)
2026-01-28 16:18:00 +01:00
Nick Hamilton
28f3e82f2d pco: Fix for atomic operations on an image buffer
Within the driver buffers are treated as 2D as sampling them as 1D
will run into HW restrictions on max size.

The compiler does the same however for atomic image ops the address
is manually calculated and doing this via the 2D path leads to
incorrect offsets.

The fix is to treat buffers as 1D for atomic ops which calculates
the correct offsets for the operations.

Fix deqp:
dEQP-VK.image.atomic_operations.add.buffer.*
dEQP-VK.image.atomic_operations.and.buffer.*
dEQP-VK.image.atomic_operations.compare_exchange.buffer.*
dEQP-VK.image.atomic_operations.dec.buffer.*
dEQP-VK.image.atomic_operations.exchange.buffer.*
dEQP-VK.image.atomic_operations.inc.buffer.*
dEQP-VK.image.atomic_operations.max.buffer.*
dEQP-VK.image.atomic_operations.min.buffer.*
dEQP-VK.image.atomic_operations.or.buffer.*
dEQP-VK.image.atomic_operations.sub.buffer.*
dEQP-VK.image.atomic_operations.xor.buffer.*

Fixes: 6dc5e1e109 ("pco: fully support Vulkan 1.2 image atomics")

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39521>
(cherry picked from commit 079377c767)
2026-01-28 16:18:00 +01:00
Olivia Lee
6f2d97ef41 Revert "panvk: advertise VK_EXT_primitives_generated_query on v10+"
This reverts commit 6eadcaa851.

VK_EXT_primitives_generated_query has a dependency on
VK_EXT_transform_feedback, which we do not implement yet. This is
breaking the android CTS. It will be reenabled once transform feedback
is in.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39547>
(cherry picked from commit 4959f45e99)
2026-01-28 16:17:59 +01:00
Iván Briano
cb8a069e24 brw: fix local_invocation_index with quad derivaties on mesh/task shaders
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.

Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.

Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*

Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
(cherry picked from commit 5b48805b42)
2026-01-28 16:17:59 +01:00
Georg Lehmann
8d9349e75b aco: disable DPP for rev integer subs and shifts
It is not documented anywhere, but at least on gfx12 and gfx10.3
DPP is applied to src1 instead of src0.
This might be useful for shifts, but to be safe just disable DPP
completely for now.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14739

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
(cherry picked from commit 140ca3bb50)
2026-01-28 16:17:59 +01:00
Georg Lehmann
6553c4ce40 aco: add a helper function for non supported DPP opcodes
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
(cherry picked from commit 8e99bf5380)
2026-01-28 16:17:59 +01:00
Eric Engestrom
e68f96eb1f nir/meson: fix cpp_args of nir_opt_algebraic_pattern_tests
Fixes: 4c30c44b75 ("nir: Generate unit tests for nir_opt_algebraic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39550>
(cherry picked from commit d12e3454e6)
2026-01-28 16:17:59 +01:00
Nanley Chery
c2eca1a1cc anv: Fix the fast clear type for FCV writes
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.

Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit ce196c9de5)
2026-01-28 16:17:59 +01:00
Nanley Chery
f3db65d95e anv: Update predicated resolve documentation
* Don't mention gfx7-8 due to the hasvk split.
* Account for the array of clear colors.

Fixes: 0e6b132a75 ("anv: Access more colors in fast_clear_memory_range")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit e7854d06a5)
2026-01-28 16:17:59 +01:00
Nanley Chery
943fd8152a iris: Use the CLEAR state on Xe2+ for MCS
On Xe2+, HSD 14011946253 and the related documents explain that MCS
still only supports a single clear color.

Fixes: df006bba02 ("iris: Update aux state for color fast clears (xe2)")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 6c6b2d8f30)
2026-01-28 16:17:59 +01:00
Nanley Chery
f3adaccb4b iris: Set missing flags on clear color changes
When changing the clear color without a fast clear, use dirty bits to
ensure that surfaces with inline clear colors are updated and that
partial resolves are done as needed.

Remove the flags at the bottom of fast_clear_color() as
blorp_fast_clear() already sets them for us.

Fixes: 64d861b700 ("iris: Skip some fast-clears even on color changes")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit 3b642f7456)
2026-01-28 16:17:59 +01:00
Nanley Chery
a680c20d40 intel/isl: Fix QPitch of arrayed MCS
From RENDER_SURFACE_STATE::AuxiliarySurfaceQPitch on BDW+,

   This field must be set to an integer multiple of the Surface
   Vertical Alignment

Accomplish this by aligning the height of each MCS layer to main
surface's vertical alignment. Prevents the following test group from
failing on Xe2 when a future commit enables multi-layer fast-clears in
anv:

   dEQP-VK.api.image_clearing.*.
   clear_color_attachment.multiple_layers.
   *_clamp_input_sample_count_*

The main test I used to debug this:

   dEQP-VK.api.image_clearing.core.
   clear_color_attachment.multiple_layers.
   a8b8g8r8_unorm_pack32_64x11_clamp_input_sample_count_2

Backport-to: 25.3
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
(cherry picked from commit eb4a581e44)
2026-01-28 16:17:59 +01:00
Mel Henning
d20d30442c nvk: Disable large pages for now
Reviewed-by: Mary Guillemard <mary@mary.zone>
Fixes: cabfdb4404 ("nvk: Enable compression")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39364>
(cherry picked from commit f3c53cf66b)
2026-01-28 16:17:59 +01:00
Georg Lehmann
7e42c6e949 aco: fix demote in header of single iteration loop
The control is not divergent before a divergent break in a single iteration loop,
but we already pushed the loop mask on the stack.

Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14733
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39528>
(cherry picked from commit 4b1996b1c7)
2026-01-28 16:17:59 +01:00
Tapani Pälli
41026e14f9 blorp: fix asserts hit with msaa blorp blits on xe3
Tested on PTL, fixes various copy_and_blit tests that utilize compute
after ab9d3528dc that exposed this to them.

Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39548>
(cherry picked from commit bb84773c81)
2026-01-28 16:17:59 +01:00
Caterina Shablia
174aa7ed66 panvk: fix sparse image non-opaque binds
I have no idea how this passed CTS.

Fixes: 5326c451 ("panvk/csf: implement sparse image non-opaque binds")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39546>
(cherry picked from commit a3ec5ece8b)
2026-01-28 16:17:59 +01:00
Samuel Pitoiset
362faeb15e radv: add a workaround for a synchronization bug in Strange Brigade Vulkan
This game has broken synchronization reported by VVL and it indeed
doesn't wait for idle right before present. Workaround this by
injecting a full barrier (easier than rewriting the dep struct).

This only applies to the Vulkan backend.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14705
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39480>
(cherry picked from commit 14d3fb5f1b)
2026-01-28 16:17:59 +01:00
Samuel Pitoiset
33fbf9bf61 radv: fix applying radv_ssbo_non_uniform=true for Crysis 2/3 remastered
DX11 games that use Vulkan interop for RT with a broken and too generic
app/engin name. This is very specific to these two games.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14718
Fixes: 56813236f4 ("radv: use app names instead of exec name for shader based drirc workarounds")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39518>
(cherry picked from commit d679236e09)
2026-01-28 16:17:59 +01:00
Rob Clark
cda3f42323 freedreno/a6xx: Better program state size calc
Most of the time we were significantly over-allocating the size of
program stateobjs.  Except when the shader had a very large # of
immediates, in which case we were under-allocating (and crashing).

Fixes: 598928d7e7 ("nir/loop_analyze: determine whether all control flow gets eliminated upon loop unrolling")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14731
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39545>
(cherry picked from commit 670ded35c1)
2026-01-28 16:17:59 +01:00
Konstantin Seurer
3ef0b4b27a vulkan: Avoid NAN in the IR BVH
Build and encoding stages should be able to assume that AABBs don't have
NANs. This commit covers all possible sources of NAN.

Fixes: 091b43b ("radv: Use HPLOC for TLAS builds")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14696
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39508>
(cherry picked from commit 20322687e0)
2026-01-28 16:17:59 +01:00
Konstantin Seurer
1f1da9bc5a vulkan: Handle inactive primitives with LBVH builds
cc: mesa-stable

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39378>
(cherry picked from commit 0817551f00)
2026-01-28 16:17:59 +01:00
Nanley Chery
0d3857c832 blorp: Fix Tile64 clear redescription assertion
Prevent assert failures in a future commit where Tile64 will be selected
more often.

Fixes: 42ef23ecd1 ("intel/blorp: Don't redescribe some Tile64 clears")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
(cherry picked from commit 6fc0e5c0aa)
2026-01-28 16:17:59 +01:00
Nanley Chery
cec72c7a29 intel/isl: Fix miptail selection for compressed textures
When determining if an LOD can fit within a miptail, we must minify in
pixel space and then convert to elements.

Prevents the following test case from failing when Yf is force-enabled:

   dEQP-VK.image.texel_view_compatible.graphic.extended.3d_image.texture_read.astc_8x5_srgb_block.r32g32b32a32_uint

Fixes: 46f45d62d1 ("intel/isl: Start using miptails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
(cherry picked from commit add742fca6)
2026-01-28 16:17:59 +01:00
Mike Blumenkrantz
e2bf4b9007 ntv: emit demote extension/capability when emitting demote
this is cleaner and more accurate

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39540>
(cherry picked from commit a842e641d9)
2026-01-28 16:17:59 +01:00
Mel Henning
03c90bcd1f nvk: Ignore meta ops in occlusion queries
Fixes: 052bbd65c9 ("nvk: Implement pipeline statistics and occlusion queries")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39510>
(cherry picked from commit e32bfc5efe)
2026-01-28 16:17:59 +01:00
Faith Ekstrand
e8f33e8ffb nvk: Enable ZPASS_PIXEL_COUNT in draw_state_init()
Fixes: 052bbd65c9 ("nvk: Implement pipeline statistics and occlusion queries")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39510>
(cherry picked from commit c081ab864f)
2026-01-28 16:17:59 +01:00
Patrick Lerda
4a1133e769 r600: update cubearray imagesize calculation
The previous method to calculate imageSize().z was
incorrect for a cubearray view.

This change was tested on palm and cayman. Here is the test fixed:
spec/arb_texture_view/rendering-layers-image/layers rendering of imagecubearray: fail pass

Fixes: 6c1432f0be ("r600/eg: fix cube map array buffer images.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39063>
(cherry picked from commit 0b8d8f2b17)
2026-01-28 16:17:59 +01:00
Benjamin Cheng
4e1f5fda4a radv/video: Use a more reliable way of computing tile sizes
Some apps (old FFmpeg, contemporary CTS) send down pMi{Col,Row}Starts in
SB units, not MI units. Instead of dependening on those values which
could be unreliable, derive the tile sizes in SB using other parameters.

Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39492>
(cherry picked from commit c10ebb0fda)
2026-01-28 16:17:59 +01:00
Patrick Lerda
fd0ec1af2b r600: fix rv770 clamp to max_texel_buffer_elements
This change fixes the clamp to max_texel_buffer_elements
issue related to rv770 and older gpus.

Here are the tests fixed on rv770:
spec/arb_texture_buffer_object/texture-buffer-size-clamp/r8ui_texture_buffer_size_via_sampler: fail pass
spec/arb_texture_buffer_object/texture-buffer-size-clamp/rg8ui_texture_buffer_size_via_sampler: fail pass
spec/arb_texture_buffer_object/texture-buffer-size-clamp/rgba8ui_texture_buffer_size_via_sampler: fail pass

Fixes: 1a441ad5cb ("r600: clamp to max_texel_buffer_elements")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39385>
(cherry picked from commit afcead9158)
2026-01-28 16:17:58 +01:00
Patrick Lerda
161f3c2144 r600: make vertex r10g10b10a2_sscaled conformant on palm and beyond
This is a gl4.3 issue very similar to e8fa3b4950.

The mode r10g10b10a2_sscaled processed as vertex on palm at the
hardware level doesn't follow the current standard. Indeed, the .w
component (2-bits) is not calculated as expected. The table below
describes the situation.

This change fixes this issue by adding two gpu instructions at
the vertex fetch shader stage. An equivalent C representation and
a gpu asm dump of the generated sequence are available below.

.w(2-bits)	expected	palm		cypress
0		 0		0		 0
1		 1		1		 1
2		-2		2		-2
3		-1		3		-1

w_out = w_in - (w_in > 1. ? 4. : 0.);

0002 00000024 A0040000  ALU 2 @72
 0072 801F2C0A 600004C0     1 w:     SETGT*4                __.w,  R10.w, 1.0
 0074 839FCC0A 61400010     2 w:     ADD                    R10.w,  R10.w, -PV.w

Note: cypress returns the expected value, and does not need
this correction.

This change was tested on palm, barts and cayman. Here are the tests fixed:
khr-gl4[3-6]/vertex_attrib_binding/basic-input-case6: fail pass
khr-gles31/core/vertex_attrib_binding/basic-input-case6: fail pass

Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38849>
(cherry picked from commit 2ed761021f)
2026-01-28 16:17:58 +01:00
Patrick Lerda
88ae449dbc r600: fix rv770 dot4 operations
Using a PV register which is not PV.x, after a dot4 operation,
does not work on rv770. Anyway, this does work on evergreen
but this is not documented.

This change updates this behavior for all the r600 gpus
which fixes the issue on rv770. It adds max4 which has the
same requirement in the case of max4 being implemented.

Here are some of the affected tests on rv770:
piglit/bin/fp-abs-01 -auto -fbo
glcts --deqp-case=KHR-GL31.buffer_objects.triangles
piglit/bin/shader_runner generated_tests/spec/glsl-1.10/execution/built-in-functions/fs-distance-vec2-vec2.shader_test -auto -fbo

Fixes: 942e6af40b ("r600/sfn: use PS and PV inline registers when possible")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39101>
(cherry picked from commit da1108dcc4)
2026-01-28 16:17:58 +01:00
Patrick Lerda
3231523878 r600: fix cayman msaa shading behavior
The functionality was working properly at glMinSampleShading(0.)
and glMinSampleShading(1.). The issue was with the intermediary
values. This change makes this function compatible with the
evergreen setup.

Note: this was one of the few functionalities which were working
properly on evergreen but not on cayman.

Here are the tests fixed:
spec/arb_sample_shading/samplemask 4 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 4/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 6/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8 all/0.500000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.250000 partition: fail pass
spec/arb_sample_shading/samplemask 8/0.500000 partition: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_rbo_8: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_4: fail pass
deqp-gles31/functional/shaders/sample_variables/sample_mask_in/bit_count_per_two_samples/multisample_texture_8: fail pass

Fixes: f7796a966d ("radeonsi: add basic code for overrasterization")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38615>
(cherry picked from commit d5d844bfc4)
2026-01-28 16:17:58 +01:00
Georg Lehmann
6303313da0 aco/optimizer: fix parsing salu p_insert as shift
Fixes: 88f7e3fff3 ("aco/optimizer: parse pseudo alu instructions")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412>
(cherry picked from commit ba73792de0)
2026-01-28 16:17:58 +01:00
Rhys Perry
ca22a66dd9 aco/insert_fp_mode: remove incorrect assertion
This can happen if a loop has no continues, and the later code should work
fine in this situation.

This fixes war_thunder/0013a69e097b2471 on navi21.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b9d28ab9b ("aco/insert_fp_mode: insert fp mode in reverse")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39481>
(cherry picked from commit e59a0df302)
2026-01-28 16:17:58 +01:00
Zan Dobersek
cfdaa05349 tu: handle DS_DEPTH_BOUNDS_TEST_BOUNDS state under TU_DYNAMIC_STATE_RB_DEPTH_CNTL
MESA_VK_DYNAMIC_DS_DEPTH_BOUNDS_TEST_BOUNDS state should be emitted as part
of TU_DYNAMIC_STATE_RB_DEPTH_CNTL along with other depth state, and not as
part of dynamic stencil state.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 979cf7bac0 ("tu: Merge depth/stencil draw states")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39323>
(cherry picked from commit 3cb4776ede)
2026-01-28 16:17:58 +01:00
Sushma Venkatesh Reddy
6c6ed2a9e6 brw: Use lookup tables for Gfx12+ 3src type encoding/decoding
The previous Gfx12+ implementation using bit masking is failing for FP8
types, so replacing with explicit lookup tables.
For float types, the encoding now aligns with brw_data_type_float, ensuring
correct behavior for DPAS and other 3-source instructions.

Fixes: d1d4e3d530 ("brw: Add EU assembler support for float8")

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39448>
(cherry picked from commit 0ce4e8ba6f)
2026-01-28 16:17:58 +01:00
Calder Young
0148f7f746 Revert "anv,brw: Allow multiple ray queries without spilling to a shadow stack"
This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.

Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
(cherry picked from commit 895ff7fe92)
2026-01-28 16:17:58 +01:00
Rob Clark
14887b7f03 freedreno/lrz: Correct lrz fc layout for gen8
Fixes: 14a23e8b3e ("freedreno/lrz: Add gen8 lrz layout support")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39375>
(cherry picked from commit 1d715662de)
2026-01-28 16:17:58 +01:00
Gurchetan Singh
98afd0c2f7 gallium: fix sometimes-uninitialized warning
Otherwise:

gallium/auxiliary/gallivm/lp_bld_nir_soa.c:2394:7:
 error: variable 'opname' is used uninitialized whenever switch default is taken

is observed.

Reviewed-by: @LingMan
Fixes: 12bceb228a ("gallivm: let reduce ops use llvm intrinsics")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39418>
(cherry picked from commit 0f582b0268)
2026-01-28 16:17:58 +01:00
Danylo Piliaiev
ca25229f90 tu: Fix typo in min bounds calculation of FDM scissors
Fixes: fec372dfa5 ("tu: Implement FDM viewport patching")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39461>
(cherry picked from commit 1d6fe66989)
2026-01-28 16:17:58 +01:00
Rob Clark
4aa5731f09 freedreno: Force single wavesize if double threadsize is unsupported
Turns out ir3 isn't enforcing this itself.

Fixes: c323848b0b ("ir3, tu: Plumb through support for per-shader robustness")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39470>
(cherry picked from commit 455b692e4f)
2026-01-28 16:17:58 +01:00
Rob Clark
e1dae01299 freedreno/common: Fix gen8 EFU float control
This reg should be programmed to zero like previous gens.

Fixes: 6e3598177b ("freedreno/common: Add A840 and X2-85")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39467>
(cherry picked from commit 53b879ac58)
2026-01-28 16:17:58 +01:00
Silvio Vilerino
00632c8dfc d3d12: Add HAVE_GALLIUM_D3D12_VIDEO guards for d3d12_video_encoder_set_max_async_queue_depth/d3d12_video_encoder_get_last_slice_completion_fence
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14709
Fixes: e55b2b5064 ("d3d12: Add get_video_enc_last_slice_completion_fence interop")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39457>
(cherry picked from commit 4b366f8824)
2026-01-28 16:17:58 +01:00
Silvio Vilerino
944bcc85a0 d3d12: Add missing using Microsoft::WRL:ComPtr in d3d12_context_common
Fixes: b06b2fbaba ("d3d12: Remove Agility v717 guards for features now available in v618")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39457>
(cherry picked from commit 237313a243)
2026-01-28 16:17:58 +01:00
Lionel Landwerlin
fefa2b1e68 iris: fix incorrect intrinsic usage on ELK
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: faa857a061 ("intel: rework push constant handling")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14708
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39443>
(cherry picked from commit 21661f66fc)
2026-01-28 16:17:58 +01:00
Nick Hamilton
861c689517 pvr: Temporarily disable the buffer device address extension
The extension is optional in Vulkan 1.2 and is causing crashes in
multiple CTS tests.

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Backport-to: 26.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39351>
(cherry picked from commit 3aacc324bc)
2026-01-28 16:17:58 +01:00
Natalie Vock
b055af7ceb aco: Fix parameter stack size calculation
This only accounted for 1/32 (or 1/64) of the actual parameter size. In
some cases this meant that some threads were smashing other threads'
stacks.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39455>
(cherry picked from commit 15328a5ef3)
2026-01-28 16:17:58 +01:00
Mike Blumenkrantz
b12d9282c9 zink: re-allow transient images during blitting
now that transient images are a more complete mechanism, this should
in theory be okay and also accounts for the case where
a framebuffer contains mixed msrtt textures and plain multisampled textures

(cherry picked from commit 6474af3b42)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39469>
2026-01-28 16:17:58 +01:00
Yiwei Zhang
2f53818f7a venus: refactor Android ANB tracking to avoid confusions with WSI
WSI used to track the similar for aliased wsi image creation, but later
got deprecated. So let's rename wsi.memory to wsi.anb_mem and drop
wsi.memory_owned to avoid confusions with common wsi related trackings.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 481df22209)
2026-01-28 16:17:58 +01:00
Yiwei Zhang
f299be5193 venus: properly handle wsi implicit in-fence
Vulkan is supposed to operate in explicit synchronization mode. However,
for legacy compositors that only support implicit fencing, we have to
extract the compositor implicit fence (release fence) and resolve it
properly.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 849e3552e8)
2026-01-28 16:17:58 +01:00
Yiwei Zhang
e0af337416 venus: refactor vn_AcquireNextImage2KHR
Prepare for valid implicit in-fence.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 211c21725c)
2026-01-28 16:17:58 +01:00
Yiwei Zhang
29b37e4484 venus: add vn_renderer_bo_export_sync_file helper
...and a renderer internal helper shared by virtgpu and vtest backend
when supported.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 9718847dbf)
2026-01-28 16:17:58 +01:00
Yiwei Zhang
960a4d667b venus: track dedicated image during mem alloc
Need this because the new common wsi interface only returns the wsi
memory from the acquired image index.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit 3fca8423c9)
2026-01-28 16:17:58 +01:00
Yiwei Zhang
48c28ee238 venus: track prime blit dst buffer memory in the wsi image
This is to prepare for handling WSI implicit acquire fence.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39401>
(cherry picked from commit eb709cba47)
2026-01-28 16:17:58 +01:00
Simon Perretta
1b1229d3b2 pco: update formatless skip check
The skip check should only be checking the format rather than the entire
packed word.

Fixes: 52ddc40a75 ("pco: restrict shadow sampler comparator clamping to unorm formats")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39428>
(cherry picked from commit c5b70dcb48)
2026-01-28 16:17:58 +01:00
Samuel Pitoiset
f585d2fadc vulkan: fix missing begin debug marker for HPLOC
This fixes capturing with RGP.

Fixes: 091b43b970 ("radv: Use HPLOC for TLAS builds")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39427>
(cherry picked from commit 873008f274)
2026-01-28 16:17:58 +01:00
Kitlith
a09bbbf3e1 pvr: Free drm device in can_present_on_device
Fixes: 6bda88bfdb ("pvr: copy WSI can_present_on_device function from PanVK")
Signed-off-by: Kitlith <kitlith@kitl.pw>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39415>
(cherry picked from commit b18b52e61d)
2026-01-28 16:17:57 +01:00
Kitlith
6d4b68c748 panvk: Free drm device in can_present_on_device
Fixes: 08da41f2f1 ("panvk: override can_present_on_device")
Signed-off-by: Kitlith <kitlith@kitl.pw>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39415>
(cherry picked from commit 4de41bf27d)
2026-01-28 16:17:57 +01:00
jaap aarts
700f6c3214 radv/sqtt: Prevent concurrent submit when sqtt is enabled
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39090>
(cherry picked from commit 8f7941f92d)
2026-01-28 16:17:57 +01:00
Aitor Camacho
f4e56b61da hk: Handle unbound sets that contain dynamic buffers
The offset for the dynamic buffers needs to be computed with the currently
bound pipeline layout. This change fixes incorrectly selecting the offset
for a dynamic buffer if a descriptor with a lower index than the currently
being bound contains a dynamic buffer but said descriptor hasn't being
bound yet. It also prevents the binding to override the dynamic buffers in
order to preserve the already bound dynamic descriptors.

Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
(cherry picked from commit aaf4405507)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39440>
2026-01-28 16:17:57 +01:00
Aitor Camacho
d2bc79c260 nvk: Handle unbound sets that contain dynamic buffers
The offset for the dynamic buffers needs to be computed with the currently
bound pipeline layout. This change fixes incorrectly selecting the offset
for a dynamic buffer if a descriptor with a lower index than the currently
being bound contains a dynamic buffer but said descriptor hasn't being
bound yet. It also prevents the binding to override the dynamic buffers in
order to preserve the already bound dynamic descriptors.

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
(cherry picked from commit 80a076f5d0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39440>
2026-01-28 16:17:57 +01:00
Dylan Baker
1ed4f69065 bin/pick: When the main widget is replaced, trigger a redraw
The docs clearly say this, and though it used to just work that seems to
have been a coincidence rather than being correct.

CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39459>
(cherry picked from commit 0380c1228e)
2026-01-28 16:17:57 +01:00
Eric Engestrom
b317162543 pick-ui: update for python 3.14 support
```
Traceback (most recent call last):
  File "bin/pick-ui.py", line 31, in <module>
    loop = urwid.MainLoop(u.render(), PALETTE, event_loop=evl, handle_mouse=False)
                          ~~~~~~~~^^
  File "bin/pick/ui.py", line 196, in render
    asyncio.ensure_future(self.update())
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.14/asyncio/tasks.py", line 730, in ensure_future
    loop = events.get_event_loop()
  File "/usr/lib64/python3.14/asyncio/events.py", line 715, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
                       % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'MainThread'.
```

Of the 3 dependencies, only urwid actually needs to be updated, but
while at it let's pick the latest of each.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39452>
(cherry picked from commit 21829c9f7e)
2026-01-28 16:17:57 +01:00
Eric Engestrom
4141851e8a .pick_status.json: Update to bed1576b14 2026-01-28 16:17:57 +01:00
Eric Engestrom
a2b03c2117 VERSION: bump for 26.0.0-rc1
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
2026-01-21 19:28:32 +01:00
374 changed files with 33306 additions and 4708 deletions

0
.clang-format Normal file
View file

View file

@ -774,7 +774,7 @@ debian-riscv64:
# While s390 is dead, s390x is very much alive, and one of the last major
# big-endian platforms, so it provides useful coverage.
# In case of issues with this job, contact @ajax
debian-s390x:
.debian-s390x:
extends:
- .meson-cross
- .use-debian/s390x_build
@ -789,7 +789,7 @@ debian-s390x:
DRI_LOADERS:
-D glvnd=disabled
debian-ppc64el:
.debian-ppc64el:
extends:
- .meson-cross
- .use-debian/ppc64el_build

22752
.pick_status.json Normal file

File diff suppressed because it is too large Load diff

View file

@ -1 +1 @@
26.0.0-devel
26.0.2

View file

@ -385,5 +385,5 @@ async def main() -> None:
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop = asyncio.new_event_loop()
loop.run_until_complete(main())

View file

@ -27,7 +27,9 @@ from pick.ui import UI, PALETTE
if __name__ == "__main__":
u = UI()
evl = urwid.AsyncioEventLoop(loop=asyncio.new_event_loop())
asyncio_loop = asyncio.new_event_loop()
asyncio.set_event_loop(asyncio_loop)
evl = urwid.AsyncioEventLoop(loop=asyncio_loop)
loop = urwid.MainLoop(u.render(), PALETTE, event_loop=evl, handle_mouse=False)
u.mainloop = loop
loop.run()

View file

@ -52,7 +52,7 @@ IS_FIX = re.compile(r'^\s*fixes:\s*([a-f0-9]{6,40})', flags=re.MULTILINE | re.IG
IS_CC = re.compile(r'^\s*cc:\s*["\']?([0-9]{2}\.[0-9])?["\']?\s*["\']?([0-9]{2}\.[0-9])?["\']?\s*\<?mesa-stable',
flags=re.MULTILINE | re.IGNORECASE)
IS_REVERT = re.compile(r'This reverts commit ([0-9a-f]{40})')
IS_BACKPORT = re.compile(r'^\s*backport-to:\s*(\d{2}\.\d),?\s*(\d{2}\.\d)?',
IS_BACKPORT = re.compile(r'^\s*backport-to:\s*(?:(\d{2}\.\d),?\s*(\d{2}\.\d)?|(\*))',
flags=re.MULTILINE | re.IGNORECASE)
# XXX: hack
@ -295,7 +295,7 @@ async def resolve_nomination(commit: 'Commit', version: str) -> 'Commit':
if backport_to := IS_BACKPORT.findall(commit_message):
for match in backport_to:
if any(Version(version) >= Version(backport_version)
if any(backport_version == '*' or Version(version) >= Version(backport_version)
for backport_version in match if backport_version != ''):
commit.nominated = True
commit.nomination_type = NominationType.BACKPORT

View file

@ -263,7 +263,7 @@ class TestRE:
""")
backport_to = core.IS_BACKPORT.findall(message)
assert backport_to == [('19.2', '')]
assert backport_to == [('19.2', '', '')]
def test_multiple_release_space(self):
"""Tests commit with more than one branch specified"""
@ -278,7 +278,7 @@ class TestRE:
""")
backport_to = core.IS_BACKPORT.findall(message)
assert backport_to == [('19.1', '19.2')]
assert backport_to == [('19.1', '19.2', '')]
def test_multiple_release_comma(self):
"""Tests commit with more than one branch specified"""
@ -293,7 +293,7 @@ class TestRE:
""")
backport_to = core.IS_BACKPORT.findall(message)
assert backport_to == [('19.1', '19.2')]
assert backport_to == [('19.1', '19.2', '')]
def test_multiple_release_lines(self):
"""Tests commit with more than one branch specified in mulitple tags"""
@ -305,7 +305,7 @@ class TestRE:
""")
backport_to = core.IS_BACKPORT.findall(message)
assert backport_to == [('19.0', ''), ('19.1', '19.2')]
assert backport_to == [('19.0', '', ''), ('19.1', '19.2', '')]
class TestResolveNomination:
@ -405,6 +405,17 @@ class TestResolveNomination:
assert c.nominated
assert c.nomination_type is core.NominationType.BACKPORT
@pytest.mark.asyncio
async def test_backport_all_is_nominated(self):
s = self.FakeSubprocess(b'Backport-to: *')
c = core.Commit('abcdef1234567890', 'a commit')
with mock.patch('bin.pick.core.asyncio.create_subprocess_exec', s.mock):
await core.resolve_nomination(c, '0.0')
assert c.nominated
assert c.nomination_type is core.NominationType.BACKPORT
@pytest.mark.asyncio
async def test_backport_is_nominated_after(self):
s = self.FakeSubprocess(b'Backport-to: 16.2')

View file

@ -1,3 +1,3 @@
attrs==23.1.0
packaging==25.0
urwid==2.1.2
attrs==25.4.0
packaging==26.0
urwid==3.0.3

View file

@ -224,6 +224,7 @@ class UI:
if commit.nominated and commit.resolution is core.Resolution.UNRESOLVED:
b = urwid.AttrMap(CommitWidget(self, commit), None, focus_map='reversed')
self.commit_list.append(b)
self.mainloop.draw_screen()
self.save()
async def feedback(self, text: str) -> None:
@ -236,6 +237,7 @@ class UI:
if c.base_widget is commit:
del self.commit_list[i]
break
self.mainloop.draw_screen()
def save(self):
core.save(itertools.chain(self.new_commits, self.previous_commits))
@ -246,6 +248,7 @@ class UI:
def reset_cb(_) -> None:
self.mainloop.widget = o
self.mainloop.draw_screen()
async def apply_cb(edit: urwid.Edit) -> None:
text: str = edit.get_edit_text()
@ -263,6 +266,7 @@ class UI:
raise RuntimeError(f"Couldn't find {sha}")
await commit.apply(self)
self.mainloop.draw_screen()
q = urwid.Edit("Commit sha\n")
ok_btn = urwid.Button('Ok')
@ -279,12 +283,14 @@ class UI:
self.mainloop.widget = urwid.Overlay(
urwid.Filler(box), o, 'center', ('relative', 50), 'middle', ('relative', 50)
)
self.mainloop.draw_screen()
def chp_failed(self, commit: 'CommitWidget', err: str) -> None:
o = self.mainloop.widget
def reset_cb(_) -> None:
self.mainloop.widget = o
self.mainloop.draw_screen()
t = urwid.Text(textwrap.dedent(f"""
Failed to apply {commit.commit.sha} {commit.commit.description} with the following error:
@ -313,3 +319,4 @@ class UI:
self.mainloop.widget = urwid.Overlay(
urwid.Filler(box), o, 'center', ('relative', 50), 'middle', ('relative', 50)
)
self.mainloop.draw_screen()

View file

@ -3,6 +3,9 @@ Release Notes
The release notes summarize what's new or changed in each Mesa release.
- :doc:`26.0.2 release notes <relnotes/26.0.2>`
- :doc:`26.0.1 release notes <relnotes/26.0.1>`
- :doc:`26.0.0 release notes <relnotes/26.0.0>`
- :doc:`25.3.3 release notes <relnotes/25.3.3>`
- :doc:`25.3.2 release notes <relnotes/25.3.2>`
- :doc:`25.2.8 release notes <relnotes/25.2.8>`
@ -473,6 +476,9 @@ The release notes summarize what's new or changed in each Mesa release.
:maxdepth: 1
:hidden:
26.0.2 <relnotes/26.0.2>
26.0.1 <relnotes/26.0.1>
26.0.0 <relnotes/26.0.0>
25.3.3 <relnotes/25.3.3>
25.3.2 <relnotes/25.3.2>
25.2.8 <relnotes/25.2.8>

4765
docs/relnotes/26.0.0.rst Normal file

File diff suppressed because it is too large Load diff

247
docs/relnotes/26.0.1.rst Normal file
View file

@ -0,0 +1,247 @@
Mesa 26.0.1 Release Notes / 2026-02-25
======================================
Mesa 26.0.1 is a bug fix release which fixes bugs found since the 26.0.0 release.
Mesa 26.0.1 implements the OpenGL 4.6 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.6. OpenGL
4.6 is **only** available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
Mesa 26.0.1 implements the Vulkan 1.4 API, but the version reported by
the apiVersion property of the VkPhysicalDeviceProperties struct
depends on the particular driver being used.
SHA checksums
-------------
::
SHA256: bb5104f9f9a46c9b5175c24e601e0ef1ab44ce2d0fdbe81548b59adc8b385dcc mesa-26.0.1.tar.xz
SHA512: d47072257035acfa8a5594c0cda831b4e5178169dea8a06c6657268a441e32271f8798486e837cea23f35ce3f0b4b9520a4ea4ed26b0e1267b02da4c649bc9f9 mesa-26.0.1.tar.xz
New features
------------
- None
Bug fixes
---------
- Missing Haswell case after a097a3d214eda7fb7b9ff63176754b7260e09e03 leads to bogus assert in intel_perf_mdapi.c
- Question: Does building Lavapipe on Windows require building "microsoft-experimental" as well?
- [ANV]: Regression in dxvk Greedfall
- [ANV][BMG] Building Mesa with Clang causes Missing Skin Textures in UE games - Tekken 8
- [ANV][DG2][Regression]: Flickering water "boxes" in Civilization VII
- [RADV] Killer7 has a blue tint with RDNA3/4
- [bisected] Xe3 regression with piglit tess/barrier-patch.shader_test after cmod prop change
- [radeonsi] Regression: GL_FEEDBACK returns 0.0 for X-coordinates (Legacy GL)
- anv, bisected: Genshin Impact wrong shadows, flickering grass
- turnip: llama.cpp: Running test-backend-ops results in segmentation fault
- venus crashes in vn_CreateDevice() with latest mesa/main [bisected]
Changes
-------
Aitor Camacho (7):
- wsi/metal: Expose additional color spaces if instance extension enabled
- kk: Fill pipelineUUID
- kk: Fix shader uint32_t value serialization
- kk: Correctly release pipeline handles at shader destroy
- kk: Fix compute pipeline cache
- kk: Move gfx pipeline data to the info struct within kk_shader
- kk: Fix graphics pipeline serialization
Alyssa Rosenzweig (1):
- brw: drop buggy SLM optimization
Anna Maniscalco (1):
- freedreno/common: set has_astc_hdr true for a7xx targets
Benjamin Otte (1):
- lavapipe: Fix features for nonsubsampled ycbcr formats
Daniel Schürmann (1):
- nir/clone: Fix cloning indirect call instructions
Danylo Piliaiev (1):
- ir3: Align TCS per-patch output to 64 bytes to prevent stale reads
Emma Anholt (1):
- ir3/ra: Fix DOUBLE_ONLY limit pressure computation.
Eric Engestrom (5):
- docs: add sha sum for 26.0.0
- .pick_status.json: Update to 03d2cc2b2ae5341409ee1fab74e98134a6df0511
- bin/gen_release_notes: fix support for python 3.14
- pick-ui: add \`Backport-to: \*` as a synonym to \`Cc: mesa-stable`
- .pick_status.json: Mark 7dd7731ac710b0c7213f6bb466b33f62eca80604 as denominated
Faith Ekstrand (6):
- pan/clear: Stop packing undefined bits in colors
- nir/gather_info: Add support for panfrost tile load/store intrinsics
- panvk: Create both Z/S descriptors, even for separate Z/S
- panvk/preload: Stop assuming 32 registers
- panvk/jm: Refactor BeginRendering()
- panvk: Also load output attachments with LOAD_OP_NONE+STORE_OP_NONE
Frank Binns (2):
- pvr/ci: move some timing out tests from fails to skips
- pvr: Fix alloc callbacks usage when freeing frame buffers
Ian Romanick (8):
- spirv: Use STACK_ARRAY instead of NIR_VLA
- nir: Use STACK_ARRAY instead of NIR_VLA
- brw: Call nir_opt_algebraic_late in brw_nir_create_raygen_trampoline
- brw: Call nir_opt_algebraic_late later in brw_postprocess_nir_opts
- elk: Call nir_opt_algebraic_late in elk_postprocess_nir
- brw/cmod: Don't propagate from CMP to ADD if there is a write between
- elk/cmod: Don't propagate from CMP to possible Inf + (-Inf)
- elk/cmod: Don't propagate from CMP to ADD if there is a write between
Janne Grunau (3):
- asahi: Use GPU for buffer copies in resource_copy_region()
- asahi: Implement clear_buffer using libagx_fill*
- hk: Use aligned vector fill in hk_CmdFillBuffer if possible
Jarred Davies (2):
- pvr: Fix allocating the required scratch buffer space for tile buffers
- pvr: Add missing support for tile buffers to SPM EOT programs
Jesse Natalie (1):
- meson: Include DirectX-Headers dependency for all VK Windows builds
Jianxun Zhang (1):
- anv: Limit modifier disabling workaround to specific GTK versions
José Roberto de Souza (1):
- intel/perf: Add HSW verx10 to intel_perf_query_result_write_mdapi()
Juston Li (1):
- anv: set missing protected bit for protected depth/stencil surfaces
Konstantin Seurer (2):
- radv: Fix setting the viewport for depth stencil FS resolves
- vulkan/cmd_queue: Fixup stride for multi draws
Lars-Ivar Hesselberg Simonsen (2):
- panvk: Fix dcd_flags1 dirty bit
- pan/genxml/v13: Fix HSR Prepass typo
Leon Perianu (1):
- pvr: fix format table properties duplicate
Lionel Landwerlin (8):
- anv: flush render caches on first pipeline select
- anv: fix nested command buffer relocations
- anv: add missing constant cache invalidation for descriptor buffers
- isl: fix 32bit math with 4GB buffer size
- anv: apply the same ccs disabling for Xe3 than Xe2
- anv: disable ccs modifier reporting when ccs modifiers are disabled
- anv: dirty descriptors after blorp operations
- anv: remove snprintf for aux op transition
Mary Guillemard (1):
- hk: Fix crash in hk_handle_passthrough_gs
Matt Turner (4):
- brw/cse: fix \`operands_match` corrupting non-IMM register data
- brw/cse: use copies in \`operands_match` instead of in-place modification
- elk/cse: fix \`operands_match` corrupting non-IMM register data
- elk/cse: use copies in \`operands_match` instead of in-place modification
Mike Blumenkrantz (2):
- zink: fix broken compiler assert
- zink: only do pre-sync transfer barrier after a renderpass
Natalie Vock (3):
- radv/rt: Only use ds_bvh_stack_rtn if the stack base is possible to encode
- radv: Initialize nir_lower_io_to_scalar progress variable
- radv/nir: Correctly handle workgroup sizes not aligned to 32
Nick Hamilton (5):
- pvr: Fix incorrect subpass merging optimisation
- pvr: Rename pvr_render_input_attachment
- pvr: Add missing support for preserve attachments
- pvr: Update CI fails list after render pass fixes
- pvr: Add support for fragment pass through shader
Olivia Lee (1):
- hk: fix passthrough GS key invalidation
Pavel Ondračka (2):
- r300: align macro-tiled stride-addressed textures in X
- mesa: implement FRAMEBUFFER_RENDERABLE internalformat query
Rhys Perry (3):
- aco: fix gfx6-8 store_scratch() with function calls
- aco: reset all vgpr_used_by_vmem\_ in resolve_all_gfx11
- aco: resolve hazards before calls
Robert Mader (1):
- lavapipe: enable dmabuf import for planar drm formats
Ryan Zhang (1):
- panvk: guard against NULL pointers to avoid crash
Samuel Pitoiset (5):
- ac,radv,radeonsi: use correct swizzle/pitch for depth-only images with SDMA
- radv: fix potential corruption after FMASK decompression on GFX6-8
- radv/meta: fix depth/stencil resolves with different regions
- ac/nir: fix writemask for dual source blending on GFX11+
- radv: fix potential GPU hangs with secondaries on transfer queue
Tapani Pälli (1):
- util: bring back fix to avoid strict aliasing bugs in xxhash
Timothy Arceri (2):
- mesa: add _mesa_lookup_state_param_idx() helper
- st/glsl_to_nir: make sure the variant has the correct locations set
Wei Hao (1):
- radeonsi: fix threaded shader compilation finishing after context is destroyed
Yiwei Zhang (2):
- venus: workaround a gcc-15 dead store elimination (DSE) bug
- venus: sync protocol for strict aliasing compliance

238
docs/relnotes/26.0.2.rst Normal file
View file

@ -0,0 +1,238 @@
Mesa 26.0.2 Release Notes / 2026-03-12
======================================
Mesa 26.0.2 is a bug fix release which fixes bugs found since the 26.0.1 release.
Mesa 26.0.2 implements the OpenGL 4.6 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.6. OpenGL
4.6 is **only** available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
Mesa 26.0.2 implements the Vulkan 1.4 API, but the version reported by
the apiVersion property of the VkPhysicalDeviceProperties struct
depends on the particular driver being used.
SHA checksums
-------------
::
TBD.
New features
------------
- None
Bug fixes
---------
- 26.0.1 fails to build: \`create_context.c: error: 'struct glx_screen' has no member named 'frontend_screen'`
- A770: Counter-Strike 2 visual glitches (regression)
- Bisected regression: Assertion texObj->pt == view->texture failed.
- Kodi regression with panthor >= 1.7 after updating to Linux 7.0-rc1
- MDK2 HD (opengl) has most elements rendered as black
- Mesa 25.3 amdgpu memory issue
- OpenGL 4.1 VRAM Memory Leak with setting uniform variables
- Panfrost Bifrost compiler assertion failure: wrong vectorization in bi_alu_src_index (Mesa 26.0.0)
- RADV: RDNA4 visual corruption in DX11 (DXVK) Mafia III character model glitches, AMDVLK renders correctly (9070XT)
- [radeonsi] Regression: GL_FEEDBACK returns 0.0 for X-coordinates (Legacy GL)
- glsl: spec\@glsl-es-1.00\@linker\@glsl-mismatched-uniform-precision-unused broken
- ir3: ir3_get_predicate() vs &ctx->build
- r300 , regression , bisected : Glitches with Sauerbraten
- r300: HiZ related dEQP failures
Changes
-------
Anna Maniscalco (1):
- zink: don't care about generated gs output primitive
Benjamin Cheng (1):
- radeonsi/vcn: Use full pitch for pre-encode input
Boris Brezillon (1):
- pan/kmod: Allow mmap() on foreign buffers
Caio Oliveira (4):
- spirv: Refactor ALU opcode translation to take bit sizes
- spirv: Pull constant source fixup to the existing loop
- spirv: Fix spec constant to handle Select for non-native floats
- nir: Fix constant folding for iadd_sat
Christoph Pillmayer (2):
- pan/bi: Fix coupling spill placement
- pan/bi: Move FAUs to memory for memory phis
Connor Abbott (4):
- tu: Use HW offset 0 in 3d loads/clears with FDM
- ir3: Fix constlen trimming when more than one stage is trimmed
- tu: Set polygon mode when blitting
- tu: Fix setting will_be_resolved with MSRTSS
Danylo Piliaiev (2):
- tu: Store gmem attachments after custom resolve in dyn RP
- tu: Don't read .patch_input_gmem of unused attachment
David Rosca (1):
- vl: Also disable MPEG2 Main profile when mpeg12 decode is disabled
Eric Engestrom (3):
- docs: add sha sum for 26.0.1
- fixup! docs: add release notes for 26.0.1
- .pick_status.json: Update to 73dba1e15173ff6109925de9615f9d9f5cccc2be
Eric R. Smith (1):
- pco: fix a typo in the check for optimization looping
Erik Faye-Lund (1):
- gallium/dri: set LIBVA_DRIVERS_PATH in devenv
Faith Ekstrand (3):
- etnaviv: Call lower_bool_to_int32 not to_bitsize
- nir/lower_bool_to_bitsize: Make all bN_csel sources match
- pan/bi: Be more careful about bit sizes in b2f lowering
Georg Lehmann (3):
- ci: disable debian-ppc64el and debian-s390x
- aco/insert_fp_mode: don't skip setting round for fract
- nir/opt_algebraic: fix frsq clamp pattern
Ian Romanick (5):
- brw: Don't mark_invalid in update_for_reads for non-VGRF destination
- brw: Use brw_reg_is_arf in update_for_reads
- brw: Also check for ADDRESS file in update_for_reads
- brw/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT
- elk/algebraic: Don't optimize SEL.L.SAT or SEL.G.SAT
Icenowy Zheng (1):
- pvr: only specially handle gfx subcmd for BeginQuery
Iván Briano (1):
- anv: don't try to fast clear D/S with multiview
Jesse Natalie (1):
- d3d12: Fix importing external resources
Job Noorman (2):
- ir3: update context builder after ir3_get_predicate
- ir3: don't predicate vote_all/vote_any
Jose Maria Casanova Crespo (3):
- v3d: flush write jobs before BO replacement in DISCARD_WHOLE path
- vc4: flush write jobs before BO replacement in DISCARD_WHOLE path
- v3d: reject fast TLB blit when RT formats don't match
Karol Herbst (2):
- nir: fix nir_alu_type_range_contains_type_range for fp16 to int
- nir: fix nir_round_int_to_float for fp16
Lionel Landwerlin (2):
- anv: add missing handling for attachment locations in secondaries
- anv: dirty all push constant stages in simple shader
Lucas Fryzek (5):
- drisw: Properly mark shmid as -1 when alloc fails
- x11: Add helper util to check for xshm support
- egl/dri: Check that xshm can be attached
- glx: Check that xshm can be attached
- vulkan/wsi: Check that xshm can be attached
Luigi Santivetti (1):
- zink: fix format conversion logic for the alpha emulation case
Marek Olšák (1):
- ac: set the correct number of Z planes for ALLOW_EXPCLEAR
Mary Guillemard (1):
- vulkan: Do not override the shader_flags in case of no task shader
Mel Henning (1):
- driconf: force_vk_vendor on No Man's Sky + NVK
Mike Blumenkrantz (4):
- zink: add TRANSFER_WRITE -> HOST_READ sync to end of batch
- st/bitmap: only release YUV samplerviews
- radv: fix multiview fast clears
- egl/device: fix the fix for explicit sw rejection in non-sw EGL_PLATFORM=device
Patrick Lerda (1):
- r600: fix cs atomic operations when the shader is called multiple times
Pavel Ondračka (3):
- r300: copy target when merging alpha output instruction
- r300: disable HiZ for PIPE_FUNC_ALWAYS
- r300: disable clip-discard watermark for triangles
Pierre-Eric Pelloux-Prayer (2):
- frontends/va: fix undefined ref error
- mesa: don't wraparound st_context::work_counter
Rhys Perry (2):
- aco: perform dce for blocks skipped for process_block()
- nir/range_analysis: set deleted key
Sagar Ghuge (1):
- anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921
Samuel Pitoiset (4):
- radv: fix copying images with different swizzle modes on SDMA7
- radv: fix a GPU hang with PS epilogs and secondary command buffers
- radv: fix local invocation index for mesh/task and quad derivatives on GFX12
- radv: fix missing L2 cache invalidation with streamout on GFX12
Tapani Pälli (2):
- intel/dev: update mesa_defs.json from workaround database
- anv: add handling for Wa_14026600921
Timothy Arceri (5):
- glsl: relax precision matching on unused uniforms ES
- glsl: add workaround for MDK2 HD
- mesa/st: use same path for setting state ref locations
- st/glsl_to_nir: update state var locations earlier
- glx: guard glx_screen frontend_screen member
Yiwei Zhang (2):
- pan: fix to not clear out of bitset range
- lvp: avoid advertising dmabuf support for kms_swrast

View file

@ -1,32 +0,0 @@
VK_KHR_relaxed_block_layout on pvr
VK_KHR_storage_buffer_storage_class on pvr
VK_EXT_external_memory_acquire_unmodified on panvk
VK_EXT_discard_rectangles on NVK
VK_KHR_present_id on HoneyKrisp
VK_KHR_present_id2 on HoneyKrisp
VK_KHR_present_wait on HoneyKrisp
VK_KHR_present_wait2 on HoneyKrisp
VK_KHR_maintenance10 on ANV, NVK, RADV
VK_EXT_shader_uniform_buffer_unsized_array on ANV, HK, NVK, RADV
VK_EXT_device_memory_report on panvk
VK_VALVE_video_encode_rgb_conversion on radv
VK_EXT_custom_resolve on RADV
GL_EXT_shader_pixel_local_storage on Panfrost v6+
VK_EXT_image_drm_format_modifier on panvk/v7
VK_KHR_sampler_ycbcr_conversion on panvk/v7
sparseResidencyImage2D on panvk v10+
sparseResidencyStandard2DBlockShape on panvk v10+
VK_KHR_surface_maintenance1 promotion everywhere EXT is exposed
VK_KHR_swapchain_maintenance1 promotion everywhere EXT is exposed
VK_KHR_dynamic_rendering on PowerVR
VK_EXT_multisampled_render_to_single_sampled on panvk
VK_KHR_pipeline_binary on HoneyKrisp
VK_KHR_incremental_present on pvr
VK_KHR_xcb_surface on pvr
VK_KHR_xlib_surface on pvr
VK_KHR_robustness2 on panvk v10+
VK_KHR_robustness2 on HoneyKrisp
VK_KHR_robustness2 on hasvk
VK_KHR_robustness2 on NVK
VK_KHR_robustness2 on Turnip
VK_KHR_robustness2 on lavapipe

View file

@ -197,6 +197,9 @@ following example::
This will backport the commit to the 21.0 branch, as well as any more recent
stable branch. Multiple ``Backport-to:`` lines are allowed, but only the
lowest number mentioned actually matters, so for clarity, please only use one.
You can also use the special ``Backport-to: *`` which will nominate the commit
to be backported to every active stable branch, making it a synonym to the ``Cc:
mesa-stable`` below.
The last option is deprecated and mostly here for historical reasons
dating back to when patch submission was done via emails: using a ``Cc:``

View file

@ -642,7 +642,7 @@ if with_dri
endif
dep_dxheaders = null_dep
if with_gallium_d3d12 or with_microsoft_clc or with_microsoft_vk or with_gfxstream_vk and host_machine.system() == 'windows'
if with_gallium_d3d12 or with_microsoft_clc or with_microsoft_vk or (with_any_vk and host_machine.system() == 'windows')
dep_dxheaders = dependency('directx-headers', required : false)
if not dep_dxheaders.found()
dep_dxheaders = dependency('DirectX-Headers',
@ -1931,7 +1931,6 @@ dep_spirv_tools = dependency(
'SPIRV-Tools',
required : with_spirv_tools,
version : '>= 2024.1',
static : host_machine.system() == 'darwin',
)
if dep_spirv_tools.found()
pre_args += '-DHAVE_SPIRV_TOOLS'

View file

@ -401,7 +401,6 @@ spec@egl 1.4@eglterminate then unbind context,Fail
spec@egl_khr_surfaceless_context@viewport,Fail
spec@egl_mesa_configless_context@basic,Fail
spec@ext_external_objects@vk-ping-pong-single-sem,Crash
spec@glsl-es-1.00@linker@glsl-mismatched-uniform-precision-unused,Fail
spec@glsl-es-3.00@execution@built-in-functions@fs-packhalf2x16,Fail
spec@glsl-es-3.00@execution@built-in-functions@vs-packhalf2x16,Fail
spec@khr_texture_compression_astc@miptree-gles srgb-fp,Fail

View file

@ -71,5 +71,4 @@ program@run kernel with max work item sizes,Fail
# uprev Piglit in Mesa
spec@ext_external_objects@vk-semaphores,Crash
spec@ext_external_objects@vk-semaphores-2,Crash
spec@glsl-es-1.00@linker@glsl-mismatched-uniform-precision-unused,Fail

View file

@ -121,7 +121,6 @@ spec@ext_texture_srgb@texwrap formats-s3tc bordercolor-swizzled@GL_COMPRESSED_SR
spec@ext_texture_srgb@texwrap formats-s3tc bordercolor-swizzled@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT- swizzled- border color only,Fail
spec@glsl-1.50@execution@geometry@tri-strip-ordering-with-prim-restart gl_triangle_strip_adjacency ffs,Fail
spec@glsl-1.50@execution@geometry@tri-strip-ordering-with-prim-restart gl_triangle_strip_adjacency other,Fail
spec@glsl-es-1.00@linker@glsl-mismatched-uniform-precision-unused,Fail
spec@glsl-es-3.00@execution@built-in-functions@fs-packhalf2x16,Fail
spec@glsl-es-3.00@execution@built-in-functions@vs-packhalf2x16,Fail
spec@khr_texture_compression_astc@miptree-gl srgb-fp,Fail

View file

@ -14,7 +14,6 @@ spec@egl_khr_surfaceless_context@viewport,Fail
spec@ext_external_objects@vk-image-display,Crash
spec@ext_external_objects@vk-semaphores,Crash
spec@ext_external_objects@vk-semaphores-2,Crash
spec@glsl-es-1.00@linker@glsl-mismatched-uniform-precision-unused,Fail
spec@glsl-es-3.00@execution@built-in-functions@fs-packhalf2x16,Fail
spec@glsl-es-3.00@execution@built-in-functions@vs-packhalf2x16,Fail
spec@khr_texture_compression_astc@miptree-gles srgb-fp,Fail

View file

@ -222,10 +222,12 @@ static uint32_t
ac_sdma_get_tiled_info_dword(const struct radeon_info *info,
const struct ac_sdma_surf_tiled *tiled)
{
const uint32_t swizzle_mode = tiled->surf->has_stencil ? tiled->surf->u.gfx9.zs.stencil_swizzle_mode
: tiled->surf->u.gfx9.swizzle_mode;
const uint16_t epitch = tiled->surf->has_stencil ? tiled->surf->u.gfx9.zs.stencil_epitch
: tiled->surf->u.gfx9.epitch;
const uint32_t swizzle_mode =
tiled->is_stencil ? tiled->surf->u.gfx9.zs.stencil_swizzle_mode
: tiled->surf->u.gfx9.swizzle_mode;
const uint16_t epitch =
tiled->is_stencil ? tiled->surf->u.gfx9.zs.stencil_epitch
: tiled->surf->u.gfx9.epitch;
const enum gfx9_resource_type dimension =
ac_sdma_get_tiled_resource_dim(info->sdma_ip_version, tiled);
const uint32_t mip_max = MAX2(tiled->num_levels, 1);

View file

@ -61,6 +61,7 @@ struct ac_sdma_surf_tiled {
uint64_t va;
enum pipe_format format;
uint32_t bpp;
bool is_stencil;
struct {
uint32_t x;

View file

@ -1055,8 +1055,15 @@ ac_init_ds_surface(const struct radeon_info *info, const struct ac_ds_state *sta
static unsigned
ac_get_decompress_on_z_planes(const struct radeon_info *info, enum pipe_format format, uint8_t log_num_samples,
bool htile_stencil_disabled, bool no_d16_compression)
bool tc_compat_htile_enabled, bool htile_stencil_disabled, bool no_d16_compression,
bool z_allow_expclear)
{
if (info->gfx_level < GFX8)
return 0;
if (!tc_compat_htile_enabled)
return z_allow_expclear ? 15 : 0;
uint32_t max_zplanes = 0;
if (info->gfx_level >= GFX9) {
@ -1073,6 +1080,7 @@ ac_get_decompress_on_z_planes(const struct radeon_info *info, enum pipe_format f
max_zplanes = 1;
max_zplanes++;
assert(max_zplanes != 1); /* 1 is invalid and can cause corruption on gfx11-11.5 */
} else {
if (format == PIPE_FORMAT_Z16_UNORM && no_d16_compression) {
/* Do not enable Z plane compression for 16-bit depth
@ -1093,6 +1101,7 @@ ac_get_decompress_on_z_planes(const struct radeon_info *info, enum pipe_format f
}
}
assert(max_zplanes != 10 && max_zplanes != 13); /* disallowed values */
return max_zplanes;
}
@ -1115,14 +1124,18 @@ ac_set_mutable_ds_surface_fields(const struct radeon_info *info, const struct ac
log_num_samples = G_028040_NUM_SAMPLES(ds->db_z_info);
}
bool z_allow_expclear = info->gfx_level <= GFX11_5 &&
G_028038_ALLOW_EXPCLEAR(ds->db_z_info);
const uint32_t max_zplanes =
ac_get_decompress_on_z_planes(info, state->format, log_num_samples,
tile_stencil_disable, state->no_d16_compression);
state->tc_compat_htile_enabled, tile_stencil_disable,
state->no_d16_compression, z_allow_expclear);
if (info->gfx_level >= GFX9) {
if (state->tc_compat_htile_enabled) {
ds->db_z_info |= S_028038_DECOMPRESS_ON_N_ZPLANES(max_zplanes);
ds->db_z_info |= S_028038_DECOMPRESS_ON_N_ZPLANES(max_zplanes);
if (state->tc_compat_htile_enabled) {
if (info->gfx_level >= GFX10) {
const bool iterate256 = log_num_samples >= 1;
@ -1138,12 +1151,13 @@ ac_set_mutable_ds_surface_fields(const struct radeon_info *info, const struct ac
ds->db_z_info |= S_028038_ZRANGE_PRECISION(state->zrange_precision);
} else {
if (state->tc_compat_htile_enabled) {
ds->u.gfx6.db_htile_surface |= S_028ABC_TC_COMPATIBLE(1);
if (info->gfx_level >= GFX8)
ds->db_z_info |= S_028040_DECOMPRESS_ON_N_ZPLANES(max_zplanes);
} else {
if (state->tc_compat_htile_enabled)
ds->u.gfx6.db_htile_surface |= S_028ABC_TC_COMPATIBLE(1);
else
ds->u.gfx6.db_depth_info |= S_02803C_ADDR5_SWIZZLE_MASK(1);
}
ds->db_z_info |= S_028040_ZRANGE_PRECISION(state->zrange_precision);
}

View file

@ -49,6 +49,8 @@ ac_sqtt_get_data_va(const struct radeon_info *rad_info, const struct ac_sqtt *da
void
ac_sqtt_init(struct ac_sqtt *data)
{
simple_mtx_init(&data->lock, mtx_plain);
list_inithead(&data->rgp_pso_correlation.record);
simple_mtx_init(&data->rgp_pso_correlation.lock, mtx_plain);
@ -71,6 +73,8 @@ ac_sqtt_init(struct ac_sqtt *data)
void
ac_sqtt_finish(struct ac_sqtt *data)
{
simple_mtx_destroy(&data->lock);
assert(data->rgp_pso_correlation.record_count == 0);
simple_mtx_destroy(&data->rgp_pso_correlation.lock);

View file

@ -15,6 +15,7 @@
#include "ac_pm4.h"
#include "ac_rgp.h"
#include "amd_family.h"
#include "util/simple_mtx.h"
#define SQTT_BUFFER_ALIGN_SHIFT 12
@ -61,6 +62,8 @@ struct ac_sqtt {
struct rgp_clock_calibration rgp_clock_calibration;
struct hash_table_u64 *pipeline_bos;
simple_mtx_t lock;
};
struct ac_sqtt_data_info {

View file

@ -443,10 +443,14 @@ emit_ps_color_export(nir_builder *b, lower_ps_state *s, unsigned output_index, u
}
}
s->exp[s->exp_num++] = nir_export_amd(b, nir_vec(b, outputs, 4),
.base = target,
.write_mask = write_mask,
.flags = flags);
nir_intrinsic_instr *exp = nir_export_amd(b, nir_vec(b, outputs, 4),
.base = target,
.flags = flags);
/* Set the writemask explicitly because write_mask=0 means full write mask. */
nir_intrinsic_set_write_mask(exp, write_mask);
s->exp[s->exp_num++] = exp;
return true;
}
@ -483,7 +487,7 @@ emit_ps_dual_src_blend_swizzle(nir_builder *b, lower_ps_state *s, unsigned first
uint32_t mrt0_write_mask = nir_intrinsic_write_mask(mrt0_exp);
uint32_t mrt1_write_mask = nir_intrinsic_write_mask(mrt1_exp);
uint32_t write_mask = mrt0_write_mask & mrt1_write_mask;
uint32_t write_mask = mrt0_write_mask | mrt1_write_mask;
nir_def *mrt0_arg = mrt0_exp->src[0].ssa;
nir_def *mrt1_arg = mrt1_exp->src[0].ssa;

View file

@ -216,6 +216,11 @@ the correct layout is:
VOP2 `v_pk_fmac_f16`. But like all other packed math opcodes, DPP does not function in practice.
RDNA1 and RDNA2 support `v_pk_fmac_f16_dpp`.
## DPP with integer `subrev` and shifts
No documentation mentions this, but DPP is seemingly applied to src1 instead of src0 for
integer reverse subtract and shift opcodes.
## ds_swizzle_b32 rotate/fft modes
These are first mentioned in the GFX9 (Vega) ISA doc, information from the LLVM bug tracker

View file

@ -1867,6 +1867,8 @@ resolve_all_gfx11(State& state, NOP_ctx_gfx11& ctx,
ctx.vgpr_used_by_vmem_bvh.any()) {
waitcnt_depctr &= 0xffe3;
ctx.vgpr_used_by_vmem_load.reset();
ctx.vgpr_used_by_vmem_sample.reset();
ctx.vgpr_used_by_vmem_bvh.reset();
ctx.vgpr_used_by_vmem_store.reset();
ctx.vgpr_used_by_ds.reset();
}
@ -1912,7 +1914,9 @@ handle_block(Program* program, Ctx& ctx, Block& block)
Handle(state, ctx, instr, block.instructions);
/* Resolve all possible hazards (we don't know what s_setpc_b64 jumps to). */
if (instr->opcode == aco_opcode::s_setpc_b64) {
if (instr->opcode == aco_opcode::s_setpc_b64 || instr->opcode == aco_opcode::s_swappc_b64 ||
instr->opcode == aco_opcode::s_call_b64) {
found_end |= instr->opcode == aco_opcode::s_setpc_b64;
block.instructions.emplace_back(std::move(instr));
std::vector<aco_ptr<Instruction>> resolve_instrs;
@ -1920,8 +1924,6 @@ handle_block(Program* program, Ctx& ctx, Block& block)
block.instructions.insert(std::prev(block.instructions.end()),
std::move_iterator(resolve_instrs.begin()),
std::move_iterator(resolve_instrs.end()));
found_end = true;
continue;
}

View file

@ -484,10 +484,17 @@ process_instructions(exec_ctx& ctx, Block* block, std::vector<aco_ptr<Instructio
Operand exit_cond = Operand(exec, bld.lm);
if (state == Exact) {
assert(info.exec.size() == 1);
bld.sop2(Builder::s_andn2, Definition(exec, bld.lm), bld.def(s1, scc), info.exec[0].op,
src);
info.exec[0].op = Operand(exec, bld.lm);
bld.sop2(Builder::s_andn2, Definition(exec, bld.lm), bld.def(s1, scc),
info.exec.back().op, src);
info.exec.back().op = Operand(exec, bld.lm);
/* Although this is in uniform CF, it might be a loop without back-edge.
* Update the loop restore mask as well.
*/
for (unsigned i = 0; i < info.exec.size() - 1; i++) {
assert(info.exec[i + 1].type & mask_type_loop);
info.exec[i].op = bld.copy(bld.def(bld.lm), Operand(exec, bld.lm));
}
} else {
Temp cond = bld.tmp(s1);
info.exec[0].op = bld.sop2(Builder::s_andn2, bld.def(bld.lm), Definition(cond, scc),

View file

@ -233,9 +233,6 @@ instr_ignores_round_mode(const Instruction* instr)
case aco_opcode::v_rndne_f64:
case aco_opcode::v_rndne_f32:
case aco_opcode::v_rndne_f16:
case aco_opcode::v_fract_f64:
case aco_opcode::v_fract_f32:
case aco_opcode::v_fract_f16:
case aco_opcode::s_min_f32:
case aco_opcode::s_min_f16:
case aco_opcode::s_max_f32:
@ -454,16 +451,16 @@ emit_set_mode_block(fp_mode_ctx* ctx, Block* block)
for (uint32_t pred : block->linear_preds)
max_pred = MAX2(max_pred, pred);
assert(max_pred != 0);
mode_mask to_set = 0;
/* Check if the any mode was changed during the loop. */
u_foreach_bit (i, fp_state.required) {
if (ctx->last_set[i] <= max_pred)
to_set |= BITFIELD_BIT(i);
if (max_pred >= block->index) {
mode_mask to_set = 0;
/* Check if the any mode was changed during the loop. */
u_foreach_bit (i, fp_state.required) {
if (ctx->last_set[i] <= max_pred)
to_set |= BITFIELD_BIT(i);
}
if (to_set)
set_mode(ctx, block, fp_state, 0, to_set);
}
if (to_set)
set_mode(ctx, block, fp_state, 0, to_set);
}
ctx->block_states[block->index] = fp_state;

View file

@ -391,6 +391,65 @@ convert_to_SDWA(amd_gfx_level gfx_level, aco_ptr<Instruction>& instr)
return tmp;
}
bool
opcode_supports_dpp(amd_gfx_level gfx_level, aco_opcode opcode, bool vop3p)
{
switch (opcode) {
/* reverse integer subtract and shift seem to apply dpp to src1 instead of src0 */
case aco_opcode::v_subrev_co_u32:
case aco_opcode::v_subrev_co_u32_e64:
case aco_opcode::v_subbrev_co_u32:
case aco_opcode::v_subrev_u16:
case aco_opcode::v_subrev_u32:
case aco_opcode::v_ashrrev_i32:
case aco_opcode::v_lshrrev_b32:
case aco_opcode::v_lshlrev_b32:
case aco_opcode::v_ashrrev_i16:
case aco_opcode::v_lshrrev_b16:
case aco_opcode::v_lshlrev_b16:
case aco_opcode::v_ashrrev_i16_e64:
case aco_opcode::v_lshrrev_b16_e64:
case aco_opcode::v_lshlrev_b16_e64: return false;
case aco_opcode::v_pk_fmac_f16: return gfx_level < GFX11;
/* there are more cases but those all take 64-bit inputs */
case aco_opcode::v_madmk_f32:
case aco_opcode::v_madak_f32:
case aco_opcode::v_madmk_f16:
case aco_opcode::v_madak_f16:
case aco_opcode::v_fmamk_f32:
case aco_opcode::v_fmaak_f32:
case aco_opcode::v_fmamk_f16:
case aco_opcode::v_fmaak_f16:
case aco_opcode::v_readfirstlane_b32:
case aco_opcode::v_cvt_f64_i32:
case aco_opcode::v_cvt_f64_f32:
case aco_opcode::v_cvt_f64_u32:
case aco_opcode::v_mul_lo_u32:
case aco_opcode::v_mul_lo_i32:
case aco_opcode::v_mul_hi_u32:
case aco_opcode::v_mul_hi_i32:
case aco_opcode::v_qsad_pk_u16_u8:
case aco_opcode::v_mqsad_pk_u16_u8:
case aco_opcode::v_mqsad_u32_u8:
case aco_opcode::v_mad_u64_u32:
case aco_opcode::v_mad_i64_i32:
case aco_opcode::v_permlane16_b32:
case aco_opcode::v_permlanex16_b32:
case aco_opcode::v_permlane64_b32:
case aco_opcode::v_readlane_b32_e64:
case aco_opcode::v_writelane_b32_e64: return false;
/* simpler than listing all VOP3P opcodes which do not support DPP */
case aco_opcode::v_fma_mix_f32:
case aco_opcode::v_fma_mixlo_f16:
case aco_opcode::v_fma_mixhi_f16:
case aco_opcode::p_v_fma_mixlo_f16_rtz:
case aco_opcode::p_v_fma_mixhi_f16_rtz:
case aco_opcode::v_dot2_f32_f16:
case aco_opcode::v_dot2_f32_bf16: return gfx_level >= GFX11;
default: return !vop3p;
}
}
bool
can_use_DPP(amd_gfx_level gfx_level, const aco_ptr<Instruction>& instr, bool dpp8)
{
@ -433,41 +492,7 @@ can_use_DPP(amd_gfx_level gfx_level, const aco_ptr<Instruction>& instr, bool dpp
if (instr->writes_exec())
return false;
/* simpler than listing all VOP3P opcodes which do not support DPP */
if (instr->isVOP3P()) {
return instr->opcode == aco_opcode::v_fma_mix_f32 ||
instr->opcode == aco_opcode::v_fma_mixlo_f16 ||
instr->opcode == aco_opcode::v_fma_mixhi_f16 ||
instr->opcode == aco_opcode::p_v_fma_mixlo_f16_rtz ||
instr->opcode == aco_opcode::p_v_fma_mixhi_f16_rtz ||
instr->opcode == aco_opcode::v_dot2_f32_f16 ||
instr->opcode == aco_opcode::v_dot2_f32_bf16;
}
if (instr->opcode == aco_opcode::v_pk_fmac_f16)
return gfx_level < GFX11;
/* there are more cases but those all take 64-bit inputs */
return instr->opcode != aco_opcode::v_madmk_f32 && instr->opcode != aco_opcode::v_madak_f32 &&
instr->opcode != aco_opcode::v_madmk_f16 && instr->opcode != aco_opcode::v_madak_f16 &&
instr->opcode != aco_opcode::v_fmamk_f32 && instr->opcode != aco_opcode::v_fmaak_f32 &&
instr->opcode != aco_opcode::v_fmamk_f16 && instr->opcode != aco_opcode::v_fmaak_f16 &&
instr->opcode != aco_opcode::v_readfirstlane_b32 &&
instr->opcode != aco_opcode::v_cvt_f64_i32 &&
instr->opcode != aco_opcode::v_cvt_f64_f32 &&
instr->opcode != aco_opcode::v_cvt_f64_u32 && instr->opcode != aco_opcode::v_mul_lo_u32 &&
instr->opcode != aco_opcode::v_mul_lo_i32 && instr->opcode != aco_opcode::v_mul_hi_u32 &&
instr->opcode != aco_opcode::v_mul_hi_i32 &&
instr->opcode != aco_opcode::v_qsad_pk_u16_u8 &&
instr->opcode != aco_opcode::v_mqsad_pk_u16_u8 &&
instr->opcode != aco_opcode::v_mqsad_u32_u8 &&
instr->opcode != aco_opcode::v_mad_u64_u32 &&
instr->opcode != aco_opcode::v_mad_i64_i32 &&
instr->opcode != aco_opcode::v_permlane16_b32 &&
instr->opcode != aco_opcode::v_permlanex16_b32 &&
instr->opcode != aco_opcode::v_permlane64_b32 &&
instr->opcode != aco_opcode::v_readlane_b32_e64 &&
instr->opcode != aco_opcode::v_writelane_b32_e64;
return opcode_supports_dpp(gfx_level, instr->opcode, instr->isVOP3P());
}
aco_ptr<Instruction>
@ -889,7 +914,9 @@ needs_exec_mask(const Instruction* instr)
if (instr->isSALU() || instr->isBranch() || instr->isSMEM() || instr->isBarrier())
return instr->opcode == aco_opcode::s_cbranch_execz ||
instr->opcode == aco_opcode::s_cbranch_execnz ||
instr->opcode == aco_opcode::s_setpc_b64 || instr->reads_exec();
instr->opcode == aco_opcode::s_setpc_b64 ||
instr->opcode == aco_opcode::s_swappc_b64 || instr->opcode == aco_opcode::s_call_b64 ||
instr->reads_exec();
if (instr->isPseudo()) {
switch (instr->opcode) {

View file

@ -2040,6 +2040,8 @@ bool can_use_opsel(amd_gfx_level gfx_level, aco_opcode op, int idx);
bool instr_is_16bit(amd_gfx_level gfx_level, aco_opcode op);
uint8_t get_gfx11_true16_mask(aco_opcode op);
bool can_use_SDWA(amd_gfx_level gfx_level, const aco_ptr<Instruction>& instr, bool pre_ra);
bool opcode_supports_dpp(amd_gfx_level gfx_level, aco_opcode opcode, bool vop3p);
bool can_use_DPP(amd_gfx_level gfx_level, const aco_ptr<Instruction>& instr, bool dpp8);
bool can_use_DPP(amd_gfx_level gfx_level, const aco_ptr<Instruction>& instr, bool dpp8);
bool can_write_m0(const aco_ptr<Instruction>& instr);
/* updates "instr" and returns the old instruction (or NULL if no update was needed) */

View file

@ -298,7 +298,9 @@ eliminate_useless_exec_writes_in_block(branch_ctx& ctx, Block& block)
/* blocks_incoming_exec_used is initialized to true, so this is correct even for loops. */
if (instr->opcode == aco_opcode::s_cbranch_scc0 ||
instr->opcode == aco_opcode::s_cbranch_scc1) {
instr->opcode == aco_opcode::s_cbranch_scc1 ||
instr->opcode == aco_opcode::s_cbranch_vccz ||
instr->opcode == aco_opcode::s_cbranch_vccnz) {
exec_write_used |= ctx.blocks_incoming_exec_used[instr->salu().imm];
}

View file

@ -22,7 +22,13 @@ enum aco_nir_function_attribs {
};
enum aco_nir_parameter_attribs {
/* Parameter value is not used by any callee and does not need to be preserved */
/* This parameter's value may not be preserved across a callee. Unlike return parameters, the
* parameter's value is undefined on return. Callers must back up values of discardable
* parameters separately.
* Mostly used for tail calls, where parameters to the tail callee have different values than
* for the caller. In that case, on function return, the parameters will have been overwritten
* with the tail callee parameter values.
*/
ACO_NIR_PARAM_ATTRIB_DISCARDABLE = 0x1,
};

View file

@ -427,6 +427,21 @@ process_block(vn_ctx& ctx, Block& block)
block.instructions = std::move(new_instructions);
}
void
dce_instructions(vn_ctx& ctx, Block& block)
{
std::vector<aco_ptr<Instruction>> new_instructions;
new_instructions.reserve(block.instructions.size());
for (aco_ptr<Instruction>& instr : block.instructions) {
if (is_dead(ctx.uses, instr.get()))
continue;
new_instructions.emplace_back(std::move(instr));
}
block.instructions = std::move(new_instructions);
}
void
rename_phi_operands(Block& block, aco::unordered_map<uint32_t, Temp>& renames)
{
@ -467,10 +482,12 @@ value_numbering(Program* program)
if (block.logical_idom == (int)block.index)
ctx.expr_values.clear();
if (block.logical_idom != -1)
if (block.logical_idom != -1) {
process_block(ctx, block);
else
} else {
dce_instructions(ctx, block);
rename_phi_operands(block, ctx.renames);
}
/* increment exec_id when entering nested control flow */
if (block.kind & block_kind_branch || block.kind & block_kind_loop_preheader ||

View file

@ -1190,7 +1190,7 @@ alu_opt_gather_info(opt_ctx& ctx, Instruction* instr, alu_opt_info& info)
info.operands.push_back({instr->operands[0]});
if (instr->definitions[0].regClass() == s1) {
info.defs.push_back(instr->definitions[1]);
info.opcode = aco_opcode::v_lshl_b32;
info.opcode = aco_opcode::s_lshl_b32;
info.format = Format::SOP2;
std::swap(info.operands[0], info.operands[1]);
} else {

View file

@ -142,6 +142,10 @@ save_reg_writes(pr_opt_ctx& ctx, aco_ptr<Instruction>& instr)
ctx.instr_idx_by_regs[ctx.current_block->index][instr->pseudo().scratch_sgpr] =
overwritten_unknown_instr;
}
if (instr->isCall()) {
std::fill(ctx.instr_idx_by_regs[ctx.current_block->index].begin(),
ctx.instr_idx_by_regs[ctx.current_block->index].end(), overwritten_unknown_instr);
}
}
Idx
@ -862,6 +866,8 @@ instr_overwrites(Instruction* instr, PhysReg reg, unsigned size)
if (scratch_reg >= reg && reg + size > scratch_reg)
return true;
}
if (instr->isCall())
return true;
return false;
}

View file

@ -672,7 +672,7 @@ build_end_with_regs(isel_context* ctx, std::vector<Operand>& regs)
Instruction*
add_startpgm(struct isel_context* ctx, bool is_callee)
{
ctx->program->scratch_arg_size += ctx->callee_info.scratch_param_size;
ctx->program->scratch_arg_size += ctx->callee_info.scratch_param_size * ctx->program->wave_size;
unsigned def_count = 0;
for (unsigned i = 0; i < ctx->args->arg_count; i++) {
@ -1034,8 +1034,7 @@ find_param_regs(Program* program, const ABI& abi, callee_info& info,
param_demand += Temp(0, it2->rc);
it2->dst_info->needs_explicit_preservation =
regs == clobbered_regs && !it2->dst_info->discardable;
it2->dst_info->needs_explicit_preservation = regs == clobbered_regs;
it2->dst_info->def.setPrecolored(*next_reg);
for (unsigned i = 0; i < it2->rc.size(); ++i)
BITSET_CLEAR(regs, next_reg->reg() + i);
@ -1051,8 +1050,7 @@ find_param_regs(Program* program, const ABI& abi, callee_info& info,
next_reg = next_reg->advance(required_padding * 4);
}
if (next_reg) {
params.back().dst_info->needs_explicit_preservation =
regs == clobbered_regs && !params.back().dst_info->discardable;
params.back().dst_info->needs_explicit_preservation = regs == clobbered_regs;
param_demand += Temp(0, params.back().rc);
params.back().dst_info->def.setPrecolored(*next_reg);
BITSET_CLEAR_COUNT(regs, next_reg->reg(), params.back().rc.size());

View file

@ -3392,7 +3392,10 @@ visit_store_scratch(isel_context* ctx, nir_intrinsic_instr* instr)
offset = as_vgpr(ctx, offset);
for (unsigned i = 0; i < write_count; i++) {
aco_opcode op = get_buffer_store_op(write_datas[i].bytes());
Instruction* mubuf = bld.mubuf(op, rsrc, offset, ctx->program->scratch_offsets.back(),
Operand soffset = Operand::c32(0);
if (!ctx->program->scratch_offsets.empty())
soffset = Operand(ctx->program->scratch_offsets.back());
Instruction* mubuf = bld.mubuf(op, rsrc, offset, soffset,
write_datas[i], offsets[i], true);
mubuf->mubuf().sync = memory_sync_info(storage_scratch, semantic_private);
enum ac_access_type type =

View file

@ -145,7 +145,7 @@ main()
ir_id_to_offset(children[i]))).aabb;
float surface_area = aabb_surface_area(bounds);
if (surface_area > largest_surface_area) {
if (surface_area > largest_surface_area || collapsed_child_index == -1) {
largest_surface_area = surface_area;
collapsed_child_index = i;
}

View file

@ -778,8 +778,11 @@ sqtt_QueueSubmit2(VkQueue _queue, uint32_t submitCount, const VkSubmitInfo2 *pSu
if (queue->sqtt_present)
return radv_sqtt_wsi_submit(_queue, submitCount, pSubmits, _fence);
if (instance->vk.trace_per_submit)
if (instance->vk.trace_per_submit) {
/* Make sure to lock in case of multithreaded submissions. */
simple_mtx_lock(&device->sqtt.lock);
radv_sqtt_start_capturing(queue);
}
for (uint32_t i = 0; i < submitCount; i++) {
const VkSubmitInfo2 *pSubmit = &pSubmits[i];
@ -863,12 +866,17 @@ sqtt_QueueSubmit2(VkQueue _queue, uint32_t submitCount, const VkSubmitInfo2 *pSu
"radv: Failed to capture RGP for this submit because the buffer is too small and auto-resizing "
"is disabled. See RADV_THREAD_TRACE_BUFFER_SIZE for increasing the size.\n");
}
simple_mtx_unlock(&device->sqtt.lock);
}
return result;
fail:
FREE(new_cmdbufs);
if (instance->vk.trace_per_submit) {
simple_mtx_unlock(&device->sqtt.lock);
}
return result;
}

View file

@ -0,0 +1,31 @@
/*
* Copyright © 2026 Valve Corporation
*
* SPDX-License-Identifier: MIT
*/
#include "radv_cmd_buffer.h"
#include "radv_device.h"
#include "radv_entrypoints.h"
VKAPI_ATTR void VKAPI_CALL
strange_brigade_CmdPipelineBarrier2(VkCommandBuffer commandBuffer, const VkDependencyInfo *pDependencyInfo)
{
VK_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
struct radv_device *device = radv_cmd_buffer_device(cmd_buffer);
for (uint32_t i = 0; i < pDependencyInfo->imageMemoryBarrierCount; i++) {
VkImageMemoryBarrier2 *barrier = (VkImageMemoryBarrier2 *)&pDependencyInfo->pImageMemoryBarriers[i];
if (barrier->newLayout == VK_IMAGE_LAYOUT_PRESENT_SRC_KHR &&
barrier->srcAccessMask == VK_ACCESS_COLOR_ATTACHMENT_READ_BIT) {
/* This game has a broken barrier right before present that causes rendering issues. Fix it
* by modifying the src access mask.
*/
barrier->srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
break;
}
}
device->layer_dispatch.app.CmdPipelineBarrier2(commandBuffer, pDependencyInfo);
}

View file

@ -22,6 +22,7 @@ radv_entrypoints_gen_command += [
'--device-prefix', 'rage2',
'--device-prefix', 'quantic_dream',
'--device-prefix', 'no_mans_sky',
'--device-prefix', 'strange_brigade',
# Command buffer annotation layer entrypoints
'--device-prefix', 'annotate',
@ -42,6 +43,7 @@ libradv_files = files(
'layers/radv_rage2.c',
'layers/radv_quantic_dream.c',
'layers/radv_no_mans_sky.c',
'layers/radv_strange_brigade.c',
'layers/radv_rmv_layer.c',
'layers/radv_rra_layer.c',
'layers/radv_sqtt_layer.c',

View file

@ -97,6 +97,7 @@ enum radv_meta_object_key_type {
RADV_META_OBJECT_KEY_CLEAR_HIZ,
RADV_META_OBJECT_KEY_FAST_CLEAR_ELIMINATE,
RADV_META_OBJECT_KEY_DCC_DECOMPRESS,
RADV_META_OBJECT_KEY_DCC_DECOMPRESS_CS,
RADV_META_OBJECT_KEY_DCC_RETILE,
RADV_META_OBJECT_KEY_HTILE_EXPAND_GFX,
RADV_META_OBJECT_KEY_HTILE_EXPAND_CS,

View file

@ -1475,7 +1475,8 @@ radv_can_fast_clear_color(struct radv_cmd_buffer *cmd_buffer, const struct radv_
static void
radv_fast_clear_color(struct radv_cmd_buffer *cmd_buffer, const struct radv_image_view *iview,
const VkClearAttachment *clear_att, const VkClearRect *clear_rect,
enum radv_cmd_flush_bits *pre_flush, enum radv_cmd_flush_bits *post_flush)
enum radv_cmd_flush_bits *pre_flush, enum radv_cmd_flush_bits *post_flush,
uint32_t view_mask)
{
struct radv_device *device = radv_cmd_buffer_device(cmd_buffer);
const struct radv_physical_device *pdev = radv_device_physical(device);
@ -1488,7 +1489,8 @@ radv_fast_clear_color(struct radv_cmd_buffer *cmd_buffer, const struct radv_imag
.baseMipLevel = iview->vk.base_mip_level,
.levelCount = iview->vk.level_count,
.baseArrayLayer = iview->vk.base_array_layer + clear_rect->baseArrayLayer,
.layerCount = clear_rect->layerCount,
/* radv_can_fast_clear_color blocks multiview fast clears unless the viewmask contains all layers */
.layerCount = view_mask ? iview->vk.layer_count : clear_rect->layerCount,
};
if (pre_flush) {
@ -1575,7 +1577,7 @@ emit_clear(struct radv_cmd_buffer *cmd_buffer, const VkClearAttachment *clear_at
if (radv_can_fast_clear_color(cmd_buffer, color_att->iview, color_att->layout, clear_rect, clear_value,
view_mask)) {
radv_fast_clear_color(cmd_buffer, color_att->iview, clear_att, clear_rect, pre_flush, post_flush);
radv_fast_clear_color(cmd_buffer, color_att->iview, clear_att, clear_rect, pre_flush, post_flush, view_mask);
} else {
emit_color_clear(cmd_buffer, clear_att, clear_rect, view_mask);
}
@ -1877,7 +1879,7 @@ radv_fast_clear_range(struct radv_cmd_buffer *cmd_buffer, struct radv_image *ima
if (vk_format_is_color(format)) {
if (radv_can_fast_clear_color(cmd_buffer, &iview, image_layout, &clear_rect, clear_att.clearValue.color, 0)) {
radv_fast_clear_color(cmd_buffer, &iview, &clear_att, &clear_rect, NULL, NULL);
radv_fast_clear_color(cmd_buffer, &iview, &clear_att, &clear_rect, NULL, NULL, 0);
fast_cleared = true;
}
} else {

View file

@ -144,6 +144,40 @@ gfx_or_compute_copy_memory_to_image(struct radv_cmd_buffer *cmd_buffer, uint64_t
(use_compute ? RADV_META_SAVE_COMPUTE_PIPELINE : RADV_META_SAVE_GRAPHICS_PIPELINE) |
RADV_META_SAVE_CONSTANTS | RADV_META_SAVE_DESCRIPTORS);
if (use_compute) {
/* For partial copies, HTILE is decompressed before because image stores don't write the
* uncompressed DWORD to HTILE. And then it's needed to re-initialize HTILE to its
* uncompressed state after the copy.
*/
const bool is_partial_copy = region->imageOffset.x || region->imageOffset.y || region->imageOffset.z ||
region->imageExtent.width != image->vk.extent.width ||
region->imageExtent.height != image->vk.extent.height ||
region->imageExtent.depth != image->vk.extent.depth;
uint32_t queue_mask = radv_image_queue_family_mask(image, cmd_buffer->qf, cmd_buffer->qf);
if (radv_layout_is_htile_compressed(device, image, region->imageSubresource.mipLevel, layout, queue_mask) &&
is_partial_copy) {
radv_describe_barrier_start(cmd_buffer, RGP_BARRIER_UNKNOWN_REASON);
u_foreach_bit (i, region->imageSubresource.aspectMask) {
unsigned aspect_mask = 1u << i;
radv_expand_depth_stencil(
cmd_buffer, image,
&(VkImageSubresourceRange){
.aspectMask = aspect_mask,
.baseMipLevel = region->imageSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = region->imageSubresource.baseArrayLayer,
.layerCount = vk_image_subresource_layer_count(&image->vk, &region->imageSubresource),
},
NULL);
}
radv_describe_barrier_end(cmd_buffer);
}
}
/**
* From the Vulkan 1.0.6 spec: 18.3 Copying Data Between Images
* extent is the size in texels of the source image to copy in width,
@ -222,6 +256,27 @@ gfx_or_compute_copy_memory_to_image(struct radv_cmd_buffer *cmd_buffer, uint64_t
slice_array++;
}
if (use_compute) {
/* Fixup HTILE after a copy on compute. */
uint32_t queue_mask = radv_image_queue_family_mask(image, cmd_buffer->qf, cmd_buffer->qf);
if (radv_layout_is_htile_compressed(device, image, region->imageSubresource.mipLevel, layout, queue_mask)) {
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_CS_PARTIAL_FLUSH | RADV_CMD_FLAG_INV_VCACHE;
VkImageSubresourceRange range = {
.aspectMask = region->imageSubresource.aspectMask,
.baseMipLevel = region->imageSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = region->imageSubresource.baseArrayLayer,
.layerCount = vk_image_subresource_layer_count(&image->vk, &region->imageSubresource),
};
uint32_t htile_value = radv_get_htile_initial_value(device, image);
cmd_buffer->state.flush_bits |= radv_clear_htile(cmd_buffer, image, &range, htile_value, false);
}
}
radv_meta_restore(&saved_state, cmd_buffer);
}

View file

@ -8,6 +8,7 @@
#include <stdbool.h>
#include "nir/radv_meta_nir.h"
#include "radv_cs.h"
#include "radv_meta.h"
enum radv_color_op {
@ -19,7 +20,7 @@ enum radv_color_op {
static VkResult
get_dcc_decompress_compute_pipeline(struct radv_device *device, VkPipeline *pipeline_out, VkPipelineLayout *layout_out)
{
enum radv_meta_object_key_type key = RADV_META_OBJECT_KEY_DCC_DECOMPRESS;
enum radv_meta_object_key_type key = RADV_META_OBJECT_KEY_DCC_DECOMPRESS_CS;
VkResult result;
const VkDescriptorSetLayoutBinding bindings[] = {
@ -241,6 +242,7 @@ radv_process_color_image_layer(struct radv_cmd_buffer *cmd_buffer, struct radv_i
const VkImageSubresourceRange *range, int level, int layer, enum radv_color_op op)
{
struct radv_device *device = radv_cmd_buffer_device(cmd_buffer);
const struct radv_physical_device *pdev = radv_device_physical(device);
struct radv_image_view iview;
uint32_t width, height;
@ -303,9 +305,23 @@ radv_process_color_image_layer(struct radv_cmd_buffer *cmd_buffer, struct radv_i
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
if (op == FMASK_DECOMPRESS || op == DCC_DECOMPRESS)
if (op == FMASK_DECOMPRESS || op == DCC_DECOMPRESS) {
/* On GFX6-8, the CB FMASK cache writes corrupted data if cache lines are flushed after their
* context has been retired. To avoid this, we must flush the CB metadata caches immediately
* after every FMASK decompress.
*
* PAL only applies this workaround on GFX6 but GFX7-8 are also affected and that matches
* RadeonSI.
*/
if (pdev->info.gfx_level <= GFX8 && op == FMASK_DECOMPRESS) {
radeon_begin(cmd_buffer->cs);
radeon_event_write(V_028A90_FLUSH_AND_INV_CB_META);
radeon_end();
}
cmd_buffer->state.flush_bits |= radv_src_access_flush(cmd_buffer, VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT, 0, image, range);
}
const VkRenderingEndInfoKHR end_info = {
.sType = VK_STRUCTURE_TYPE_RENDERING_END_INFO_KHR,

View file

@ -467,7 +467,9 @@ radv_meta_resolve_depth_stencil_cs(struct radv_cmd_buffer *cmd_buffer, struct ra
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer), VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
const uint32_t push_constants[2] = {region->srcOffset.x, region->srcOffset.y};
const uint32_t push_constants[5] = {
region->srcOffset.x, region->srcOffset.y, region->dstOffset.x, region->dstOffset.y, region->dstOffset.z,
};
const VkPushConstantsInfoKHR pc_info = {
.sType = VK_STRUCTURE_TYPE_PUSH_CONSTANTS_INFO_KHR,

View file

@ -669,8 +669,8 @@ radv_meta_resolve_depth_stencil_fs(struct radv_cmd_buffer *cmd_buffer, struct ra
radv_CmdSetViewport(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1,
&(VkViewport){
.x = region->srcOffset.x,
.y = region->srcOffset.y,
.x = region->dstOffset.x,
.y = region->dstOffset.y,
.width = region->extent.width,
.height = region->extent.height,
.minDepth = 0.0f,
@ -679,6 +679,22 @@ radv_meta_resolve_depth_stencil_fs(struct radv_cmd_buffer *cmd_buffer, struct ra
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &resolve_area);
const uint32_t push_constants[2] = {
region->srcOffset.x - region->dstOffset.x,
region->srcOffset.y - region->dstOffset.y,
};
const VkPushConstantsInfoKHR push_constants_info = {
.sType = VK_STRUCTURE_TYPE_PUSH_CONSTANTS_INFO,
.layout = layout,
.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT,
.offset = 0,
.size = sizeof(push_constants),
.pValues = push_constants,
};
radv_CmdPushConstants2(radv_cmd_buffer_to_handle(cmd_buffer), &push_constants_info);
radv_CmdDraw(radv_cmd_buffer_to_handle(cmd_buffer), 3, 1, 0, 0);
const VkRenderingEndInfoKHR end_info = {

View file

@ -1395,19 +1395,21 @@ radv_meta_nir_build_depth_stencil_resolve_compute_shader(struct radv_device *dev
nir_def *global_id = radv_meta_nir_get_global_ids(&b, 3);
nir_def *offset = nir_load_push_constant(&b, 2, 32, nir_imm_int(&b, 0), .range = 8);
nir_def *src_offset = nir_load_push_constant(&b, 2, 32, nir_imm_int(&b, 0), .range = 8);
nir_def *dst_offset = nir_load_push_constant(&b, 3, 32, nir_imm_int(&b, 8), .range = 20);
nir_def *resolve_coord = nir_iadd(&b, nir_trim_vector(&b, global_id, 2), offset);
nir_def *src_coord = nir_iadd(&b, nir_trim_vector(&b, global_id, 2), src_offset);
nir_def *dst_coord = nir_iadd(&b, global_id, dst_offset);
nir_def *img_coord =
nir_vec3(&b, nir_channel(&b, resolve_coord, 0), nir_channel(&b, resolve_coord, 1), nir_channel(&b, global_id, 2));
nir_def *src_img_coord =
nir_vec3(&b, nir_channel(&b, src_coord, 0), nir_channel(&b, src_coord, 1), nir_channel(&b, global_id, 2));
nir_deref_instr *input_img_deref = nir_build_deref_var(&b, input_img);
nir_def *outval = nir_txf_ms(&b, img_coord, nir_imm_int(&b, 0), .texture_deref = input_img_deref);
nir_def *outval = nir_txf_ms(&b, src_img_coord, nir_imm_int(&b, 0), .texture_deref = input_img_deref);
if (resolve_mode != VK_RESOLVE_MODE_SAMPLE_ZERO_BIT) {
for (int i = 1; i < samples; i++) {
nir_def *si = nir_txf_ms(&b, img_coord, nir_imm_int(&b, i), .texture_deref = input_img_deref);
nir_def *si = nir_txf_ms(&b, src_img_coord, nir_imm_int(&b, i), .texture_deref = input_img_deref);
switch (resolve_mode) {
case VK_RESOLVE_MODE_AVERAGE_BIT:
@ -1435,8 +1437,8 @@ radv_meta_nir_build_depth_stencil_resolve_compute_shader(struct radv_device *dev
outval = nir_fdiv_imm(&b, outval, samples);
}
nir_def *coord = nir_vec4(&b, nir_channel(&b, img_coord, 0), nir_channel(&b, img_coord, 1),
nir_channel(&b, img_coord, 2), nir_undef(&b, 1, 32));
nir_def *coord = nir_vec4(&b, nir_channel(&b, dst_coord, 0), nir_channel(&b, dst_coord, 1),
nir_channel(&b, dst_coord, 2), nir_undef(&b, 1, 32));
nir_image_deref_store(&b, &nir_build_deref_var(&b, output_img)->def, coord, nir_undef(&b, 1, 32), outval,
nir_imm_int(&b, 0), .image_dim = GLSL_SAMPLER_DIM_2D, .image_array = true);
return b.shader;
@ -1495,10 +1497,11 @@ radv_meta_nir_build_depth_stencil_resolve_fragment_shader(struct radv_device *de
fs_out->data.location = index == RADV_META_DEPTH_RESOLVE ? FRAG_RESULT_DEPTH : FRAG_RESULT_STENCIL;
nir_def *pos_in = nir_trim_vector(&b, nir_load_frag_coord(&b), 2);
nir_def *src_offset = nir_load_push_constant(&b, 2, 32, nir_imm_int(&b, 0), .range = 8);
nir_def *pos_int = nir_f2i32(&b, pos_in);
nir_def *img_coord = nir_trim_vector(&b, pos_int, 2);
nir_def *img_coord = nir_trim_vector(&b, nir_iadd(&b, pos_int, src_offset), 2);
nir_deref_instr *input_img_deref = nir_build_deref_var(&b, input_img);
nir_def *outval = nir_txf_ms(&b, img_coord, nir_imm_int(&b, 0), .texture_deref = input_img_deref);

View file

@ -114,11 +114,32 @@ gather_tail_call_instrs_block(nir_function *caller, const struct nir_block *bloc
if (call->callee->num_params != caller->num_params)
return;
for (unsigned i = 0; i < call->num_params; ++i) {
for (unsigned i = 0; i < call->callee->num_params; ++i) {
if (call->callee->params[i].is_return != caller->params[i].is_return)
return;
if ((call->callee->params[i].driver_attributes & ACO_NIR_PARAM_ATTRIB_DISCARDABLE) &&
!(caller->params[i].driver_attributes & ACO_NIR_PARAM_ATTRIB_DISCARDABLE))
return;
bool has_preserved_regs =
(caller->driver_attributes & ACO_NIR_FUNCTION_ATTRIB_ABI_MASK) == ACO_NIR_CALL_ABI_AHIT_ISEC;
if (has_preserved_regs && ((call->callee->params[i].driver_attributes & ACO_NIR_PARAM_ATTRIB_DISCARDABLE) !=
(caller->params[i].driver_attributes & ACO_NIR_PARAM_ATTRIB_DISCARDABLE)))
return;
if (call->callee->params[i].is_uniform != caller->params[i].is_uniform)
return;
if (call->callee->params[i].bit_size != caller->params[i].bit_size)
return;
if (call->callee->params[i].num_components != caller->params[i].num_components)
return;
}
/* The call instruction itself has not been lowered to the new signature yet, so do this in a separate loop and
* adjust parameter indices for the caller.
*/
for (unsigned i = 0; i < call->num_params; ++i) {
unsigned caller_param_idx = i + ACO_NIR_CALL_SYSTEM_ARG_COUNT;
/* We can only do tail calls if the caller returns exactly the callee return values */
if (caller->params[i].is_return) {
if (caller->params[caller_param_idx].is_return) {
assert(nir_def_as_deref_or_null(call->params[i].ssa));
nir_deref_instr *deref_root = nir_def_as_deref(call->params[i].ssa);
while (nir_deref_instr_parent(deref_root))
@ -129,16 +150,18 @@ gather_tail_call_instrs_block(nir_function *caller, const struct nir_block *bloc
nir_intrinsic_instr *intrin = nir_def_as_intrinsic_or_null(deref_root->parent.ssa);
if (!intrin || intrin->intrinsic != nir_intrinsic_load_param)
return;
/* The call parameters aren't lowered at this point, we need to add the call arg count here */
if (nir_intrinsic_param_idx(intrin) != i + ACO_NIR_CALL_SYSTEM_ARG_COUNT)
if (nir_intrinsic_param_idx(intrin) != caller_param_idx)
return;
} else if (!(caller->params[caller_param_idx].driver_attributes & ACO_NIR_PARAM_ATTRIB_DISCARDABLE)) {
/* If the parameter is not marked as discardable, then we have to preserve the caller's value. Passing
* a modified value to a tail call leaves us unable to restore the original value, so bail out if we have
* modified parameters.
*/
nir_intrinsic_instr *intrin = nir_def_as_intrinsic_or_null(call->params[i].ssa);
if (!intrin || intrin->intrinsic != nir_intrinsic_load_param ||
nir_intrinsic_param_idx(intrin) != caller_param_idx)
return;
}
if (call->callee->params[i].is_uniform != caller->params[i].is_uniform)
return;
if (call->callee->params[i].bit_size != caller->params[i].bit_size)
return;
if (call->callee->params[i].num_components != caller->params[i].num_components)
return;
}
_mesa_set_add(tail_calls, instr);

View file

@ -144,6 +144,7 @@ radv_get_ray_query_type()
struct ray_query_vars {
nir_variable *var;
bool use_bvh_stack_rtn;
bool shared_stack;
uint32_t shared_base;
uint32_t stack_entries;
@ -162,13 +163,21 @@ init_ray_query_vars(nir_shader *shader, const glsl_type *opaque_type, struct ray
uint32_t shared_stack_entries = shader->info.ray_queries == 1 ? 16 : 8;
/* ds_bvh_stack* instructions use a fixed stride of 32 dwords. */
if (radv_use_bvh_stack_rtn(pdev))
workgroup_size = MAX2(workgroup_size, 32);
workgroup_size = align(workgroup_size, 32);
uint32_t shared_stack_size = workgroup_size * shared_stack_entries * 4;
uint32_t shared_offset = align(shader->info.shared_size, 4);
if (shader->info.stage != MESA_SHADER_COMPUTE || glsl_type_is_array(opaque_type) ||
shared_offset + shared_stack_size > pdev->max_shared_size) {
dst->stack_entries = MAX_SCRATCH_STACK_ENTRY_COUNT;
} else {
if (radv_use_bvh_stack_rtn(pdev)) {
/* The hardware ds_bvh_stack_rtn address can only encode a stack base up to 8191 dwords. */
uint32_t num_wave32_groups = workgroup_size / 32;
uint32_t max_group_stack_base = (num_wave32_groups - 1) * 32 * shared_stack_entries;
uint32_t max_stack_base = (shared_offset / 4) + max_group_stack_base;
dst->use_bvh_stack_rtn = max_stack_base < 8192;
}
dst->shared_stack = true;
dst->shared_base = shared_offset;
dst->stack_entries = shared_stack_entries;
@ -303,7 +312,7 @@ lower_rq_initialize(nir_builder *b, nir_intrinsic_instr *instr, struct ray_query
if (vars->shared_stack) {
nir_def *stack_idx = nir_load_local_invocation_index(b);
if (radv_use_bvh_stack_rtn(pdev)) {
if (vars->use_bvh_stack_rtn) {
uint32_t workgroup_size =
b->shader->info.workgroup_size[0] * b->shader->info.workgroup_size[1] * b->shader->info.workgroup_size[2];
nir_def *addr =
@ -563,7 +572,7 @@ lower_rq_proceed(nir_builder *b, nir_intrinsic_instr *instr, struct ray_query_va
};
if (vars->shared_stack) {
args.use_bvh_stack_rtn = radv_use_bvh_stack_rtn(pdev);
args.use_bvh_stack_rtn = vars->use_bvh_stack_rtn;
if (args.use_bvh_stack_rtn) {
args.stack_stride = 1;
args.stack_base = 0;

View file

@ -39,7 +39,7 @@ radv_nir_init_traversal_params(nir_function *function, unsigned payload_size)
function->params = rzalloc_array_size(function->shader, sizeof(nir_parameter), function->num_params);
radv_nir_init_common_rt_params(function);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_TRAVERSAL_ADDR, glsl_uint64_t_type(), true, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_SHADER_RECORD_PTR, glsl_uint64_t_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_SHADER_RECORD_PTR, glsl_uint64_t_type(), false, ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_ACCEL_STRUCT, glsl_uint64_t_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_CULL_MASK_AND_FLAGS, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_SBT_OFFSET, glsl_uint_type(), false, 0);
@ -49,12 +49,13 @@ radv_nir_init_traversal_params(nir_function *function, unsigned payload_size)
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_RAY_TMIN, glsl_float_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_RAY_DIRECTION, glsl_vector_type(GLSL_TYPE_UINT, 3), false,
0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_RAY_TMAX, glsl_float_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_PRIMITIVE_ADDR, glsl_uint64_t_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_PRIMITIVE_ID, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_INSTANCE_ADDR, glsl_uint64_t_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_GEOMETRY_ID_AND_FLAGS, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_HIT_KIND, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_RAY_TMAX, glsl_float_type(), false,
ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_PRIMITIVE_ADDR, glsl_uint64_t_type(), false, ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_PRIMITIVE_ID, glsl_uint_type(), false, ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_INSTANCE_ADDR, glsl_uint64_t_type(), false, ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_GEOMETRY_ID_AND_FLAGS, glsl_uint_type(), false, ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + TRAVERSAL_ARG_HIT_KIND, glsl_uint_type(), false, ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
for (unsigned i = 0; i < DIV_ROUND_UP(payload_size, 4); ++i) {
radv_nir_return_param_from_type(function->params + TRAVERSAL_ARG_PAYLOAD_BASE + i, glsl_uint_type(), false, 0);
}
@ -128,15 +129,11 @@ radv_nir_init_rt_function_params(nir_function *function, mesa_shader_stage stage
radv_nir_init_common_rt_params(function);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_TRAVERSAL_ADDR, glsl_uint64_t_type(), true, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_SHADER_RECORD_PTR, glsl_uint64_t_type(), false, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_ACCEL_STRUCT, glsl_uint64_t_type(), false,
ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_ACCEL_STRUCT, glsl_uint64_t_type(), false, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_CULL_MASK_AND_FLAGS, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_SBT_OFFSET, glsl_uint_type(), false,
ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_SBT_STRIDE, glsl_uint_type(), false,
ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_MISS_INDEX, glsl_uint_type(), false,
ACO_NIR_PARAM_ATTRIB_DISCARDABLE);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_SBT_OFFSET, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_SBT_STRIDE, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_MISS_INDEX, glsl_uint_type(), false, 0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_RAY_ORIGIN, glsl_vector_type(GLSL_TYPE_UINT, 3), false,
0);
radv_nir_param_from_type(function->params + CHIT_MISS_ARG_RAY_TMIN, glsl_float_type(), false, 0);

View file

@ -9550,9 +9550,9 @@ radv_handle_color_fbfetch_output(struct radv_cmd_buffer *cmd_buffer, uint32_t in
radv_describe_barrier_start(cmd_buffer, RGP_BARRIER_UNKNOWN_REASON);
/* Force a transition to FEEDBACK_LOOP_OPTIMAL to decompress DCC. */
radv_handle_image_transition(cmd_buffer, att->iview->image, att->layout,
VK_IMAGE_LAYOUT_ATTACHMENT_FEEDBACK_LOOP_OPTIMAL_EXT, RADV_QUEUE_GENERAL,
RADV_QUEUE_GENERAL, &range, NULL);
radv_handle_rendering_image_transition(
cmd_buffer, att->iview, render->layer_count, render->view_mask, att->layout, VK_IMAGE_LAYOUT_UNDEFINED,
VK_IMAGE_LAYOUT_ATTACHMENT_FEEDBACK_LOOP_OPTIMAL_EXT, VK_IMAGE_LAYOUT_UNDEFINED, NULL);
radv_describe_barrier_end(cmd_buffer);
@ -9597,9 +9597,10 @@ radv_handle_depth_fbfetch_output(struct radv_cmd_buffer *cmd_buffer)
radv_describe_barrier_start(cmd_buffer, RGP_BARRIER_UNKNOWN_REASON);
/* Force a transition to FEEDBACK_LOOP_OPTIMAL to decompress HTILE. */
radv_handle_image_transition(cmd_buffer, att->iview->image, att->layout,
VK_IMAGE_LAYOUT_ATTACHMENT_FEEDBACK_LOOP_OPTIMAL_EXT, RADV_QUEUE_GENERAL,
RADV_QUEUE_GENERAL, &range, NULL);
radv_handle_rendering_image_transition(cmd_buffer, att->iview, render->layer_count, render->view_mask, att->layout,
att->stencil_layout, VK_IMAGE_LAYOUT_ATTACHMENT_FEEDBACK_LOOP_OPTIMAL_EXT,
VK_IMAGE_LAYOUT_ATTACHMENT_FEEDBACK_LOOP_OPTIMAL_EXT,
render->sample_locations.count > 0 ? &render->sample_locations : NULL);
radv_describe_barrier_end(cmd_buffer);
@ -9642,16 +9643,19 @@ radv_CmdExecuteCommands(VkCommandBuffer commandBuffer, uint32_t commandBufferCou
VK_FROM_HANDLE(radv_cmd_buffer, primary, commandBuffer);
struct radv_device *device = radv_cmd_buffer_device(primary);
const struct radv_physical_device *pdev = radv_device_physical(device);
const bool is_gfx_or_ace = primary->qf == RADV_QUEUE_GENERAL || primary->qf == RADV_QUEUE_COMPUTE;
assert(commandBufferCount > 0);
radv_emit_mip_change_flush_default(primary);
if (is_gfx_or_ace) {
radv_emit_mip_change_flush_default(primary);
/* Emit pending flushes on primary prior to executing secondary */
radv_emit_cache_flush(primary);
/* Emit pending flushes on primary prior to executing secondary */
radv_emit_cache_flush(primary);
/* Make sure CP DMA is idle on primary prior to executing secondary. */
radv_cp_dma_wait_for_idle(primary);
/* Make sure CP DMA is idle on primary prior to executing secondary. */
radv_cp_dma_wait_for_idle(primary);
}
for (uint32_t i = 0; i < commandBufferCount; i++) {
VK_FROM_HANDLE(radv_cmd_buffer, secondary, pCmdBuffers[i]);
@ -9694,6 +9698,9 @@ radv_CmdExecuteCommands(VkCommandBuffer commandBuffer, uint32_t commandBufferCou
if (primary->state.dirty & RADV_CMD_DIRTY_FBFETCH_OUTPUT) {
radv_handle_fbfetch_output(primary);
primary->state.dirty &= ~RADV_CMD_DIRTY_FBFETCH_OUTPUT;
/* Emit pending flushes if a late decompression was performed. */
radv_emit_cache_flush(primary);
}
if (primary->state.render.active && (primary->state.dirty & RADV_CMD_DIRTY_FRAMEBUFFER)) {
@ -9769,23 +9776,12 @@ radv_CmdExecuteCommands(VkCommandBuffer commandBuffer, uint32_t commandBufferCou
device->ws->cs_execute_secondary(primary_cs->b, secondary_cs->b, allow_ib2);
/* When the secondary command buffer is compute only we don't
* need to re-emit the current graphics pipeline.
*/
if (secondary->state.emitted_graphics_pipeline) {
primary->state.emitted_graphics_pipeline = secondary->state.emitted_graphics_pipeline;
}
primary->state.emitted_graphics_pipeline = secondary->state.emitted_graphics_pipeline;
primary->state.emitted_compute_pipeline = secondary->state.emitted_compute_pipeline;
primary->state.emitted_rt_pipeline = secondary->state.emitted_rt_pipeline;
/* When the secondary command buffer is graphics only we don't
* need to re-emit the current compute pipeline.
*/
if (secondary->state.emitted_compute_pipeline) {
primary->state.emitted_compute_pipeline = secondary->state.emitted_compute_pipeline;
}
if (secondary->state.emitted_rt_pipeline) {
primary->state.emitted_rt_pipeline = secondary->state.emitted_rt_pipeline;
}
primary->state.ps_epilog = secondary->state.ps_epilog;
primary->state.emitted_vs_prolog = secondary->state.emitted_vs_prolog;
if (secondary->state.last_ia_multi_vgt_param) {
primary->state.last_ia_multi_vgt_param = secondary->state.last_ia_multi_vgt_param;
@ -15174,10 +15170,19 @@ radv_CmdBeginTransformFeedbackEXT(VkCommandBuffer commandBuffer, uint32_t firstC
assert(firstCounterBuffer + counterBufferCount <= MAX_SO_BUFFERS);
if (pdev->info.gfx_level >= GFX12)
if (pdev->info.gfx_level >= GFX12) {
radv_init_streamout_state(cmd_buffer);
else if (!pdev->use_ngg_streamout)
/* Invalidate L2 in case the buffer filled size needs to be saved because COPY_DATA isn't
* coherent with L2.
*/
if (pdev->info.cp_sdma_ge_use_system_memory_scope) {
cmd_buffer->state.flush_bits |= RADV_CMD_FLAG_INV_L2;
radv_emit_cache_flush(cmd_buffer);
}
} else if (!pdev->use_ngg_streamout) {
radv_flush_vgt_streamout(cmd_buffer);
}
ASSERTED unsigned cdw_max = radeon_check_space(device->ws, cs->b, MAX_SO_BUFFERS * 10);

View file

@ -390,8 +390,8 @@ static void
radv_add_split_disasm(const char *disasm, uint64_t start_addr, unsigned *num, struct radv_shader_inst *instructions)
{
struct radv_shader_inst *last_inst = *num ? &instructions[*num - 1] : NULL;
char *next;
char *repeat = strstr(disasm, "then repeated");
const char *next;
const char *repeat = strstr(disasm, "then repeated");
while ((next = strchr(disasm, '\n'))) {
struct radv_shader_inst *inst = &instructions[*num];

View file

@ -786,6 +786,8 @@ init_dispatch_tables(struct radv_device *device, struct radv_physical_device *pd
add_entrypoints(&b, &quantic_dream_device_entrypoints, RADV_APP_DISPATCH_TABLE);
} else if (!strcmp(instance->drirc.debug.app_layer, "no_mans_sky")) {
add_entrypoints(&b, &no_mans_sky_device_entrypoints, RADV_APP_DISPATCH_TABLE);
} else if (!strcmp(instance->drirc.debug.app_layer, "strange_brigade")) {
add_entrypoints(&b, &strange_brigade_device_entrypoints, RADV_APP_DISPATCH_TABLE);
}
if (instance->vk.trace_mode & RADV_TRACE_MODE_RGP)
@ -1239,7 +1241,13 @@ radv_CreateDevice(VkPhysicalDevice physicalDevice, const VkDeviceCreateInfo *pCr
device->ws = pdev->ws;
device->vk.sync = device->ws->get_sync_provider(device->ws);
device->vk.copy_sync_payloads = pdev->ws->copy_sync_payloads;
/* Disable unordered submits when SQTT queue events are enabled because queue present events
* might be missing otherwise.
*/
device->vk.copy_sync_payloads = ((instance->vk.trace_mode & RADV_TRACE_MODE_RGP) && radv_sqtt_queue_events_enabled())
? NULL
: pdev->ws->copy_sync_payloads;
/* Enable the global BO list by default. */
/* TODO: Remove the per cmdbuf BO list tracking after few Mesa releases if no blockers. */

View file

@ -500,9 +500,9 @@ radv_image_view_init(struct radv_image_view *iview, struct radv_device *device,
if (!extra_create_info || !extra_create_info->from_client)
assert(pCreateInfo->flags & VK_IMAGE_VIEW_CREATE_DRIVER_INTERNAL_BIT_MESA);
vk_image_view_init(&device->vk, &iview->vk, pCreateInfo);
memset(&iview->descriptor, 0, sizeof(iview->descriptor));
memset(iview, 0, sizeof(*iview));
vk_image_view_init(&device->vk, &iview->vk, pCreateInfo);
iview->image = image;
iview->plane_id = radv_plane_from_aspect(pCreateInfo->subresourceRange.aspectMask);
@ -664,13 +664,13 @@ radv_hiz_image_view_init(struct radv_image_view *iview, struct radv_device *devi
VK_FROM_HANDLE(radv_image, image, pCreateInfo->image);
assert(pCreateInfo->flags & VK_IMAGE_VIEW_CREATE_DRIVER_INTERNAL_BIT_MESA);
memset(iview, 0, sizeof(*iview));
vk_image_view_init(&device->vk, &iview->vk, pCreateInfo);
assert(vk_format_has_depth(image->vk.format) && vk_format_has_stencil(image->vk.format));
assert(iview->vk.aspects == VK_IMAGE_ASPECT_DEPTH_BIT);
memset(&iview->descriptor, 0, sizeof(iview->descriptor));
iview->image = image;
const uint32_t type =

View file

@ -1662,7 +1662,7 @@ radv_graphics_shaders_link_varyings(struct radv_shader_stage *stages, enum amd_g
/* Scalarize all I/O, because nir_opt_varyings and nir_opt_vectorize_io expect all I/O to be scalarized. */
nir_variable_mode sca_mode = nir_var_shader_in;
bool sca_progress;
bool sca_progress = false;
if (s != MESA_SHADER_FRAGMENT)
sca_mode |= nir_var_shader_out;

View file

@ -674,7 +674,7 @@ radv_rt_compile_shaders(struct radv_device *device, struct vk_pipeline_cache *ca
bool can_use_monolithic = !library && pipeline->stage_count < 50;
for (uint32_t i = 0; i < pCreateInfo->stageCount; i++) {
if (rt_stages[i].shader || rt_stages[i].nir)
if (rt_stages[i].nir)
continue;
int64_t stage_start = os_time_get_nano();
@ -749,7 +749,7 @@ radv_rt_compile_shaders(struct radv_device *device, struct vk_pipeline_cache *ca
inline_any_hit_shaders |= raygen_lowering_mode == RADV_RT_LOWERING_MODE_MONOLITHIC && !raygen_imported;
for (uint32_t idx = 0; idx < pCreateInfo->stageCount; idx++) {
if (rt_stages[idx].shader || rt_stages[idx].nir)
if (rt_stages[idx].nir)
continue;
int64_t stage_start = os_time_get_nano();
@ -1462,17 +1462,39 @@ radv_GetRayTracingShaderGroupStackSizeKHR(VkDevice device, VkPipeline _pipeline,
VK_FROM_HANDLE(radv_pipeline, pipeline, _pipeline);
struct radv_ray_tracing_pipeline *rt_pipeline = radv_pipeline_to_ray_tracing(pipeline);
struct radv_ray_tracing_group *rt_group = &rt_pipeline->groups[group];
struct radv_ray_tracing_stage *shader_stage;
switch (groupShader) {
case VK_SHADER_GROUP_SHADER_GENERAL_KHR:
case VK_SHADER_GROUP_SHADER_CLOSEST_HIT_KHR:
return rt_pipeline->stages[rt_group->recursive_shader].stack_size;
shader_stage = &rt_pipeline->stages[rt_group->recursive_shader];
break;
case VK_SHADER_GROUP_SHADER_ANY_HIT_KHR:
return rt_pipeline->stages[rt_group->any_hit_shader].stack_size;
/* If the any-hit shader is inlined into an intersection shader, there is no stack specific to the any-hit shader
* and all stack will be allocated for the intersection shader instead.
*/
if (rt_group->intersection_shader != VK_SHADER_UNUSED_KHR)
return 0;
shader_stage = &rt_pipeline->stages[rt_group->any_hit_shader];
break;
case VK_SHADER_GROUP_SHADER_INTERSECTION_KHR:
return rt_pipeline->stages[rt_group->intersection_shader].stack_size;
shader_stage = &rt_pipeline->stages[rt_group->intersection_shader];
break;
default:
return 0;
}
uint32_t stack_size = shader_stage->stack_size;
/* Applications need to allocate stack for the traversal shader, too. The API doesn't intend for a constant
* traversal stack size, so add the stack size to every shader potentially called by the traversal shader.
* Applications are expected to max() shader stages together, so this shouldn't result in any unnecessary stack
* usage.
*/
if (shader_stage->stage == MESA_SHADER_CLOSEST_HIT || shader_stage->stage == MESA_SHADER_ANY_HIT ||
shader_stage->stage == MESA_SHADER_INTERSECTION || shader_stage->stage == MESA_SHADER_MISS)
stack_size += rt_pipeline->traversal_stack_size;
return stack_size;
}
VKAPI_ATTR VkResult VKAPI_CALL

View file

@ -790,7 +790,7 @@ rra_map_accel_struct_data(struct rra_copy_context *ctx, uint32_t i)
if (radv_GetEventStatus(ctx->device, data->build_event) != VK_EVENT_SET)
return NULL;
if (data->buffer->memory) {
if (data->buffer && data->buffer->memory) {
VkMemoryMapInfo memory_map_info = {
.sType = VK_STRUCTURE_TYPE_MEMORY_MAP_INFO,
.memory = data->buffer->memory,

View file

@ -216,6 +216,7 @@ radv_sdma_get_surf(const struct radv_device *const device, const struct radv_ima
.texel_scale = radv_sdma_get_texel_scale(image),
.is_linear = surf->is_linear,
.is_3d = surf->u.gfx9.resource_type == RADEON_RESOURCE_3D,
.is_stencil = subresource.aspectMask == VK_IMAGE_ASPECT_STENCIL_BIT,
};
const uint64_t surf_offset = (subresource.aspectMask == VK_IMAGE_ASPECT_STENCIL_BIT) ? surf->u.gfx9.zs.stencil_offset
@ -371,6 +372,7 @@ radv_sdma_emit_copy_tiled_sub_window(const struct radv_device *device, struct ra
.va = tiled->va,
.format = radv_format_to_pipe_format(tiled->aspect_format),
.bpp = tiled->bpp,
.is_stencil = tiled->is_stencil,
.offset =
{
.x = tiled_off.x,
@ -414,6 +416,7 @@ radv_sdma_emit_copy_t2t_sub_window(const struct radv_device *device, struct radv
.va = src->va,
.format = radv_format_to_pipe_format(src->aspect_format),
.bpp = src->bpp,
.is_stencil = src->is_stencil,
.offset =
{
.x = src_off.x,
@ -439,6 +442,7 @@ radv_sdma_emit_copy_t2t_sub_window(const struct radv_device *device, struct radv
.va = dst->va,
.format = radv_format_to_pipe_format(dst->aspect_format),
.bpp = dst->bpp,
.is_stencil = dst->is_stencil,
.offset =
{
.x = dst_off.x,
@ -606,12 +610,6 @@ radv_sdma_use_t2t_scanline_copy(const struct radv_device *device, const struct r
return true;
}
/* The two images can have a different block size,
* but must have the same swizzle mode.
*/
if (src->micro_tile_mode != dst->micro_tile_mode)
return true;
/* The T2T subwindow copy packet only has fields for one metadata configuration.
* It can either compress or decompress, or copy uncompressed images, but it
* can't copy from a compressed image to another.
@ -619,6 +617,16 @@ radv_sdma_use_t2t_scanline_copy(const struct radv_device *device, const struct r
if (src->is_compressed && dst->is_compressed)
return true;
if (ver >= SDMA_7_0) {
/* No support for tiling format transformation at all. */
if (src->surf->u.gfx9.swizzle_mode != dst->surf->u.gfx9.swizzle_mode)
return true;
} else {
/* The two images can have a different block size, but must have the same swizzle mode. */
if (src->micro_tile_mode != dst->micro_tile_mode)
return true;
}
const bool needs_3d_alignment = src->is_3d && (src->micro_tile_mode == RADEON_MICRO_MODE_DISPLAY ||
src->micro_tile_mode == RADEON_MICRO_MODE_STANDARD);
const unsigned log2bpp = util_logbase2(src->bpp);

View file

@ -31,6 +31,7 @@ struct radv_sdma_surf {
uint8_t texel_scale; /* Texel scale for 96-bit formats */
bool is_linear; /* Whether the image is linear. */
bool is_3d; /* Whether the image is 3-dimensional. */
bool is_stencil; /* Whether the image is stencil only. */
union {
/* linear images only */

View file

@ -655,15 +655,24 @@ radv_shader_spirv_to_nir(struct radv_device *device, const struct radv_shader_st
NIR_PASS(_, nir, nir_lower_compute_system_values, &csv_options);
}
bool lower_local_invocation_index = false;
if (nir->info.derivative_group == DERIVATIVE_GROUP_QUADS &&
((nir->info.stage == MESA_SHADER_COMPUTE || nir->info.stage == MESA_SHADER_TASK ||
(nir->info.stage == MESA_SHADER_MESH && pdev->info.mesh_fast_launch_2)))) {
lower_local_invocation_index = true;
} else if (nir->info.stage == MESA_SHADER_COMPUTE &&
(((nir->info.workgroup_size[0] == 1) + (nir->info.workgroup_size[1] == 1) +
(nir->info.workgroup_size[2] == 1)) == 2)) {
lower_local_invocation_index = true;
}
nir_lower_compute_system_values_options csv_options = {
/* Mesh shaders run as NGG which can implement local_invocation_index from
* the wave ID in merged_wave_info, but they don't have local_invocation_ids on GFX10.3.
*/
.lower_cs_local_id_to_index = nir->info.stage == MESA_SHADER_MESH && !pdev->info.mesh_fast_launch_2,
.lower_local_invocation_index = nir->info.stage == MESA_SHADER_COMPUTE &&
((((nir->info.workgroup_size[0] == 1) + (nir->info.workgroup_size[1] == 1) +
(nir->info.workgroup_size[2] == 1)) == 2) ||
nir->info.derivative_group == DERIVATIVE_GROUP_QUADS),
.lower_local_invocation_index = lower_local_invocation_index,
};
NIR_PASS(_, nir, nir_lower_compute_system_values, &csv_options);

View file

@ -950,8 +950,8 @@ radv_GetPhysicalDeviceVideoCapabilitiesKHR(VkPhysicalDevice physicalDevice, cons
struct VkVideoDecodeH265CapabilitiesKHR *ext =
vk_find_struct(pCapabilities->pNext, VIDEO_DECODE_H265_CAPABILITIES_KHR);
pCapabilities->maxDpbSlots = RADV_VIDEO_H264_MAX_DPB_SLOTS;
pCapabilities->maxActiveReferencePictures = RADV_VIDEO_H264_MAX_NUM_REF_FRAME;
pCapabilities->maxDpbSlots = RADV_VIDEO_H265_MAX_DPB_SLOTS;
pCapabilities->maxActiveReferencePictures = RADV_VIDEO_H265_MAX_NUM_REF_FRAME;
/* for h265 on navi21+ separate dpb images should work */
if (radv_enable_tier2(pdev))
pCapabilities->flags |= VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR;
@ -2320,22 +2320,6 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.tx_mode = pi->TxMode;
result.reference_mode = (pi->flags.reference_select == 1) ? 2 : 0;
if (pi->pTileInfo) {
result.tile_cols = pi->pTileInfo->TileCols;
result.tile_rows = pi->pTileInfo->TileRows;
result.tile_size_bytes = pi->pTileInfo->tile_size_bytes_minus_1;
result.context_update_tile_id = pi->pTileInfo->context_update_tile_id;
for (i = 0; i < result.tile_cols; i++)
result.tile_col_start_sb[i] = pi->pTileInfo->pMiColStarts[i];
result.tile_col_start_sb[result.tile_cols] =
result.tile_col_start_sb[result.tile_cols - 1] + pi->pTileInfo->pWidthInSbsMinus1[result.tile_cols - 1] + 1;
for (i = 0; i < pi->pTileInfo->TileRows; i++)
result.tile_row_start_sb[i] = pi->pTileInfo->pMiRowStarts[i];
result.tile_row_start_sb[result.tile_rows] =
result.tile_row_start_sb[result.tile_rows - 1] + pi->pTileInfo->pHeightInSbsMinus1[result.tile_rows - 1] + 1;
}
result.max_width = seq_hdr->max_frame_width_minus_1 + 1;
result.max_height = seq_hdr->max_frame_height_minus_1 + 1;
VkExtent2D frameExtent = frame_info->dstPictureResource.codedExtent;
@ -2351,6 +2335,44 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.superres_upscaled_width = frameExtent.width;
if (pi->pTileInfo) {
result.tile_cols = pi->pTileInfo->TileCols;
result.tile_rows = pi->pTileInfo->TileRows;
result.tile_size_bytes = pi->pTileInfo->tile_size_bytes_minus_1;
result.context_update_tile_id = pi->pTileInfo->context_update_tile_id;
/* pMi{Row,Col}Starts is unreliable, some apps send SB, some send MI, so use
* p{Width,Height}InSbsMinus1 instead. But for uniform_tile_spacing_flag,
* those are not defined by spec. */
if (pi->pTileInfo->flags.uniform_tile_spacing_flag) {
const unsigned sb_size = seq_hdr->flags.use_128x128_superblock ? 128 : 64;
const unsigned sb_width = DIV_ROUND_UP(result.width, sb_size);
const unsigned sb_height = DIV_ROUND_UP(result.height, sb_size);
const unsigned tile_width_sb = DIV_ROUND_UP(sb_width, result.tile_cols);
const unsigned tile_height_sb = DIV_ROUND_UP(sb_height, result.tile_rows);
result.tile_col_start_sb[0] = 0;
for (i = 1; i < result.tile_cols; ++i)
result.tile_col_start_sb[i] = result.tile_col_start_sb[i - 1] + tile_width_sb;
result.tile_col_start_sb[i] = sb_width;
result.tile_row_start_sb[0] = 0;
for (i = 1; i < result.tile_rows; ++i)
result.tile_row_start_sb[i] = result.tile_row_start_sb[i - 1] + tile_height_sb;
result.tile_row_start_sb[i] = sb_height;
} else {
result.tile_col_start_sb[0] = 0;
assert(pi->pTileInfo->pMiColStarts[0] == 0);
for (i = 0; i < result.tile_cols; ++i)
result.tile_col_start_sb[i + 1] = result.tile_col_start_sb[i] + pi->pTileInfo->pWidthInSbsMinus1[i] + 1;
result.tile_row_start_sb[0] = 0;
assert(pi->pTileInfo->pMiRowStarts[0] == 0);
for (i = 0; i < result.tile_rows; ++i)
result.tile_row_start_sb[i + 1] = result.tile_row_start_sb[i] + pi->pTileInfo->pHeightInSbsMinus1[i] + 1;
}
}
result.order_hint_bits = seq_hdr->order_hint_bits_minus_1 + 1;
/* The VCN FW will evict references that aren't specified in

View file

@ -376,13 +376,15 @@ hk_bind_descriptor_sets(UNUSED struct hk_cmd_buffer *cmd,
*
* This means that, if some earlier set gets bound in such a way that
* it changes set_dynamic_buffer_start[s], this binding is implicitly
* invalidated. Therefore, we can always look at the current value
* of set_dynamic_buffer_start[s] as the base of our dynamic buffer
* range and it's only our responsibility to adjust all
* set_dynamic_buffer_start[p] for p > s as needed.
* invalidated.
*/
uint8_t dyn_buffer_start =
desc->root.set_dynamic_buffer_start[info->firstSet];
uint8_t dyn_buffer_start = 0u;
for (uint32_t i = 0u; i < info->firstSet; ++i) {
const struct hk_descriptor_set_layout *set_layout =
vk_to_hk_descriptor_set_layout(pipeline_layout->set_layouts[i]);
if (set_layout)
dyn_buffer_start += set_layout->dynamic_buffer_count;
}
uint32_t next_dyn_offset = 0;
for (uint32_t i = 0; i < info->descriptorSetCount; ++i) {
@ -427,10 +429,6 @@ hk_bind_descriptor_sets(UNUSED struct hk_cmd_buffer *cmd,
assert(dyn_buffer_start <= HK_MAX_DYNAMIC_BUFFERS);
assert(next_dyn_offset <= info->dynamicOffsetCount);
for (uint32_t s = info->firstSet + info->descriptorSetCount; s < HK_MAX_SETS;
s++)
desc->root.set_dynamic_buffer_start[s] = dyn_buffer_start;
desc->root_dirty = true;
}

View file

@ -3212,6 +3212,9 @@ hk_handle_passthrough_gs(struct hk_cmd_buffer *cmd, struct agx_draw draw)
struct hk_graphics_state *gfx = &cmd->state.gfx;
struct hk_api_shader *gs = gfx->shaders[MESA_SHADER_GEOMETRY];
if (!IS_SHADER_DIRTY(VERTEX) && !IS_SHADER_DIRTY(GEOMETRY))
return;
/* If there's an application geometry shader, there's nothing to un/bind */
if (gs && !gs->is_passthrough)
return;
@ -3221,20 +3224,17 @@ hk_handle_passthrough_gs(struct hk_cmd_buffer *cmd, struct agx_draw draw)
uint32_t xfb_outputs = last_sw->info.xfb_info.output_count;
bool needs_gs = xfb_outputs;
/* If we already have a matching GS configuration, we're done */
if ((gs != NULL) == needs_gs)
return;
/* If we don't need a GS but we do have a passthrough, unbind it */
if (gs) {
assert(!needs_gs && gs->is_passthrough);
hk_cmd_bind_graphics_shader(cmd, MESA_SHADER_GEOMETRY, NULL);
if (!needs_gs) {
if (gs != NULL) {
assert(gs->is_passthrough);
hk_cmd_bind_graphics_shader(cmd, MESA_SHADER_GEOMETRY, NULL);
}
return;
}
/* Else, we need to bind a passthrough GS */
size_t key_size =
sizeof(struct hk_passthrough_gs_key) + nir_xfb_info_size(xfb_outputs);
size_t key_size = hk_passthrough_gs_key_size(xfb_outputs);
struct hk_passthrough_gs_key *key = alloca(key_size);
*key = (struct hk_passthrough_gs_key){

View file

@ -1493,7 +1493,12 @@ hk_CmdFillBuffer(VkCommandBuffer commandBuffer, VkBuffer dstBuffer,
uint64_t addr =
vk_meta_buffer_address(&dev->vk, dstBuffer, dstOffset, dstRange);
libagx_fill(cmd, agx_1d(range / 4), AGX_BARRIER_ALL, addr, data);
if (util_is_aligned(addr, 16) && util_is_aligned(range, 16)) {
libagx_fill_uint4(cmd, agx_2d(range / 16, 1), AGX_BARRIER_ALL,
addr, 0, data, data, data, data);
} else {
libagx_fill(cmd, agx_1d(range / 4), AGX_BARRIER_ALL, addr, data);
}
}
VKAPI_ATTR void VKAPI_CALL

View file

@ -387,8 +387,16 @@ struct hk_passthrough_gs_key {
/* Decomposed primitive */
enum mesa_prim prim;
/* Transform feedback info. Must add nir_xfb_info_size to get the key size */
/* Transform feedback info. Must use hk_passthrough_gs_key_size to get the
* key size */
nir_xfb_info xfb_info;
};
static inline size_t
hk_passthrough_gs_key_size(uint16_t output_count)
{
return (sizeof(struct hk_passthrough_gs_key) - sizeof(nir_xfb_info)) +
nir_xfb_info_size(output_count);
}
void hk_nir_passthrough_gs(struct nir_builder *b, const void *key_);

View file

@ -853,7 +853,6 @@ spec@!opengl 1.1@polygon-mode-offset@config 6: Expected blue pixel in center,Fai
spec@!opengl 1.1@polygon-mode-offset@config 6: Expected white pixel on right edge,Fail
spec@!opengl 1.1@polygon-mode-offset@config 6: Expected white pixel on top edge,Fail
spec@!opengl 1.1@texsubimage-unpack,Fail
spec@!opengl 1.1@texwrap 2d proj,Fail
spec@!opengl 1.1@texwrap 2d proj@GL_RGBA8- NPOT- projected,Fail
spec@!opengl 1.1@texwrap 2d proj@GL_RGBA8- projected,Fail
@ -953,7 +952,6 @@ spec@arb_occlusion_query@occlusion_query_conform,Fail
spec@arb_occlusion_query@occlusion_query_conform@GetObjivAval_multi2,Fail
spec@arb_pixel_buffer_object@fbo-pbo-readpixels-small,Fail
spec@arb_pixel_buffer_object@pbo-getteximage,Fail
spec@arb_pixel_buffer_object@texsubimage-unpack pbo,Fail
spec@arb_point_sprite@arb_point_sprite-mipmap,Fail
spec@arb_provoking_vertex@arb-provoking-vertex-render,Fail
spec@arb_sampler_objects@sampler-objects,Fail

View file

@ -861,93 +861,6 @@ ubsan-dEQP-VK.image.mutable.2d_array.r16g16b16a16_sfloat_r16g16b16a16_uint_draw_
ubsan-dEQP-VK.image.mutable.2d_array.r32_uint_r8g8b8a8_sint_draw_copy_resolve_mutable_color_att,Fail
ubsan-dEQP-VK.pipeline.monolithic.logic_op_na_formats.r16g16_sfloat.nand_blend,Fail
# New failures with ES CTS 3.2.13.0
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32i_rgba32i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32i_rgba32i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16i_r16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16i_r16i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16ui_r16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16ui_r16ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8i_rg8i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8i_rg8i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8_rg8.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8_rg8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8ui_rg8ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8ui_rg8ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16i_rg16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16i_rg16i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16ui_rg16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16ui_rg16ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16f.renderbuffer_to_texture2d,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16i.renderbuffer_to_texture2d,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16ui.renderbuffer_to_texture2d,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8i_rgba8i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8i_rgba8i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8_rgba8.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8_rgba8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8ui_rgba8ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8ui_rgba8ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.srgb8_alpha8_srgb8_alpha8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32ui_rg32ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32ui_rg32ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16i_rgba16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16i_rgba16i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16ui_rgba16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16ui_rgba16ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8i_r8i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8i_r8i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8_r8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16i_r16i.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8_rg8.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8i_rg8i.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8ui_rg8ui.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16ui_rg16ui.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16f.renderbuffer_to_texture2d,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16i.renderbuffer_to_texture2d,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16ui.renderbuffer_to_texture2d,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8_rgba8.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8i_rgba8i.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8ui_rgba8ui.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.srgb8_alpha8_srgb8_alpha8.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32ui_rg32ui.renderbuffer_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16i_rgba16i.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8_r8.texture2d_to_renderbuffer,Fail
arm32-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.renderbuffer_to_renderbuffer,Fail
ubsan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.renderbuffer_to_renderbuffer,Fail
ubsan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.renderbuffer_to_renderbuffer,Fail
ubsan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.texture2d_to_renderbuffer,Fail
ubsan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.texture2d_to_renderbuffer,Fail
ubsan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.renderbuffer_to_renderbuffer,Fail
# SKQP failing tests
ES2BlendWithNoTexture,Fail
SRGBReadWritePixels,Fail

View file

@ -701,84 +701,6 @@ dEQP-VK.binding_model.unused_invalid_descriptor.write.unused.storage_buffer,Cras
dEQP-VK.binding_model.unused_invalid_descriptor.write.unused.uniform_buffer,Crash
asan-dEQP-VK.binding_model.unused_invalid_descriptor.write.invalid.combined_image_sampler,Crash
# New failures with ES CTS 3.2.13.0
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32i_rgba32i.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16i_rg16i.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16i_rg16i.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16ui_rg16ui.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16f.renderbuffer_to_texture2d,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16i.renderbuffer_to_texture2d,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8_rgba8.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8i_rgba8i.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8ui_rgba8ui.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32ui_rg32ui.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16i_rgba16i.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16ui_rgba16ui.texture2d_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.renderbuffer_to_renderbuffer,Fail
asan-dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32i_rgba32i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32i_rgba32i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_128_bits.rgba32ui_rgba32ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16i_r16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16i_r16i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16ui_r16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.r16ui_r16ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8_rg8.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8_rg8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8i_rg8i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8i_rg8i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8ui_rg8ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_16_bits.rg8ui_rg8ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_24_bits.rgb8_rgb8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32i_r32i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.r32ui_r32ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16i_rg16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16i_rg16i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16ui_rg16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rg16ui_rg16ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2_rgb10_a2.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16f.renderbuffer_to_texture2d,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16i.renderbuffer_to_texture2d,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rg16ui.renderbuffer_to_texture2d,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgb10_a2ui_rgb10_a2ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8_rgba8.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8_rgba8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8i_rgba8i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8i_rgba8i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8ui_rgba8ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.rgba8ui_rgba8ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.srgb8_alpha8_srgb8_alpha8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32i_rg32i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32ui_rg32ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rg32ui_rg32ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16i_rgba16i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16i_rgba16i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16ui_rgba16ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_64_bits.rgba16ui_rgba16ui.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8_r8.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8i_r8i.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8i_r8i.texture2d_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.renderbuffer_to_renderbuffer,Fail
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_8_bits.r8ui_r8ui.texture2d_to_renderbuffer,Fail
# SKQP failing tests
ES2BlendWithNoTexture,Fail
SRGBReadWritePixels,Fail

View file

@ -1,4 +1,4 @@
<vcxml gen="3.3" min_ver="42" max_ver="71">
<vcxml gen="4.2" min_ver="42" max_ver="71">
<enum name="Compare Function" prefix="V3D_COMPARE_FUNC">
<value name="NEVER" value="0"/>

View file

@ -50,9 +50,12 @@ enum clc_spirv_version {
};
struct clc_optional_features {
bool atomic_order_seq_cst;
bool atomic_scope_device;
bool extended_bit_ops;
bool fp16;
bool fp64;
bool generic_address_space;
bool int64;
bool images;
bool images_depth;

View file

@ -28,8 +28,6 @@
#include <sstream>
#include <mutex>
#include "util/ralloc.h"
#include "util/set.h"
#include <llvm/ADT/ArrayRef.h>
#include <llvm/IR/DiagnosticPrinter.h>
#include <llvm/IR/DiagnosticInfo.h>
@ -68,7 +66,17 @@
#include <llvm/Support/VirtualFileSystem.h>
#endif
#if LLVM_VERSION_MAJOR >= 22
#include <clang/Options/OptionUtils.h>
#endif
/* We have to include our own headers after LLVM/clang as they seem to use
* `UNUSED` within enum definitions:
* https://github.com/llvm/llvm-project/blob/ea443eeb2ab8ed49ffb783c2025fed6629a36f10/clang/include/clang/Basic/OffloadArch.h#L19
*/
#include "util/macros.h"
#include "util/ralloc.h"
#include "util/set.h"
#include "util/u_dl.h"
#include "glsl_types.h"
@ -915,7 +923,9 @@ clc_compile_to_llvm_module(LLVMContext &llvm_ctx,
// GetResourcePath is a way to retrieve the actual libclang resource dir based on a given binary
// or library.
auto tmp_res_path =
#if LLVM_VERSION_MAJOR >= 20
#if LLVM_VERSION_MAJOR >= 22
clang::GetResourcesPath(std::string(clang_path));
#elif LLVM_VERSION_MAJOR >= 20
Driver::GetResourcesPath(std::string(clang_path));
#else
Driver::GetResourcesPath(std::string(clang_path), CLANG_RESOURCE_DIR);
@ -959,6 +969,12 @@ clc_compile_to_llvm_module(LLVMContext &llvm_ctx,
c->getPreprocessorOpts().addMacroDef("cl_khr_expect_assume=1");
bool needs_opencl_c_h = false;
if (args->features.atomic_order_seq_cst) {
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+__opencl_c_atomic_order_seq_cst");
}
if (args->features.atomic_scope_device) {
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+__opencl_c_atomic_scope_device");
}
if (args->features.extended_bit_ops) {
c->getPreprocessorOpts().addMacroDef("cl_khr_extended_bit_ops=1");
}
@ -969,6 +985,9 @@ clc_compile_to_llvm_module(LLVMContext &llvm_ctx,
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+cl_khr_fp64");
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+__opencl_c_fp64");
}
if (args->features.generic_address_space) {
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+__opencl_c_generic_address_space");
}
if (args->features.int64) {
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+cles_khr_int64");
c->getTargetOpts().OpenCLExtensionsAsWritten.push_back("+__opencl_c_int64");

View file

@ -134,6 +134,11 @@ main(int argc, char **argv)
.args = util_dynarray_begin(&clang_args),
.num_args = util_dynarray_num_elements(&clang_args, char *),
.c_compatible = true,
.features = {
.atomic_order_seq_cst = true,
.atomic_scope_device = true,
.generic_address_space = true,
},
};
/* Enable all features, we don't know the target here and it is the

View file

@ -263,7 +263,7 @@ libclc_add_generic_variants(nir_shader *shader)
if (strstr(func->name, "async_work_group_strided_copy"))
continue;
char *U3AS1 = strstr(func->name, "U3AS1");
const char *U3AS1 = strstr(func->name, "U3AS1");
if (U3AS1 == NULL)
continue;

View file

@ -7667,6 +7667,7 @@ ast_process_struct_or_iface_block_members(ir_exec_list *instructions,
* embedded structures in 1.10 only.
*/
if (state->language_version != 110 &&
!state->allow_glsl_embedded_structure_declarations &&
decl_list->type->specifier->structure != NULL)
_mesa_glsl_error(&loc, state,
"embedded structure declarations are not allowed");

View file

@ -1684,12 +1684,27 @@ cross_validate_globals(void *mem_ctx, const struct gl_constants *consts,
existing->data.mode == nir_var_mem_ssbo &&
existing->data.from_ssbo_unsized_array &&
glsl_get_gl_type(var->type) == glsl_get_gl_type(existing->type))) {
linker_error(prog, "%s `%s' declared as type "
"`%s' and type `%s'\n",
gl_nir_mode_string(var),
var->name, glsl_get_type_name(var->type),
glsl_get_type_name(existing->type));
return;
/* Relax precision matching on unused uniforms for early ES shaders */
if (prog->IsES && !var->interface_type &&
!(existing->data.used && var->data.used) &&
glsl_base_type_is_integer(glsl_get_gl_type(var->type)) == glsl_base_type_is_integer(glsl_get_gl_type(existing->type)) &&
glsl_base_type_is_float(glsl_get_gl_type(var->type)) == glsl_base_type_is_float(glsl_get_gl_type(existing->type)) &&
prog->GLSL_Version < 300) {
linker_warning(prog, "%s `%s' declared as type "
"`%s' and type `%s'\n",
gl_nir_mode_string(var),
var->name, glsl_get_type_name(var->type),
glsl_get_type_name(existing->type));
} else {
linker_error(prog, "%s `%s' declared as type "
"`%s' and type `%s'\n",
gl_nir_mode_string(var),
var->name, glsl_get_type_name(var->type),
glsl_get_type_name(existing->type));
return;
}
}
}
}

View file

@ -329,6 +329,8 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx,
ctx->Const.AllowVertexTextureBias;
this->allow_glsl_120_subset_in_110 =
ctx->Const.AllowGLSL120SubsetIn110;
this->allow_glsl_embedded_structure_declarations =
ctx->Const.AllowGLSLEmbeddedStructureDeclarations;
this->allow_builtin_variable_redeclaration =
ctx->Const.AllowGLSLBuiltinVariableRedeclaration;
this->ignore_write_to_readonly_var =

View file

@ -1023,6 +1023,7 @@ struct _mesa_glsl_parse_state {
char *alias_shader_extension;
bool allow_vertex_texture_bias;
bool allow_glsl_120_subset_in_110;
bool allow_glsl_embedded_structure_declarations;
bool allow_builtin_variable_redeclaration;
bool ignore_write_to_readonly_var;

View file

@ -676,6 +676,14 @@ glsl_type_is_e5m2(const glsl_type *t)
return t->base_type == GLSL_TYPE_FLOAT_E5M2;
}
static inline bool
glsl_type_is_nonnative_float(const glsl_type *t)
{
return t->base_type == GLSL_TYPE_BFLOAT16 ||
t->base_type == GLSL_TYPE_FLOAT_E4M3FN ||
t->base_type == GLSL_TYPE_FLOAT_E5M2;
}
static inline bool
glsl_type_is_int_16_32_64(const glsl_type *t)
{

View file

@ -416,10 +416,9 @@ if with_tests
nir_opt_algebraic_pattern_tests += static_library(
'nir_opt_algebraic_pattern_test_@0@'.format(i),
nir_opt_algebraic_pattern_test_cpp,
cpp_args : [cpp_msvc_compat_args, msvc_bigobj],
override_options: [msvc_designated_initializer],
gnu_symbol_visibility : 'hidden',
cpp_args : '-DSUBSET=@0@'.format(i),
cpp_args : [cpp_msvc_compat_args, msvc_bigobj, '-DSUBSET=@0@'.format(i)],
include_directories : [inc_include, inc_src],
dependencies : [dep_thread, idep_gtest, idep_nir, idep_mesautil],
)

View file

@ -484,6 +484,8 @@ clone_call(clone_state *state, const nir_call_instr *call)
for (unsigned i = 0; i < ncall->num_params; i++)
__clone_src(state, ncall, &ncall->params[i], &call->params[i]);
if (call->indirect_callee.ssa)
__clone_src(state, ncall, &ncall->indirect_callee, &call->indirect_callee);
return ncall;
}

View file

@ -24,6 +24,7 @@
#ifndef NIR_CONVERSION_BUILDER_H
#define NIR_CONVERSION_BUILDER_H
#include "util/half_float.h"
#include "util/u_math.h"
#include "nir_builder.h"
#include "nir_builtin_builder.h"
@ -162,6 +163,29 @@ nir_round_int_to_float(nir_builder *b, nir_def *src,
}
UNREACHABLE("unexpected rounding mode");
} else {
/* For conversions to FP16 we need to clamp the input against the fp16
* max value when rounding towards zero or down. The reason for that is
* that for integer values outside of FP16 finite value range we could
* get Infinity, which would be incorrect rounding in those cases.
*
* Furthermore, we only need to do the clamping for integers bigger than
* 32 bits, because the lowering below will already clamp 16 bit integers
* correctly.
*
* This isn't a problem for FP32 or FP64 floats as integers can't exceed
* the finite value ranges.
*/
if (dest_bit_size == 16 && src->bit_size >= 32) {
switch (round) {
case nir_rounding_mode_rtz:
case nir_rounding_mode_rd:
src = nir_umin_imm(b, src, FP16_MAX_F);
break;
default:
break;
}
}
nir_def *mantissa_bit_size = nir_imm_int(b, mantissa_bits);
nir_def *msb = nir_imax(b, nir_ufind_msb(b, src), mantissa_bit_size);
nir_def *bits_to_lose = nir_isub(b, msb, mantissa_bit_size);
@ -207,11 +231,6 @@ nir_alu_type_range_contains_type_range(nir_alu_type a, nir_alu_type b)
a_bit_size > b_bit_size)
return true;
/* 16-bit floats fit in 32-bit integers */
if (a_base_type == nir_type_int && a_bit_size >= 32 &&
b == nir_type_float16)
return true;
/* All signed or unsigned ints can fit in float or above. A uint8 can fit
* in a float16.
*/
@ -486,6 +505,15 @@ nir_convert_with_rounding(nir_builder *b,
if (trivial_convert)
return nir_type_convert(b, src, src_type, dest_type, round);
/* Nontrivial float conversions have special infinity handling when
* clamping, so we can't have fast math enabled.
*/
unsigned old_fp_ctrl = b->fp_math_ctrl;
if (src_base_type == nir_type_float || dest_base_type == nir_type_float) {
b->fp_math_ctrl = nir_fp_no_fast_math;
}
nir_def *dest = src;
/* clamp the result into range */
@ -514,6 +542,7 @@ nir_convert_with_rounding(nir_builder *b,
if (clamp_after_conversion)
dest = nir_clamp_to_type_range(b, dest, dest_type, src, src_type, dest_type);
b->fp_math_ctrl = old_fp_ctrl;
return dest;
}

View file

@ -1021,6 +1021,7 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
case nir_intrinsic_atest_pan:
case nir_intrinsic_zs_emit_pan:
case nir_intrinsic_load_return_param_amd:
case nir_intrinsic_load_local_invocation_index_intel:
is_divergent = true;
break;

View file

@ -48,6 +48,15 @@ nir_fixup_is_exported(nir_shader *nir)
nir_foreach_function(func, nir) {
if (_mesa_set_search(shadowed, func->name)) {
func->is_exported = func->is_entrypoint;
} else {
/* Starting with LLVM-22 we don't see the wrappers anymore, so we
* can simply export every entrypoint.
*
* We could do an LLVM version check here, but that's going to be a
* mess making nir depending on LLVM in any way and this seems to work
* for both situations.
*/
func->is_exported |= func->is_entrypoint;
}
if (func->name[0] == '_') {

View file

@ -22,10 +22,10 @@
*/
#include "util/u_printf.h"
#include "util/stack_array.h"
#include "nir.h"
#include "nir_builder.h"
#include "nir_control_flow.h"
#include "nir_vla.h"
/*
* TODO: write a proper inliner for GPUs.
@ -240,12 +240,13 @@ inline_functions_pass(nir_builder *b,
* to an SSA value first.
*/
const unsigned num_params = call->num_params;
NIR_VLA(nir_def *, params, num_params);
STACK_ARRAY(nir_def *, params, num_params);
for (unsigned i = 0; i < num_params; i++) {
params[i] = call->params[i].ssa;
}
nir_inline_function_impl(b, call->callee->impl, params, NULL);
STACK_ARRAY_FINISH(params);
return true;
}

View file

@ -850,6 +850,23 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, nir_shader *shader)
shader->info.outputs_written |= BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK);
break;
case nir_intrinsic_load_tile_pan:
case nir_intrinsic_load_tile_res_pan: {
const nir_io_semantics io = nir_intrinsic_io_semantics(instr);
shader->info.outputs_read |=
BITFIELD64_RANGE(io.location, io.num_slots);
break;
}
case nir_intrinsic_blend_pan:
case nir_intrinsic_blend2_pan:
case nir_intrinsic_store_tile_pan: {
const nir_io_semantics io = nir_intrinsic_io_semantics(instr);
shader->info.outputs_written |=
BITFIELD64_RANGE(io.location, io.num_slots);
break;
}
case nir_intrinsic_demote_samples:
shader->info.fs.uses_discard = true;
break;

View file

@ -2633,6 +2633,9 @@ system_value("fs_msaa_intel", 1)
# Per primitive remapping table offset.
system_value("per_primitive_remap_intel", 1)
# The (linear) local invocation index provided in the payload of mesh/task shaders.
system_value("local_invocation_index_intel", 1)
# Intrinsics for Intel bindless thread dispatch
# BASE=brw_topoloy_id
system_value("topology_id_intel", 1, indices=[BASE])

View file

@ -57,6 +57,32 @@ get_bool_convert_opcode(uint32_t dst_bit_size)
}
}
static void
resize_bool_alu_source(nir_builder *b, nir_alu_instr *alu,
uint32_t src_idx, uint32_t bit_size)
{
if (nir_src_bit_size(alu->src[src_idx].src) == bit_size)
return;
b->cursor = nir_before_instr(&alu->instr);
nir_op convert_op = get_bool_convert_opcode(bit_size);
/* Retain the number of components and swizzle of the original
* instruction so that we dont unnecessarily create a vectorized
* instruction.
*/
nir_def *new_src =
nir_build_alu1(b, convert_op, nir_ssa_for_alu_src(b, alu, src_idx));
nir_src_rewrite(&alu->src[src_idx].src, new_src);
/* The swizzle will have been handled by the conversion instruction
* so we can reset it back to the default
*/
for (unsigned j = 0; j < NIR_MAX_VEC_COMPONENTS; j++)
alu->src[src_idx].swizzle[j] = j;
}
static void
make_sources_canonical(nir_builder *b, nir_alu_instr *alu, uint32_t start_idx)
{
@ -65,29 +91,8 @@ make_sources_canonical(nir_builder *b, nir_alu_instr *alu, uint32_t start_idx)
*/
const nir_op_info *op_info = &nir_op_infos[alu->op];
uint32_t bit_size = nir_src_bit_size(alu->src[start_idx].src);
for (uint32_t i = start_idx + 1; i < op_info->num_inputs; i++) {
if (nir_src_bit_size(alu->src[i].src) != bit_size) {
b->cursor = nir_before_instr(&alu->instr);
nir_op convert_op = get_bool_convert_opcode(bit_size);
nir_alu_instr *conv_instr = nir_alu_instr_create(b->shader, convert_op);
conv_instr->src[0].src = nir_src_for_ssa(alu->src[i].src.ssa);
/* Retain the write mask and swizzle of the original instruction so
* that we dont unnecessarily create a vectorized instruction.
*/
memcpy(conv_instr->src[0].swizzle,
alu->src[i].swizzle,
sizeof(conv_instr->src[0].swizzle));
nir_def *new_src = nir_builder_alu_instr_finish_and_insert(b, conv_instr);
nir_src_rewrite(&alu->src[i].src, new_src);
/* The swizzle will have been handled by the conversion instruction
* so we can reset it back to the default
*/
for (unsigned j = 0; j < NIR_MAX_VEC_COMPONENTS; j++)
alu->src[i].swizzle[j] = j;
}
}
for (uint32_t i = start_idx + 1; i < op_info->num_inputs; i++)
resize_bool_alu_source(b, alu, i, bit_size);
}
static bool
@ -134,7 +139,9 @@ lower_alu_instr(nir_builder *b, nir_alu_instr *alu)
case nir_op_bcsel:
/* bcsel may be choosing between boolean sources too */
if (alu->def.bit_size == 1)
make_sources_canonical(b, alu, 1);
make_sources_canonical(b, alu, 0);
else
resize_bool_alu_source(b, alu, 0, alu->def.bit_size);
break;
default:

View file

@ -696,7 +696,7 @@ if (nir_is_rounding_mode_rtz(execution_mode, bit_size)) {
binop("iadd", tint, _2src_commutative + associative, "(uint64_t)src0 + (uint64_t)src1")
binop("iadd_sat", tint, _2src_commutative, """
util_add_check_overflow({dest_type}, src0, src1) ?
(src1 < 0 ? u_intN_max(bit_size) : u_uintN_max(bit_size)) : (src0 + src1)
(src1 < 0 ? u_intN_min(bit_size) : u_intN_max(bit_size)) : (src0 + src1)
""", "", True)
binop("uadd_sat", tuint, _2src_commutative,
"util_add_check_overflow({dest_type}, src0, src1) ? u_uintN_max(sizeof(src0) * 8) : (src0 + src1)",

View file

@ -783,8 +783,8 @@ optimizations.extend([
(('bcsel(is_only_used_as_float)', ('feq', a, 'b(is_not_zero)'), b, a), a),
(('bcsel(is_only_used_as_float)', ('fneu', a, 'b(is_not_zero)'), a, b), a),
(('bcsel', ignore_exact('feq', a, 0), 0, ('fsat', ('fmul', a, 'b(is_a_number)'))), ('fsat!', ('fmul', a, b))),
(('bcsel', ignore_exact('fneu', a, 0), ('fsat', ('fmul', a, 'b(is_a_number)')), 0), ('fsat!', ('fmul', a, b))),
(('bcsel', ignore_exact('feq', a, 0), 0, ('fsat', ('fmul', a, 'b(is_a_number)'))), ('!fsat', ('fmul', a, b))),
(('bcsel', ignore_exact('fneu', a, 0), ('fsat', ('fmul', a, 'b(is_a_number)')), 0), ('!fsat', ('fmul', a, b))),
(('bcsel', ignore_exact('feq', a, 0), b, ('fadd', a, 'b(is_not_zero)')), ('fadd', a, b)),
(('bcsel', ignore_exact('fneu', a, 0), ('fadd', a, 'b(is_not_zero)'), b), ('fadd', a, b)),
@ -2507,7 +2507,7 @@ optimizations.extend([
('ior', ('ior', ('ilt', a, 0), ('ilt', b, 0)), ('ige', ('iadd', a, b), 0)),
('iadd', a, b),
0x7fffffffffffffff)),
'(options->lower_int64_options & nir_lower_iadd_sat64) != 0', TestStatus.XFAIL),
'(options->lower_int64_options & nir_lower_iadd_sat64) != 0'),
# int64_t sum = a - b;
#
@ -2936,7 +2936,7 @@ for bit_size in [8, 16, 32, 64]:
optimizations += [
(('iadd_sat@' + str(bit_size), a, b),
('bcsel', ('ige', b, 1), ('bcsel', ('ilt', ('iadd', a, b), a), intmax, ('iadd', a, b)),
('bcsel', ('ilt', a, ('iadd', a, b)), intmin, ('iadd', a, b))), 'options->lower_iadd_sat', TestStatus.XFAIL if bit_size in [8, 64] else TestStatus.PASS),
('bcsel', ('ilt', a, ('iadd', a, b)), intmin, ('iadd', a, b))), 'options->lower_iadd_sat'),
(('isub_sat@' + str(bit_size), a, b),
('bcsel', ('ilt', b, 0), ('bcsel', ('ilt', ('isub', a, b), a), intmax, ('isub', a, b)),
('bcsel', ('ilt', a, ('isub', a, b)), intmin, ('isub', a, b))), 'options->lower_iadd_sat'),
@ -3910,7 +3910,7 @@ late_optimizations.extend([
# Putting this in 'optimizations' interferes with the bcsel(a, op(b, c),
# op(b, d)) => op(b, bcsel(a, c, d)) transformations. I do not know why.
(('bcsel', ('feq', ('fsqrt', 'a(is_not_negative)'), 0.0), intBitsToFloat(0x7f7fffff), ('frsq', a)),
(('bcsel@32', ('feq', ('fsqrt', 'a(is_a_number_not_negative)'), 0.0), intBitsToFloat(0x7f7fffff), ('frsq', a)),
('fmin', ('frsq', a), intBitsToFloat(0x7f7fffff))),
# Things that look like DPH in the source shader may get expanded to

View file

@ -821,7 +821,7 @@ new_bitsize_acceptable(struct vectorize_ctx *ctx, unsigned new_bit_size,
unsigned high_offset = get_offset_diff(low, high);
/* This can cause issues when combining store data. */
if (high_offset % (new_bit_size / 8) != 0)
if (low->is_store && (high_offset % (new_bit_size / 8) != 0))
return false;
/* check nir_extract_bits limitations */

View file

@ -2197,6 +2197,7 @@ nir_unsigned_upper_bound(nir_shader *shader, struct hash_table *range_ht,
push_scalar_query(&state, scalar);
_mesa_hash_table_set_deleted_key(range_ht, (void *)(uintptr_t)UINT32_MAX);
return perform_analysis(&state);
}
@ -2588,5 +2589,6 @@ nir_def_num_lsb_zero(struct hash_table *numlsb_ht, nir_scalar def)
push_scalar_query(&state, def);
_mesa_hash_table_set_deleted_key(numlsb_ht, (void *)(uintptr_t)UINT32_MAX);
return perform_analysis(&state);
}

View file

@ -27,7 +27,6 @@
#include "glsl_types.h"
#include "vtn_private.h"
#include "nir/nir_vla.h"
#include "nir/nir_control_flow.h"
#include "nir/nir_constant_expressions.h"
#include "nir/nir_deref.h"
@ -42,6 +41,7 @@
#include "util/mesa-blake3.h"
#include "util/bfloat.h"
#include "util/float8.h"
#include "util/stack_array.h"
#include <stdio.h>
@ -1404,7 +1404,7 @@ vtn_type_get_nir_type(struct vtn_builder *b, struct vtn_type *type,
case vtn_base_type_struct: {
bool need_new_struct = false;
const uint32_t num_fields = type->length;
NIR_VLA(struct glsl_struct_field, fields, num_fields);
STACK_ARRAY(struct glsl_struct_field, fields, num_fields);
for (unsigned i = 0; i < num_fields; i++) {
fields[i] = *glsl_get_struct_field_data(type->type, i);
const struct glsl_type *field_nir_type =
@ -1414,20 +1414,25 @@ vtn_type_get_nir_type(struct vtn_builder *b, struct vtn_type *type,
need_new_struct = true;
}
}
const struct glsl_type *result;
if (need_new_struct) {
if (glsl_type_is_interface(type->type)) {
return glsl_interface_type(fields, num_fields,
/* packing */ 0, false,
glsl_get_type_name(type->type));
result = glsl_interface_type(fields, num_fields,
/* packing */ 0, false,
glsl_get_type_name(type->type));
} else {
return glsl_struct_type(fields, num_fields,
glsl_get_type_name(type->type),
glsl_struct_type_is_packed(type->type));
result = glsl_struct_type(fields, num_fields,
glsl_get_type_name(type->type),
glsl_struct_type_is_packed(type->type));
}
} else {
/* No changes, just pass it on */
return type->type;
result = type->type;
}
STACK_ARRAY_FINISH(fields);
return result;
}
case vtn_base_type_image:
@ -2073,7 +2078,7 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
val->type->offsets = vtn_alloc_array(b, unsigned, num_fields);
val->type->packed = false;
NIR_VLA(struct glsl_struct_field, fields, count);
STACK_ARRAY(struct glsl_struct_field, fields, count);
for (unsigned i = 0; i < num_fields; i++) {
val->type->members[i] = vtn_get_type(b, w[i + 2]);
const char *name = NULL;
@ -2129,6 +2134,8 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
name ? name : "struct",
val->type->packed);
}
STACK_ARRAY_FINISH(fields);
break;
}
@ -2858,60 +2865,66 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
default: {
bool swap;
const glsl_type *org_dst_type = val->type->type;
const glsl_type *org_src_type = org_dst_type;
const glsl_type *dst_type = val->type->type;
const bool saturate = vtn_has_decoration(b, val, SpvDecorationSaturatedToLargestFloat8NormalConversionEXT);
unsigned num_components = glsl_get_vector_elements(val->type->type);
vtn_assert(count <= 7);
const unsigned src_count = count - 4;
struct vtn_value *src_val[3] = {0};
const glsl_type *src_type[3] = {0};
for (unsigned i = 0; i < src_count; i++) {
src_val[i] = vtn_value(b, w[4 + i], vtn_value_type_constant);
src_type[i] = src_val[i]->type->type;
}
unsigned conv_src_bit_size;
switch (opcode) {
case SpvOpConvertFToU:
case SpvOpConvertFToS:
case SpvOpConvertSToF:
case SpvOpConvertUToF:
case SpvOpSConvert:
case SpvOpFConvert:
case SpvOpUConvert:
/* We have a different source type in a conversion. */
org_src_type = vtn_get_value_type(b, w[4])->type;
conv_src_bit_size =
glsl_type_is_nonnative_float(src_type[0]) ? 32 : glsl_get_bit_size(src_type[0]);
break;
default:
/* When picking ALU ops, bit-size is only used for Convert
* operations.
*/
conv_src_bit_size = 0;
break;
};
const glsl_type *dst_type = org_dst_type;
if (glsl_type_is_bfloat_16(dst_type) || glsl_type_is_e4m3fn(dst_type) || glsl_type_is_e5m2(dst_type))
dst_type = glsl_float_type();
const glsl_type *src_type = org_src_type;
if (glsl_type_is_bfloat_16(src_type) || glsl_type_is_e4m3fn(src_type) || glsl_type_is_e5m2(src_type))
src_type = glsl_float_type();
const unsigned dst_bit_size =
glsl_type_is_nonnative_float(dst_type) ? 32 : glsl_get_bit_size(dst_type);
bool exact;
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &swap, &exact,
src_type, dst_type);
conv_src_bit_size, dst_bit_size);
/* No SPIR-V opcodes handled through this path should set exact.
* Since it is ignored, assert on it.
*/
assert(!exact);
unsigned bit_size = glsl_get_bit_size(dst_type);
unsigned resolved_bit_size = dst_bit_size;
nir_const_value src[3][NIR_MAX_VEC_COMPONENTS];
for (unsigned i = 0; i < count - 4; i++) {
struct vtn_value *src_val =
vtn_value(b, w[4 + i], vtn_value_type_constant);
for (unsigned i = 0; i < src_count; i++) {
/* If this is an unsized source, pull the bit size from the
* source; otherwise, we'll use the bit size from the destination.
*/
if (!nir_alu_type_get_type_size(nir_op_infos[op].input_types[i])) {
if (org_src_type != src_type) {
/* Small float conversion. */
assert(i == 0);
bit_size = glsl_get_bit_size(src_type);
} else {
bit_size = glsl_get_bit_size(src_val->type->type);
}
resolved_bit_size = glsl_type_is_nonnative_float(src_type[i]) ?
32 : glsl_get_bit_size(src_type[i]);
}
unsigned src_comps = nir_op_infos[op].input_sizes[i] ?
@ -2920,53 +2933,55 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
unsigned j = swap ? 1 - i : i;
for (unsigned c = 0; c < src_comps; c++) {
src[j][c] = src_val->constant->values[c];
if (glsl_type_is_bfloat_16(org_src_type))
src[j][c] = src_val[i]->constant->values[c];
if (glsl_type_is_bfloat_16(src_type[i]))
src[j][c].f32 = _mesa_bfloat16_bits_to_float(src[j][c].u16);
else if (glsl_type_is_e4m3fn(org_src_type))
else if (glsl_type_is_e4m3fn(src_type[i]))
src[j][c].f32 = _mesa_e4m3fn_to_float(src[j][c].u8);
else if (glsl_type_is_e5m2(org_src_type))
else if (glsl_type_is_e5m2(src_type[i]))
src[j][c].f32 = _mesa_e5m2_to_float(src[j][c].u8);
}
}
/* fix up fixed size sources */
switch (op) {
case nir_op_ishl:
case nir_op_ishr:
case nir_op_ushr: {
if (bit_size == 32)
break;
for (unsigned i = 0; i < num_components; ++i) {
switch (bit_size) {
case 64: src[1][i].u32 = src[1][i].u64; break;
case 16: src[1][i].u32 = src[1][i].u16; break;
case 8: src[1][i].u32 = src[1][i].u8; break;
/* Fix up source to respect NIR expected sizes. */
switch (op) {
case nir_op_ishl:
case nir_op_ishr:
case nir_op_ushr: {
/* Shift amount in NIR ops must be 32-bit. */
vtn_assert(!swap);
const unsigned shift_idx = 1;
const unsigned shift_bit_size = glsl_get_bit_size(src_type[i]);
if (i != shift_idx || shift_bit_size == 32)
break;
for (unsigned c = 0; c < src_comps; c++) {
nir_const_value *shift = &src[shift_idx][c];
*shift = nir_const_value_for_uint(
nir_const_value_as_uint(*shift, shift_bit_size), 32);
}
break;
}
default:
break;
}
break;
}
default:
break;
}
nir_const_value *srcs[3] = {
src[0], src[1], src[2],
};
nir_eval_const_opcode(op, val->constant->values, NULL,
num_components, bit_size, srcs,
num_components, resolved_bit_size, srcs,
b->shader->info.float_controls_execution_mode);
for (int i = 0; i < num_components; i++) {
uint16_t conv;
if (glsl_type_is_bfloat_16(org_dst_type)) {
if (glsl_type_is_bfloat_16(dst_type)) {
conv = _mesa_float_to_bfloat16_bits_rte(val->constant->values[i].f32);
} else if (glsl_type_is_e4m3fn(org_dst_type)) {
} else if (glsl_type_is_e4m3fn(dst_type)) {
if (saturate)
conv = _mesa_float_to_e4m3fn_sat(val->constant->values[i].f32);
else
conv = _mesa_float_to_e4m3fn(val->constant->values[i].f32);
} else if (glsl_type_is_e5m2(org_dst_type)) {
} else if (glsl_type_is_e5m2(dst_type)) {
if (saturate)
conv = _mesa_float_to_e5m2_sat(val->constant->values[i].f32);
else
@ -2975,7 +2990,7 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
continue;
}
val->constant->values[i] = nir_const_value_for_raw_uint(conv, glsl_get_bit_size(org_dst_type));
val->constant->values[i] = nir_const_value_for_raw_uint(conv, glsl_get_bit_size(dst_type));
}
break;
@ -5248,6 +5263,12 @@ vtn_handle_entry_point(struct vtn_builder *b, const uint32_t *w,
b->interface_ids = vtn_alloc_array(b, uint32_t, b->interface_ids_count);
memcpy(b->interface_ids, &w[start], b->interface_ids_count * 4);
qsort(b->interface_ids, b->interface_ids_count, 4, cmp_uint32_t);
if (stage == MESA_SHADER_KERNEL) {
b->fp_math_ctrl_fp16 |= nir_fp_preserve_sz_inf_nan;
b->fp_math_ctrl_fp32 |= nir_fp_preserve_sz_inf_nan;
b->fp_math_ctrl_fp64 |= nir_fp_preserve_sz_inf_nan;
}
}
static bool

View file

@ -280,12 +280,9 @@ vtn_convert_op_dst_type(SpvOp opcode)
nir_op
vtn_nir_alu_op_for_spirv_opcode(struct vtn_builder *b,
SpvOp opcode, bool *swap, bool *exact,
const glsl_type *src_type,
const glsl_type *dst_type)
unsigned conv_src_bit_size,
unsigned conv_dst_bit_size)
{
const unsigned src_bit_size = glsl_get_bit_size(src_type);
const unsigned dst_bit_size = glsl_get_bit_size(dst_type);
/* Indicates that the first two arguments should be swapped. This is
* used for implementing greater-than and less-than-or-equal.
*/
@ -382,8 +379,12 @@ vtn_nir_alu_op_for_spirv_opcode(struct vtn_builder *b,
case SpvOpConvertUToF:
case SpvOpSConvert:
case SpvOpFConvert: {
nir_alu_type src_type = vtn_convert_op_src_type(opcode) | src_bit_size;
nir_alu_type dst_type = vtn_convert_op_dst_type(opcode) | dst_bit_size;
vtn_fail_if(conv_src_bit_size == 0,
"Need src bit_size to translate from SPIR-V convert opcodes to NIR.");
vtn_fail_if(conv_dst_bit_size == 0,
"Need dst bit_size to translate from SPIR-V convert opcodes to NIR.");
nir_alu_type src_type = vtn_convert_op_src_type(opcode) | conv_src_bit_size;
nir_alu_type dst_type = vtn_convert_op_dst_type(opcode) | conv_dst_bit_size;
return nir_type_conversion_op(src_type, dst_type, nir_rounding_mode_undef);
}
@ -909,8 +910,7 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
bool swap;
bool unused_exact;
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &swap,
&unused_exact,
vtn_src[0]->type, dest_type);
&unused_exact, 0, 0);
if (swap) {
nir_def *tmp = src[0];
@ -986,8 +986,7 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
case SpvOpShiftRightLogical: {
bool swap;
bool exact;
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &swap, &exact,
vtn_src[0]->type, dest_type);
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &swap, &exact, 0, 0);
assert(!exact);
@ -1046,7 +1045,8 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
bool exact;
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &swap,
&exact,
vtn_src[0]->type, dest_type);
glsl_get_bit_size(vtn_src[0]->type),
glsl_get_bit_size(dest_type));
if (swap) {
nir_def *tmp = src[0];

View file

@ -320,9 +320,7 @@ vtn_handle_cooperative_alu(struct vtn_builder *b, struct vtn_value *dest_val,
nir_deref_instr *src = vtn_get_cmat_deref(b, w[3]);
bool ignored = false;
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &ignored, &ignored,
glsl_get_cmat_element(src->type),
glsl_get_cmat_element(dst_type->type));
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &ignored, &ignored, 0, 0);
nir_deref_instr *dst = vtn_create_cmat_temporary(b, dst_type->type, "cmat_unary");
nir_cmat_unary_op(&b->nb, &dst->def, &src->def,
@ -346,9 +344,7 @@ vtn_handle_cooperative_alu(struct vtn_builder *b, struct vtn_value *dest_val,
nir_deref_instr *mat_a = vtn_get_cmat_deref(b, w[3]);
nir_deref_instr *mat_b = vtn_get_cmat_deref(b, w[4]);
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &ignored, &ignored,
glsl_get_cmat_element(mat_a->type),
glsl_get_cmat_element(dst_type->type));
nir_op op = vtn_nir_alu_op_for_spirv_opcode(b, opcode, &ignored, &ignored, 0, 0);
nir_deref_instr *dst = vtn_create_cmat_temporary(b, dst_type->type, "cmat_binary");
nir_cmat_binary_op(&b->nb, &dst->def, &mat_a->def, &mat_b->def,

View file

@ -725,6 +725,13 @@ handle_special(struct vtn_builder *b, uint32_t opcode,
if (!ret)
vtn_fail("No NIR equivalent");
/* libclc's cbrt() implementation fails to flush subnormal numbers to zero
* even when flush-to-zero is required. Manually flush its output.
*/
if (opcode == OpenCLstd_Cbrt) {
ret = nir_fcanonicalize(nb, ret);
}
return ret;
}

Some files were not shown because too many files have changed in this diff Show more