Compare commits

...

203 commits

Author SHA1 Message Date
Dylan Baker
f7aeb0d677 VERSION: bump for 25.3.0
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-11-14 13:25:42 -08:00
Dylan Baker
523eea18c5 docs: add release notes for 25.3.0 2025-11-14 13:24:41 -08:00
Samuel Pitoiset
48b0dd2892 radv: add vk_wsi_disable_unordered_submits and enable for GTK
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
GTK is missing a semaphore between QueueSubmit() and QueuePresent()
causing the WSI submit to be "unordered" and to immediately signal the
semaphores (because it's missing a wait semaphore in QueuePresent()).

The workaround is to disable unordered WSI submits until GTK fixes it
properly.

Cc: "25.3"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14087
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 0d9d45db4e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-14 08:36:53 -08:00
Dylan Baker
25abf47e3e .pick_status.json: Update to 8f13905c5e
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-14 08:36:53 -08:00
Mario Kleiner
28ca4a48d6 wsi/wayland: Zero min_luminance, max_luminance HDR light levels are valid.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
CTA-861-G section 6.9.1 Static Metadata Type 1 declares that zero values
for different groups of HDR Metadata properties are allowed, including
zero nits values for max display mastering luminance, max content light
level, max frame-average light level and min display mastering luminance.

A zero value is meant to be treated by the video sink as "undefined" /
"unknown", and handled accordingly. This is common for dynamically
generated visual content.

The is_hdr_metadata_legal() function in the Vulkan/WSI/Wayland HDR backend
currently declares HDR light level metadata as invalid if the mastering
display min_luminance and max_luminance light levels are set to the legal
level of zero nits. This causes valid HDR metadata as set by the client
via vkSetHdrMetadata() to be not sent to the compositor.

Fix this by skipping checks that don't apply if min_luminance or
max_luminance are zero. If max_luminance is zero then we skip sending
of mastering display min/max luminance to Wayland, as sending a a
max_luminance <= min_luminance would trigger a protocol error. All
other valid data is still send, ie. color primaries, white-point,
content light levels.

Fixes: cb7726bb2c ("vulkan/wsi: validate HDR metadata to not cause protocol errors")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Co-authored-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Xaver Hugl <xaver.hugl@kde.org>
(cherry picked from commit 490f05f82c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:38:01 -08:00
Lars-Ivar Hesselberg Simonsen
23665f9bd9 pan/format: Disable PAN_BIND_STORAGE_IMAGE for RGBA4/BGRA4
The RGBA4/BGRA4 formats had the PAN_BIND_STORAGE_IMAGE set, but we
cannot support that.

Fixes: d95423686f ("pan/format: Add PAN_BIND_STORAGE_IMAGE flag")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit 15868cf6e9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:38:00 -08:00
Lars-Ivar Hesselberg Simonsen
b7ce6abb6a pan/format: Fix mapping for I16F
This was mapped to RG16F, while R16F should be correct.

Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit 1e2ca4dad6)

Conflicts:
	src/panfrost/ci/panfrost-g610-fails.txt
	src/panfrost/ci/panfrost-g610-flakes.txt

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:37:59 -08:00
Ludvig Lindau
0fbf00af9b panfrost: Make instrs_equal check res table/index
Add resource table and index check to instruction equality function.
This prevents CSE from mistakenly eliminating LEA_BUF_IMM instructions
that load from different resources, but with the same buffer offset.

Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit 00b5275fe8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:40 -08:00
Gert Wollny
e2164fbc11 r600/sfn: Don't start a new ALU-CF if LDS pipeline loads are pending
Fixes: e57643cf (r600/sfn: Add handling for R600 indirect access alias handling)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 79e4323cf0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:39 -08:00
Joshua Simmons
f651443a74 vtn: Fix OpCopyLogical destination type
Previously the type info for nested values was copied from the source
operand, rather than propagating the new type from the destination
operand.

Fixes: 4c363acf94 ("vtn: Allow for OpCopyLogical with different but compatible types")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 7ac1f7777d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:38 -08:00
Marek Olšák
b5a4245193 gallium/noop: don't unref buffers passed to set_vertex_buffers to fix crashes
this code is invalid after the refcounting rework

Fixes: b3133e250e - gallium: add pipe_context::resource_release to eliminate buffer refcounting

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 7d22e4c7ba)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:37 -08:00
Lionel Landwerlin
76d66b72db anv: disable software detiling on Xe2+ for image atomics 64bits
This is what happens when you leave MR unreviewed for months.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d39e443ef8 ("anv: add infrastructure for common vk_pipeline")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit c4e2878537)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:37 -08:00
Timur Kristóf
09b856c367 ac/nir/ngg: Fix scratch space for NGG GS streamout
For GS streamout, we need the following LDS scratch space:

- Repacking streamout vertices takes 1 dword per 4 waves per stream
  (max 16 bytes for Wave64, max 32 bytes for Wave32)
- 1 dword per stream for buffer info
  (16 bytes)
- 1 dword per buffer for buffer info
  (16 bytes)

Previously, the space used for buffer info aliased with the
space for repacking the output vertices in ngg_gs_finale(),
and there was no barrier in between, which caused a race
condition, resulting in random failure.

Fix this by allocating a few more LDS dwords so that aliasing
is not required, which also allows us to remove an extra
workgroup barrier.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12705
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 8f99d736d0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:36 -08:00
Christian Gmeiner
e5f9980d50 meson: require sysprof-capture-4 >= 4.49.0
When Mesa is compiled with sysprof support, applications can crash with a
segfault during shutdown. This happens because sysprof_collector_mark()
registers thread-local storage destructors that get called after the library
containing the destructor code has been unloaded.

The problem was fixed in sysprof https://gitlab.gnome.org/GNOME/sysprof/-/merge_requests/152

CC: mesa-stable
Closes: mesa/mesa#13571
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit e9341568fa)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:35 -08:00
Dmitry Baryshkov
7f68450a6c ci: drop google-freedreno remnants
Drop remnants of the  google-freedreno lab entries.

Fixes: 6541b911bd ("freedreno/ci: Remove baremetal job templates")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
(cherry picked from commit 9a33edca35)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:34 -08:00
Natalie Vock
feccefbc86 radv: Fix PSO history with RT pipelines
1. The prolog needs to have a null check. Libraries don't have prologs.
2. We only need to print the shaders actually included in this pipeline.
   Libraries were already printed separately.
3. The traversal shader was wrongly omitted from the output.

Cc: mesa-stable
(cherry picked from commit 73a31dafbc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:33 -08:00
Sviatoslav Peleshko
a407dc6d83 driconf: Add vertex_program_default_out option for Penumbra: Overture
Penumbra's vertex program Diffuse_EnvMap_Reflect_vp.cg produces 3-component
texture coordinates and primitive colors while using the FF fragment
program. Add this WA to fix the misrenderings.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14170
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 5af8abbf8b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:32 -08:00
Sviatoslav Peleshko
4a0cb910e4 mesa,driconf: Add WA to initialize vertex program outputs to vec4(0,0,0,1)
Per ARB_vertex_program spec result registers are 4-component and initially
undefined, and the FF fragment program expects its intputs to be
4-component too. So, if the client's vertex program does not write the
whole vector it will cause misrenderings unless the same client also
supplies fragment program that expects less than 4 componens.

This commit adds a workaround that initializes results to vec4(0, 0, 0, 1)
which seems to be an expected behavior for such clients.

Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit f03432c81a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:31 -08:00
Faith Ekstrand
8ffc19f935 nir: Add a couple panfrost sysvals to divergence analysis
Fixes: 2af6e4beeb ("pan: Don't pretend we support load_{vertex_id_zero_base,first_vertex}")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
(cherry picked from commit 0e9fcb33c3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:30 -08:00
Tapani Pälli
de44196bfb anv: fix issues found with indirect data stride
Use tristate for the aligned setting, otherwise it is always
first disabled which contributes to the condition if we set the
new stride active.

v2: set ByteStride in dword units and take secondary cmdbuf
    in to account (Lionel)

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
(cherry picked from commit 2741ddd75a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:30 -08:00
Timothy Arceri
f0aeb824b9 glsl: assign block indices in the order they appear
The hash lookup should be negligible. This makes things
predictable rather than having hash table modifications causing
the order to change, and fixes things for some seemingly buggy games.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13802
Fixes: be5a15f11d ("util/hash_table: start with 16 entries to reduce reallocations")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 595a2fdbd2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:29 -08:00
Sagar Ghuge
2cb191d7d1 intel/common: Consider 0 threads while setting TG
In ray tracing dispatch, we have dispatch.threads set to 0 since we
calculate the local_size_x/y/z based on the launch sizes.

This change takes 0 threads into an account and returh the TG size 8 in
such scenarios. Before this change, we were setting TG size to 2.

Fixes: 0c4e1c9efc ("intel/common: Add helper for compute thread group dispatch size")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 16f66ffe55)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:28 -08:00
Patrick Lerda
b074ea9fe8 r600: limit pre-evergreen predicate ready size
With the current stack configuration the rv770 seems to be unable
to go beyond three with the "vs-output-array-float-index-wr-before-gs.shader_test"
test. Anyway, the value four seems to be sufficient for the other tests.

This issue was triggered on rv770, for instance, with:
"piglit/bin/shader_runner tests/spec/glsl-1.50/execution/variable-indexing/gs-output-array-float-index-wr.shader_test -auto -fbo"
"piglit/bin/shader_runner tests/spec/glsl-1.50/execution/variable-indexing/vs-output-array-float-index-wr-before-gs.shader_test -auto -fbo"

Fixes: 713edb5998 ("r600/sfn: handle the IF predicate in the scheduler")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit ae049f6fea)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:27 -08:00
Karol Herbst
42726a2afa rusticl/spirv: preserve signed zeroes by default
Cc: mesa-stable
(cherry picked from commit 92a4ae0ab2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:26 -08:00
Karol Herbst
e42294ba9f rusticl/kernel: take no kernel_info reference inside the launch closure
Otherwise patterns like this wouldn't work:

clCreateKernel(prog)
clEnqueueNDRangeKernel
clReleaseKernel
clBuildProgram(prog)

Fixes: bb2453c649 ("rusticl/kernel: move most of the code in launch inside the closure")
(cherry picked from commit df344f12cc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:26 -08:00
Karol Herbst
3bb77d6906 rusticl/queue: fix error code for invalid sampler kernel arg
Fixes: 5795ee0e08 ("rusticl: translate spirv to nir and first steps to kernel arg handling")
(cherry picked from commit c0f0baeaca)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:25 -08:00
Karol Herbst
efd2f1d61c rusticl/queue: fix error code for invalid queue properties part 2
Fixes: 2c202eb787 ("rusticl: verify validity of property names and values")
(cherry picked from commit e98abe35c0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:24 -08:00
Karol Herbst
2cd6bc199a rusticl/queue: fix error code for invalid queue properties part 1
Cc: mesa-stable
(cherry picked from commit e83400cab2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:23 -08:00
Eric Engestrom
5c3427b1fe ci: track src/android_stub/ changes
Fixes: 932f51d593 ("ci: Include enough Android headers to let us compile test EGL")
Suggested-by: Yonggang Luo <luoyonggang@gmail.com>
(cherry picked from commit f689322d27)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:23 -08:00
Eric Engestrom
b6ae45d326 ci: track src/c11/ changes
It's used by mesa_util, so let's just consider changes to it can affect
any job.

Fixes: b2ddec4e98 ("c11: Implement c11/time.h with c11/impl/time.c")
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
(cherry picked from commit 2ec3e536fd)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:22 -08:00
Lionel Landwerlin
14097ed79d anv/blorp/iris: rework Wa_14025112257
Drivers already have to track this workaround, so remove the logic
from Blorp and let the driver manage this.

Also in Anv don't accumulate this workaround, emit it directly in
place right after COMPUTE_WALKER. Accumulating can be problematic when
you want to dispatch concurrent compute shaders that do not need any
cache flush interaction (typical example with the internal
simple_shader framework).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3e0ad0176b ("anv: Emit state cache invalidation after every compute dispatch")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
(cherry picked from commit c478b6355a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:21 -08:00
Dave Airlie
0ef221f4a4 c11/threads: fix build on c23
C23/glibc is now including once_init in stdlib.h

https://patchwork.sourceware.org/project/glibc/patch/78061085-f04a-0c45-107b-5a8a15521083@redhat.com/#213088

Just fix up our use of it.

Cc: mesa-stable
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
(cherry picked from commit 179e744f75)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:20 -08:00
Karol Herbst
263e1823d2 st/interop: fix fence leak
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14050
Fixes: 1396dc1c38 ("mesa/st, dri2, wgl, glx: Modify flush_objects interop func to export a fence_fd")
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
(cherry picked from commit 87550fc657)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:19 -08:00
Faith Ekstrand
e334938384 nil: Add support for Blackwell 8 and 16-bit modifiers
Backport-to: 25.2
Reviewed-by: James Jones <jajones@nvidia.com>
(cherry picked from commit f1cb63a21d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:18 -08:00
Faith Ekstrand
f299249d8b drm-uapi: Import the new NVIDIA modifiers
Imported from kernel commit eef295a850820 of drm-misc-fixes

Backport-to: 25.2
Reviewed-by: James Jones <jajones@nvidia.com>
(cherry picked from commit 3247452b2c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:17 -08:00
Sagar Ghuge
594ae17ec9 anv: Drop unwanted untyped flush for AS query
CmdWriteAccelerationStructuresPropertiesKHR writes the data with MI
commands, we no longer dispatch shaders to write the properties.
As a result, we don't need to flush untyped cache.

Fixes: f0e18c475b ("intel: remove GRL/intel-clc")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 14194e59a4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:15 -08:00
Benjamin Cheng
02ba16ec03 radv/video: Fix dummy DPB addresses
This fixes the VVL PositiveVideoDecodeAV1.* tests, which trigger error
concealment. These DPB addresses would not be normally used, but get
used by the error concealment path.

Fixes: d103b76ad6 ("radv/video: add VK_KHR_video_decode_av1 support.")
Reviewed-by: David Rosca <david.rosca@amd.com>
(cherry picked from commit 82d944b388)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:14 -08:00
Lars-Ivar Hesselberg Simonsen
1460a0319f panvk: Fix IUB decode
The base address used for bounds checking the entry was wrong. Directly
pass the end_of_entry address instead.

Fixes: db4bcd48d7 ("panvk: Fix IUB decode")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
(cherry picked from commit 89293120f0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:09 -08:00
Dylan Baker
d48e4a3f3b .pick_status.json: Update to 294e72e2b5
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-13 08:10:00 -08:00
Christian Gmeiner
d3d820d6ef anv: Fix needs_temp_copy() incorrectly matching depth/stencil formats
The needs_temp_copy() function was incorrectly identifying some
depth/stencil formats as needing RGB<->RGBA conversion.

VK_FORMAT_D32_SFLOAT_S8_UINT maps to PIPE_FORMAT_Z32_FLOAT_S8X24_UINT,
which has 3 channels (F32 depth, UP8 stencil, X24 padding). The
component count check (== 3) was matching this as an RGB color format,
causing depth/stencil images to incorrectly use the RGB conversion path.

Add an explicit vk_format_is_depth_or_stencil() check before the
component count test to ensure depth/stencil formats always use the
direct copy path.

Fixes: f97b51186f ("anv: intermediate RGB <-> RGBX copy for HIC")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 0be53b2ed8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:55 -08:00
Mario Kleiner
70e3af188d wsi/display: Allow atomic modeset for change of Colorspace or HDR poperties
At least some drivers need a full modeset to change the Colorspace
property or to en-/disable HDR mode. E.g., at least amdgpu-kms as
tested under Linux 6.8 on Polaris needs it. Otherwise the atomic
commit for disabling HDR in _wsi_display_cleanup_state() will fail,
and the connector stays stuck in HDR mode after vkDestroySwapchainKHR().

Fixes: 1ed78dd7ec ("wsi/display: Clean up DRM hdr/color state on swapchain destruction")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
(cherry picked from commit ba82d36dce)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:55 -08:00
Mario Kleiner
b777384e99 wsi/display: Initially set default HDR metadata from EDID for HDR modes
For a selected non-default imageColorSpace during swapchain creation,
make sure that proper HDR setup also works even if a client app does not
explicitly call vkSetHdrMetadataEXT() in time.

Assign the EDID provided metadata here, so the 1st atomic commit will
set Colorspace and HDR metadata properties on the connector, to make sure
HDR or other wide color gamut modes get enabled.

Without this, the chain->color_outcome_serial would stay at zero and
the properties would not ever get assigned during drm_atomic_commit(),
leaving HDR disabled on the display sink.

Fixes: 13137393f6 ("wsi/display: Expose HDR10 colorspace based on EDID")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
(cherry picked from commit 19b2e3b81b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:54 -08:00
Mario Kleiner
c4e0f4d917 wsi/display: Accept 0 nits for HDR light level properties for "undefined"
CTA-861-G section 6.9.1 Static Metadata Type 1 declares that zero values
for different groups of HDR Metadata properties are allowed, including
zero nits values for max display mastering luminance, max content light
level, max frame-average light level and min display mastering luminance.

A zero value is meant to be treated by the video sink as "undefined" /
"unknown", and handled accordingly. This is common for dynamically
generated visual content.

Therefore don't assert on some minimum nits level > 0, but only check for
a non-negative level.

Fixes: b4176393a0 ("wsi/display: Implement VK_EXT_hdr_metadata on KHR_display swapchain")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
(cherry picked from commit 19dc09aded)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:54 -08:00
Danylo Piliaiev
f3268818d5 tu: Use cmd->rp_trace u_trace for draw calls
Fixes: 707c97f634 ("tu: Add tracepoints around draws, with shader sha1s.")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit c04e375588)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:53 -08:00
Yiwei Zhang
9e809986f6 panvk: fix mem alloc size for VkBuffer backed by imported blob AHB
For AHB VkBuffer import, the allocationSize comes from the raw external
AHB props query and it can be larger than the underlying buffer memory
requirement. So we must respect the allocationSize for the actual mem
import to support mapping the whole AHB size, and the dedicated buffer
info has to be stripped to obey the spec.

Test: CtsNativeHardwareTestCases no longer crashes on debug build panvk

Fixes: 66bbd9eec8 ("panvk: implement AHB image deferred init and memory alloc")
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 4ec2a921d3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:53 -08:00
Lionel Landwerlin
8a813632c3 vulkan/runtime: simplify robustness state hashing
We're doing the same in vk_pipeline_precomp_shader_create().

Also fixes valgrind warning due to uninitialized fields

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit fc6d17a290)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:52 -08:00
Daniel Schürmann
b17381dc8d radv/null_device: set more options which affect compilation
Cc: mesa-stable
(cherry picked from commit 23ef756496)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:52 -08:00
David Rosca
7aa2c70759 radv/video: Add NULL checks for picture parameters
Fixes vk_layer_validation_tests PositiveVideoDecode.* and
PositiveVideoDecode*.InlineSessionParams

Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit bd151bf8b2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:51 -08:00
David Rosca
11a4adec73 radv/video: Correctly handle no feedback query for encode
Fixes vk_layer_validation_tests PositiveVideoEncodeAV1.*

Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit 23a3587aa6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:50 -08:00
David Rosca
e21d417234 vulkan/video: Avoid NULL pointers in session parameters
Always copy parameters that are not guarded by a flag, zero init
the structs if not provided by application.

Fixes vk_layer_validation_tests PositiveVideoEncode*.GetEncodedSessionParams

Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit 6a1c6ab95b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:50 -08:00
Benjamin Cheng
dafde3434a vulkan/video: NULL check codec-specific chain
It seems applications are allowed to do no-op updates by not passing any
codec-specific extension structures.

Cc: mesa-stable
Reviewed-by: David Rosca <david.rosca@amd.com>
(cherry picked from commit 4d22427079)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:49 -08:00
Timothy Arceri
38e258d5d4 mesa: skip redundant uniform update optimisation if unsafe
If multiple contexts are updating uniform values we can't assume
a uniform update can skip flushing.

Fixes: b32e20e630 ("mesa: skip redundant uniform updates for glUniformHandle")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14129

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 34db720660)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:48 -08:00
Lionel Landwerlin
478d92171d anv: avoid invalid timestamp generation due to skipped commands
We skip the stall emission for STATE_BASE_ADDRESS since this one can
be skipped on Gfx12.5+ and instead add a new sba tracepoint that has
valid timestamps.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
(cherry picked from commit cff047280a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:48 -08:00
Emma Anholt
96d959986c v3dv: Fix assertion failure for not-found primary_fd during enumeration.
Found when I had v3dv built in my aarch64 turnip setup.

Fixes: 451a0bd490 ("v3dv: use v3d primary node for VK_EXT_physical_device_drm")
(cherry picked from commit bb532a7a39)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:47 -08:00
Yiwei Zhang
c167b0a816 glcpp/meson: fix libglcpp generated header dependency
Explicitly declare glcpp-parse.h as a file dependency to ensure
glcpp_parse custom target completes before compiling glcpp-lex.c.

Cc: mesa-stable
(cherry picked from commit 53482178ef)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:47 -08:00
Lionel Landwerlin
ec8518d123 brw: avoid invalid URB messages
Some new CTS tests have geometry shader looking like this :

   void main()
   {
      gl_Position = gl_in[0].gl_Position;
      EmitVertex();
      EndPrimitive();
      // <-- some storage buffer write
   }

The generate shader has :
   - a message to write the position
   - a message to write to the storage buffer
   - a final message to end the thread

This generates an empty EOT URB messages which is apparently not legal
(simulation complains, HW hangs) :

send(8)         nullUD          g126UD          nullUD          0x04088007                0x00000000
                urb MsgDesc: offset 0 SIMD8 write masked  mlen 2 ex_mlen 0 rlen 0 { align1 1Q A@1 EOT };

Instead emit a write with actual data and the mask set at 0 to discard
the effect :

mov(8)          g127<1>UD       0x00000000UD                    { align1 WE_all 1Q };
mov(8)          g125<1>UD       0x00000000UD                    { align1 1Q };
send(8)         nullUD          g126UD          g125UD          0x04088007                0x00000040
                urb MsgDesc: offset 0 SIMD8 write masked  mlen 2 ex_mlen 1 rlen 0 { align1 1Q A@1 EOT };

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit ff57c31696)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:46 -08:00
Ian Romanick
269a6fe030 brw: Correctly generate conditional modifier for BFN
Fixes: 4193895145 ("brw/cmod: Enable limited cmod propagation for BFN")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 34fe598b39)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:46 -08:00
Georg Lehmann
4b46e87296 aco/gfx10_3: work around NSA hazard
4+ dword NSA can hang if exec becomes non-zero again directly before
the instruction.

Foz-DB Navi21:
Totals from 608 (0.74% of 82161) affected shaders:
Instrs: 945138 -> 946431 (+0.14%)
CodeSize: 5171580 -> 5176864 (+0.10%)
Latency: 13356895 -> 13357113 (+0.00%)
InvThroughput: 3043234 -> 3043236 (+0.00%); split: -0.00%, +0.00%

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9852
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13981
Cc: mesa-stable

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit b2172467d1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:45 -08:00
David Rosca
3f169d14d2 radv/video: Fix AV1 bidir compound encode with order_hint disabled
Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit bcb6e6b6e6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:45 -08:00
David Rosca
3a63355583 radv/video: Don't require encode FW version >= interface version
Otherwise this breaks backwards compatibility when bumping interface
version for new features.

Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit 96db490318)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:44 -08:00
David Rosca
7fb0030c06 radeonsi/vcn: Fix AV1 bidir compound encode with order_hint disabled
Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
(cherry picked from commit 1a8a8db8c5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:44 -08:00
Yiwei Zhang
e96e71fb79 llvmpipe: misc fixes for sparse binding
This change:
1. Move size validation within sparse binding, but not escape to
   non-sparse code path.
2. Error out if sparse is requested on unsupported platforms.

Fixes: d747c4a874 ("lavapipe: Implement sparse buffers and images")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit e0acc5c2b4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:43 -08:00
Yiwei Zhang
eface4be0d llvmpipe: add a missing alloc error handling in fd import
Fixes: d74ea2c117 ("llvmpipe: Implement dmabuf handling")
Suggested-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 66414c6b70)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:42 -08:00
Yiwei Zhang
36aff3454b llvmpipe: fix udmabuf mmap error check
Upon failing to mmap, MAP_FAILED (void *)-1 is returned instead of NULL.

Fixes: d74ea2c117 ("llvmpipe: Implement dmabuf handling")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 3e07f57d4a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:42 -08:00
Yiwei Zhang
3d9d9ca09b llvmpipe: zero is also a valid fd
Fixes: a062544d3d ("llvmpipe: Use an anonymous file for memory allocations")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
(cherry picked from commit 3a655c212b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:41 -08:00
Samuel Pitoiset
e817b525d8 radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes
Not the most optimal solution but 64-bit vertex attributes are rarely
used. Could still revisit if we find a real use case that matters.

This fixes recent VKCTS coverage:

dEQP-VK.pipeline.fast_linked_library.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
dEQP-VK.pipeline.shader_object_.*.vertex_input.component_mismatch.r64g64b64.*_to_dvec2

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14243
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a0d607bfdb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:41 -08:00
Samuel Pitoiset
8eec239517 aco: fix reserving VGPRs for 64-bit attributes in VS prologs
Otherwise the fetch index would be overwritten if the attribute format
is 64-bit and more than 2 components are loaded.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14242
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ba5bf81aa2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:40 -08:00
Sagar Ghuge
e9f677dff9 anv: Use correct engine class for companion RCS
Fixes: 6f138fe723 ("anv: avoid null pointer access in utrace copies on CCS")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 43d98a3f1a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:38 -08:00
Dylan Baker
4f5c1c6c75 .pick_status.json: Update to 04a0d512fa
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2025-11-07 08:14:36 -08:00
Dylan Baker
944ec88ca5 VERSION: bump for 25.3.0-rc4
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-11-06 09:02:18 -08:00
Georg Lehmann
7d4557bae8 radv: do not report wave32 in gl_SubgroupSize for Doom Dark Ages
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
The shaders in question use:

(memory_load + (gl_SubgroupSize - 1)) & ~(gl_SubgroupSize - 1)

My guess is that this is supposed to be the subgroup size of whatever
produced the value, not the subgroup size in this shader.
And because in the consumer the workgroup size is 32, we use wave32.

Fixes: a2d3cbac2a ("radv: determine subgroup/wave size early")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14187

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 83e9ae2d5c)

Conflicts:
	src/amd/vulkan/radv_instance.c
	src/amd/vulkan/radv_instance.h
	src/util/driconf.h

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-05 10:18:26 -08:00
Yiwei Zhang
4482281292 panvk: fix sample shading of internal blend shader for MSAA
Align with gallium side. When fixed-function blending is not available,
the internal blend shader is used. This is handled by a single ST_TILE
in the blend shader with the current sample ID, which requires sample
shading enablement.

Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 763d2418b8)

CI fails removed from cherry-pick as the file doesn't exist on stable,
and the main branch change has only removals.

Conflicts:
	src/panfrost/ci/panfrost-g925-fails.txt

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-05 10:16:55 -08:00
Connor Abbott
1dadb38d7b tu: Fix attachment stores with subpasses with partial views
Subpasses can have different view masks, although this isn't often used.
So we can't use the view mask of the last subpass when deciding what to
store, instead we have to use the same used_views field that's used by
loads and clears.

Noticed by upcoming tests for VK_QCOM_multiview_per_view_render_areas.

Cc: mesa-stable
(cherry picked from commit c0b5c04b84)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 14:12:21 -08:00
Connor Abbott
ef9457d119 tu: Rename tu_render_pass_attachment::clear_views to used_views
It's not just used for clears, it was already used for loads and it
needs to be used for stores too so clear_views was a confusing name.

Cc: mesa-stable
(cherry picked from commit 6c3ed74ed2)

Conflicts:
	src/freedreno/vulkan/tu_pass.cc
	src/freedreno/vulkan/tu_pass.h

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 14:12:20 -08:00
Lionel Landwerlin
9696921018 anv: avoid null pointer access in utrace copies on CCS
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3e0ad0176b ("anv: Emit state cache invalidation after every compute dispatch")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 6f138fe723)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Lionel Landwerlin
30678337dd u_trace: reserve chunk space before emitting copies
Some implementations can emit tracepoints when copying u_trace
buffers. It's important to reserve the slots we want to copy into
before emitting the copies so that both processes don't clash with one
another.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
(cherry picked from commit df5f92d114)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Danylo Piliaiev
eb2668aad7 vulkan: Always fill DS state for EXT_dynamic_rendering_unused_attachments
If renderpass has D/S attachment, but pipeline has D/S as UNDEFINED,
D/S should be properly disabled for the pipeline. The easiest way is to
ensure that D/S state is valid when pipeline's D/S format is UNDEFINED.
So we always create VkPipelineDepthStencilStateCreateInfo.

CC: mesa-stable

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 2798ef7bfd)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Ryan Houdek
e23c722170 freedreno/fdl: Fix typo in tiled_to_linear_2cpp
The non-aarch64 path was copying in the wrong direction.

Fixes: 7a5a33e0e3 ("freedreno/fdl: Add tiling/untiling implementation for a6xx/a7xx")
(cherry picked from commit 455eb2c751)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Mel Henning
2ad12150e8 nak/opt_lop: Don't handle modifiers in dedup_srcs
The handling in dedup_srcs was incorrect because it would apply the
modifier from srcs[i] to the LUT without removing the modifier from the
instruction. We can fix and simplify this code by removing all modifiers
before the dedup_srcs() call, which we were doing immediately after the
call anyway.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13966
Fixes: 66c9c40f68 ("nak: Handle modifiers in dedup_srcs() in opt_lop()")
Reviewed-by: Seán de Búrca <sdeburca@fastmail.net>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
(cherry picked from commit 041216e605)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
3109237d7c llvmpipe: Always recompute 1/w
The value depends on the tgsi_interpolate_loc which is not constant for
the loop. llvm should be able to cse in cases where they are the same.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit aa28fcb610)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
775652e08b gallivm/nir/soa: Use the sign of src1 for imod
This is the behavior specified by the nir opcode, the spirv spec and
required by maintenance8.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4d30da6599)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
51285c6715 lavapipe: Bump MAX_DESCRIPTOR_UNIFORM_BLOCK_SIZE
cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 25e678a37d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
c70fd7f766 lavapipe: Zero image null descriptors
The size queries for images do not use function pointers so we need to
be careful that width, height and depth are 0.

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d6dd96e1c7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Konstantin Seurer
d527eedb15 lavapipe: Bump maxPrimitiveCount
The vulkan spec requires at least 2^29-1.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14212
cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ff145d2ddc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Marek Olšák
7c18540961 Revert ABI breakage "amd: Add user queue HQD count to hw_ip info"
This reverts commit 56d758d321.

It broke ABI between Mesa and libdrm, causing crashes due to stack smashing.

See: https://gitlab.freedesktop.org/mesa/libdrm/-/issues/121#note_3172362

Fixes: 56d758d321
(cherry picked from commit 5d92c92ce5)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Janne Grunau
90b6c3a8ac hk: Report the correct plane count in VkDrmFormatModifierProperties2?EXT
Fixes import of planar formats like NV12 in gtk4. Allows
`gst-launch-1.0 v4l2src ! gtk4paintablesink` to use vulkan instead of
falling back to OpenGL.

Closes: #14217
Cc: mesa-stable
Signed-off-by: Janne Grunau <j@jannau.net>
(cherry picked from commit 83b97379dc)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Alyssa Rosenzweig
3dab73159b asahi,ail: fix multi-plane imports
We need to handle plane offsets everywhere. I noticed this broken before but
didn't realize it was a GL driver issue. Fix is easy, wrote this on my sofa
while waking up in the morning.

Fixes gst-launch-1.0 v4l2src ! glimagesink

Note that cheese & snapshot both still hang for some reason due to
libgstpipewire, but the Mesa side should be fine now.

Closes: #14217
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: mesa-stable
(cherry picked from commit aa9f937116)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Ian Romanick
55a37838b9 elk: Apply vgrf127 workaround in more cases
No shader-db changes on Broadwell. Older platforms were not tested.

Fixes: e7b7d572b3 ("intel/fs/ra: Re-arrange interference setup")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 2e8b89ec60)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Ian Romanick
9bad1beb98 brw: Apply Gfx9 vgrf127 workaround in more cases
No shader-db changes on any Intel platform.

fossil-db:

Skylake
Intel(R) HD Graphics 530 (SKL GT2)
Totals:
Cycle count: 57669758527 -> 57669757913 (-0.00%); split: -0.00%, +0.00%

Totals from 10 (0.00% of 1736875) affected shaders:
Cycle count: 274949 -> 274335 (-0.22%); split: -0.36%, +0.14%

This change is likely due to subtle differences of different registers
being allocated.

In addition, fossils/google-meet-clvk/BgBlur.1f58fdf742c27594.1.foz and
fossils/google-meet-clvk/Relight.1f58fdf742c27594.1.foz stopped failing
EU validation on Gfx9 platforms.

Closes: #14171
Fixes: e7b7d572b3 ("intel/fs/ra: Re-arrange interference setup")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 3e6af6c5bb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
Dylan Baker
3086692bcd .pick_status.json: Update to 27d9e4ec2a
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-04 10:16:42 -08:00
David Rosca
ce6c6a7a57 radv/video: Only use write_memory for encode feedback with full support
write_memory is used after encoding every frame to mark the feedback
buffer as ready. Only use it when write_memory can work without PCIe
atomics support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 874e02003a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-03 13:17:41 +01:00
David Rosca
629a0a4dcc radv/video: Introduce two levels of write_memory support
Print warning when using write_memory with firmwares that require
PCIe atomics support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 8e1d74bbb4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-11-03 13:17:38 +01:00
Icenowy Zheng
12c82aaa82 gallivm: orcjit: remember Context in addition to ThreadSafeContext
The llvm::orc::ThreadSafeContext object wraps an llvm::Context and keeps
its reference.

As we are no longer able to squeeze out Context from ThreadSafeContext
in LLVM 21, do not let ThreadSafeContext create Context implicitly for
LLVM 21, instead explicitly create Context and then remember it.

This also eliminates the code creating a Context that is never disposed.

Fixes: cd129dbf8a ("gallivm: support LLVM 21")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
(cherry picked from commit cc60a7a39d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:14 -07:00
Samuel Pitoiset
1e885e7a88 radv: add a workaround for illegal depth/stencil descriptors with No Man's Sky
Using descriptors with both depth and stencil aspects is illegal in
Vulkan and this hangs the GPU.

Use NULL descriptors to mitigate the issue. Note that AMDVLK also
ignores them.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13325
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit cb4e0c4140)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:13 -07:00
Tapani Pälli
3ddddf78b4 anv: bring back some lost game drirc workarounds for subgroups
Fixes: d39e443ef8 (" anv: add infrastructure for common vk_pipeline")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit f48df6f45c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:12 -07:00
Gert Wollny
86313f9571 r600/sfn: AR loads are not dependend on the future and other code blocks
If the AR is loaded from a register changing that register in a loop was
resulting in a scheduling failure because the AR load was made dependend
on a later instruction. Fix the dependencies by only using dependencies on
older instruuctions in the same block.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14114
Fixes: d21054b4bc ("r600/sfn: Add pass to split addess and index register loads")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 43d9765e35)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:12 -07:00
Paul Gofman
a46307a732 driconf: add a workaround for Investigation Stories : gunsound
CC: mesa-stable
(cherry picked from commit 63aec75981)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:11 -07:00
Dylan Baker
0a0d08dfe0 .pick_status.json: Update to e44a776f47
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-31 12:07:09 -07:00
Faith Ekstrand
182877f3c8 nvk: Don't re-initialize the descriptor writer if the set matches
The logic here before was wrong.  In the case where the set is the same,
it would avoid the flush but then re-initialize anyway, loosing the
dirty information and causing us not to actually flush out all the
descriptors.

Fixes: 1f0fda22f7 ("nvk: Flush descriptor set maps")
(cherry picked from commit 2f6b3b6b91)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:05 -07:00
Eric Engestrom
9aeac1e0a7 util/meson: don't build libmesa_util_clflush unless needed
Fixes: efbecd93ba ("util: Build util/cache_ops_x86.c with -msse2")
(cherry picked from commit 0fe0acd4c3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:05 -07:00
Eric Engestrom
46f0422165 util/meson: don't build libmesa_util_clflushopt unless needed
Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
(cherry picked from commit ccf33664e8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:04 -07:00
Samuel Pitoiset
f69d1abfcf radv: ignore dual-source blending when blending isn't enabled for MRT0
The Vulkan spec says:
    "VUID-vkCmdDraw-maxFragmentDualSrcAttachments-09239
     If blending is enabled for any attachment where either the source
     or destination blend factors for that attachment use the secondary
     color input, the maximum value of Location for any output attachment
     statically used in the Fragment Execution Model executed by this
     command must be less than maxFragmentDualSrcAttachments"

Which means it must be disabled.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14190
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit b2badb2b24)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:14:00 -07:00
Eric Engestrom
770e095766 asahi/virtio: fix memleak
Fixes: c64a2bbff5 ("asahi: port to stable uAPI")
(cherry picked from commit fdef10916e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:58 -07:00
Dmitry Osipenko
205fe1a245 virtio/vdrm: Fix varying offsets of struct vdrm_device members
Struct virgl_renderer_capset_drm has a varying size depending on whether
AMDGPU driver is enabled or not. This breaks offset of struct vdrm_device
members for non-AMD drivers when Mesa is built with multiple native context
drivers including the AMD driver. Place varying capsets in the end struct
vdrm_device to mitigate the issue.

Fixes: 5736280730 ("virtio/vdrm: add ENABLE_DRM_AMDGPU for c_args")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
(cherry picked from commit bd8377bb04)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:58 -07:00
Mike Blumenkrantz
093c7d9d8e zink: don't destroy old push layout when enabling fbfetch descriptor
this may be in use by programs, and adding tracking/refcounting just to
delete a descriptor layout isn't worth the effort

cc: mesa-stable

(cherry picked from commit 272cf1db8e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:57 -07:00
Gert Wollny
2c67b0fac6 r600/sfn: make sure kill and update_exec don't happen in one group
v2: - Correctly test in multi-slot split whether the group has kill if
      we want to add a multi-slot op.
    - update group_has_predicate if an according vector op was added

Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 317345cc98)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:57 -07:00
Gert Wollny
e082f6b6c0 r600/sfn: Track whether a ALU group has a exec flag update
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 0d065a2421)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:56 -07:00
Gert Wollny
a12369eb3d r600/sfn: move some common code into try_readport
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 51e7c477d6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:55 -07:00
Gert Wollny
6670d0742b r600/sfn: extract function to update group after instr insert
Fixes: 359bfc3138 ("r600/sfn: make sure that kill and update pred are not in the same group")

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit a7f477b51f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:55 -07:00
Mike Blumenkrantz
a7a020dde6 zink: collapse mesh pipeline fetching and binding conditionals
this avoids taking the wrong conditional if a pipeline fetch fails

cc: mesa-stable

(cherry picked from commit 343eef990e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:54 -07:00
Mike Blumenkrantz
7e15070ee1 zink: collapse gfx pipeline fetching and binding conditionals
this avoids taking the wrong conditional if a pipeline fetch fails

cc: mesa-stable

(cherry picked from commit 0b24fd174a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:54 -07:00
Sagar Ghuge
0edb1852a7 vulkan/runtime: Fix typo in stack size calculation
Fixes: 69a04151db ("vulkan/runtime: add ray tracing pipeline support")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
(cherry picked from commit a00560f763)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:53 -07:00
Alyssa Rosenzweig
3ce875a2d0 anv: use D3D-compatible texturing for Proton
Intel & AMD Direct3D drivers modify their rounding behaviour for texturing to
match Direct3D expectations. Such behaviour is not conformant in Vulkan, and
Intel hardware lacks a reasonable way to get NVIDIA's behaviour (which uniquely
works for Vulkan & Direct3D). The second best choice is to use
Direct3D-compatible behaviour for Proton (via driconf) and our current
Vulkan-conformant behaviour everywhere else. Given the APIs diverge and there is
no Vulkan extension to control the behaviour explicitly, driconf'ing on the
engineName is the reasonable solution.

anv already has a anv_force_filter_addr_rounding driconf option to force
Direct3D behaviour for certain Direct3D titles. Here we simply apply it to all
D3D10+ titles, aligning us with the Windows driver.

Note that D3D9 does not have this behaviour. We therefore use standard Vulkan
behaviour for D3D9 to avoid breaking D3D9 titles, even though the engineName is
the same as D3D10+.

This is the same solution radv uses, they call it radv_disable_trunc_coord. We
could unify the driconf entries later.

See https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38098#note_3166306
for a more detailed analysis, as well as the linked references:

   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27337
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25911
   https://github.com/HansKristian-Work/vkd3d-proton/pull/1884

This fixes misrendering in piles of Direct3D games run on anv via Proton,
including Assassin's Creed Valhalla.

Cc: mesa-stable
Closes: #13886
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Co-authored-by: Calder Young <cgiacun@gmail.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
(cherry picked from commit 7a71952762)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:52 -07:00
Dylan Baker
fd777ce645 .pick_status.json: Update to 3334284845
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38268>
2025-10-30 12:13:41 -07:00
Dylan Baker
315b688976 VERSION: bump for rc3
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-10-30 11:35:06 -07:00
Job Noorman
3a71d94735 spirv: don't set in_bounds for structs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The arr::in_bounds field was set unconditionally for every deref created
for a chain. For struct derefs, which don't have this field, this would
write to an unused memory location, which is probably why this never
caused issues.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: f19cbe98e3 ("nir,spirv: Preserve inbounds access information")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 0ac55b786a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:18 -07:00
Benjamin Cheng
8a2bf930bb radv/video: Override H265 SPS unaligned resolutions
VCN requires 64x16 alignment for HEVC. When the app requests non-aligned
resolutions, make up for it with conformance window cropping.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Cc: mesa-stable
(cherry picked from commit cef8eff74d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:18 -07:00
Benjamin Cheng
ac492d42be radv/video: Override H265 SPS block size parameters
VCN only supports this set of parameters.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Cc: mesa-stable
(cherry picked from commit 84b6d8e0d7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:17 -07:00
Lionel Landwerlin
2e17fd0cb2 vulkan/render_pass: Add a missing sType
Fixes: 3a204d5cf3 ("vulkan/render_pass: Add a better helper for render pass inheritance")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit c5740c2548)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:16 -07:00
Marek Olšák
9311f170c4 zink: fix mesh and task shader pipeline statistics
Fixes: 9d0e73335a - zink: enable GL_EXT_mesh_shader
(cherry picked from commit 41a8c4d37c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:15 -07:00
Dylan Baker
3e227a04b1 .pick_status.json: Update to 32b646c597
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-28 12:53:10 -07:00
Sagar Ghuge
f63a5df30b brw/rt: fix ray_object_(direction|origin) for closest-hit shaders
We were returning world BVH level for origin/direction, this commit
fixes by retuning correct object BVH level origin/direction.

Fixes: aaff191356 ("brw/rt: fix ray_object_(direction|origin) for closest-hit shaders")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 89fbcc8c34)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Sagar Ghuge
9ba765e3e3 brw/rt: Move nir_build_vec3_mat_mult_col_major helper to header
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 3edeb1e191)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mike Blumenkrantz
8010d0cd39 zink: disable primitiveFragmentShadingRateMeshShader feature
features are auto-enabled, but some of them cause validation errors
which are simple to work around

Fixes: 90f3c57337 ("zink: hook up VK_EXT_mesh_shader")
(cherry picked from commit a2ef369abf)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Job Noorman
f1f32d557e ir3/ra: fix assert during file start reset
While accounting for an input register's merge set when resetting the
file start after the preamble, we implicitly assume that the allocated
register is the preferred one by asserting that the register's merge set
offset is not smaller than its physreg (to prevent an underflow).
However, inputs are not guaranteed to have their preferred register
allocated which causes the assert to get triggered.

Fix this by only taking the whole merge set into account for inputs that
actually got their preferred register allocated.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 9d4ba885bb ("ir3/ra: make main shader reg select independent of preamble")
(cherry picked from commit f84d85790e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Natalie Vock
05e5db1a4d nir/lower_shader_calls: Repair SSA after wrap_instrs
Wrapping jump instructions that are located inside ifs can break SSA
invariants because the else block no longer dominates the merge block.
Repair the SSA to make the validator happy again.

Cc: mesa-stable
(cherry picked from commit 50e65dac79)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Taras Pisetskyi
5ae8474029 drirc/anv: force_vk_vendor=-1 for Wuthering Waves
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12459

Signed-off-by: Taras Pisetskyi <taras.pisetskyi@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit dcd9b90aff)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
b3470359bf hk: Allocate the temp tile buffer in copy_image_to_image_cpu
We may require a bigger more than 16KiB to handle the image copy.
We now always allocate a buffer to handle it properly fixing the
remaining failures on VKCTS 1.4.4.0 for HIC.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
(cherry picked from commit d37ba302d0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
5e1a88cea0 hk: Make width and height per block in HIC
We were assuming that every formats used for HIC had a block widgh and
height of 1x1.

This is wrong for compressed formats like BC5, ASTC, ect.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
(cherry picked from commit 887f06a966)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Sagar Ghuge
040453857b anv: Call brw_nir_lower_rt_intrinsics_pre_trace lowering pass
Call this pass before nir_lower_shader_calls().

Fixes: d39e443e ("anv: add infrastructure for common vk_pipeline")
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 006085e676)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
28e172e956 hk: Remove unused allocation in queue_submit
Unused and leaking memory, found with address sanitizer.

Fixes: c64a2bbff5 ("asahi: port to stable uAPI")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 64131475a8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
74880f8954 hk: Disable 1x in sampleLocationsSampleCounts
We don't support it, everyone dropped support for that, let's not expose it.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 7e636d52f1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Mary Guillemard
f02f5e217f hk: Fix maxVariableDescriptorCount with inline uniform block
Same problem as NVK on VKCTS 1.4.4.0

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 8447b99f61)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Dylan Baker
d9636807f7 intel/compiler/brw: Add assert that we don't have a negative value
Coverity notices that `nir_get_io_index_src_number` could return -1, and
that we use it to index an array. It cannot understand that -1 only
happens for unhandled enum values, but all of these are handled. Add an
assert to help it out.

CID: 1667234
Fixes: 37a9c5411f ("brw: serialize messages on Gfx12.x if required")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit a5b9f428f9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Dylan Baker
b768139858 .pick_status.json: Update to 45a762727c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Olivia Lee
498a25cfb8 hk: fix data race when initializing poly_heap
hk_heap is called during command buffer recording, which may be
concurrent, so writing dev->heap without synchronization is a data race.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
(cherry picked from commit bca29b1c92)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-27 09:34:33 -07:00
Connor Abbott
9728bbf7b0 tu: Also disable stencil load for attachments not in GMEM
We were accidentally still emitting loads for D32S8 resolve attachments.

Cc: mesa-stable
(cherry picked from commit a3652af380)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 08:07:30 -07:00
Connor Abbott
f142fdc273 tu: Fix 3d load path with D24S8 on a7xx
We need to always use the FMT6_Z24S8_AS_R8G8B8A8 format for GMEM even if
UBWC is disabled, as already done for the 2d store path. Because we
use the pre-baked RB_MRT_BUF_INFO register value, this means we have to
override it.

Cc: mesa-stable
(cherry picked from commit 9417ce287c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 08:07:28 -07:00
Connor Abbott
1c52a94428 tu: Don't patch GMEM for input attachments never in GMEM
This can happen if we resolve to a resolve attachment and then use that
resolve attachment as an input attachment in a later subpass. We don't
need to put it in GMEM, but it's still considered "written" because
input attachment reads need a dependency after the resolve.

MSRTSS input attachment tests effectively created such a scenario after
lowering to transient multisample attachments and inserting resolves.

Cc: mesa-stable
(cherry picked from commit d491a79027)

Conflicts:
	src/freedreno/vulkan/tu_pass.cc

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 08:07:28 -07:00
Faith Ekstrand
2cfd3c52b2 panvk/shader: Use the right copy size for deserializing dynamic UBOs/SSBOs
Fixes: 563823c9ca ("panvk: Implement vk_shader")
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 64ad337036)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:08 -07:00
Faith Ekstrand
606ebb042e panvk/shader: [de]serialize desc_info.max_varying_loads
Fixes: de86641d3f ("panvk: Limit AD allocation to max var loads in v9+")
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit a546484ed9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:07 -07:00
Samuel Pitoiset
424f37b348 radv: dirty dynamic descriptors when required
The user SGPRS might be different and dynamic descriptors need to be
re-emitted again

This fixes a regression with ANGLE, and VCKTS is currently missing
coverage.

Fixes: a47952d495 ("radv: upload and emit dynamic descriptors separately from push constants")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14146
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 54a6c81d3a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:06 -07:00
Faith Ekstrand
7f75931019 nvk: Capture/replay buffer addresses for EDB capture/replay
Fixes: 3f1c3f04be ("nvk: Advertise VK_EXT_descriptor_buffer")
(cherry picked from commit 998dbd43d3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:05 -07:00
Faith Ekstrand
ba107091c2 nvk: Look at the right pointer in GetDescriptorInfo for SSBOs
It doesn't actually matter but we shouldn't poke at the wrong union
field.

Fixes: 77db71db7d ("nvk: Implement GetDescriptorEXT")
(cherry picked from commit a13474939d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:05 -07:00
Faith Ekstrand
b74000dbce nvk: Emit inactive vertex attributes
VK_KHR_maintenance9 requires that vertex attributes in shaders which map
to vertex attributes that aren't bound at the API return a consistent
value.  In order to do this, we need toemit SET_VERTEX_ATTRIBUTE_A, even
for unused attributes.  The RGBA32F format was chosen to ensure we
return (0, 0, 0, 0) from unbound attributes.

Fixes: 7692d3c0e1 ("nvk: Advertise VK_KHR_maintenance9")
(cherry picked from commit d39221cef3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:04 -07:00
Mauro Rossi
fb2273df78 util: Fix gnu-empty-initializer error
Fixes the following building error happening with clang:

../src/util/os_file.c:291:29: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
   struct epoll_event evt = {};
                            ^
1 error generated.

Fixes: 17e28652 ("util: mimic KCMP_FILE via epoll when KCMP is missing")
Cc: "25.3"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 7bbbfa6670)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:03 -07:00
Connor Abbott
65eb3aed4b tu: Fix RT count with remapped color attachments
The index of each RT is the remapped color attachment index, so we have
to use the remapped indices when telling the HW the number of RTs.

This fixes KHR-GLES3.framebuffer_blit.scissor_blit on ANGLE once we
enabled VK_EXT_multisampled_render_to_single_sampled, which switched
ANGLE to using dynamic rendering with
VK_KHR_dynamic_rendering_local_read.

Fixes: d50eef5b06 ("tu: Support color attachment remapping")
(cherry picked from commit 8d276e0d70)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:02 -07:00
Lionel Landwerlin
a9653fa019 anv: destroy sets when destroying pool
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14169
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 2689056c82)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:02 -07:00
Lionel Landwerlin
159d397437 anv/brw: fix output tcs vertices
brw_prog_tcs_data::instances can be divided by vertices per threads on
earlier generations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit e450297ea9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:01 -07:00
Xaver Hugl
6a7effe059 vulkan/wsi: remove support for VK_COLOR_SPACE_EXTENDED_SRGB_NONLINEAR_EXT
It's not really clear whether or not it should use gamma 2.2 or the piece-wise
transfer function, or how clients would use it for wider gamut in general.
Currently no compositors I know of support ext_srgb, so this shouldn't affect
applications in practice.

Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Fixes: 4b663d56 ("vulkan/wsi: implement support for VK_EXT_hdr_metadata on Wayland")
(cherry picked from commit 14fcf145e3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:55:00 -07:00
Konstantin Seurer
2a0a2cc5b0 aco: Fixup out_launch_size_y in the RT prolog for 1D dispatch
launch_size_y is set to ACO_RT_CONVERTED_2D_LAUNCH_SIZE for 1D
dispatches. The prolog needs to set it to 1 so that the app shader
loads the correct value.

cc: mesa-stable

(cherry picked from commit 47ffe2ecd4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:59 -07:00
Faith Ekstrand
3f9f4d79d3 nvk: Disable sampleLocationsSampleCounts for 1x MSAA
Suggested-by: Mel Henning <mhenning@darkrefraction.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14108
Fixes: a34edc7500 ("nvk: Fill out sample locations on Maxwell B+")
(cherry picked from commit aa0f404f7b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:59 -07:00
Faith Ekstrand
cd253df92a nvk: Include the chipset in the pipeline/binary cache UUID
Cc: mesa-stable
(cherry picked from commit d1793c7a59)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:58 -07:00
Lionel Landwerlin
bfd09d9891 nir/lower_io: add missing levels intrinsics to get_io_index_src_number
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7ac46a1d8 ("nir/lower_io: add get_io_index_src_number support for image intrinsics")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit aa929ea706)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:57 -07:00
Lionel Landwerlin
dcecd8fd1e brw: handle GLSL/GLSL tessellation parameters
Apparently various tessellation parameters come specified from
TESS_EVAL stage in GLSL while they come from the TESS_CTRL stage in
HLSL.

We switch to store the tesselation params more like shader_info with 0
values for unspecified fields. That let's us merge it with a simple OR
with values from from tcs/tes and the resulting merge can be used for
state programming.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a91e0e0d61 ("brw: add support for separate tessellation shader compilation")
Fixes: 50fd669294 ("anv: prep work for separate tessellation shaders")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit f3df267735)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:56 -07:00
Lionel Landwerlin
1648f759c1 anv: rename structure holding 3DSTATE_WM_DEPTH_STENCIL state
Cc stable for the next commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 8d05b7b72e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:55 -07:00
Valentine Burley
d5f7261ce5 tu: Fix maxVariableDescriptorCount with inline uniform blocks
It must not be larger than maxInlineUniformBlockSize.

Fixes VKCTS 1.4.4.0's
dEQP-VK.api.maintenance3_check.support_count_inline_uniform_block*.

Cc: mesa-stable

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
(cherry picked from commit fd2fa0fbc9)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:54 -07:00
Valentine Burley
2c1c52a8c8 tu: Fix indexing with variable descriptor count
Based on RADV.
The Vulkan spec says:
    "If bindingCount is zero or if this structure is not included in
     the pNext chain, the VkDescriptorBindingFlags for each descriptor
     set layout binding is considered to be zero. Otherwise, the
     descriptor set layout binding at
     VkDescriptorSetLayoutCreateInfo::pBindings[i] uses the flags in
     pBindingFlags[i]."

Fixes dEQP-VK.api.maintenance3_check.* in VKCTS 1.4.4.0.

Cc: mesa-stable

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
(cherry picked from commit 17e25b4983)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:52 -07:00
Dylan Baker
fe3a3b08c9 .pick_status.json: Update to fd55e874ed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38167>
2025-10-24 07:54:46 -07:00
Dylan Baker
d9812eaea8 VERSION: bump for rc2
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-10-22 16:13:33 -07:00
Benjamin Cheng
be191ceff7 radv/video_enc: Cleanup slice count assert
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This was left over when first enabling multiple slice encoding.

Fixes: 63e952ff2c ("radv/video: Support encoding multiple slices")
Reviewed-by: David Rosca <david.rosca@amd.com>
(cherry picked from commit b6d6c1af73)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:39 -07:00
Pierre-Eric Pelloux-Prayer
49bfddbd11 radeonsi: propagate shader updates for merged shaders
In case of merged shaders (eg: VS+GS), a change to VS should trigger
a GS update.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13935
Fixes: b1a34ac95d ("radeonsi: change do_update_shaders boolean to a bitmask")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 90103fe618)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:39 -07:00
Faith Ekstrand
0182cde848 util: Build util/cache_ops_x86.c with -msse2
__builtin_ia32_clflush() requires -msse2 so we need to set -msse2 at
least for building that file.  Fortunately, there are no GPUs that
actually need userspace cache flushing that can ever be bolted onto a
pre-SSE2 x86 CPUs.

Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14134
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit efbecd93ba)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:38 -07:00
Faith Ekstrand
94ec7c686d util: Don't advertise cache ops on x86 without SSE2
Fixes: 555881e574 ("util/cache_ops: Add some cache flush helpers")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit 3739d7a90c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:37 -07:00
Olivia Lee
4202ea6c7f panfrost: fix cl_local_size for precompiled shaders
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit a410d90fd2)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:37 -07:00
Rhys Perry
10475e8ac1 amd/lower_mem_access_bit_sizes: fix shared access when bytes<bit_size/8
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.

This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit b18421ae3d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:36 -07:00
Rhys Perry
c1cf6e75ae amd/lower_mem_access_bit_sizes: be more careful with 8/16-bit scratch load
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.3
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit e89b22280f)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:34 -07:00
Rhys Perry
2b8675fd86 amd/lower_mem_access_bit_sizes: improve subdword/unaligned SMEM lowering
Summary of changes:
- handle unaligned 16-bit scalar loads when supported_dword=true
- increases the size of 8/16/32/64-bit buffer loads which are not dword
  aligned, which can create less SMEM loads.
- handles when "bytes" is less than "bit_size / 8"

fossil-db (gfx1201):
Totals from 26 (0.03% of 79839) affected shaders:
Instrs: 12676 -> 12710 (+0.27%); split: -0.30%, +0.57%
CodeSize: 67272 -> 67384 (+0.17%); split: -0.24%, +0.40%
Latency: 44399 -> 44375 (-0.05%); split: -0.09%, +0.04%
SClause: 352 -> 344 (-2.27%)
SALU: 3972 -> 3992 (+0.50%)
SMEM: 554 -> 528 (-4.69%)

fossil-db (navi21):
Totals from 6 (0.01% of 79825) affected shaders:
Instrs: 2192 -> 2186 (-0.27%)
CodeSize: 12188 -> 12140 (-0.39%)
Latency: 10037 -> 10033 (-0.04%); split: -0.12%, +0.08%
SMEM: 124 -> 118 (-4.84%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 8829fc3bd6)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:32 -07:00
Rhys Perry
e967da84a8 amd/lower_mem_access_bit_sizes: don't create subdword UBO loads with LLVM
These are unsupported.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14127
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit 79b2fa785d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:31 -07:00
Dylan Baker
2a8f2ff397 .pick_status.json: Update to e38491eb18
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-22 09:03:27 -07:00
Mel Henning
7a30a71c45 nvk: VK_DEPENDENCY_ASYMMETRIC_EVENT_BIT_KHR
This was missed in the original maintenance9 MR.

Fixes the flakes in test
dEQP-VK.synchronization2.op.single_queue.event.write_ssbo_compute_read_ssbo_compute.buffer_16384_maintenance9

Fixes: 7692d3c0 ("nvk: Advertise VK_KHR_maintenance9")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit 28fbc6addb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:55 -07:00
Karol Herbst
9c57c0a194 nak: fix MMA latencies on Ampere
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 7a01953a39 ("nak: Add Ampere and Ada latency information")
(cherry picked from commit e7dca5a6ca)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:54 -07:00
Karol Herbst
425c49ebf2 nak: ensure deref has a ptr_stride in cmat load/store lowering
With untyped pointer we might get a deref_cast with a 0 ptr_stride. But we
were supposed to ignore the stride information on the pointer anyway, so
let's do that properly now.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
(cherry picked from commit 3bbf3f7826)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:54 -07:00
Karol Herbst
7b7cb63a14 nak: extract cmat load/store element offset calculation
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Fixes: 05dca16143 ("nak: extract nir_intrinsic_cmat_load lowering into a function")
(cherry picked from commit f632bfc715)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:53 -07:00
Faith Ekstrand
1941ada4a6 panvk: Fix integer dot product properties
We already set has_[su]dot_4x8[_sat] in nir_shader_compiler_options so
we're already getting the opcodes.  We just need to advertise the
features properly.  If bifrost_compile.h is to be believed, those are
all available starting at gen 9.

Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/218
Closes: https://gitlab.freedesktop.org/panfrost/mesa/-/issues/219
Fixes: f7f9b3d170 ("panvk: Move to vk_properties")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 38950083ae)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:52 -07:00
Lionel Landwerlin
e982234bb6 nir/divergence: fix handling of intel uniform block load
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 255d1e883d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:52 -07:00
Lionel Landwerlin
dbbadebe13 brw: fix ballot() type operations in shaders with HALT instructions
Fixes dEQP-VK.reconvergence.terminate_invocation.bit_count

LNL fossildb stats:

 Totals from 16489 (3.36% of 490184) affected shaders:
 Instrs: 3710499 -> 3710500 (+0.00%)
 Cycle count: 91601018 -> 90305642 (-1.41%); split: -1.81%, +0.40%
 Max dispatch width: 523936 -> 523952 (+0.00%); split: +0.02%, -0.01%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 757c042e39)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:51 -07:00
Lionel Landwerlin
0d100cc078 brw: only consider cross lane access on non scalar VGRFs
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1bff4f93ca ("brw: Basic infrastructure to store convergent values as scalars")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 70aa028f27)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:50 -07:00
Lionel Landwerlin
f656d062e3 brw: constant fold u2u16 conversion on MCS messages
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bddfbe7fb1 ("brw/blorp: lower MCS fetching in NIR")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit f48c9c3a37)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:50 -07:00
Mel Henning
847ad886d6 nvk: Really fix maxVariableDescriptorCount w/ iub
I didn't test "nvk: Fix maxVariableDescriptorCount with iub" as
thoroughly as I should have and it regressed
dEQP-VK.api.maintenance3_check.descriptor_set because we were then
violating the requirement that maxPerSetDescriptors describes a limit
that's guaranteed to be supported (and reported as supported in
GetDescriptorSetLayoutSupport).

That commit was also based on a misreading of nvk_nir_lower_descriptors.c
where I thought that the end offset of an inline uniform block needed to
be less than the size of a UBO. That is not the case - on closer
inspection that code gracefully falls back to placing IUBs in globablmem
if necessary. So, we can afford to be less strict about our IUB sizing
and only require that IUBs follow the existing limit imposed by
maxInlineUniformBlockSize.

Fixes: ff7f785f09 ("nvk: Fix maxVariableDescriptorCount with iub")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
(cherry picked from commit 77cd629b34)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:48 -07:00
Emma Anholt
5dcc65643c nir/shrink_stores: Don't shrink stores to an invalid num_components.
Avoids a regression in the CL CTS on the next commit.

Fixes: 2dba7e6056 ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors")
(cherry picked from commit 537cc4e0ff)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:47 -07:00
Yiwei Zhang
ab7bda0a1b panvk: fix to advance vs res_table properly
Fix a regression from an unfortunate typo.

Fixes: 48e8d6d207 ("panfrost, panvk: The size of resource tables needs to be a multiple of 4.")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 387f75f43d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:29 -07:00
Yiwei Zhang
a02d8d5767 panvk: fix to advance vs driver_set properly
Should only set once outside the multidraw loop so that per draw can
patch its own own desc attribs when needed.

Fixes: a5a0dd3ccc ("panvk: Implement multiDrawIndirect for v10+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
(cherry picked from commit 800c4d3430)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:29 -07:00
Timur Kristóf
13fa1460dd ac/nir/ngg_mesh: Lower num_subgroups to constant
Mesh shader workgroups always have the same amount of subgroups.

When the API workgroup size is the same as the real workgroup
size, this is a small optimization (using a constant instead of
a shader arg).

When the API workgroup size is smaller than the real workgroup
size (eg. when the number of output vertices or primitves is
greater than the API workgroup size on RDNA 2), this fixes a
potential bug because num_subgroups would return the "real"
workgroup size instead of the API one.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
(cherry picked from commit d20049b430)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:29 -07:00
Patrick Lerda
14544ef278 r600: update nplanes support
This change fixes "piglit/bin/ext_image_dma_buf_import-export -auto".

Fixes: 02aaf360ae ("r600: Implement resource_get_param")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 84dc9af3d4)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:28 -07:00
Patrick Lerda
602b4a2924 r600: fix r600_draw_rectangle refcnt imbalance
The object buf is referenced at the beginning of the
r600_draw_rectangle() function and should be freed
at the end. This issue was introduced with cbb6e0277f.

Fixes: cbb6e0277f ("r600: stop using util_set_vertex_buffers")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 3b1e3a40a8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:28 -07:00
Jose Maria Casanova Crespo
717e8a8caf v3d: mark FRAG_RESULT_COLOR as output_written on SAND blits FS
With the introduction of "v3d: Add support for 16bit normalised
formats" https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35820
nir_lower_fragcolor is always called if shaders outputs_written shows
that FRAG_RESULT_COLOR is used.

But on SAND8/30 blit fragment shaders although the FRAG_RESULT_COLOR
is used, it was not marked as output_written so the lowering was not
applied.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14141
Fixes: ee48e81b26 ("v3d: Always lower frag color")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit a131530dd1)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:27 -07:00
Emma Anholt
40ff53c5b8 wsi: Fix the flagging of dma_buf_sync_file for the amdgpu workaround.
In my regression fix, I covered one of the two paths that had stopped
setting the implicit_sync flag and thus triggered the amdgpu behavior we
don't want, but probably the less common one.

Fixes: f7cbc7b1c5 ("radv: Allocate BOs as implicit sync even if the WSI is doing implicit sync.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13942
(cherry picked from commit aa96444149)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:27 -07:00
Marek Olšák
bf9e1f2e37 winsys/radeon: fix completely broken tessellation for gfx6-7
The info was moved to radeon_info, but it was only set for the amdgpu
kernel driver. It was uninitialized for radeon.

Fixes: d82eda72a1 - ac/gpu_info: move HS info into radeon_info

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit f5b648f6d3)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:26 -07:00
Benjamin Cheng
c3cf272a04 radv/video: Fill maxCodedExtent caps first
Later code (i.e. max qp map extent filling) depends on this.

Fixes: ae6ea69c85 ("radv: Implement VK_KHR_video_encode_quantization_map")
Reviewed-by: David Rosca <david.rosca@amd.com>
(cherry picked from commit b1370e1935)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:25 -07:00
Dylan Baker
30ba8880b4 .pick_status.json: Update to 28fbc6addb
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-21 14:42:24 -07:00
Job Noorman
42ab1c6f3c nir: mark fneg distribution through fadd/ffma as nsz
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.

However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
(cherry picked from commit ad421cdf2e)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:30 -07:00
Josh Simmons
674e2a702a radv: Fix crash in sqtt due to uninitalized value
Fixes: 772b9ce411 ("radv: Remove qf from radv_spm/sqtt/perfcounter where applicable")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit b10c1a1952)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:29 -07:00
Mike Blumenkrantz
756618ee3b zink: consistently set/unset msrtss in begin_rendering
this has to always be set or unset, never persistent from previous renderpass

Fixes: 5080f2b6f5 ("zink: disable msrtss handling when blitting")
(cherry picked from commit f74cf45078)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:28 -07:00
Marek Olšák
ca7d2daf5f r300: fix DXTC blits
Fixes: 9d359c6d10 - gallium: delete pipe_surface::width and pipe_surface::height
(cherry picked from commit 733ba77bfe)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:27 -07:00
Xaver Hugl
45aafef631 vulkan/wsi: require extended target volume support for scRGB
It's hardly going to be useful without that

Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Fixes: 4b663d56 ("vulkan/wsi: implement support for VK_EXT_hdr_metadata on Wayland")
(cherry picked from commit 892cf427a0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:25 -07:00
Dylan Baker
8711394383 .pick_status.json: Mark c20e2733bf as denominated
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:41:23 -07:00
Dylan Baker
289c768e88 .pick_status.json: Update to ad421cdf2e
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-17 07:40:05 -07:00
Lionel Landwerlin
84655b4b5d anv: fix image-to-image copies of TileW images
The intermediate buffer between the 2 images is linear, its stride
should be a function of the tile's logical width.

Normally this should map to the values reported by ISL except for
TileW where for some reason it was decided to report 128 for TileW
instead of the actual 64 size (see isl_tiling_get_info() ISL_TILING_W
case)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
(cherry picked from commit 77fb8fb062)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:38 -07:00
Valentine Burley
fd6b9c70b6 docs: Update LAVA caching setup
After a recent change, `piglit-traces.sh` automatically sets the caching
proxy, so update the docs to reflect this.

Also update the name of the variable from `FDO_HTTP_CACHE_URI` to
`LAVA_HTTP_CACHE_URI`.

Fixes: fa74e939bf ("ci/piglit: automatically use LAVA proxy")

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
(cherry picked from commit 28e73a6239)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:37 -07:00
Lionel Landwerlin
9bb7bf9c66 Revert "wsi: Implements scaling controls for DRI3 presentation."
This reverts commit a219308867.

It's failing most of the tests on Anv :

$ ./deqp-vk -n dEQP-VK.wsi.xlib.maintenance1.scaling.*

Test run totals:
  Passed:        88/2422 (3.6%)
  Failed:        576/2422 (23.8%)
  Not supported: 1758/2422 (72.6%)
  Warnings:      0/2422 (0.0%)
  Waived:        0/2422 (0.0%)

The only passing tests seem to be with this pattern :

 dEQP-VK.wsi.xlib.maintenance1.scaling.*.same_size_and_aspect

(cherry picked from commit 2baa3b8c06)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:36 -07:00
Dylan Baker
f510e6a1bd .pick_status.json: Update to 3b2f7ed918
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38010>
2025-10-16 11:37:29 -07:00
Dylan Baker
40f7bef16c VERSION: bump for 25.3.0-rc1
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2025-10-15 20:56:25 -07:00
215 changed files with 20116 additions and 2391 deletions

View file

@ -34,7 +34,6 @@
# anholt | (decommissioned) | @anholt
# austriancoder | ci-tron | @austriancoder
# collabora | lava | @daniels, @sergi
# google-freedreno | none (moving to LAVA) | @daniels, @sergi
# igalia | baremetal/poe-powered, ci-tron | @jasuarez, @chema
# lima | lava | @enunes
# microsoft | custom | @jenatali, @alatiera
@ -293,15 +292,6 @@
- !reference [.pengutronix-farm-rules, rules]
# Temporary placeholder as the devices move across to LAVA.
.google-freedreno-farm-rules:
rules:
- when: never
.google-freedreno-farm-manual-rules:
rules:
- when: never
# Skip container & build jobs when disabling any farm, and run them if any
# farm gets re-enabled.
# Only apply these rules in MR context, because otherwise we get a false

View file

@ -118,7 +118,6 @@ def main():
# before we make it to 9-digit jobs (we're at 7 so far).
nick = args.runner
nick = nick.replace('mesa-', '')
nick = nick.replace('google-freedreno-', '')
nick += f'-{args.job}'
irc.send_line(f"NICK {nick}")
irc.send_line(f"USER {nick} unused unused: Gitlab CI Notifier")

View file

@ -60,6 +60,8 @@
- subprojects/**/*
- .gitattributes
- src/*
- src/android_stub/**/*
- src/c11/**/*
- src/compiler/**/*
- src/drm-shim/**/*
- src/gtest/**/*

11872
.pick_status.json Normal file

File diff suppressed because it is too large Load diff

View file

@ -1 +1 @@
25.3.0-devel
25.3.0

View file

@ -122,9 +122,8 @@ Enable the site and restart nginx:
# Second download should be cached.
wget http://localhost/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo-v2.trace
Now, set ``download-url`` in your ``traces-*.yml`` entry to something like
``http://caching-proxy/cache/?uri=https://s3.freedesktop.org/mesa-tracie-public``
and you should have cached downloads for traces. Add it to
``FDO_HTTP_CACHE_URI=`` in your ``config.toml`` runner environment lines and you
can use it for cached artifact downloads instead of going all the way to
freedesktop.org on each job.
The trace runner script automatically sets the caching proxy, so there's no
need to modify anything in the Mesa CI YAML files.
Add ``LAVA_HTTP_CACHE_URI=http://localhost/cache/?uri=`` to your ``config.toml``
runner environment lines and you can use it for cached artifact downloads
instead of going all the way to freedesktop.org on each job.

View file

@ -3,6 +3,7 @@ Release Notes
The release notes summarize what's new or changed in each Mesa release.
- :doc:`25.3.0 release notes <relnotes/25.3.0>`
- :doc:`25.2.5 release notes <relnotes/25.2.5>`
- :doc:`25.2.4 release notes <relnotes/25.2.4>`
- :doc:`25.2.3 release notes <relnotes/25.2.3>`
@ -466,6 +467,7 @@ The release notes summarize what's new or changed in each Mesa release.
:maxdepth: 1
:hidden:
25.3.0 <relnotes/25.3.0>
25.2.5 <relnotes/25.2.5>
25.2.4 <relnotes/25.2.4>
25.2.3 <relnotes/25.2.3>

6071
docs/relnotes/25.3.0.rst Normal file

File diff suppressed because it is too large Load diff

View file

@ -1,89 +0,0 @@
EGL_EXT_create_context_robustness support on Panfrost V10+
GL_ARB_robust_buffer_access_behavior, GL_KHR_robust_buffer_access_behavior and GL_KHR_robustness support on Panfrost
VK_EXT_mutable_descriptor_type on panvk/v9+
GL_KHR_robustness on v3d
VK_ARM_shader_core_builtins on panvk
VK_KHR_shader_untyped_pointers on anv
cl_ext_immutable_memory_objects
VK_KHR_video_encode_intra_refresh on radv
VK_KHR_video_encode_quantization_map on radv
GL_ATI_meminfo and GL_NVX_gpu_memory_info on r300
VK_KHR_shader_untyped_pointers on anv and RADV
VK_KHR_maintenance8 on NVK
VK_KHR_maintenance9 on NVK
cl_khr_semaphore on radeonsi and zink
cl_khr_external_semaphore on radeonsi and zink
cl_khr_external_semaphore_sync_fd on radeonsi and zink
GL_NV_shader_atomic_int64 on radeonsi and Panfrost V9+
VK_KHR_maintenance7 on panvk/v10+
VK_KHR_maintenance8 on panvk/v10+
VK_KHR_maintenance9 on panvk
VK_AMD_buffer_marker on NVK
VK_EXT_ycbcr_2plane_444_formats on radv
Removed VDPAU frontend
GL_NV_representative_fragment_test on zink
VK_KHR_maintenance9 on HoneyKrisp
sparseBinding on panvk/v10+
sparseResidencyBuffer on panvk/v10+
Vulkan 1.2 on pvr
VK_KHR_create_renderpass2 on pvr
VK_KHR_dedicated_allocation on pvr
VK_KHR_depth_stencil_resolve on pvr
VK_KHR_descriptor_update_template on pvr
VK_KHR_imageless_framebuffer on pvr
VK_KHR_line_rasterization on pvr
VK_KHR_maintenance1 on pvr
VK_KHR_maintenance2 on pvr
VK_KHR_maintenance3 on pvr
VK_KHR_multiview on pvr
VK_KHR_robustness2 on pvr
VK_KHR_separate_depth_stencil_layouts on pvr
VK_KHR_shader_draw_parameters on pvr
VK_KHR_shader_float_controls on pvr
VK_KHR_shader_subgroup_extended_types on pvr
VK_KHR_spirv_1_4 on pvr
VK_KHR_shader_terminate_invocation on pvr
VK_KHR_swapchain_mutable_format on pvr
VK_KHR_vertex_attribute_divisor on pvr
VK_EXT_border_color_swizzle on pvr
VK_EXT_color_write_enable on pvr
VK_EXT_custom_border_color on pvr
VK_EXT_depth_clamp_zero_one on pvr
VK_EXT_depth_clip_enable on pvr
VK_EXT_extended_dynamic_state on pvr
VK_EXT_extended_dynamic_state2 on pvr
VK_EXT_extended_dynamic_state3 on pvr
VK_EXT_image_2d_view_of_3d on pvr
VK_EXT_line_rasterization on pvr
VK_EXT_physical_device_drm on pvr
VK_EXT_provoking_vertex on pvr
VK_EXT_robustness2 on pvr
VK_EXT_queue_family_foreign on pvr
VK_EXT_separate_stencil_usage on pvr
VK_EXT_shader_demote_to_helper_invocation on pvr
VK_EXT_vertex_attribute_divisor on pvr
imageCubeArray on pvr
independentBlend on pvr
sampleRateShading on pvr
logicOp on pvr
drawIndirectFirstInstance on pvr
alphaToOne on pvr
samplerAnisotropy on pvr
shaderStorageImageExtendedFormats on pvr
shaderStorageImageReadWithoutFormat on pvr
shaderStorageImageWriteWithoutFormat on pvr
shaderClipDistance on pvr
shaderCullDistance on pvr
VK_EXT_zero_initialize_device_memory on pvr
VK_KHR_sampler_mirror_clamp_to_edge on pvr
VK_KHR_shader_non_semantic_info on pvr
VK_KHR_shader_relaxed_extended_instruction on pvr
VK_EXT_shader_replicated_composites on pvr
VK_KHR_device_group_creation on pvr
VK_KHR_map_memory2 on pvr
VK_EXT_map_memory_placed on pvr
VK_KHR_device_group on pvr
VK_KHR_buffer_device_address on pvr
GL_EXT_mesh_shader on zink
VK_KHR_wayland_surface on pvr
VK_NVX_image_view_handle on NVK

View file

@ -1489,8 +1489,6 @@ struct drm_amdgpu_info_hw_ip {
__u32 available_rings;
/** version info: bits 23:16 major, 15:8 minor, 7:0 revision */
__u32 ip_discovery_version;
/* Userq available slots */
__u32 userq_num_slots;
};
/* GFX metadata BO sizes and alignment info (in bytes) */

View file

@ -979,14 +979,20 @@ extern "C" {
* 2 = Gob Height 8, Turing+ Page Kind mapping
* 3 = Reserved for future use.
*
* 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further
* bit remapping step that occurs at an even lower level than the
* page kind and block linear swizzles. This causes the layout of
* surfaces mapped in those SOC's GPUs to be incompatible with the
* equivalent mapping on other GPUs in the same system.
* 22:22 s Sector layout. There is a further bit remapping step that occurs
* 26:27 at an even lower level than the page kind and block linear
* swizzles. This causes the bit arrangement of surfaces in memory
* to differ subtly, and prevents direct sharing of surfaces between
* GPUs with different layouts.
*
* 0 = Tegra K1 - Tegra Parker/TX2 Layout.
* 1 = Desktop GPU and Tegra Xavier+ Layout
* 0 = Tegra K1 - Tegra Parker/TX2 Layout
* 1 = Pre-GB20x, GB20x 32+ bpp, GB10, Tegra Xavier-Orin Layout
* 2 = GB20x(Blackwell 2)+ 8 bpp surface layout
* 3 = GB20x(Blackwell 2)+ 16 bpp surface layout
* 4 = Reserved for future use.
* 5 = Reserved for future use.
* 6 = Reserved for future use.
* 7 = Reserved for future use.
*
* 25:23 c Lossless Framebuffer Compression type.
*
@ -1001,7 +1007,7 @@ extern "C" {
* 6 = Reserved for future use
* 7 = Reserved for future use
*
* 55:25 - Reserved for future use. Must be zero.
* 55:28 - Reserved for future use. Must be zero.
*/
#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
fourcc_mod_code(NVIDIA, (0x10 | \
@ -1009,6 +1015,7 @@ extern "C" {
(((k) & 0xff) << 12) | \
(((g) & 0x3) << 20) | \
(((s) & 0x1) << 22) | \
(((s) & 0x6) << 25) | \
(((c) & 0x7) << 23)))
/* To grandfather in prior block linear format modifiers to the above layout,
@ -1017,7 +1024,7 @@ extern "C" {
* which corresponds to the "generic" kind used for simple single-sample
* uncompressed color formats on Fermi - Volta GPUs.
*/
static __inline__ __u64
static inline __u64
drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
{
if (!(modifier & 0x10) || (modifier & (0xff << 12)))

View file

@ -2191,7 +2191,7 @@ endif
with_sysprof = get_option('sysprof')
if with_sysprof
dep_sysprof = dependency('sysprof-capture-4', version: '>= 3.38.0')
dep_sysprof = dependency('sysprof-capture-4', version: '>= 4.49.0')
pre_args += '-DHAVE_SYSPROF'
endif

View file

@ -315,8 +315,6 @@ ac_query_gpu_info(int fd, void *dev_p, struct radeon_info *info,
info->ip[ip_type].num_queues = 1;
} else if (ip_info.available_rings) {
info->ip[ip_type].num_queues = util_bitcount(ip_info.available_rings);
} else if (ip_info.userq_num_slots) {
info->ip[ip_type].num_queue_slots = ip_info.userq_num_slots;
} else {
continue;
}
@ -1696,11 +1694,11 @@ void ac_print_gpu_info(const struct radeon_info *info, FILE *f)
fprintf(f, " clock_crystal_freq = %i KHz\n", info->clock_crystal_freq);
for (unsigned i = 0; i < AMD_NUM_IP_TYPES; i++) {
if (info->ip[i].num_queues || info->ip[i].num_queue_slots) {
fprintf(f, " IP %-7s %2u.%u \tqueues:%u \tqueue_slots:%u \talign:%u \tpad_dw:0x%x\n",
if (info->ip[i].num_queues) {
fprintf(f, " IP %-7s %2u.%u \tqueues:%u \talign:%u \tpad_dw:0x%x\n",
ac_get_ip_type_string(info, i),
info->ip[i].ver_major, info->ip[i].ver_minor, info->ip[i].num_queues,
info->ip[i].num_queue_slots,info->ip[i].ib_alignment, info->ip[i].ib_pad_dw_mask);
info->ip[i].ib_alignment, info->ip[i].ib_pad_dw_mask);
}
}

View file

@ -26,7 +26,6 @@ struct amd_ip_info {
uint8_t ver_minor;
uint8_t ver_rev;
uint8_t num_queues;
uint8_t num_queue_slots;
uint8_t num_instances;
uint32_t ib_alignment;
uint32_t ib_pad_dw_mask;

View file

@ -194,7 +194,6 @@ struct drm_amdgpu_info_hw_ip {
uint32_t ib_size_alignment;
uint32_t available_rings;
uint32_t ip_discovery_version;
uint32_t userq_num_slots;
};
struct drm_amdgpu_info_uq_fw_areas_gfx {

View file

@ -498,6 +498,7 @@ typedef struct rvcn_enc_hevc_encode_params_s {
typedef struct rvcn_enc_av1_encode_params_s {
uint32_t ref_frames[RENCODE_AV1_REFS_PER_FRAME];
uint32_t lsm_reference_frame_index[2];
uint32_t cur_order_hint;
} rvcn_enc_av1_encode_params_t;
typedef struct rvcn_enc_h264_deblocking_filter_s {

View file

@ -109,23 +109,37 @@ lower_mem_access_cb(nir_intrinsic_op intrin, uint8_t bytes, uint8_t bit_size, ui
nir_mem_access_size_align res;
if (intrin == nir_intrinsic_load_shared || intrin == nir_intrinsic_store_shared) {
/* Split unsupported shared access. */
res.bit_size = MIN2(bit_size, combined_align * 8ull);
res.align = res.bit_size / 8;
/* Don't use >64-bit LDS loads for performance reasons. */
unsigned max_bytes = intrin == nir_intrinsic_store_shared && cb_data->gfx_level >= GFX7 ? 16 : 8;
bytes = MIN3(bytes, combined_align, max_bytes);
bytes = bytes == 12 ? bytes : round_down_to_power_of_2(bytes);
/* Split unsupported shared access. */
res.bit_size = MIN2(bit_size, bytes * 8ull);
res.align = res.bit_size / 8;
res.num_components = bytes / res.align;
res.shift = nir_mem_access_shift_method_bytealign_amd;
return res;
}
const bool is_buffer_load = intrin == nir_intrinsic_load_ubo ||
intrin == nir_intrinsic_load_ssbo ||
intrin == nir_intrinsic_load_constant;
if (is_smem) {
const bool supported_subdword = cb_data->gfx_level >= GFX12 &&
intrin != nir_intrinsic_load_push_constant &&
(!cb_data->use_llvm || intrin != nir_intrinsic_load_ubo);
/* Round up subdword loads if unsupported. */
const bool supported_subdword = cb_data->gfx_level >= GFX12 && intrin != nir_intrinsic_load_push_constant;
if (bit_size < 32 && (bytes >= 3 || !supported_subdword))
if (bytes <= 2 && combined_align % bytes == 0 && supported_subdword) {
bit_size = bytes * 8;
} else if (bytes % 4 || combined_align % 4) {
if (is_buffer_load)
bytes += 4 - MIN2(combined_align, 4);
bytes = align(bytes, 4);
bit_size = 32;
}
/* Generally, require an alignment of 4. */
res.align = MIN2(4, bytes);
@ -138,9 +152,6 @@ lower_mem_access_cb(nir_intrinsic_op intrin, uint8_t bytes, uint8_t bit_size, ui
if (!util_is_power_of_two_nonzero(bytes) && (cb_data->gfx_level < GFX12 || bytes != 12)) {
const uint8_t larger = util_next_power_of_two(bytes);
const uint8_t smaller = larger / 2;
const bool is_buffer_load = intrin == nir_intrinsic_load_ubo ||
intrin == nir_intrinsic_load_ssbo ||
intrin == nir_intrinsic_load_constant;
const bool is_aligned = align_mul % smaller == 0;
/* Overfetch up to 1 dword if this is a bounds-checked buffer load or the access is aligned. */
@ -185,8 +196,8 @@ lower_mem_access_cb(nir_intrinsic_op intrin, uint8_t bytes, uint8_t bit_size, ui
const uint32_t max_pad = 4 - MIN2(combined_align, 4);
/* Global loads don't have bounds checking, so increasing the size might not be safe. */
if (intrin == nir_intrinsic_load_global || intrin == nir_intrinsic_load_global_constant) {
/* Global/scratch loads don't have bounds checking, so increasing the size might not be safe. */
if (!is_buffer_load) {
if (align_mul < 4) {
/* If we split the load, only lower it to 32-bit if this is a SMEM load. */
const unsigned chunk_bytes = align(bytes, 4) - max_pad;

View file

@ -1817,10 +1817,25 @@ ac_ngg_get_scratch_lds_size(mesa_shader_stage stage,
} else {
assert(stage == MESA_SHADER_GEOMETRY);
/* Repacking output vertices at the end in ngg_gs_finale() uses 1 dword per 4 waves */
scratch_lds_size = ALIGN(max_num_waves, 4u);
/* streamout take 8 dwords for buffer offset and emit vertex per stream */
if (streamout_enabled)
scratch_lds_size = MAX2(scratch_lds_size, 32);
/* For streamout:
* - Repacking streamout vertices takes 1 dword per 4 waves per stream
* (max 16 bytes for Wave64, 32 bytes for Wave32)
* - 1 dword per stream for buffer info
* (16 bytes)
* - 1 dword per buffer for buffer info
* (16 bytes)
*/
if (streamout_enabled) {
const unsigned num_streams = 4;
const unsigned num_so_buffers = 4;
const unsigned streamout_scratch_size =
num_streams * ALIGN(max_num_waves, 4u) + num_streams * 4 + num_so_buffers * 4;
scratch_lds_size += streamout_scratch_size;
}
}
return scratch_lds_size;

View file

@ -660,6 +660,10 @@ ngg_gs_build_streamout(nir_builder *b, lower_ngg_gs_state *s)
nir_def *export_seq[4] = {0};
nir_def *out_vtx_primflag[4] = {0};
const unsigned scratch_stride = ALIGN(s->max_num_waves, 4);
const unsigned scratch_base_off = scratch_stride;
const unsigned num_streams = util_bitcount(info->streams_written);
u_foreach_bit(stream, info->streams_written) {
out_vtx_primflag[stream] =
ngg_gs_load_out_vtx_primflag(b, stream, tid_in_tg, out_vtx_lds_addr, max_vtxcnt, s);
@ -669,9 +673,8 @@ ngg_gs_build_streamout(nir_builder *b, lower_ngg_gs_state *s)
*/
prim_live[stream] = nir_i2b(b, nir_iand_imm(b, out_vtx_primflag[stream], 1));
unsigned scratch_stride = ALIGN(s->max_num_waves, 4);
nir_def *scratch_base =
nir_iadd_imm(b, s->lds_addr_gs_out_vtx, stream * scratch_stride);
nir_iadd_imm(b, s->lds_addr_gs_out_vtx, stream * scratch_stride + scratch_base_off);
/* We want to export primitives to streamout buffer in sequence,
* but not all vertices are alive or mark end of a primitive, so
@ -697,18 +700,14 @@ ngg_gs_build_streamout(nir_builder *b, lower_ngg_gs_state *s)
export_seq[stream] = rep.repacked_invocation_index;
}
/* Workgroup barrier: wait for LDS scratch reads finish. */
nir_barrier(b, .execution_scope = SCOPE_WORKGROUP,
.memory_scope = SCOPE_WORKGROUP,
.memory_semantics = NIR_MEMORY_ACQ_REL,
.memory_modes = nir_var_mem_shared);
/* Get global buffer offset where this workgroup will stream out data to. */
nir_def *emit_prim[4] = {0};
nir_def *buffer_offsets[4] = {0};
nir_def *so_buffer[4] = {0};
nir_def *buffer_info_scratch_base =
nir_iadd_imm_nuw(b, s->lds_addr_gs_out_vtx, num_streams * scratch_stride + scratch_base_off);
ac_nir_ngg_build_streamout_buffer_info(b, info, s->options->hw_info->gfx_level, s->options->has_xfb_prim_query,
s->options->use_gfx12_xfb_intrinsic, s->lds_addr_gs_out_vtx, tid_in_tg,
s->options->use_gfx12_xfb_intrinsic, buffer_info_scratch_base, tid_in_tg,
gen_prim, so_buffer, buffer_offsets, emit_prim);
u_foreach_bit(stream, info->streams_written) {

View file

@ -508,6 +508,8 @@ lower_ms_intrinsic(nir_builder *b, nir_instr *instr, void *state)
return update_ms_barrier(b, intrin, s);
case nir_intrinsic_load_workgroup_index:
return lower_ms_load_workgroup_index(b, intrin, s);
case nir_intrinsic_load_num_subgroups:
return nir_imm_int(b, DIV_ROUND_UP(s->api_workgroup_size, s->wave_size));
case nir_intrinsic_set_vertex_and_primitive_count:
return lower_ms_set_vertex_and_primitive_count(b, intrin, s);
default:
@ -529,6 +531,7 @@ filter_ms_intrinsic(const nir_instr *instr,
intrin->intrinsic == nir_intrinsic_store_per_primitive_output ||
intrin->intrinsic == nir_intrinsic_barrier ||
intrin->intrinsic == nir_intrinsic_load_workgroup_index ||
intrin->intrinsic == nir_intrinsic_load_num_subgroups ||
intrin->intrinsic == nir_intrinsic_set_vertex_and_primitive_count;
}

View file

@ -338,6 +338,17 @@ Only `s_waitcnt_vscnt null, 0`. Needed even if the first instruction is a load.
NSA MIMG instructions should be limited to 3 dwords before GFX10.3 to avoid
stability issues: https://reviews.llvm.org/D103348
## RDNA2 / GFX10.3 hazards
### SALU EXEC write followed by NSA MIMG instruction
Triggered-by:
Potential stability issues can occur if an SALU instruction changes exec from 0
to non-zero immediately before an NSA MIMG instruction with 4+ dwords.
Mitigated-by: Any instruction, including `s_nop`.
## RDNA3 / GFX11 hazards
### VcmpxPermlaneHazard

View file

@ -129,6 +129,7 @@ struct NOP_ctx_gfx10 {
bool has_branch_after_DS = false;
bool has_NSA_MIMG = false;
bool has_writelane = false;
bool has_salu_exec_write = false;
std::bitset<128> sgprs_read_by_VMEM;
std::bitset<128> sgprs_read_by_VMEM_store;
std::bitset<128> sgprs_read_by_DS;
@ -145,6 +146,7 @@ struct NOP_ctx_gfx10 {
has_branch_after_DS |= other.has_branch_after_DS;
has_NSA_MIMG |= other.has_NSA_MIMG;
has_writelane |= other.has_writelane;
has_salu_exec_write |= other.has_salu_exec_write;
sgprs_read_by_VMEM |= other.sgprs_read_by_VMEM;
sgprs_read_by_DS |= other.sgprs_read_by_DS;
sgprs_read_by_VMEM_store |= other.sgprs_read_by_VMEM_store;
@ -159,6 +161,7 @@ struct NOP_ctx_gfx10 {
has_branch_after_VMEM == other.has_branch_after_VMEM && has_DS == other.has_DS &&
has_branch_after_DS == other.has_branch_after_DS &&
has_NSA_MIMG == other.has_NSA_MIMG && has_writelane == other.has_writelane &&
has_salu_exec_write == other.has_salu_exec_write &&
sgprs_read_by_VMEM == other.sgprs_read_by_VMEM &&
sgprs_read_by_DS == other.sgprs_read_by_DS &&
sgprs_read_by_VMEM_store == other.sgprs_read_by_VMEM_store &&
@ -907,6 +910,15 @@ handle_instruction_gfx10(State& state, NOP_ctx_gfx10& ctx, aco_ptr<Instruction>&
ctx.waits_since_fp_atomic = std::min(ctx.waits_since_fp_atomic, 3);
}
/* 4+ dword NSA can hang if exec becomes non-zero again directly before the instruction. */
if (instr->isSALU() && instr->writes_exec()) {
ctx.has_salu_exec_write = true;
} else if (ctx.has_salu_exec_write) {
ctx.has_salu_exec_write = false;
if (instr->isMIMG() && get_mimg_nsa_dwords(instr.get()) > 1)
bld.sopp(aco_opcode::s_nop, 0);
}
if (state.program->gfx_level != GFX10)
return; /* no other hazards/bugs to mitigate */
@ -2019,13 +2031,15 @@ required_export_priority(Program* program)
void
insert_NOPs(Program* program)
{
if (program->gfx_level >= GFX11) {
NOP_ctx_gfx11 initial_ctx;
bool has_previous_part =
program->is_epilog || program->info.vs.has_prolog || program->info.ps.has_prolog ||
(program->info.merged_shader_compiled_separately && program->stage.sw != SWStage::VS &&
program->stage.sw != SWStage::TES) || program->stage == raytracing_cs;
program->stage.sw != SWStage::TES) ||
program->stage == raytracing_cs;
if (program->gfx_level >= GFX11) {
NOP_ctx_gfx11 initial_ctx;
if (program->gfx_level >= GFX12 && has_previous_part) {
/* resolve_all_gfx11 can't resolve VALUReadSGPRHazard entirely. We have to assume that any
* SGPR might have been read by VALU if there was a previous shader part.
@ -2036,7 +2050,10 @@ insert_NOPs(Program* program)
mitigate_hazards<NOP_ctx_gfx11, handle_instruction_gfx11, resolve_all_gfx11>(program,
initial_ctx);
} else if (program->gfx_level >= GFX10) {
mitigate_hazards<NOP_ctx_gfx10, handle_instruction_gfx10, resolve_all_gfx10>(program);
NOP_ctx_gfx10 initial_ctx;
initial_ctx.has_salu_exec_write = has_previous_part;
mitigate_hazards<NOP_ctx_gfx10, handle_instruction_gfx10, resolve_all_gfx10>(program,
initial_ctx);
} else {
mitigate_hazards<NOP_ctx_gfx6, handle_instruction_gfx6, resolve_all_gfx6>(program);
}

View file

@ -214,6 +214,8 @@ select_rt_prolog(Program* program, ac_shader_config* config,
bld.sop2(Builder::s_cselect, Definition(vcc, bld.lm),
Operand::c32_or_c64(-1u, program->wave_size == 64),
Operand::c32_or_c64(0, program->wave_size == 64), Operand(scc, s1));
bld.sop2(aco_opcode::s_cselect_b32, Definition(out_launch_size_y, s1),
Operand(out_launch_size_y, s1), Operand::c32(1), Operand(scc, s1));
bld.vop2(aco_opcode::v_cndmask_b32, Definition(out_launch_ids[0], v1),
Operand(tmp_invocation_idx, v1), Operand(out_launch_ids[0], v1), Operand(vcc, bld.lm));
bld.vop2(aco_opcode::v_cndmask_b32, Definition(out_launch_ids[1], v1), Operand::zero(),

View file

@ -338,6 +338,22 @@ load_unaligned_vs_attrib(Builder& bld, PhysReg dst, Operand desc, Operand index,
state->current_loads.push_back(load);
}
bool
is_last_attribute_large(const struct aco_vs_prolog_info* pinfo)
{
const struct ac_vtx_format_info* vtx_info_table =
ac_get_vtx_format_info_table(GFX8, CHIP_POLARIS10);
unsigned last_attribute = pinfo->num_attributes - 1;
if ((pinfo->misaligned_mask & (1u << last_attribute))) {
const struct ac_vtx_format_info* vtx_info = &vtx_info_table[pinfo->formats[last_attribute]];
if (vtx_info->chan_byte_size == 8 && vtx_info->num_channels > 2)
return true;
}
return false;
}
} // namespace
void
@ -393,9 +409,11 @@ select_vs_prolog(Program* program, const struct aco_vs_prolog_info* pinfo, ac_sh
has_nontrivial_divisors && (program->gfx_level <= GFX8 || program->gfx_level >= GFX11);
int vgpr_offset = pinfo->misaligned_mask & (1u << (pinfo->num_attributes - 1)) ? 0 : -4;
const bool is_last_attr_large = is_last_attribute_large(pinfo);
unsigned num_vgprs = args->num_vgprs_used;
PhysReg attributes_start = get_next_vgpr(pinfo->num_attributes * 4, &num_vgprs);
PhysReg attributes_start =
get_next_vgpr(pinfo->num_attributes * 4 + (is_last_attr_large ? 4 : 0), &num_vgprs);
PhysReg vertex_index, instance_index, start_instance_vgpr, nontrivial_tmp_vgpr0,
nontrivial_tmp_vgpr1;
if (needs_vertex_index)
@ -625,6 +643,14 @@ select_vs_prolog(Program* program, const struct aco_vs_prolog_info* pinfo, ac_sh
continue_pc = Operand(prolog_input, s2);
}
/* Wait for all pending VMEM loads when the prolog loads large 64-bit
* attributes because the vertex shader isn't required to consume all of
* them and they might be overwritten. This isn't the most optimal solution
* but 64-bit vertex attributes are rarely used.
*/
if (is_last_attr_large)
wait_for_vmem_loads(bld);
bld.sop1(aco_opcode::s_setpc_b64, continue_pc);
program->config->float_mode = program->blocks[0].fp_mode.val;

View file

@ -1329,7 +1329,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xb0000,
.userq_num_slots = 2,
},
.hw_ip_compute = {
.hw_ip_version_major = 11,
@ -1339,7 +1338,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xb0000,
.userq_num_slots = 16,
},
.fw_gfx_me = {
.ver = 1486,
@ -1460,7 +1458,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xb0002,
.userq_num_slots = 0x0,
},
.hw_ip_compute = {
.hw_ip_version_major = 11,
@ -1470,7 +1467,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xb0002,
.userq_num_slots = 0x0,
},
.fw_gfx_me = {
.ver = 2390,
@ -2070,7 +2066,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xb0500,
.userq_num_slots = 2,
},
.hw_ip_compute = {
.hw_ip_version_major = 11,
@ -2080,7 +2075,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xb0500,
.userq_num_slots = 16,
},
.fw_gfx_me = {
.ver = 29,
@ -2201,7 +2195,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0x1,
.ip_discovery_version = 0xc0001,
.userq_num_slots = 8,
},
.hw_ip_compute = {
.hw_ip_version_major = 12,
@ -2211,7 +2204,6 @@ const struct amdgpu_device amdgpu_devices[] = {
.ib_size_alignment = 32,
.available_rings = 0xf,
.ip_discovery_version = 0xc0001,
.userq_num_slots = 8,
},
.fw_gfx_me = {
.ver = 2590,

View file

@ -379,7 +379,6 @@ amdgpu_dump_hw_ips(int fd)
printf(" .ib_size_alignment = %u,\n", info.ib_size_alignment);
printf(" .available_rings = 0x%x,\n", info.available_rings);
printf(" .ip_discovery_version = 0x%04x,\n", info.ip_discovery_version);
printf(" .userq_num_slots = 0x%x,\n", info.userq_num_slots);
printf("},\n");
}
}

View file

@ -0,0 +1,35 @@
/*
* Copyright © 2025 Valve Corporation
*
* SPDX-License-Identifier: MIT
*/
#include "radv_device.h"
#include "radv_entrypoints.h"
#include "radv_image_view.h"
VKAPI_ATTR VkResult VKAPI_CALL
no_mans_sky_CreateImageView(VkDevice _device, const VkImageViewCreateInfo *pCreateInfo,
const VkAllocationCallbacks *pAllocator, VkImageView *pView)
{
VK_FROM_HANDLE(radv_device, device, _device);
VkResult result;
result = device->layer_dispatch.app.CreateImageView(_device, pCreateInfo, pAllocator, pView);
if (result != VK_SUCCESS)
return result;
VK_FROM_HANDLE(radv_image_view, iview, *pView);
if ((iview->vk.aspects == (VK_IMAGE_ASPECT_DEPTH_BIT | VK_IMAGE_ASPECT_STENCIL_BIT)) &&
(iview->vk.usage &
(VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT))) {
/* No Man's Sky creates descriptors with depth/stencil aspects (only when Intel XESS is
* enabled apparently). and this is illegal in Vulkan. Ignore them by using NULL descriptors
* to workaroud GPU hangs.
*/
memset(&iview->descriptor, 0, sizeof(iview->descriptor));
}
return result;
}

View file

@ -21,6 +21,7 @@ radv_entrypoints_gen_command += [
'--device-prefix', 'metro_exodus',
'--device-prefix', 'rage2',
'--device-prefix', 'quantic_dream',
'--device-prefix', 'no_mans_sky',
# Command buffer annotation layer entrypoints
'--device-prefix', 'annotate',
@ -40,6 +41,7 @@ libradv_files = files(
'layers/radv_metro_exodus.c',
'layers/radv_rage2.c',
'layers/radv_quantic_dream.c',
'layers/radv_no_mans_sky.c',
'layers/radv_rmv_layer.c',
'layers/radv_rra_layer.c',
'layers/radv_sqtt_layer.c',

View file

@ -6111,6 +6111,13 @@ radv_emit_tess_domain_origin_state(struct radv_cmd_buffer *cmd_buffer)
radeon_end();
}
static bool
radv_is_dual_src_enabled(const struct radv_dynamic_state *dynamic_state)
{
/* Dual-source blending must be ignored if blending isn't enabled for MRT0. */
return dynamic_state->blend_eq.mrt0_is_dual_src && !!(dynamic_state->color_blend_enable & 1u);
}
static struct radv_shader_part *
lookup_ps_epilog(struct radv_cmd_buffer *cmd_buffer)
{
@ -6144,7 +6151,7 @@ lookup_ps_epilog(struct radv_cmd_buffer *cmd_buffer)
state.color_write_mask = d->color_write_mask;
state.color_blend_enable = d->color_blend_enable;
state.mrt0_is_dual_src = d->blend_eq.mrt0_is_dual_src;
state.mrt0_is_dual_src = radv_is_dual_src_enabled(&cmd_buffer->state.dynamic);
if (d->vk.ms.alpha_to_coverage_enable) {
/* Select a color export format with alpha when alpha to coverage is enabled. */
@ -8114,6 +8121,8 @@ radv_mark_descriptors_dirty(struct radv_cmd_buffer *cmd_buffer, VkPipelineBindPo
struct radv_descriptor_state *descriptors_state = radv_get_descriptors_state(cmd_buffer, bind_point);
descriptors_state->dirty |= descriptors_state->valid;
if (descriptors_state->dynamic_offset_count)
descriptors_state->dirty_dynamic = true;
}
static void
@ -8642,7 +8651,6 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
if (cmd_buffer->state.compute_pipeline == compute_pipeline)
return;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
radv_bind_shader(cmd_buffer, compute_pipeline->base.shaders[MESA_SHADER_COMPUTE], MESA_SHADER_COMPUTE);
@ -8656,7 +8664,6 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
if (cmd_buffer->state.rt_pipeline == rt_pipeline)
return;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
radv_bind_shader(cmd_buffer, rt_pipeline->base.base.shaders[MESA_SHADER_INTERSECTION], MESA_SHADER_INTERSECTION);
radv_bind_rt_prolog(cmd_buffer, rt_pipeline->prolog);
@ -8690,7 +8697,6 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
if (cmd_buffer->state.graphics_pipeline == graphics_pipeline)
return;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
radv_foreach_stage (
stage, (cmd_buffer->state.active_stages | graphics_pipeline->active_stages) & RADV_GRAPHICS_STAGE_BITS) {
@ -8744,6 +8750,8 @@ radv_CmdBindPipeline(VkCommandBuffer commandBuffer, VkPipelineBindPoint pipeline
cmd_buffer->descriptors[vk_to_bind_point(pipelineBindPoint)].dynamic_offset_count = pipeline->dynamic_offset_count;
cmd_buffer->descriptors[vk_to_bind_point(pipelineBindPoint)].need_indirect_descriptors =
pipeline->need_indirect_descriptors;
radv_mark_descriptors_dirty(cmd_buffer, pipelineBindPoint);
}
VKAPI_ATTR void VKAPI_CALL
@ -11688,7 +11696,7 @@ radv_emit_cb_render_state(struct radv_cmd_buffer *cmd_buffer)
const struct radv_rendering_state *render = &cmd_buffer->state.render;
const struct radv_dynamic_state *d = &cmd_buffer->state.dynamic;
unsigned cb_blend_control[MAX_RTS], sx_mrt_blend_opt[MAX_RTS];
const bool mrt0_is_dual_src = d->blend_eq.mrt0_is_dual_src;
const bool mrt0_is_dual_src = radv_is_dual_src_enabled(&cmd_buffer->state.dynamic);
uint32_t cb_color_control = 0;
const uint32_t cb_target_mask = d->color_write_enable & d->color_write_mask;

View file

@ -792,6 +792,8 @@ init_dispatch_tables(struct radv_device *device, struct radv_physical_device *pd
add_entrypoints(&b, &rage2_device_entrypoints, RADV_APP_DISPATCH_TABLE);
} else if (!strcmp(instance->drirc.debug.app_layer, "quanticdream")) {
add_entrypoints(&b, &quantic_dream_device_entrypoints, RADV_APP_DISPATCH_TABLE);
} else if (!strcmp(instance->drirc.debug.app_layer, "no_mans_sky")) {
add_entrypoints(&b, &no_mans_sky_device_entrypoints, RADV_APP_DISPATCH_TABLE);
}
if (instance->vk.trace_mode & RADV_TRACE_MODE_RGP)

View file

@ -173,6 +173,7 @@ static const driOptionDescription radv_dri_options[] = {
DRI_CONF_VK_LOWER_TERMINATE_TO_DISCARD(false)
DRI_CONF_VK_WSI_FORCE_BGRA8_UNORM_FIRST(false)
DRI_CONF_VK_WSI_FORCE_SWAPCHAIN_TO_CURRENT_EXTENT(false)
DRI_CONF_VK_WSI_DISABLE_UNORDERED_SUBMITS(false)
DRI_CONF_VK_X11_IGNORE_SUBOPTIMAL(false)
DRI_CONF_VK_REQUIRE_ETC2(false)
DRI_CONF_VK_REQUIRE_ASTC(false)
@ -200,6 +201,7 @@ static const driOptionDescription radv_dri_options[] = {
DRI_CONF_RADV_EMULATE_RT(false)
DRI_CONF_RADV_ENABLE_FLOAT16_GFX8(false)
DRI_CONF_RADV_COOPERATIVE_MATRIX2_NV(false)
DRI_CONF_RADV_NO_IMPLICIT_VARYING_SUBGROUP_SIZE(false)
DRI_CONF_SECTION_END
};
// clang-format on
@ -236,6 +238,8 @@ radv_init_dri_debug_options(struct radv_instance *instance)
drirc->debug.ssbo_non_uniform = driQueryOptionb(&drirc->options, "radv_ssbo_non_uniform");
drirc->debug.tex_non_uniform = driQueryOptionb(&drirc->options, "radv_tex_non_uniform");
drirc->debug.zero_vram = driQueryOptionb(&drirc->options, "radv_zero_vram");
drirc->debug.no_implicit_varying_subgroup_size =
driQueryOptionb(&drirc->options, "radv_no_implicit_varying_subgroup_size");
drirc->debug.app_layer = driQueryOptionstr(&drirc->options, "radv_app_layer");
drirc->debug.override_uniform_offset_alignment =

View file

@ -57,6 +57,7 @@ struct radv_drirc {
bool ssbo_non_uniform;
bool tex_non_uniform;
bool zero_vram;
bool no_implicit_varying_subgroup_size;
char *app_layer;
int override_uniform_offset_alignment;
} debug;

View file

@ -252,6 +252,7 @@ radv_physical_device_init_cache_key(struct radv_physical_device *pdev)
key->use_llvm = pdev->use_llvm;
key->use_ngg = pdev->use_ngg;
key->use_ngg_culling = pdev->use_ngg_culling;
key->no_implicit_varying_subgroup_size = instance->drirc.debug.no_implicit_varying_subgroup_size;
}
static int

View file

@ -64,8 +64,9 @@ struct radv_physical_device_cache_key {
uint32_t use_llvm : 1;
uint32_t use_ngg : 1;
uint32_t use_ngg_culling : 1;
uint32_t no_implicit_varying_subgroup_size : 1;
uint32_t reserved : 10;
uint32_t reserved : 9;
};
enum radv_video_enc_hw_ver {

View file

@ -1247,9 +1247,13 @@ radv_pipeline_report_pso_history(const struct radv_device *device, struct radv_p
case RADV_PIPELINE_RAY_TRACING: {
struct radv_ray_tracing_pipeline *rt_pipeline = radv_pipeline_to_ray_tracing(pipeline);
if (rt_pipeline->prolog)
radv_print_pso_history(pipeline, rt_pipeline->prolog, output);
for (uint32_t i = 0; i < rt_pipeline->stage_count; i++) {
if (pipeline->shaders[MESA_SHADER_INTERSECTION])
radv_print_pso_history(pipeline, pipeline->shaders[MESA_SHADER_INTERSECTION], output);
for (uint32_t i = 0; i < rt_pipeline->non_imported_stage_count; i++) {
const struct radv_shader *shader = rt_pipeline->stages[i].shader;
if (shader)

View file

@ -2383,8 +2383,9 @@ radv_GetQueryPoolResults(VkDevice _device, VkQueryPool queryPool, uint32_t first
break;
}
case VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR: {
const bool write_memory = radv_video_write_memory_supported(pdev) == RADV_VIDEO_WRITE_MEMORY_SUPPORT_FULL;
uint32_t *src32 = (uint32_t *)src;
uint32_t ready_idx = radv_video_write_memory_supported(pdev) ? RADV_ENC_FEEDBACK_STATUS_IDX : 1;
uint32_t ready_idx = write_memory ? RADV_ENC_FEEDBACK_STATUS_IDX : 1;
uint32_t value;
do {
value = p_atomic_read(&src32[ready_idx]);

View file

@ -367,6 +367,10 @@ radv_shader_choose_subgroup_size(struct radv_device *device, nir_shader *nir,
.requiredSubgroupSize = stage_key->subgroup_required_size * 32,
};
/* Do not allow for the SPIR-V 1.6 varying subgroup size rules. */
if (pdev->cache_key.no_implicit_varying_subgroup_size)
spirv_version = 0x10000;
vk_set_subgroup_size(&device->vk, nir, spirv_version, rss_info.requiredSubgroupSize ? &rss_info : NULL,
stage_key->subgroup_allow_varying, stage_key->subgroup_require_full);

View file

@ -191,6 +191,11 @@ declare_vs_input_vgprs(enum amd_gfx_level gfx_level, const struct radv_shader_in
unsigned num_attributes = util_last_bit(info->vs.input_slot_usage_mask);
for (unsigned i = 0; i < num_attributes; i++) {
ac_add_arg(&args->ac, AC_ARG_VGPR, 4, AC_ARG_VALUE, &args->vs_inputs[i]);
/* The vertex shader isn't required to consume all components that are loaded by the prolog
* and it's possible that more VGPRs are written. This specific case is handled at the end
* of the prolog which waits for all pending VMEM loads if needed.
*/
args->ac.args[args->vs_inputs[i].arg_index].pending_vmem = true;
}
}

View file

@ -508,7 +508,9 @@ radv_begin_sqtt(struct radv_queue *queue)
device->sqtt.start_cs[family] = NULL;
}
cs.b = ws->cs_create(ws, radv_queue_ring(queue), false);
radv_init_cmd_stream(&cs, radv_queue_ring(queue));
cs.b = ws->cs_create(ws, cs.hw_ip, false);
if (!cs.b)
return false;
@ -585,7 +587,9 @@ radv_end_sqtt(struct radv_queue *queue)
device->sqtt.stop_cs[family] = NULL;
}
cs.b = ws->cs_create(ws, radv_queue_ring(queue), false);
radv_init_cmd_stream(&cs, radv_queue_ring(queue));
cs.b = ws->cs_create(ws, cs.hw_ip, false);
if (!cs.b)
return false;

View file

@ -149,10 +149,16 @@ radv_vcn_write_memory(struct radv_cmd_buffer *cmd_buffer, uint64_t va, unsigned
struct radv_physical_device *pdev = radv_device_physical(device);
struct rvcn_sq_var sq;
struct radv_cmd_stream *cs = cmd_buffer->cs;
enum radv_video_write_memory_support support = radv_video_write_memory_supported(pdev);
if (!radv_video_write_memory_supported(pdev))
if (support == RADV_VIDEO_WRITE_MEMORY_SUPPORT_NONE)
return;
if (support == RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS) {
fprintf(stderr, "radv: VCN WRITE_MEMORY requires PCIe atomics support. Expect issues "
"if PCIe atomics are not enabled on current device.\n");
}
bool separate_queue = pdev->vid_decode_ip != AMD_IP_VCN_UNIFIED;
if (cmd_buffer->qf == RADV_QUEUE_VIDEO_DEC && separate_queue && pdev->vid_dec_reg.data2) {
radeon_check_space(device->ws, cs->b, 8);
@ -819,6 +825,32 @@ radv_GetPhysicalDeviceVideoCapabilitiesKHR(VkPhysicalDevice physicalDevice, cons
if (cap && !cap->valid)
cap = NULL;
if (cap) {
pCapabilities->maxCodedExtent.width = cap->max_width;
pCapabilities->maxCodedExtent.height = cap->max_height;
} else {
switch (pVideoProfile->videoCodecOperation) {
case VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR:
pCapabilities->maxCodedExtent.width = (pdev->info.family < CHIP_TONGA) ? 2048 : 4096;
pCapabilities->maxCodedExtent.height = (pdev->info.family < CHIP_TONGA) ? 1152 : 4096;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_VP9_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
default:
break;
}
}
pCapabilities->flags = 0;
pCapabilities->pictureAccessGranularity.width = VK_VIDEO_H264_MACROBLOCK_WIDTH;
pCapabilities->pictureAccessGranularity.height = VK_VIDEO_H264_MACROBLOCK_HEIGHT;
@ -1126,32 +1158,6 @@ radv_GetPhysicalDeviceVideoCapabilitiesKHR(VkPhysicalDevice physicalDevice, cons
break;
}
if (cap) {
pCapabilities->maxCodedExtent.width = cap->max_width;
pCapabilities->maxCodedExtent.height = cap->max_height;
} else {
switch (pVideoProfile->videoCodecOperation) {
case VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR:
pCapabilities->maxCodedExtent.width = (pdev->info.family < CHIP_TONGA) ? 2048 : 4096;
pCapabilities->maxCodedExtent.height = (pdev->info.family < CHIP_TONGA) ? 1152 : 4096;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
case VK_VIDEO_CODEC_OPERATION_DECODE_VP9_BIT_KHR:
pCapabilities->maxCodedExtent.width =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 2048 : 4096) : 8192;
pCapabilities->maxCodedExtent.height =
(pdev->info.family < CHIP_RENOIR) ? ((pdev->info.family < CHIP_TONGA) ? 1152 : 4096) : 4352;
break;
default:
break;
}
}
return VK_SUCCESS;
}
@ -1746,8 +1752,10 @@ get_h265_msg(struct radv_device *device, struct radv_video_session *vid, struct
result.bit_depth_luma_minus8 = sps->bit_depth_luma_minus8;
result.bit_depth_chroma_minus8 = sps->bit_depth_chroma_minus8;
result.log2_max_pic_order_cnt_lsb_minus4 = sps->log2_max_pic_order_cnt_lsb_minus4;
if (sps->pDecPicBufMgr) {
result.sps_max_dec_pic_buffering_minus1 =
sps->pDecPicBufMgr->max_dec_pic_buffering_minus1[sps->sps_max_sub_layers_minus1];
}
result.log2_min_luma_coding_block_size_minus3 = sps->log2_min_luma_coding_block_size_minus3;
result.log2_diff_max_min_luma_coding_block_size = sps->log2_diff_max_min_luma_coding_block_size;
result.log2_min_transform_block_size_minus2 = sps->log2_min_luma_transform_block_size_minus2;
@ -1870,8 +1878,7 @@ get_vp9_msg(struct radv_device *device, struct radv_video_session *vid, struct v
memset(&result, 0, sizeof(result));
rvcn_dec_vp9_probs_segment_t *prbs = (rvcn_dec_vp9_probs_segment_t *)(probs_ptr);
if (std_pic_info->flags.segmentation_enabled) {
if (std_pic_info->flags.segmentation_enabled && std_pic_info->pSegmentation) {
for (unsigned i = 0; i < 8; ++i) {
prbs->seg.feature_data[i] = (uint16_t)std_pic_info->pSegmentation->FeatureData[i][0] |
((uint32_t)(std_pic_info->pSegmentation->FeatureData[i][1] & 0xff) << 16) |
@ -1912,12 +1919,12 @@ get_vp9_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.frame_header_flags |=
(std_pic_info->flags.refresh_frame_context << RDECODE_FRAME_HDR_INFO_VP9_REFRESH_FRAME_CONTEXT_SHIFT) &
RDECODE_FRAME_HDR_INFO_VP9_REFRESH_FRAME_CONTEXT_MASK;
if (std_pic_info->flags.segmentation_enabled) {
assert(std_pic_info->pSegmentation);
result.frame_header_flags |=
(std_pic_info->flags.segmentation_enabled << RDECODE_FRAME_HDR_INFO_VP9_SEGMENTATION_ENABLED_SHIFT) &
RDECODE_FRAME_HDR_INFO_VP9_SEGMENTATION_ENABLED_MASK;
if (std_pic_info->flags.segmentation_enabled && std_pic_info->pSegmentation) {
result.frame_header_flags |= (std_pic_info->pSegmentation->flags.segmentation_update_map
<< RDECODE_FRAME_HDR_INFO_VP9_SEGMENTATION_UPDATE_MAP_SHIFT) &
RDECODE_FRAME_HDR_INFO_VP9_SEGMENTATION_UPDATE_MAP_MASK;
@ -1930,6 +1937,8 @@ get_vp9_msg(struct radv_device *device, struct radv_video_session *vid, struct v
<< RDECODE_FRAME_HDR_INFO_VP9_SEGMENTATION_UPDATE_DATA_SHIFT) &
RDECODE_FRAME_HDR_INFO_VP9_SEGMENTATION_UPDATE_DATA_MASK;
}
if (std_pic_info->pLoopFilter) {
result.frame_header_flags |= (std_pic_info->pLoopFilter->flags.loop_filter_delta_enabled
<< RDECODE_FRAME_HDR_INFO_VP9_MODE_REF_DELTA_ENABLED_SHIFT) &
RDECODE_FRAME_HDR_INFO_VP9_MODE_REF_DELTA_ENABLED_MASK;
@ -1937,6 +1946,7 @@ get_vp9_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.frame_header_flags |= (std_pic_info->pLoopFilter->flags.loop_filter_delta_update
<< RDECODE_FRAME_HDR_INFO_VP9_MODE_REF_DELTA_UPDATE_SHIFT) &
RDECODE_FRAME_HDR_INFO_VP9_MODE_REF_DELTA_UPDATE_MASK;
}
result.frame_header_flags |=
(std_pic_info->flags.UsePrevFrameMvs << RDECODE_FRAME_HDR_INFO_VP9_USE_PREV_IN_FIND_MV_REFS_SHIFT) &
@ -1949,26 +1959,31 @@ get_vp9_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.frame_context_idx = std_pic_info->frame_context_idx;
result.reset_frame_context = std_pic_info->reset_frame_context;
uint8_t loop_filter_level = 0;
if (std_pic_info->pLoopFilter) {
loop_filter_level = std_pic_info->pLoopFilter->loop_filter_level;
result.filter_level = std_pic_info->pLoopFilter->loop_filter_level;
result.sharpness_level = std_pic_info->pLoopFilter->loop_filter_sharpness;
}
int shifted = std_pic_info->pLoopFilter->loop_filter_level >= 32;
int shifted = loop_filter_level >= 32;
for (int i = 0; i < (std_pic_info->flags.segmentation_enabled ? 8 : 1); i++) {
const uint8_t seg_lvl_alt_l = 1;
uint8_t lvl;
if (std_pic_info->flags.segmentation_enabled &&
if (std_pic_info->flags.segmentation_enabled && std_pic_info->pSegmentation &&
std_pic_info->pSegmentation->FeatureEnabled[i] & (1 << seg_lvl_alt_l)) {
lvl = std_pic_info->pSegmentation->FeatureData[i][seg_lvl_alt_l];
if (!std_pic_info->pSegmentation->flags.segmentation_abs_or_delta_update)
lvl += std_pic_info->pLoopFilter->loop_filter_level;
lvl += loop_filter_level;
lvl = CLAMP(lvl, 0, 63);
} else {
lvl = std_pic_info->pLoopFilter->loop_filter_level;
lvl = loop_filter_level;
}
if (std_pic_info->pLoopFilter->flags.loop_filter_delta_enabled) {
if (std_pic_info->pLoopFilter && std_pic_info->pLoopFilter->flags.loop_filter_delta_enabled) {
result.lf_adj_level[i][0][0] = result.lf_adj_level[i][0][1] =
CLAMP(lvl + (std_pic_info->pLoopFilter->loop_filter_ref_deltas[0] * (1 << shifted)), 0, 63);
for (int j = 1; j < 4; j++) {
@ -1995,6 +2010,7 @@ get_vp9_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.log2_tile_rows = std_pic_info->tile_rows_log2;
result.chroma_format = 1;
if (std_pic_info->pColorConfig)
result.bit_depth_luma_minus8 = result.bit_depth_chroma_minus8 = (std_pic_info->pColorConfig->BitDepth - 8);
result.vp9_frame_size = vp9_pic_info->uncompressedHeaderOffset;
@ -2082,16 +2098,20 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
(pi->flags.allow_high_precision_mv << RDECODE_FRAME_HDR_INFO_AV1_ALLOW_HIGH_PRECISION_MV_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_ALLOW_HIGH_PRECISION_MV_MASK;
if (seq_hdr->pColorConfig) {
result.frame_header_flags |=
(seq_hdr->pColorConfig->flags.mono_chrome << RDECODE_FRAME_HDR_INFO_AV1_MONOCHROME_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_MONOCHROME_MASK;
}
result.frame_header_flags |= (pi->flags.skip_mode_present << RDECODE_FRAME_HDR_INFO_AV1_SKIP_MODE_FLAG_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_SKIP_MODE_FLAG_MASK;
if (pi->pQuantization) {
result.frame_header_flags |=
(pi->pQuantization->flags.using_qmatrix << RDECODE_FRAME_HDR_INFO_AV1_USING_QMATRIX_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_USING_QMATRIX_MASK;
}
result.frame_header_flags |=
(seq_hdr->flags.enable_filter_intra << RDECODE_FRAME_HDR_INFO_AV1_ENABLE_FILTER_INTRA_SHIFT) &
@ -2135,6 +2155,7 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
(pi->flags.force_integer_mv << RDECODE_FRAME_HDR_INFO_AV1_CUR_FRAME_FORCE_INTEGER_MV_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_CUR_FRAME_FORCE_INTEGER_MV_MASK;
if (pi->pLoopFilter) {
result.frame_header_flags |=
(pi->pLoopFilter->flags.loop_filter_delta_enabled << RDECODE_FRAME_HDR_INFO_AV1_MODE_REF_DELTA_ENABLED_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_MODE_REF_DELTA_ENABLED_MASK;
@ -2142,6 +2163,7 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.frame_header_flags |=
(pi->pLoopFilter->flags.loop_filter_delta_update << RDECODE_FRAME_HDR_INFO_AV1_MODE_REF_DELTA_UPDATE_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_MODE_REF_DELTA_UPDATE_MASK;
}
result.frame_header_flags |= (pi->flags.delta_q_present << RDECODE_FRAME_HDR_INFO_AV1_DELTA_Q_PRESENT_FLAG_SHIFT) &
RDECODE_FRAME_HDR_INFO_AV1_DELTA_Q_PRESENT_FLAG_MASK;
@ -2201,6 +2223,8 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.sb_size = seq_hdr->flags.use_128x128_superblock;
result.interp_filter = pi->interpolation_filter;
if (pi->pLoopFilter) {
for (i = 0; i < 2; ++i)
result.filter_level[i] = pi->pLoopFilter->loop_filter_level[i];
result.filter_level_u = pi->pLoopFilter->loop_filter_level[2];
@ -2210,6 +2234,13 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.ref_deltas[i] = pi->pLoopFilter->loop_filter_ref_deltas[i];
for (i = 0; i < 2; ++i)
result.mode_deltas[i] = pi->pLoopFilter->loop_filter_mode_deltas[i];
}
result.qm_y = 0xff;
result.qm_u = 0xff;
result.qm_v = 0xff;
if (pi->pQuantization) {
result.base_qindex = pi->pQuantization->base_q_idx;
result.y_dc_delta_q = pi->pQuantization->DeltaQYDc;
result.u_dc_delta_q = pi->pQuantization->DeltaQUDc;
@ -2221,19 +2252,18 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.qm_y = pi->pQuantization->qm_y | 0xf0;
result.qm_u = pi->pQuantization->qm_u | 0xf0;
result.qm_v = pi->pQuantization->qm_v | 0xf0;
} else {
result.qm_y = 0xff;
result.qm_u = 0xff;
result.qm_v = 0xff;
}
}
result.delta_q_res = (1 << pi->delta_q_res);
result.delta_lf_res = (1 << pi->delta_lf_res);
result.tile_cols = pi->pTileInfo->TileCols;
result.tile_rows = pi->pTileInfo->TileRows;
result.tx_mode = pi->TxMode;
result.reference_mode = (pi->flags.reference_select == 1) ? 2 : 0;
result.chroma_format = seq_hdr->pColorConfig->flags.mono_chrome ? 0 : 1;
if (pi->pTileInfo) {
result.tile_cols = pi->pTileInfo->TileCols;
result.tile_rows = pi->pTileInfo->TileRows;
result.tile_size_bytes = pi->pTileInfo->tile_size_bytes_minus_1;
result.context_update_tile_id = pi->pTileInfo->context_update_tile_id;
@ -2245,6 +2275,7 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
result.tile_row_start_sb[i] = pi->pTileInfo->pMiRowStarts[i];
result.tile_row_start_sb[result.tile_rows] =
result.tile_row_start_sb[result.tile_rows - 1] + pi->pTileInfo->pHeightInSbsMinus1[result.tile_rows - 1] + 1;
}
result.max_width = seq_hdr->max_frame_width_minus_1 + 1;
result.max_height = seq_hdr->max_frame_height_minus_1 + 1;
@ -2294,8 +2325,7 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
av1_pic_info->referenceNameSlotIndices[i] == -1 ? 0x7f : av1_pic_info->referenceNameSlotIndices[i];
}
result.bit_depth_luma_minus8 = result.bit_depth_chroma_minus8 = seq_hdr->pColorConfig->BitDepth - 8;
if (pi->pSegmentation) {
int16_t *feature_data = (int16_t *)probs_ptr;
int fd_idx = 0;
for (i = 0; i < 8; ++i) {
@ -2305,14 +2335,17 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
feature_data[fd_idx++] = result.feature_data[i][j];
}
}
memcpy(((char *)probs_ptr + 128), result.feature_mask, 8);
}
if (pi->pCDEF) {
result.cdef_damping = pi->pCDEF->cdef_damping_minus_3 + 3;
result.cdef_bits = pi->pCDEF->cdef_bits;
for (i = 0; i < 8; ++i) {
result.cdef_strengths[i] = (pi->pCDEF->cdef_y_pri_strength[i] << 2) + pi->pCDEF->cdef_y_sec_strength[i];
result.cdef_uv_strengths[i] = (pi->pCDEF->cdef_uv_pri_strength[i] << 2) + pi->pCDEF->cdef_uv_sec_strength[i];
}
}
if (pi->flags.UsesLr) {
for (int plane = 0; plane < STD_VIDEO_AV1_MAX_NUM_PLANES; plane++) {
@ -2321,10 +2354,14 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
}
}
if (seq_hdr->pColorConfig) {
result.chroma_format = seq_hdr->pColorConfig->flags.mono_chrome ? 0 : 1;
result.bit_depth_luma_minus8 = result.bit_depth_chroma_minus8 = seq_hdr->pColorConfig->BitDepth - 8;
if (seq_hdr->pColorConfig->BitDepth > 8) {
result.p010_mode = 1;
result.msb_mode = 1;
}
}
result.preskip_segid = 0;
result.last_active_segid = 0;
@ -2355,7 +2392,7 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
rvcn_dec_film_grain_params_t *fg_params = &result.film_grain;
fg_params->apply_grain = pi->flags.apply_grain;
if (fg_params->apply_grain) {
if (fg_params->apply_grain && pi->pFilmGrain) {
rvcn_dec_av1_fg_init_buf_t *fg_buf = (rvcn_dec_av1_fg_init_buf_t *)((char *)probs_ptr + 256);
fg_params->random_seed = pi->pFilmGrain->grain_seed;
fg_params->grain_scale_shift = pi->pFilmGrain->grain_scale_shift;
@ -2401,11 +2438,13 @@ get_av1_msg(struct radv_device *device, struct radv_video_session *vid, struct v
}
result.uncompressed_header_size = 0;
if (pi->pGlobalMotion) {
for (i = 0; i < STD_VIDEO_AV1_NUM_REF_FRAMES; ++i) {
result.global_motion[i].wmtype = pi->pGlobalMotion->GmType[i];
for (j = 0; j < STD_VIDEO_AV1_GLOBAL_MOTION_PARAMS; ++j)
result.global_motion[i].wmmat[j] = pi->pGlobalMotion->gm_params[i][j];
}
}
for (i = 0; i < av1_pic_info->tileCount && i < 256; ++i) {
result.tile_info[i].offset = av1_pic_info->pTileOffsets[i];
result.tile_info[i].size = av1_pic_info->pTileSizes[i];
@ -2671,8 +2710,8 @@ rvcn_dec_message_decode(struct radv_cmd_buffer *cmd_buffer, struct radv_video_se
* It will not perform any actual writes to these dummy slots.
*/
for (int i = 0; i < STD_VIDEO_AV1_NUM_REF_FRAMES; i++) {
dynamic_dpb_t2->dpbAddrHi[i] = addr;
dynamic_dpb_t2->dpbAddrLo[i] = addr >> 32;
dynamic_dpb_t2->dpbAddrLo[i] = addr;
dynamic_dpb_t2->dpbAddrHi[i] = addr >> 32;
}
}
@ -2918,8 +2957,10 @@ get_uvd_h265_msg(struct radv_device *device, struct radv_video_session *vid, str
result.bit_depth_luma_minus8 = sps->bit_depth_luma_minus8;
result.bit_depth_chroma_minus8 = sps->bit_depth_chroma_minus8;
result.log2_max_pic_order_cnt_lsb_minus4 = sps->log2_max_pic_order_cnt_lsb_minus4;
if (sps->pDecPicBufMgr) {
result.sps_max_dec_pic_buffering_minus1 =
sps->pDecPicBufMgr->max_dec_pic_buffering_minus1[sps->sps_max_sub_layers_minus1];
}
result.log2_min_luma_coding_block_size_minus3 = sps->log2_min_luma_coding_block_size_minus3;
result.log2_diff_max_min_luma_coding_block_size = sps->log2_diff_max_min_luma_coding_block_size;
result.log2_min_transform_block_size_minus2 = sps->log2_min_luma_transform_block_size_minus2;

View file

@ -73,6 +73,19 @@ struct radv_video_session {
bool session_initialized;
};
/**
* WRITE_MEMORY support in FW.
*
* none: Not supported at all. Old VCN FW and all UVD.
* pcie_atomics: Supported, relies on PCIe atomics.
* full: Supported, works also without PCIe atomics.
*/
enum radv_video_write_memory_support {
RADV_VIDEO_WRITE_MEMORY_SUPPORT_NONE = 0,
RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS,
RADV_VIDEO_WRITE_MEMORY_SUPPORT_FULL,
};
VK_DEFINE_NONDISP_HANDLE_CASTS(radv_video_session, vk.base, VkVideoSessionKHR, VK_OBJECT_TYPE_VIDEO_SESSION_KHR)
void radv_init_physical_device_decoder(struct radv_physical_device *pdev);
@ -98,7 +111,7 @@ void radv_video_get_enc_dpb_image(struct radv_device *device, const struct VkVid
bool radv_video_decode_vp9_supported(const struct radv_physical_device *pdev);
bool radv_video_encode_av1_supported(const struct radv_physical_device *pdev);
bool radv_video_encode_qp_map_supported(const struct radv_physical_device *pdev);
bool radv_video_write_memory_supported(const struct radv_physical_device *pdev);
enum radv_video_write_memory_support radv_video_write_memory_supported(const struct radv_physical_device *pdev);
uint32_t radv_video_get_qp_map_texel_size(VkVideoCodecOperationFlagBitsKHR codec);
bool radv_check_vcn_fw_version(const struct radv_physical_device *pdev, uint32_t dec, uint32_t enc, uint32_t rev);

View file

@ -41,7 +41,7 @@
#define ENC_ALIGNMENT 256
#define RENCODE_V5_FW_INTERFACE_MAJOR_VERSION 1
#define RENCODE_V5_FW_INTERFACE_MINOR_VERSION 3
#define RENCODE_V5_FW_INTERFACE_MINOR_VERSION 10
#define RENCODE_V4_FW_INTERFACE_MAJOR_VERSION 1
#define RENCODE_V4_FW_INTERFACE_MINOR_VERSION 11
@ -67,31 +67,6 @@ radv_probe_video_encode(struct radv_physical_device *pdev)
if (instance->debug_flags & RADV_DEBUG_NO_VIDEO)
return;
if (pdev->info.vcn_ip_version >= VCN_5_0_0) {
pdev->video_encode_enabled = true;
return;
} else if (pdev->info.vcn_ip_version >= VCN_4_0_0) {
if (pdev->info.vcn_enc_major_version != RENCODE_V4_FW_INTERFACE_MAJOR_VERSION)
return;
if (pdev->info.vcn_enc_minor_version < RENCODE_V4_FW_INTERFACE_MINOR_VERSION)
return;
} else if (pdev->info.vcn_ip_version >= VCN_3_0_0) {
if (pdev->info.vcn_enc_major_version != RENCODE_V3_FW_INTERFACE_MAJOR_VERSION)
return;
if (pdev->info.vcn_enc_minor_version < RENCODE_V3_FW_INTERFACE_MINOR_VERSION)
return;
} else if (pdev->info.vcn_ip_version >= VCN_2_0_0) {
if (pdev->info.vcn_enc_major_version != RENCODE_V2_FW_INTERFACE_MAJOR_VERSION)
return;
if (pdev->info.vcn_enc_minor_version < RENCODE_V2_FW_INTERFACE_MINOR_VERSION)
return;
} else {
if (pdev->info.vcn_enc_major_version != RENCODE_FW_INTERFACE_MAJOR_VERSION)
return;
if (pdev->info.vcn_enc_minor_version < RENCODE_FW_INTERFACE_MINOR_VERSION)
return;
}
/* WRITE_MEMORY is needed for SetEvent and is required to pass CTS */
if (radv_video_write_memory_supported(pdev)) {
pdev->video_encode_enabled = true;
@ -495,10 +470,10 @@ radv_enc_session_init(struct radv_cmd_buffer *cmd_buffer, const struct VkVideoEn
if (pdev->enc_hw_ver >= RADV_VIDEO_ENC_HW_3)
RADEON_ENC_CS(vid->enc_session.slice_output_enabled);
RADEON_ENC_CS(vid->enc_session.display_remote);
if (pdev->enc_hw_ver == RADV_VIDEO_ENC_HW_4) {
if (pdev->enc_hw_ver == RADV_VIDEO_ENC_HW_4)
RADEON_ENC_CS(vid->enc_session.WA_flags);
if (pdev->enc_hw_ver >= RADV_VIDEO_ENC_HW_4)
RADEON_ENC_CS(0);
}
RADEON_ENC_END();
}
@ -890,7 +865,6 @@ radv_enc_slice_header(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
uint32_t num_bits[RENCODE_SLICE_HEADER_TEMPLATE_MAX_NUM_INSTRUCTIONS] = {0};
const struct VkVideoEncodeH264PictureInfoKHR *h264_picture_info =
vk_find_struct_const(enc_info->pNext, VIDEO_ENCODE_H264_PICTURE_INFO_KHR);
int slice_count = h264_picture_info->naluSliceEntryCount;
const StdVideoEncodeH264PictureInfo *pic = h264_picture_info->pStdPictureInfo;
const StdVideoH264SequenceParameterSet *sps =
vk_video_find_h264_enc_std_sps(cmd_buffer->video.params, pic->seq_parameter_set_id);
@ -903,8 +877,6 @@ radv_enc_slice_header(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
unsigned int cdw_filled = 0;
unsigned int bits_copied = 0;
assert(slice_count <= 1);
struct radv_device *device = radv_cmd_buffer_device(cmd_buffer);
const struct radv_physical_device *pdev = radv_device_physical(device);
struct radv_cmd_stream *cs = cmd_buffer->cs;
@ -1080,6 +1052,9 @@ radv_enc_hevc_st_ref_pic_set(struct radv_cmd_buffer *cmd_buffer, const StdVideoH
unsigned int num_short_term_ref_pic_sets = sps->num_short_term_ref_pic_sets;
unsigned int index = num_short_term_ref_pic_sets;
if (!rps)
return 0;
if (index != 0)
radv_enc_code_fixed_bits(cmd_buffer, rps->flags.inter_ref_pic_set_prediction_flag, 0x1);
@ -2275,6 +2250,7 @@ radv_enc_params_av1(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInfoK
RADEON_ENC_CS(av1_picture_info->referenceNameSlotIndices[i]);
RADEON_ENC_CS(slot_idx_0);
RADEON_ENC_CS(slot_idx_1);
RADEON_ENC_CS(av1_picture_info->pStdPictureInfo->order_hint);
RADEON_ENC_END();
}
@ -2792,7 +2768,7 @@ radv_vcn_encode_video(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
cmd_buffer->video.enc.total_task_size = 0;
// task info
radv_enc_task_info(cmd_buffer, true);
radv_enc_task_info(cmd_buffer, feedback_query_va);
if (vid->enc_need_begin) {
begin(cmd_buffer, enc_info);
@ -2861,6 +2837,7 @@ radv_vcn_encode_video(struct radv_cmd_buffer *cmd_buffer, const VkVideoEncodeInf
if (pdev->enc_hw_ver >= RADV_VIDEO_ENC_HW_2) {
radv_vcn_sq_tail(cs, &cmd_buffer->video.sq);
if (feedback_query_va && radv_video_write_memory_supported(pdev) == RADV_VIDEO_WRITE_MEMORY_SUPPORT_FULL)
radv_vcn_write_memory(cmd_buffer, feedback_query_va + RADV_ENC_FEEDBACK_STATUS_IDX * sizeof(uint32_t), 1);
}
}
@ -3166,6 +3143,36 @@ radv_video_patch_encode_session_parameters(struct radv_device *device, struct vk
}
break;
case VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR: {
for (unsigned i = 0; i < params->h265_enc.h265_sps_count; i++) {
uint32_t pic_width_in_luma_samples =
params->h265_enc.h265_sps[i].base.pic_width_in_luma_samples;
uint32_t pic_height_in_luma_samples =
params->h265_enc.h265_sps[i].base.pic_height_in_luma_samples;
uint32_t aligned_pic_width = align(pic_width_in_luma_samples, 64);
uint32_t aligned_pic_height = align(pic_height_in_luma_samples, 16);
/* Override the unaligned pic_{width,height} and make up for it with conformance window
* cropping */
params->h265_enc.h265_sps[i].base.pic_width_in_luma_samples = aligned_pic_width;
params->h265_enc.h265_sps[i].base.pic_height_in_luma_samples = aligned_pic_height;
if (aligned_pic_width != pic_width_in_luma_samples ||
aligned_pic_height != pic_height_in_luma_samples) {
params->h265_enc.h265_sps[i].base.flags.conformance_window_flag = 1;
params->h265_enc.h265_sps[i].base.conf_win_right_offset +=
(aligned_pic_width - pic_width_in_luma_samples) / 2;
params->h265_enc.h265_sps[i].base.conf_win_bottom_offset +=
(aligned_pic_height - pic_height_in_luma_samples) / 2;
}
/* VCN supports only the following block sizes (resulting in 64x64 CTBs with any coding
* block size) */
params->h265_enc.h265_sps[i].base.log2_min_luma_coding_block_size_minus3 = 0;
params->h265_enc.h265_sps[i].base.log2_diff_max_min_luma_coding_block_size = 3;
params->h265_enc.h265_sps[i].base.log2_min_luma_transform_block_size_minus2 = 0;
params->h265_enc.h265_sps[i].base.log2_diff_max_min_luma_transform_block_size = 3;
}
for (unsigned i = 0; i < params->h265_enc.h265_pps_count; i++) {
/* cu_qp_delta needs to be enabled if rate control is enabled. VCN2 and newer can also enable
* it with rate control disabled. Since we don't know what rate control will be used, we
@ -3268,6 +3275,14 @@ radv_GetEncodedVideoSessionParametersKHR(VkDevice device,
assert(sps);
char *data_ptr = pData ? (char *)pData + vps_size : NULL;
vk_video_encode_h265_sps(sps, size_limit, &sps_size, data_ptr);
if (pFeedbackInfo) {
struct VkVideoEncodeH265SessionParametersFeedbackInfoKHR *h265_feedback_info =
vk_find_struct(pFeedbackInfo->pNext, VIDEO_ENCODE_H265_SESSION_PARAMETERS_FEEDBACK_INFO_KHR);
pFeedbackInfo->hasOverrides = VK_TRUE;
if (h265_feedback_info)
h265_feedback_info->hasStdSPSOverrides = VK_TRUE;
}
}
if (h265_get_info->writeStdPPS) {
const StdVideoH265PictureParameterSet *pps = vk_video_find_h265_enc_std_pps(templ, h265_get_info->stdPPSId);
@ -3421,17 +3436,20 @@ radv_video_encode_qp_map_supported(const struct radv_physical_device *pdev)
return true;
}
bool
enum radv_video_write_memory_support
radv_video_write_memory_supported(const struct radv_physical_device *pdev)
{
if (pdev->info.vcn_ip_version >= VCN_5_0_0)
return true;
else if (pdev->info.vcn_ip_version >= VCN_4_0_0)
return pdev->info.vcn_enc_minor_version >= 22;
else if (pdev->info.vcn_ip_version >= VCN_3_0_0)
return pdev->info.vcn_enc_minor_version >= 33;
else if (pdev->info.vcn_ip_version >= VCN_2_0_0)
return pdev->info.vcn_enc_minor_version >= 24;
else /* VCN 1 and UVD */
return false;
if (pdev->info.vcn_ip_version >= VCN_5_0_0) {
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
} else if (pdev->info.vcn_ip_version >= VCN_4_0_0) {
if (pdev->info.vcn_enc_minor_version >= 22)
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
} else if (pdev->info.vcn_ip_version >= VCN_3_0_0) {
if (pdev->info.vcn_enc_minor_version >= 33)
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
} else if (pdev->info.vcn_ip_version >= VCN_2_0_0) {
if (pdev->info.vcn_enc_minor_version >= 24)
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_PCIE_ATOMICS;
}
return RADV_VIDEO_WRITE_MEMORY_SUPPORT_NONE;
}

View file

@ -158,6 +158,11 @@ radv_null_winsys_query_info(struct radeon_winsys *rws, struct radeon_info *gpu_i
gpu_info->family == CHIP_RAVEN2 || gpu_info->family == CHIP_RENOIR || gpu_info->gfx_level >= GFX10_3);
gpu_info->has_gang_submit = true;
gpu_info->mesh_fast_launch_2 = gpu_info->gfx_level >= GFX11;
gpu_info->hs_offchip_workgroup_dw_size = gpu_info->family == CHIP_HAWAII ? 4096 : 8192;
gpu_info->has_ls_vgpr_init_bug = gpu_info->family == CHIP_VEGA10 || gpu_info->family == CHIP_RAVEN;
gpu_info->has_graphics = true;
gpu_info->ip[AMD_IP_GFX].num_queues = 1;
gpu_info->gart_page_size = 4096;
}

File diff suppressed because it is too large Load diff

View file

@ -24,7 +24,8 @@ ail_initialize_linear(struct ail_layout *layout)
layout->layer_stride_B = align64(
(uint64_t)layout->linear_stride_B * layout->height_px, AIL_CACHELINE);
layout->size_B = layout->layer_stride_B * layout->depth_px;
layout->size_B =
layout->level_offsets_B[0] + (layout->layer_stride_B * layout->depth_px);
}
/*
@ -341,6 +342,7 @@ ail_make_miptree(struct ail_layout *layout)
assert(layout->linear_stride_B == 0 && "Invalid nonlinear layout");
assert(layout->levels >= 1 && "Invalid dimensions");
assert(layout->sample_count_sa >= 1 && "Invalid sample count");
assert(layout->level_offsets_B[0] == 0 && "Invalid offset");
}
assert(!(layout->writeable_image && layout->compressed) &&

View file

@ -133,6 +133,7 @@ agx_virtio_bo_bind(struct agx_device *dev, struct drm_asahi_gem_bind_op *ops,
memcpy(req->payload, ops, payload_size);
int ret = vdrm_send_req(dev->vdrm, &req->hdr, false);
free(req);
if (ret) {
fprintf(stderr, "ASAHI_CCMD_GEM_BIND failed: %d\n", ret);
}

View file

@ -992,12 +992,11 @@ hk_CmdEndRendering(VkCommandBuffer commandBuffer)
}
}
static uint64_t
hk_heap(struct hk_cmd_buffer *cmd)
{
static void
hk_init_heap(const void *data) {
struct hk_cmd_buffer *cmd = (struct hk_cmd_buffer *) data;
struct hk_device *dev = hk_cmd_buffer_device(cmd);
if (unlikely(!dev->heap)) {
perf_debug(cmd, "Allocating heap");
size_t size = 128 * 1024 * 1024;
@ -1013,7 +1012,14 @@ hk_heap(struct hk_cmd_buffer *cmd)
.base = dev->heap->va->addr,
.size = size,
};
}
}
static uint64_t
hk_heap(struct hk_cmd_buffer *cmd)
{
struct hk_device *dev = hk_cmd_buffer_device(cmd);
util_call_once_data(&dev->heap_init_once, hk_init_heap, cmd);
/* We need to free all allocations after each command buffer execution */
if (!cmd->uses_heap) {

View file

@ -330,6 +330,7 @@ hk_GetDescriptorSetLayoutSupport(
uint64_t non_variable_size = 0;
uint32_t variable_stride = 0;
uint32_t variable_count = 0;
bool variable_is_inline_uniform_block = false;
uint8_t dynamic_buffer_count = 0;
for (uint32_t i = 0; i < pCreateInfo->bindingCount; i++) {
@ -362,6 +363,10 @@ hk_GetDescriptorSetLayoutSupport(
*/
variable_count = MAX2(1, binding->descriptorCount);
variable_stride = stride;
variable_is_inline_uniform_block =
binding->descriptorType ==
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK;
} else {
/* Since we're aligning to the maximum and since this is just a
* check for whether or not the max buffer size is big enough, we
@ -393,12 +398,21 @@ hk_GetDescriptorSetLayoutSupport(
switch (ext->sType) {
case VK_STRUCTURE_TYPE_DESCRIPTOR_SET_VARIABLE_DESCRIPTOR_COUNT_LAYOUT_SUPPORT: {
VkDescriptorSetVariableDescriptorCountLayoutSupport *vs = (void *)ext;
uint32_t max_var_count;
if (variable_stride > 0) {
vs->maxVariableDescriptorCount =
max_var_count =
(max_buffer_size - non_variable_size) / variable_stride;
} else {
vs->maxVariableDescriptorCount = 0;
max_var_count = 0;
}
if (variable_is_inline_uniform_block) {
max_var_count =
MIN2(max_var_count, HK_MAX_INLINE_UNIFORM_BLOCK_SIZE);
}
vs->maxVariableDescriptorCount = max_var_count;
break;
}

View file

@ -92,6 +92,7 @@ struct hk_device {
* expected to be a legitimate problem. If it is, we can rework later.
*/
struct agx_bo *heap;
util_once_flag heap_init_once;
struct {
struct agx_scratch vs, fs, cs;

View file

@ -67,7 +67,7 @@ get_drm_format_modifier_properties_list(
{
*out_props = (VkDrmFormatModifierPropertiesEXT){
.drmFormatModifier = mod,
.drmFormatModifierPlaneCount = 1 /* no planar mods */,
.drmFormatModifierPlaneCount = vk_format_get_plane_count(vk_format),
.drmFormatModifierTilingFeatures = flags,
};
};
@ -96,7 +96,7 @@ get_drm_format_modifier_properties_list_2(
{
*out_props = (VkDrmFormatModifierProperties2EXT){
.drmFormatModifier = mod,
.drmFormatModifierPlaneCount = 1, /* no planar mods */
.drmFormatModifierPlaneCount = vk_format_get_plane_count(vk_format),
.drmFormatModifierTilingFeatures = flags,
};
};

View file

@ -1424,6 +1424,13 @@ hk_copy_memory_to_image(struct hk_device *device, struct hk_image *dst_image,
uint32_t src_height = info->memoryImageHeight ?: extent.height;
uint32_t blocksize_B = util_format_get_blocksize(layout->format);
/* Align width and height to block */
src_width =
DIV_ROUND_UP(src_width, util_format_get_blockwidth(layout->format));
src_height =
DIV_ROUND_UP(src_height, util_format_get_blockheight(layout->format));
uint32_t src_pitch = src_width * blocksize_B;
unsigned start_layer = (dst_image->vk.image_type == VK_IMAGE_TYPE_3D)
@ -1496,6 +1503,13 @@ hk_copy_image_to_memory(struct hk_device *device, struct hk_image *src_image,
#endif
uint32_t blocksize_B = util_format_get_blocksize(layout->format);
/* Align width and height to block */
dst_width =
DIV_ROUND_UP(dst_width, util_format_get_blockwidth(layout->format));
dst_height =
DIV_ROUND_UP(dst_height, util_format_get_blockheight(layout->format));
uint32_t dst_pitch = dst_width * blocksize_B;
unsigned start_layer = (src_image->vk.image_type == VK_IMAGE_TYPE_3D)
@ -1649,11 +1663,6 @@ hk_copy_image_to_image_cpu(struct hk_device *device, struct hk_image *src_image,
&device->physical_device->ubwc_config);
#endif
} else {
/* Work tile-by-tile, holding the unswizzled tile in a temporary
* buffer.
*/
char temp_tile[16384];
unsigned src_level = info->srcSubresource.mipLevel;
unsigned dst_level = info->dstSubresource.mipLevel;
uint32_t block_width = src_layout->tilesize_el[src_level].width_el;
@ -1667,6 +1676,12 @@ hk_copy_image_to_image_cpu(struct hk_device *device, struct hk_image *src_image,
}
uint32_t temp_pitch = block_width * src_block_B;
size_t temp_tile_size = temp_pitch * (src_offset.y + extent.height);
/* Work tile-by-tile, holding the unswizzled tile in a temporary
* buffer.
*/
char *temp_tile = malloc(temp_tile_size);
for (unsigned by = src_offset.y / block_height;
by * block_height < src_offset.y + extent.height; by++) {
@ -1683,14 +1698,14 @@ hk_copy_image_to_image_cpu(struct hk_device *device, struct hk_image *src_image,
MIN2((bx + 1) * block_width, src_offset.x + extent.width) -
src_x_start;
assert(height * temp_pitch <= ARRAY_SIZE(temp_tile));
ail_detile((void *)src, temp_tile, src_layout, src_level,
temp_pitch, src_x_start, src_y_start, width, height);
ail_tile(dst, temp_tile, dst_layout, dst_level, temp_pitch,
dst_x_start, dst_y_start, width, height);
}
}
free(temp_tile);
}
}
}

View file

@ -859,7 +859,7 @@ hk_get_device_properties(const struct agx_device *dev,
.maxSubgroupSize = 32,
.maxComputeWorkgroupSubgroups = 1024 / 32,
.requiredSubgroupSizeStages = 0,
.maxInlineUniformBlockSize = 1 << 16,
.maxInlineUniformBlockSize = HK_MAX_INLINE_UNIFORM_BLOCK_SIZE,
.maxPerStageDescriptorInlineUniformBlocks = 32,
.maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks = 32,
.maxDescriptorSetInlineUniformBlocks = 6 * 32,
@ -953,7 +953,7 @@ hk_get_device_properties(const struct agx_device *dev,
.robustUniformBufferAccessSizeAlignment = HK_MIN_UBO_ALIGNMENT,
/* VK_EXT_sample_locations */
.sampleLocationSampleCounts = sample_counts,
.sampleLocationSampleCounts = sample_counts & ~VK_SAMPLE_COUNT_1_BIT,
.maxSampleLocationGridSize = (VkExtent2D){1, 1},
.sampleLocationCoordinateRange[0] = 0.0f,
.sampleLocationCoordinateRange[1] = 0.9375f,

View file

@ -23,6 +23,7 @@
#define HK_MAX_DESCRIPTOR_SIZE 64
#define HK_MAX_PUSH_DESCRIPTORS 32
#define HK_MAX_DESCRIPTOR_SET_SIZE (1u << 30)
#define HK_MAX_INLINE_UNIFORM_BLOCK_SIZE (1u << 16)
#define HK_MAX_DESCRIPTORS (1 << 20)
#define HK_PUSH_DESCRIPTOR_SET_SIZE \
(HK_MAX_PUSH_DESCRIPTORS * HK_MAX_DESCRIPTOR_SIZE)

View file

@ -812,11 +812,6 @@ queue_submit(struct hk_device *dev, struct hk_queue *queue,
/* Now setup the command structs */
struct util_dynarray payload;
util_dynarray_init(&payload, NULL);
union drm_asahi_cmd *cmds = malloc(sizeof(*cmds) * command_count);
if (cmds == NULL) {
free(cmds);
return vk_error(dev, VK_ERROR_OUT_OF_HOST_MEMORY);
}
unsigned nr_vdm = 0, nr_cdm = 0;

View file

@ -1582,9 +1582,7 @@ enumerate_devices(struct vk_instance *vk_instance)
break;
}
assert(primary_fd >= 0);
if (render_fd < 0)
if (render_fd < 0 || primary_fd < 0)
result = VK_ERROR_INCOMPATIBLE_DRIVER;
else
result = create_physical_device(instance, primary_fd, render_fd, display_fd);

View file

@ -46,12 +46,13 @@ impl_thrd_routine(void *p)
/*--------------- 7.25.2 Initialization functions ---------------*/
// 7.25.2.1
#ifndef __once_flag_defined
void
call_once(once_flag *flag, void (*func)(void))
{
pthread_once(flag, func);
}
#endif
/*------------- 7.25.3 Condition variable functions -------------*/
// 7.25.3.1

View file

@ -118,8 +118,10 @@ typedef pthread_cond_t cnd_t;
typedef pthread_t thrd_t;
typedef pthread_key_t tss_t;
typedef pthread_mutex_t mtx_t;
#ifndef __once_flag_defined
typedef pthread_once_t once_flag;
# define ONCE_FLAG_INIT PTHREAD_ONCE_INIT
#endif
# ifdef PTHREAD_DESTRUCTOR_ITERATIONS
# define TSS_DTOR_ITERATIONS PTHREAD_DESTRUCTOR_ITERATIONS
# else

View file

@ -139,6 +139,7 @@ struct link_uniform_block_active {
bool has_instance_name;
bool has_binding;
bool is_shader_storage;
bool block_index_assigned;
};
/*
@ -1197,14 +1198,32 @@ link_linked_shader_uniform_blocks(void *mem_ctx,
if (!prog->data->spirv) {
hash_table_foreach(block_hash, entry) {
/* Assign block indices in the order they appear in the shader. We could
* just loop over the hash table and this would be spec compiliant
* however some games seem to incorrectly assume they know the correct
* index without checking. So to avoid debugging strange issues anytime
* the hash table is modified and the order changes we use this
* predictable index allocation instead.
*/
nir_foreach_variable_in_shader(var, shader->Program->nir) {
if (block_type == BLOCK_UBO && !nir_variable_is_in_ubo(var))
continue;
if (block_type == BLOCK_SSBO && !nir_variable_is_in_ssbo(var))
continue;
const struct hash_entry *entry =
_mesa_hash_table_search(block_hash,
glsl_get_type_name(var->interface_type));
struct link_uniform_block_active *const b =
(struct link_uniform_block_active *) entry->data;
if (b->block_index_assigned)
continue;
const struct glsl_type *blk_type =
glsl_without_array(b->var->type) == b->var->interface_type ?
b->var->type : b->var->interface_type;
if (glsl_type_is_array(blk_type)) {
char *name =
ralloc_strdup(NULL,
@ -1221,6 +1240,7 @@ link_linked_shader_uniform_blocks(void *mem_ctx,
variables, &variable_index, 0, 0, prog, shader->Stage,
block_type);
}
b->block_index_assigned = true;
}
} else {
nir_foreach_variable_in_shader(var, shader->Program->nir) {

View file

@ -28,10 +28,16 @@ glcpp_lex = custom_target(
command : [prog_flex, '-o', '@OUTPUT@', '@INPUT@'],
)
glcpp_header_gen_deps = declare_dependency(
sources : [
glcpp_parse[1],
],
)
libglcpp = static_library(
'glcpp',
[glcpp_lex, glcpp_parse, files('glcpp.h', 'pp.c')],
dependencies : idep_mesautil,
dependencies : [idep_mesautil, glcpp_header_gen_deps],
include_directories : [inc_include, inc_src, inc_mesa, inc_gallium, inc_gallium_aux],
c_args : [no_override_init_args, c_msvc_compat_args],
cpp_args : [cpp_msvc_compat_args],

View file

@ -224,6 +224,7 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
case nir_intrinsic_load_subgroup_id_shift_ir3:
case nir_intrinsic_load_base_instance:
case nir_intrinsic_load_base_vertex:
case nir_intrinsic_load_raw_vertex_offset_pan:
case nir_intrinsic_load_first_vertex:
case nir_intrinsic_load_draw_id:
case nir_intrinsic_load_is_indexed_draw:
@ -319,14 +320,10 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
case nir_intrinsic_load_base_global_invocation_id:
case nir_intrinsic_load_base_workgroup_id:
case nir_intrinsic_load_alpha_reference_amd:
case nir_intrinsic_load_ubo_uniform_block_intel:
case nir_intrinsic_load_ssbo_uniform_block_intel:
case nir_intrinsic_load_shared_uniform_block_intel:
case nir_intrinsic_load_barycentric_optimize_amd:
case nir_intrinsic_load_poly_line_smooth_enabled:
case nir_intrinsic_load_rasterization_primitive_amd:
case nir_intrinsic_unit_test_uniform_amd:
case nir_intrinsic_load_global_constant_uniform_block_intel:
case nir_intrinsic_load_debug_log_desc_amd:
case nir_intrinsic_load_xfb_state_address_gfx12_amd:
case nir_intrinsic_cmat_length:
@ -364,6 +361,24 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
is_divergent = false;
break;
case nir_intrinsic_load_ubo_uniform_block_intel:
case nir_intrinsic_load_ssbo_uniform_block_intel:
case nir_intrinsic_load_shared_uniform_block_intel:
case nir_intrinsic_load_global_constant_uniform_block_intel:
if (options & (nir_divergence_across_subgroups |
nir_divergence_multiple_workgroup_per_compute_subgroup)) {
unsigned num_srcs = nir_intrinsic_infos[instr->intrinsic].num_srcs;
for (unsigned i = 0; i < num_srcs; i++) {
if (src_divergent(instr->src[i], state)) {
is_divergent = true;
break;
}
}
} else {
is_divergent = false;
}
break;
/* This is divergent because it specifically loads sequential values into
* successive SIMD lanes.
*/
@ -825,6 +840,7 @@ visit_intrinsic(nir_intrinsic_instr *instr, struct divergence_state *state)
case nir_intrinsic_load_sample_pos_or_center:
case nir_intrinsic_load_vertex_id_zero_base:
case nir_intrinsic_load_vertex_id:
case nir_intrinsic_load_raw_vertex_id_pan:
case nir_intrinsic_load_invocation_id:
case nir_intrinsic_load_local_invocation_id:
case nir_intrinsic_load_local_invocation_index:

View file

@ -1069,6 +1069,7 @@ nir_get_io_index_src_number(const nir_intrinsic_instr *instr)
IMG_CASE(atomic):
IMG_CASE(atomic_swap):
IMG_CASE(size):
IMG_CASE(levels):
IMG_CASE(samples):
IMG_CASE(texel_address):
IMG_CASE(samples_identical):

View file

@ -1228,8 +1228,16 @@ wrap_instr(nir_builder *b, nir_instr *instr, void *data)
static bool
wrap_instrs(nir_shader *shader, wrap_instr_callback callback)
{
return nir_shader_instructions_pass(shader, wrap_instr,
bool progress = nir_shader_instructions_pass(shader, wrap_instr,
nir_metadata_none, callback);
/* Wrapping jump instructions that are located inside ifs can break SSA
* invariants because the else block no longer dominates the merge block.
* Repair the SSA to make the validator happy again.
*/
if (progress)
nir_repair_ssa(shader);
return progress;
}
static bool

View file

@ -4096,9 +4096,9 @@ distribute_src_mods = [
(('fneg', ('fmul(is_used_once)', a, b)), ('fmul', ('fneg', a), b)),
(('fabs', ('fmul(is_used_once)', a, b)), ('fmul', ('fabs', a), ('fabs', b))),
(('fneg', ('ffma(is_used_once)', a, b, c)), ('ffma', ('fneg', a), b, ('fneg', c))),
(('fneg', ('ffma(is_used_once,nsz)', a, b, c)), ('ffma', ('fneg', a), b, ('fneg', c))),
(('fneg', ('flrp(is_used_once)', a, b, c)), ('flrp', ('fneg', a), ('fneg', b), c)),
(('fneg', ('~fadd(is_used_once)', a, b)), ('fadd', ('fneg', a), ('fneg', b))),
(('fneg', ('fadd(is_used_once,nsz)', a, b)), ('fadd', ('fneg', a), ('fneg', b))),
# Note that fmin <-> fmax. I don't think there is a way to distribute
# fabs() into fmin or fmax.

View file

@ -82,7 +82,9 @@ opt_shrink_store_instr(nir_builder *b, nir_intrinsic_instr *instr, bool shrink_i
/* Trim the num_components stored according to the write mask. */
unsigned write_mask = nir_intrinsic_write_mask(instr);
unsigned last_bit = util_last_bit(write_mask);
/* Don't trim down to an invalid number of components, though. */
unsigned last_bit = nir_round_up_components(util_last_bit(write_mask));
if (last_bit < instr->num_components) {
nir_def *def = nir_trim_vector(b, instr->src[0].ssa, last_bit);
nir_src_rewrite(&instr->src[0], def);

View file

@ -652,6 +652,7 @@ nir_precompiled_build_variant(const nir_function *libfunc,
assert(libfunc->workgroup_size[0] != 0 && "must set workgroup size");
b.shader->info.workgroup_size_variable = false;
b.shader->info.workgroup_size[0] = libfunc->workgroup_size[0];
b.shader->info.workgroup_size[1] = libfunc->workgroup_size[1];
b.shader->info.workgroup_size[2] = libfunc->workgroup_size[2];

View file

@ -4791,22 +4791,30 @@ vtn_vector_construct(struct vtn_builder *b, unsigned num_components,
return &vec->def;
}
/*
* Creates a copy of `src`, reinterpreting it as `dest_type`.
*/
static struct vtn_ssa_value *
vtn_composite_copy(struct vtn_builder *b, struct vtn_ssa_value *src)
vtn_composite_copy_logical(struct vtn_builder *b, struct vtn_ssa_value *src, struct vtn_type* dest_type)
{
assert(!src->is_variable);
struct vtn_ssa_value *dest = vtn_zalloc(b, struct vtn_ssa_value);
dest->type = src->type;
dest->type = glsl_get_bare_type(dest_type->type);
if (glsl_type_is_vector_or_scalar(src->type)) {
if (glsl_type_is_vector_or_scalar(dest_type->type)) {
dest->def = src->def;
} else {
unsigned elems = glsl_get_length(src->type);
unsigned elems = glsl_get_length(dest_type->type);
dest->elems = vtn_alloc_array(b, struct vtn_ssa_value *, elems);
if (glsl_type_is_struct(dest_type->type) || glsl_type_is_interface(dest_type->type)) {
for (unsigned i = 0; i < elems; i++)
dest->elems[i] = vtn_composite_copy(b, src->elems[i]);
dest->elems[i] = vtn_composite_copy_logical(b, src->elems[i], dest_type->members[i]);
} else {
for (unsigned i = 0; i < elems; i++)
dest->elems[i] = vtn_composite_copy_logical(b, src->elems[i], dest_type->array_element);
}
}
return dest;
@ -4814,13 +4822,14 @@ vtn_composite_copy(struct vtn_builder *b, struct vtn_ssa_value *src)
static struct vtn_ssa_value *
vtn_composite_insert(struct vtn_builder *b, struct vtn_ssa_value *src,
struct vtn_ssa_value *insert, const uint32_t *indices,
unsigned num_indices)
struct vtn_type *src_type, struct vtn_ssa_value *insert,
const uint32_t *indices, unsigned num_indices)
{
if (glsl_type_is_cmat(src->type))
return vtn_cooperative_matrix_insert(b, src, insert, indices, num_indices);
struct vtn_ssa_value *dest = vtn_composite_copy(b, src);
/* Straight copy, use the source type as the destination type. */
struct vtn_ssa_value *dest = vtn_composite_copy_logical(b, src, src_type);
struct vtn_ssa_value *cur = dest;
unsigned i;
@ -4963,15 +4972,15 @@ vtn_handle_composite(struct vtn_builder *b, SpvOp opcode,
case SpvOpCompositeInsert:
ssa = vtn_composite_insert(b, vtn_ssa_value(b, w[4]),
vtn_get_value_type(b, w[4]),
vtn_ssa_value(b, w[3]),
w + 5, count - 5);
break;
case SpvOpCopyLogical: {
ssa = vtn_composite_copy(b, vtn_ssa_value(b, w[3]));
struct vtn_type *dst_type = vtn_get_value_type(b, w[2]);
vtn_assert(vtn_types_compatible(b, type, dst_type));
ssa->type = glsl_get_bare_type(dst_type->type);
struct vtn_type *dest_type = vtn_get_value_type(b, w[2]);
vtn_assert(vtn_types_compatible(b, vtn_get_value_type(b, w[3]), dest_type));
ssa = vtn_composite_copy_logical(b, vtn_ssa_value(b, w[3]), dest_type);
break;
}
case SpvOpCopyObject:

View file

@ -506,8 +506,8 @@ vtn_pointer_dereference(struct vtn_builder *b,
type = type->array_element;
}
tail = nir_build_deref_array(&b->nb, tail, arr_index);
}
tail->arr.in_bounds = deref_chain->in_bounds;
}
access |= type->access;
}

View file

@ -564,7 +564,7 @@ tiled_to_linear_2cpp(char *_tiled, char *_linear, uint32_t linear_pitch)
"v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15");
}
#else
memcpy_small<2, LINEAR_TO_TILED, FDL_MACROTILE_4_CHANNEL>(
memcpy_small<2, TILED_TO_LINEAR, FDL_MACROTILE_4_CHANNEL>(
0, 0, 32, 4, _tiled, _linear, linear_pitch, 0, 0, 0);
#endif
}

View file

@ -2300,6 +2300,17 @@ insert_live_out_moves(struct ra_ctx *ctx)
insert_file_live_out_moves(ctx, &ctx->shared);
}
static bool
has_merge_set_preferred_reg(struct ir3_register *reg)
{
assert(reg->merge_set);
assert(reg->num != INVALID_REG);
return reg->merge_set->preferred_reg != (physreg_t)~0 &&
ra_reg_get_physreg(reg) ==
reg->merge_set->preferred_reg + reg->merge_set_offset;
}
static void
handle_block(struct ra_ctx *ctx, struct ir3_block *block)
{
@ -2338,17 +2349,15 @@ handle_block(struct ra_ctx *ctx, struct ir3_block *block)
struct ir3_register *dst = input->dsts[0];
assert(dst->num != INVALID_REG);
physreg_t dst_start = ra_reg_get_physreg(dst);
physreg_t dst_end;
if (dst->merge_set) {
if (dst->merge_set && has_merge_set_preferred_reg(dst)) {
/* Take the whole merge set into account to prevent its range being
* allocated for defs not part of the merge set.
*/
assert(dst_start >= dst->merge_set_offset);
dst_end = dst_start - dst->merge_set_offset + dst->merge_set->size;
dst_end = dst->merge_set->preferred_reg + dst->merge_set->size;
} else {
dst_end = dst_start + reg_size(dst);
dst_end = ra_reg_get_physreg(dst) + reg_size(dst);
}
struct ra_file *file = ra_get_file(ctx, dst);

View file

@ -1461,6 +1461,15 @@ r3d_dst_gmem(struct tu_cmd_buffer *cmd, struct tu_cs *cs,
gmem_offset = tu_attachment_gmem_offset(cmd, att, layer);
}
/* On a7xx we must always use FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8. See
* blit_base_format().
*/
if (CHIP >= A7XX && att->format == VK_FORMAT_D24_UNORM_S8_UINT) {
RB_MRT_BUF_INFO = pkt_field_set(A6XX_RB_MRT_BUF_INFO_COLOR_FORMAT,
RB_MRT_BUF_INFO,
FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8);
}
tu_cs_emit_regs(cs,
RB_MRT_BUF_INFO(CHIP, 0, .dword = RB_MRT_BUF_INFO),
A6XX_RB_MRT_PITCH(0, 0),
@ -1533,7 +1542,8 @@ r3d_setup(struct tu_cmd_buffer *cmd,
tu_cs_emit_call(cs, cmd->device->dbg_renderpass_stomp_cs);
}
enum a6xx_format fmt = blit_base_format<CHIP>(dst_format, ubwc, false);
enum a6xx_format fmt = blit_base_format<CHIP>(dst_format, ubwc,
blit_param & R3D_DST_GMEM);
fixup_dst_format(src_format, &dst_format, &fmt);
if (!cmd->state.pass) {
@ -4638,7 +4648,7 @@ clear_sysmem_attachment(struct tu_cmd_buffer *cmd,
enum pipe_format format = vk_format_to_pipe_format(vk_format);
const struct tu_framebuffer *fb = cmd->state.framebuffer;
const struct tu_image_view *iview = cmd->state.attachments[a];
const uint32_t clear_views = cmd->state.pass->attachments[a].clear_views;
const uint32_t clear_views = cmd->state.pass->attachments[a].used_views;
const struct blit_ops *ops = &r2d_ops<CHIP>;
const VkClearValue *value = &cmd->state.clear_values[a];
if (cmd->state.pass->attachments[a].samples > 1)
@ -4734,7 +4744,7 @@ tu_clear_gmem_attachment(struct tu_cmd_buffer *cmd,
tu_emit_clear_gmem_attachment<CHIP>(cmd, cs, resolve_group, a, 0,
cmd->state.framebuffer->layers,
attachment->clear_views,
attachment->used_views,
attachment->clear_mask,
&cmd->state.clear_values[a], NULL);
}
@ -4755,7 +4765,7 @@ tu7_generic_clear_attachment(struct tu_cmd_buffer *cmd,
iview->view.ubwc_enabled, att->samples);
enum pipe_format format = vk_format_to_pipe_format(att->format);
for_each_layer(i, att->clear_views, cmd->state.framebuffer->layers) {
for_each_layer(i, att->used_views, cmd->state.framebuffer->layers) {
uint32_t layer = i + 0;
uint32_t mask =
aspect_write_mask_generic_clear(format, att->clear_mask);
@ -4836,7 +4846,7 @@ tu_emit_blit(struct tu_cmd_buffer *cmd,
uint32_t buffer_id = tu_resolve_group_include_buffer<CHIP>(resolve_group, format);
event_blit_setup(cs, buffer_id, attachment, blit_event_type, clear_mask);
for_each_layer(i, attachment->clear_views, cmd->state.framebuffer->layers) {
for_each_layer(i, attachment->used_views, cmd->state.framebuffer->layers) {
event_blit_dst_view blt_view = blt_view_from_tu_view(iview, i);
event_blit_run<CHIP>(cmd, cs, attachment, &blt_view, separate_stencil);
}
@ -4951,7 +4961,7 @@ load_3d_blit(struct tu_cmd_buffer *cmd,
/* Wait for CACHE_INVALIDATE to land */
tu_cs_emit_wfi(cs);
for_each_layer(i, att->clear_views, cmd->state.framebuffer->layers) {
for_each_layer(i, att->used_views, cmd->state.framebuffer->layers) {
if (cmd->state.pass->has_fdm) {
struct apply_load_coords_state state = {
.view = i,

View file

@ -1616,7 +1616,7 @@ tu6_emit_gmem_stores(struct tu_cmd_buffer *cmd,
scissor_emitted = true;
}
tu_store_gmem_attachment<CHIP>(cmd, cs, resolve_group, a, a,
fb->layers, subpass->multiview_mask,
fb->layers, att->used_views,
cond_exec_allowed);
}
}
@ -6868,7 +6868,7 @@ tu6_draw_common(struct tu_cmd_buffer *cmd,
struct tu_render_pass_state *rp = &cmd->state.rp;
trace_start_draw(
&cmd->trace, &cmd->draw_cs, cmd, draw_count,
&cmd->rp_trace, &cmd->draw_cs, cmd, draw_count,
cmd->state.program.stage_sha1[MESA_SHADER_VERTEX],
cmd->state.program.stage_sha1[MESA_SHADER_TESS_CTRL],
cmd->state.program.stage_sha1[MESA_SHADER_TESS_EVAL],
@ -7316,7 +7316,7 @@ tu_CmdDraw(VkCommandBuffer commandBuffer,
tu_cs_emit(cs, instanceCount);
tu_cs_emit(cs, vertexCount);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDraw);
@ -7365,7 +7365,7 @@ tu_CmdDrawMultiEXT(VkCommandBuffer commandBuffer,
}
if (i != 0)
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawMultiEXT);
@ -7393,7 +7393,7 @@ tu_CmdDrawIndexed(VkCommandBuffer commandBuffer,
tu_cs_emit_qw(cs, cmd->state.index_va);
tu_cs_emit(cs, cmd->state.max_index_count);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawIndexed);
@ -7447,7 +7447,7 @@ tu_CmdDrawMultiIndexedEXT(VkCommandBuffer commandBuffer,
}
if (i != 0)
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawMultiIndexedEXT);
@ -7492,7 +7492,7 @@ tu_CmdDrawIndirect(VkCommandBuffer commandBuffer,
tu_cs_emit_qw(cs, vk_buffer_address(&buf->vk, offset));
tu_cs_emit(cs, stride);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawIndirect);
@ -7525,7 +7525,7 @@ tu_CmdDrawIndexedIndirect(VkCommandBuffer commandBuffer,
tu_cs_emit_qw(cs, vk_buffer_address(&buf->vk, offset));
tu_cs_emit(cs, stride);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawIndexedIndirect);
@ -7564,7 +7564,7 @@ tu_CmdDrawIndirectCount(VkCommandBuffer commandBuffer,
tu_cs_emit_qw(cs, vk_buffer_address(&count_buf->vk, countBufferOffset));
tu_cs_emit(cs, stride);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawIndirectCount);
@ -7600,7 +7600,7 @@ tu_CmdDrawIndexedIndirectCount(VkCommandBuffer commandBuffer,
tu_cs_emit_qw(cs, vk_buffer_address(&count_buf->vk, countBufferOffset));
tu_cs_emit(cs, stride);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawIndexedIndirectCount);
@ -7644,7 +7644,7 @@ tu_CmdDrawIndirectByteCountEXT(VkCommandBuffer commandBuffer,
tu_cs_emit(cs, counterOffset);
tu_cs_emit(cs, vertexStride);
trace_end_draw(&cmd->trace, cs);
trace_end_draw(&cmd->rp_trace, cs);
}
TU_GENX(tu_CmdDrawIndirectByteCountEXT);

View file

@ -208,8 +208,8 @@ tu_CreateDescriptorSetLayout(
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK)
set_layout->has_inline_uniforms = true;
if (variable_flags && binding->binding < variable_flags->bindingCount &&
(variable_flags->pBindingFlags[binding->binding] &
if (variable_flags && j < variable_flags->bindingCount &&
(variable_flags->pBindingFlags[j] &
VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT)) {
assert(!binding->pImmutableSamplers); /* Terribly ill defined how
many samplers are valid */
@ -377,7 +377,7 @@ tu_GetDescriptorSetLayoutSupport(
uint64_t max_count = MAX_SET_SIZE;
unsigned descriptor_count = binding->descriptorCount;
if (binding->descriptorType == VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK) {
max_count = MAX_SET_SIZE - size;
max_count = MAX_INLINE_UBO_RANGE - size;
descriptor_count = descriptor_sz;
descriptor_sz = 1;
} else if (descriptor_sz) {
@ -388,9 +388,9 @@ tu_GetDescriptorSetLayoutSupport(
supported = false;
}
if (variable_flags && binding->binding < variable_flags->bindingCount &&
if (variable_flags && i < variable_flags->bindingCount &&
variable_count &&
(variable_flags->pBindingFlags[binding->binding] &
(variable_flags->pBindingFlags[i] &
VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT)) {
variable_count->maxVariableDescriptorCount =
MIN2(UINT32_MAX, max_count);

View file

@ -417,7 +417,8 @@ tu_render_pass_patch_input_gmem(struct tu_render_pass *pass)
uint32_t a = subpass->input_attachments[j].attachment;
if (a == VK_ATTACHMENT_UNUSED)
continue;
subpass->input_attachments[j].patch_input_gmem = written[a];
subpass->input_attachments[j].patch_input_gmem =
written[a] && pass->attachments[a].gmem;
}
for (unsigned j = 0; j < subpass->color_count; j++) {
@ -884,7 +885,7 @@ tu_subpass_use_attachment(struct tu_render_pass *pass, int i, uint32_t a, const
att->gmem = true;
update_samples(subpass, pCreateInfo->pAttachments[a].samples);
att->clear_views |= subpass->multiview_mask;
att->used_views |= subpass->multiview_mask;
/* Loads and clears are emitted at the start of the subpass that needs them. */
att->first_subpass_idx = MIN2(i, att->first_subpass_idx);
@ -1126,6 +1127,7 @@ tu_CreateRenderPass2(VkDevice _device,
if (!att->gmem) {
att->clear_mask = 0;
att->load = false;
att->load_stencil = false;
}
}
@ -1235,7 +1237,7 @@ tu_setup_dynamic_render_pass(struct tu_cmd_buffer *cmd_buffer,
VK_FROM_HANDLE(tu_image_view, view, att_info->imageView);
tu_setup_dynamic_attachment(att, view);
att->gmem = true;
att->clear_views = info->viewMask;
att->used_views = info->viewMask;
attachment_set_ops(device, att, att_info->loadOp,
VK_ATTACHMENT_LOAD_OP_DONT_CARE, att_info->storeOp,
VK_ATTACHMENT_STORE_OP_DONT_CARE);
@ -1279,7 +1281,7 @@ tu_setup_dynamic_render_pass(struct tu_cmd_buffer *cmd_buffer,
struct tu_render_pass_attachment *att = &pass->attachments[a];
tu_setup_dynamic_attachment(att, view);
att->gmem = true;
att->clear_views = info->viewMask;
att->used_views = info->viewMask;
subpass->depth_stencil_attachment.attachment = a++;
subpass->input_attachments[0].attachment =
subpass->depth_stencil_attachment.attachment;

View file

@ -94,7 +94,19 @@ struct tu_render_pass_attachment
VkSampleCountFlagBits samples;
uint32_t cpp;
VkImageAspectFlags clear_mask;
uint32_t clear_views;
/* All views that are used with the attachment in all subpasses. Used to
* determine which views to apply loadOp/storeOp to.
*/
uint32_t used_views;
/* The internal MSRTSS attachment to clear when the user says to clear
* this attachment. Clear values must be remapped to this attachment.
*/
uint32_t remapped_clear_att;
/* For internal attachments created for MSRTSS, the original user attachment
* which it is resolved/unresolved to.
*/
uint32_t user_att;
bool load;
bool store;
bool gmem;

View file

@ -3157,8 +3157,6 @@ tu6_emit_blend(struct tu_cs *cs,
bool dual_src_blend = tu_blend_state_is_dual_src(cb);
tu_cs_emit_regs(cs, A6XX_SP_PS_MRT_CNTL(.mrt = num_rts));
tu_cs_emit_regs(cs, A6XX_RB_PS_MRT_CNTL(.mrt = num_rts));
tu_cs_emit_regs(cs, A6XX_SP_BLEND_CNTL(.enable_blend = blend_enable_mask,
.unk8 = true,
.dual_color_in_enable =
@ -3180,10 +3178,12 @@ tu6_emit_blend(struct tu_cs *cs,
.alpha_to_one = alpha_to_one_enable,
.sample_mask = sample_mask));
unsigned num_remapped_rts = 0;
for (unsigned i = 0; i < num_rts; i++) {
if (cal->color_map[i] == MESA_VK_ATTACHMENT_UNUSED)
continue;
unsigned remapped_idx = cal->color_map[i];
num_remapped_rts = MAX2(num_remapped_rts, remapped_idx + 1);
const struct vk_color_blend_attachment_state *att = &cb->attachments[i];
if ((cb->color_write_enables & (1u << i)) && i < cb->attachment_count) {
const enum a3xx_rb_blend_opcode color_op = tu6_blend_op(att->color_blend_op);
@ -3227,6 +3227,8 @@ tu6_emit_blend(struct tu_cs *cs,
A6XX_RB_MRT_BLEND_CONTROL(remapped_idx,));
}
}
tu_cs_emit_regs(cs, A6XX_SP_PS_MRT_CNTL(.mrt = num_remapped_rts));
tu_cs_emit_regs(cs, A6XX_RB_PS_MRT_CNTL(.mrt = num_remapped_rts));
}
static const enum mesa_vk_dynamic_graphics_state tu_blend_constants_state[] = {

View file

@ -185,12 +185,6 @@ static void noop_set_vertex_buffers(struct pipe_context *ctx,
unsigned count,
const struct pipe_vertex_buffer *buffers)
{
for (unsigned i = 0; i < count; i++) {
if (!buffers[i].is_user_buffer) {
struct pipe_resource *buf = buffers[i].buffer.resource;
pipe_resource_reference(&buf, NULL);
}
}
}
static void *noop_create_vertex_elements(struct pipe_context *ctx,

View file

@ -88,10 +88,9 @@
#define LLVMCreateBuilder ILLEGAL_LLVM_FUNCTION
typedef struct lp_context_ref {
#if GALLIVM_USE_ORCJIT
LLVMOrcThreadSafeContextRef ref;
#else
LLVMContextRef ref;
#if GALLIVM_USE_ORCJIT
LLVMOrcThreadSafeContextRef tsref;
#endif
bool owned;
} lp_context_ref;
@ -101,18 +100,21 @@ lp_context_create(lp_context_ref *context)
{
assert(context != NULL);
#if GALLIVM_USE_ORCJIT
context->ref = LLVMOrcCreateNewThreadSafeContext();
#if LLVM_VERSION_MAJOR >= 21
context->ref = LLVMContextCreate();
/* Ownership of ref is then transferred to tsref */
context->tsref = LLVMOrcCreateNewThreadSafeContextFromLLVMContext(context->ref);
#else
context->tsref = LLVMOrcCreateNewThreadSafeContext();
context->ref = LLVMOrcThreadSafeContextGetContext(context->tsref);
#endif
#else
context->ref = LLVMContextCreate();
#endif
context->owned = true;
#if LLVM_VERSION_MAJOR == 15
if (context->ref) {
#if GALLIVM_USE_ORCJIT
LLVMContextSetOpaquePointers(LLVMOrcThreadSafeContextGetContext(context->ref), false);
#else
LLVMContextSetOpaquePointers(context->ref, false);
#endif
}
#endif
}
@ -123,7 +125,7 @@ lp_context_destroy(lp_context_ref *context)
assert(context != NULL);
if (context->owned) {
#if GALLIVM_USE_ORCJIT
LLVMOrcDisposeThreadSafeContext(context->ref);
LLVMOrcDisposeThreadSafeContext(context->tsref);
#else
LLVMContextDispose(context->ref);
#endif

View file

@ -555,8 +555,8 @@ init_gallivm_state(struct gallivm_state *gallivm, const char *name,
gallivm->cache = cache;
gallivm->_ts_context = context->ref;
gallivm->context = LLVMContextCreate();
gallivm->_ts_context = context->tsref;
gallivm->context = context->ref;
gallivm->module_name = LPJit::get_unique_name(name);
gallivm->module = LLVMModuleCreateWithNameInContext(gallivm->module_name,

View file

@ -3163,7 +3163,7 @@ do_int_divide(struct lp_build_nir_soa_context *bld,
static LLVMValueRef
do_int_mod(struct lp_build_nir_soa_context *bld,
bool is_unsigned, unsigned src_bit_size,
bool is_unsigned, bool use_src2_sign, unsigned src_bit_size,
LLVMValueRef src, LLVMValueRef src2)
{
struct gallivm_state *gallivm = bld->base.gallivm;
@ -3180,8 +3180,18 @@ do_int_mod(struct lp_build_nir_soa_context *bld,
divisor = get_signed_divisor(gallivm, int_bld, mask_bld,
src_bit_size, src, divisor);
}
LLVMValueRef result = lp_build_mod(int_bld, src, divisor);
return LLVMBuildOr(builder, div_mask, result, "");
LLVMValueRef rem = lp_build_mod(int_bld, src, divisor);
rem = LLVMBuildOr(builder, div_mask, rem, "");
if (use_src2_sign) {
LLVMValueRef add_src2 = LLVMBuildICmp(builder, LLVMIntNE, rem, int_bld->zero, "");
LLVMValueRef signs_different = LLVMBuildXor(builder, LLVMBuildICmp(builder, LLVMIntSLT, src, int_bld->zero, ""),
LLVMBuildICmp(builder, LLVMIntSLT, src2, int_bld->zero, ""), "");
add_src2 = LLVMBuildAnd(builder, add_src2, signs_different, "");
rem = LLVMBuildSelect(builder, add_src2, LLVMBuildAdd(builder, rem, src2, ""), rem, "");
}
return rem;
}
static LLVMValueRef
@ -3493,7 +3503,7 @@ do_alu_action(struct lp_build_nir_soa_context *bld,
break;
case nir_op_imod:
case nir_op_irem:
result = do_int_mod(bld, false, src_bit_size[0], src[0], src[1]);
result = do_int_mod(bld, false, instr->op == nir_op_imod, src_bit_size[0], src[0], src[1]);
break;
case nir_op_ishl: {
if (src_bit_size[0] == 64)
@ -3592,7 +3602,7 @@ do_alu_action(struct lp_build_nir_soa_context *bld,
result = lp_build_min(uint_bld, src[0], src[1]);
break;
case nir_op_umod:
result = do_int_mod(bld, true, src_bit_size[0], src[0], src[1]);
result = do_int_mod(bld, true, false, src_bit_size[0], src[0], src[1]);
break;
case nir_op_umul_high: {
LLVMValueRef hi_bits;

View file

@ -64,6 +64,7 @@ DRI_CONF_SECTION_END
DRI_CONF_SECTION_MISCELLANEOUS
DRI_CONF_ALWAYS_HAVE_DEPTH_BUFFER(false)
DRI_CONF_GLSL_ZERO_INIT(false)
DRI_CONF_VERTEX_PROGRAM_DEFAULT_OUT(false)
DRI_CONF_VS_POSITION_ALWAYS_INVARIANT(false)
DRI_CONF_VS_POSITION_ALWAYS_PRECISE(false)
DRI_CONF_ALLOW_RGB10_CONFIGS(true)

View file

@ -76,6 +76,7 @@ u_driconf_fill_st_options(struct st_config_options *options,
query_string_option(force_gl_renderer);
query_string_option(mesa_extension_override);
query_bool_option(allow_multisampled_copyteximage);
query_bool_option(vertex_program_default_out);
driComputeOptionsSha1(optionCache, options->config_options_sha1);
}

View file

@ -634,8 +634,8 @@ asahi_add_attachment(struct attachments *att, struct agx_resource *rsrc)
assert(att->count < MAX_ATTACHMENTS);
att->list[att->count++] = (struct drm_asahi_attachment){
.size = rsrc->layout.size_B,
.pointer = rsrc->bo->va->addr,
.size = rsrc->layout.size_B - rsrc->layout.level_offsets_B[0],
.pointer = agx_map_gpu(rsrc),
};
}

View file

@ -210,13 +210,13 @@ agx_resource_from_handle(struct pipe_screen *pscreen,
if (rsc->layout.tiling == AIL_TILING_LINEAR) {
rsc->layout.linear_stride_B = whandle->stride;
} else if (whandle->stride != ail_get_wsi_stride_B(&rsc->layout, 0)) {
rsc->layout.level_offsets_B[0] = whandle->offset;
} else if (whandle->stride != ail_get_wsi_stride_B(&rsc->layout, 0) ||
whandle->offset != 0) {
FREE(rsc);
return NULL;
}
assert(whandle->offset == 0);
ail_make_miptree(&rsc->layout);
if (prsc->target == PIPE_BUFFER) {
@ -301,7 +301,8 @@ agx_resource_get_param(struct pipe_screen *pscreen, struct pipe_context *pctx,
enum pipe_resource_param param, unsigned usage,
uint64_t *value)
{
struct agx_resource *rsrc = (struct agx_resource *)prsc;
struct agx_resource *rsrc =
(struct agx_resource *)util_resource_at_index(prsc, plane);
switch (param) {
case PIPE_RESOURCE_PARAM_STRIDE:
@ -1292,7 +1293,7 @@ agx_cmdbuf(struct agx_device *dev, struct drm_asahi_cmd_render *c,
if (zres->layout.compressed) {
c->depth.comp_base =
agx_map_texture_gpu(zres, 0) + zres->layout.metadata_offset_B +
agx_map_gpu(zres) + zres->layout.metadata_offset_B +
(first_layer * zres->layout.compression_layer_stride_B) +
zres->layout.level_offsets_compressed_B[level];
@ -1329,7 +1330,7 @@ agx_cmdbuf(struct agx_device *dev, struct drm_asahi_cmd_render *c,
if (sres->layout.compressed) {
c->stencil.comp_base =
agx_map_texture_gpu(sres, 0) + sres->layout.metadata_offset_B +
agx_map_gpu(sres) + sres->layout.metadata_offset_B +
(first_layer * sres->layout.compression_layer_stride_B) +
sres->layout.level_offsets_compressed_B[level];

View file

@ -503,7 +503,7 @@ agx_get_query_result_resource_gpu(struct agx_context *ctx,
: 0;
libagx_copy_query_gl(batch, agx_1d(1), AGX_BARRIER_ALL, query->ptr.gpu,
rsrc->bo->va->addr + offset, result_type, bool_size);
agx_map_gpu(rsrc) + offset, result_type, bool_size);
return true;
}

View file

@ -726,7 +726,7 @@ agx_pack_texture(void *out, struct agx_resource *rsrc,
if (rsrc->layout.compressed) {
cfg.acceleration_buffer =
agx_map_texture_gpu(rsrc, 0) + rsrc->layout.metadata_offset_B +
agx_map_gpu(rsrc) + rsrc->layout.metadata_offset_B +
(first_layer * rsrc->layout.compression_layer_stride_B);
}
@ -1262,7 +1262,7 @@ agx_batch_upload_pbe(struct agx_batch *batch, struct agx_pbe_packed *out,
cfg.extended = true;
cfg.acceleration_buffer =
agx_map_texture_gpu(tex, 0) + tex->layout.metadata_offset_B +
agx_map_gpu(tex) + tex->layout.metadata_offset_B +
(layer * tex->layout.compression_layer_stride_B);
}
@ -3756,8 +3756,9 @@ agx_index_buffer_rsrc_ptr(struct agx_batch *batch,
struct agx_resource *rsrc = agx_resource(info->index.resource);
agx_batch_reads(batch, rsrc);
*extent = ALIGN_POT(rsrc->layout.size_B, 4);
return rsrc->bo->va->addr;
*extent =
ALIGN_POT(rsrc->layout.size_B - rsrc->layout.level_offsets_B[0], 4);
return agx_map_gpu(rsrc);
}
static uint64_t
@ -3948,7 +3949,7 @@ agx_batch_geometry_params(struct agx_batch *batch, uint64_t input_index_buffer,
params.xfb_size[i] = size;
if (rsrc) {
params.xfb_offs_ptrs[i] = rsrc->bo->va->addr;
params.xfb_offs_ptrs[i] = agx_map_gpu(rsrc);
agx_batch_writes(batch, rsrc, 0);
batch->incoherent_writes = true;
}
@ -4054,7 +4055,7 @@ agx_indirect_buffer_ptr(struct agx_batch *batch,
struct agx_resource *rsrc = agx_resource(indirect->buffer);
agx_batch_reads(batch, rsrc);
return rsrc->bo->va->addr + indirect->offset;
return agx_map_gpu(rsrc) + indirect->offset;
}
static void
@ -5388,7 +5389,7 @@ agx_launch_grid(struct pipe_context *pipe, const struct pipe_grid_info *info)
if (info->indirect) {
struct agx_resource *rsrc = agx_resource(info->indirect);
agx_batch_reads(batch, rsrc);
indirect = rsrc->bo->va->addr + info->indirect_offset;
indirect = agx_map_gpu(rsrc) + info->indirect_offset;
}
/* Increment the pipeline stats query.
@ -5493,7 +5494,7 @@ agx_set_global_binding(struct pipe_context *pipe, unsigned first,
struct agx_resource *rsrc = agx_resource(resources[i]);
memcpy(&addr, handles[i], sizeof(addr));
addr += rsrc->bo->va->addr;
addr += agx_map_gpu(rsrc);
memcpy(handles[i], &addr, sizeof(addr));
} else {
pipe_resource_reference(res, NULL);
@ -5534,7 +5535,7 @@ agx_decompress_inplace(struct agx_batch *batch, struct pipe_surface *surf,
surf->last_layer - surf->first_layer + 1);
libagx_decompress(batch, grid, AGX_BARRIER_ALL, layout, surf->first_layer,
level, agx_map_texture_gpu(rsrc, 0), images.gpu);
level, agx_map_gpu(rsrc), images.gpu);
}
void

View file

@ -970,10 +970,16 @@ agx_map_texture_cpu(struct agx_resource *rsrc, unsigned level, unsigned z)
ail_get_layer_level_B(&rsrc->layout, z, level);
}
static inline uint64_t
agx_map_gpu(struct agx_resource *rsrc)
{
return rsrc->bo->va->addr + rsrc->layout.level_offsets_B[0];
}
static inline uint64_t
agx_map_texture_gpu(struct agx_resource *rsrc, unsigned z)
{
return rsrc->bo->va->addr +
return agx_map_gpu(rsrc) +
(uint64_t)ail_get_layer_offset_B(&rsrc->layout, z);
}

View file

@ -116,7 +116,7 @@ agx_batch_get_so_address(struct agx_batch *batch, unsigned buffer,
target->buffer_size);
*size = target->buffer_size;
return rsrc->bo->va->addr + target->buffer_offset;
return agx_map_gpu(rsrc) + target->buffer_offset;
}
void

View file

@ -3,12 +3,9 @@
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
#include "asahi/genxml/agx_pack.h"
#include "pipe/p_state.h"
#include "util/format/u_format.h"
#include "util/half_float.h"
#include "util/macros.h"
#include "agx_abi.h"
#include "agx_device.h"
#include "agx_state.h"
#include "pool.h"
@ -19,8 +16,7 @@ agx_const_buffer_ptr(struct agx_batch *batch, struct pipe_constant_buffer *cb)
if (cb->buffer) {
struct agx_resource *rsrc = agx_resource(cb->buffer);
agx_batch_reads(batch, rsrc);
return rsrc->bo->va->addr + cb->buffer_offset;
return agx_map_gpu(rsrc) + cb->buffer_offset;
} else {
return 0;
}
@ -42,8 +38,9 @@ agx_upload_vbos(struct agx_batch *batch)
struct agx_resource *rsrc = agx_resource(vb.buffer.resource);
agx_batch_reads(batch, rsrc);
buffers[vbo] = rsrc->bo->va->addr + vb.buffer_offset;
buf_sizes[vbo] = rsrc->layout.size_B - vb.buffer_offset;
buffers[vbo] = agx_map_gpu(rsrc) + vb.buffer_offset;
buf_sizes[vbo] = rsrc->layout.size_B - vb.buffer_offset -
rsrc->layout.level_offsets_B[0];
}
}
@ -144,7 +141,7 @@ agx_set_ssbo_uniforms(struct agx_batch *batch, mesa_shader_stage stage)
agx_batch_reads(batch, rsrc);
}
unif->ssbo_base[cb] = rsrc->bo->va->addr + sb->buffer_offset;
unif->ssbo_base[cb] = agx_map_gpu(rsrc) + sb->buffer_offset;
unif->ssbo_size[cb] = st->ssbo[cb].buffer_size;
} else {
/* Invalid, so use the sink */

View file

@ -464,6 +464,15 @@ iris_blorp_exec_blitter(struct blorp_batch *blorp_batch,
iris_bo_bump_seqno(params->dst.addr.buffer, batch->next_seqno,
IRIS_DOMAIN_OTHER_WRITE);
/*
* TDOD: Add INTEL_NEEDS_WA_14025112257 check once HSD is propogated for all
* other impacted platforms.
*/
if (batch->screen->devinfo->ver >= 20 && batch->name == IRIS_BATCH_COMPUTE) {
iris_emit_pipe_control_flush(batch, "WA_14025112257",
PIPE_CONTROL_STATE_CACHE_INVALIDATE);
}
}
static void

View file

@ -223,9 +223,9 @@ iris_apply_brw_tes_prog_data(struct iris_compiled_shader *shader,
iris_apply_brw_vue_prog_data(&brw->base, &iris->base);
iris->partitioning = brw->partitioning;
iris->output_topology = brw->output_topology;
iris->domain = brw->domain;
iris->partitioning = brw_tess_info_partitioning(brw->tess_info);
iris->output_topology = brw_tess_info_output_topology(brw->tess_info);
iris->domain = brw_tess_info_domain(brw->tess_info);
iris->include_primitive_id = brw->include_primitive_id;
}

View file

@ -9901,6 +9901,16 @@ iris_emit_raw_pipe_control(struct iris_batch *batch,
}
#endif
#if GFX_VER >= 12
/* BSpec 47112 (xe), 56551 (xe2): Instruction_PIPE_CONTROL (ComputeCS):
* SW must follow below programming restrictions when programming
* PIPE_CONTROL command:
* "Command Streamer Stall Enable" must be always set.
*/
if (batch->name == IRIS_BATCH_COMPUTE)
flags |= PIPE_CONTROL_CS_STALL;
#endif
/* The "L3 Read Only Cache Invalidation Bit" docs say it "controls the
* invalidation of the Geometry streams cached in L3 cache at the top
* of the pipe". In other words, index & vertex data that gets cached

View file

@ -338,7 +338,6 @@ attribs_update_simple(struct lp_build_interp_soa_context *bld,
LLVMBuilderRef builder = gallivm->builder;
struct lp_build_context *coeff_bld = &bld->coeff_bld;
struct lp_build_context *setup_bld = &bld->setup_bld;
LLVMValueRef oow = NULL;
LLVMValueRef pixoffx;
LLVMValueRef pixoffy;
LLVMValueRef ptr;
@ -425,7 +424,6 @@ attribs_update_simple(struct lp_build_interp_soa_context *bld,
}
if (interp == LP_INTERP_PERSPECTIVE) {
if (oow == NULL) {
LLVMValueRef w;
assert(attrib != 0);
assert(bld->mask[0] & TGSI_WRITEMASK_W);
@ -442,8 +440,7 @@ attribs_update_simple(struct lp_build_interp_soa_context *bld,
else {
w = bld->attribs[0][3];
}
oow = lp_build_rcp(coeff_bld, w);
}
LLVMValueRef oow = lp_build_rcp(coeff_bld, w);
a = lp_build_mul(coeff_bld, a, oow);
}

View file

@ -1357,7 +1357,7 @@ llvmpipe_free_memory(struct pipe_screen *pscreen,
#if DETECT_OS_LINUX
struct llvmpipe_screen *screen = llvmpipe_screen(pscreen);
if (mem->fd) {
if (mem->fd >= 0) {
mtx_lock(&screen->mem_mutex);
util_vma_heap_free(&screen->mem_heap, mem->offset, mem->size);
mtx_unlock(&screen->mem_mutex);
@ -1415,8 +1415,7 @@ llvmpipe_resource_alloc_udmabuf(struct llvmpipe_screen *screen,
struct pipe_memory_allocation *data =
mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_SHARED, mem_fd, 0);
if (!data)
if (data == MAP_FAILED)
goto fail;
alloc->mem_fd = mem_fd;
@ -1486,6 +1485,9 @@ llvmpipe_import_memory_fd(struct pipe_screen *screen,
bool dmabuf)
{
struct llvmpipe_memory_allocation *alloc = CALLOC_STRUCT(llvmpipe_memory_allocation);
if (!alloc)
return false;
alloc->mem_fd = -1;
alloc->dmabuf_fd = -1;
#if defined(HAVE_LIBDRM) && defined(HAVE_LINUX_UDMABUF_H)
@ -1596,9 +1598,13 @@ llvmpipe_resource_bind_backing(struct pipe_screen *pscreen,
if (!lpr->backable)
return false;
if ((lpr->base.flags & PIPE_RESOURCE_FLAG_SPARSE) && offset < lpr->size_required) {
if (lpr->base.flags & PIPE_RESOURCE_FLAG_SPARSE) {
#if DETECT_OS_LINUX
struct llvmpipe_memory_allocation *mem = (struct llvmpipe_memory_allocation *)pmem;
if (offset >= lpr->size_required)
return false;
if (mem) {
if (llvmpipe_resource_is_texture(&lpr->base)) {
mmap((char *)lpr->tex_data + offset, size, PROT_READ|PROT_WRITE,
@ -1618,9 +1624,11 @@ llvmpipe_resource_bind_backing(struct pipe_screen *pscreen,
MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS, -1, 0);
}
}
#endif
return true;
#else
return false;
#endif
}
addr = llvmpipe_map_memory(pscreen, pmem);

View file

@ -1236,44 +1236,6 @@ spec@nv_texture_env_combine4@nv_texture_env_combine4-combine,Fail
spec@oes_texture_float@oes_texture_float half,Fail
# Remaining fallout from 9d359c6d10adb1cd2978a0e13714a3f98544aae8
spec@arb_texture_compression@fbo-generatemipmap-formats,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA NPOT,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@gen-compressed-teximage,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT NPOT,Fail
# uprev Piglit in Mesa
spec@!opengl 1.1@teximage-scale-bias,Fail
spec@glsl-1.10@execution@glsl-fs-texture2d-mipmap-const-bias-01,Fail

View file

@ -1280,44 +1280,6 @@ spec@nv_texture_env_combine4@nv_texture_env_combine4-combine,Fail
spec@oes_texture_float@oes_texture_float half,Fail
# Remaining fallout from 9d359c6d10adb1cd2978a0e13714a3f98544aae8
spec@arb_texture_compression@fbo-generatemipmap-formats,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA NPOT,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@gen-compressed-teximage,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT NPOT,Fail
# uprev Piglit in Mesa
spec@!opengl 1.1@teximage-scale-bias,Fail
spec@glsl-1.10@execution@glsl-fs-texture2d-mipmap-const-bias-01,Fail

View file

@ -778,78 +778,6 @@ dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_nearest_repeat,Fail
dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_nearest_mirror,Fail
dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_nearest_clamp,Fail
# Remaining fallout from 9d359c6d10adb1cd2978a0e13714a3f98544aae8
spec@arb_texture_compression@fbo-generatemipmap-formats,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGB NPOT,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA,Fail
spec@arb_texture_compression@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA NPOT,Fail
spec@ati_texture_compression_3dc@fbo-generatemipmap-formats,Fail
spec@ati_texture_compression_3dc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_3DC_ATI,Fail
spec@ati_texture_compression_3dc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_3DC_ATI NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT NPOT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_LATC1_EXT,Fail
spec@ext_texture_compression_latc@fbo-generatemipmap-formats@GL_COMPRESSED_LUMINANCE_LATC1_EXT NPOT,Fail
spec@ext_texture_compression_rgtc@compressedteximage gl_compressed_red_green_rgtc2_ext,Fail
spec@ext_texture_compression_rgtc@compressedteximage gl_compressed_signed_red_green_rgtc2_ext,Fail
spec@ext_texture_compression_rgtc@compressedteximage gl_compressed_signed_red_rgtc1_ext,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RED_RGTC1,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RED_RGTC1 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RG_RGTC2,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats-signed@GL_COMPRESSED_SIGNED_RG_RGTC2 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED_RGTC1,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RED_RGTC1 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG_RGTC2,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG_RGTC2 NPOT,Fail
spec@ext_texture_compression_rgtc@fbo-generatemipmap-formats@GL_COMPRESSED_RG NPOT,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_rgba_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt3_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_alpha_s3tc_dxt5_ext,Fail
spec@ext_texture_compression_s3tc@compressedteximage gl_compressed_srgb_s3tc_dxt1_ext,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGBA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_compression_s3tc@fbo-generatemipmap-formats@GL_COMPRESSED_RGB_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_compression_s3tc@gen-compressed-teximage,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT NPOT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT,Fail
spec@ext_texture_srgb@fbo-generatemipmap-formats-s3tc@GL_COMPRESSED_SRGB_S3TC_DXT1_EXT NPOT,Fail
# uprev Piglit in Mesa
spec@!opengl 1.1@teximage-scale-bias,Fail
spec@ext_framebuffer_multisample@accuracy all_samples color depthstencil linear,Fail

View file

@ -947,6 +947,32 @@ r300_set_framebuffer_state(struct pipe_context* pipe,
util_framebuffer_init(pipe, state, r300->fb_cbufs, &r300->fb_zsbuf);
util_copy_framebuffer_state(r300->fb_state.state, state);
/* DXTC blits require that blocks are 2x1 or 4x1 pixels, but
* pipe_surface_width sets the framebuffer width as if blocks were 1x1
* pixels. Override the width to correct that.
*/
if (state->nr_cbufs == 1 && state->cbufs[0].texture &&
state->cbufs[0].format == PIPE_FORMAT_R8G8B8A8_UNORM &&
util_format_is_compressed(state->cbufs[0].texture->format)) {
struct pipe_framebuffer_state *fb =
(struct pipe_framebuffer_state*)r300->fb_state.state;
const struct util_format_description *desc =
util_format_description(state->cbufs[0].texture->format);
unsigned width = u_minify(state->cbufs[0].texture->width0,
state->cbufs[0].level);
assert(desc->block.width == 4 && desc->block.height == 4);
/* Each 64-bit DXT block is 2x1 pixels, and each 128-bit DXT
* block is 4x1 pixels when blitting.
*/
width = align(width, 4); /* align to the DXT block width. */
if (desc->block.bits == 64)
width = DIV_ROUND_UP(width, 2);
fb->width = width;
}
/* Remove trailing NULL colorbuffers. */
while (current_state->nr_cbufs && !current_state->cbufs[current_state->nr_cbufs-1].texture)
current_state->nr_cbufs--;

View file

@ -201,6 +201,7 @@ void r600_draw_rectangle(struct blitter_context *blitter,
rctx->b.set_vertex_buffers(&rctx->b, 1, &vbuffer);
util_draw_arrays_instanced(&rctx->b, R600_PRIM_RECTANGLE_LIST, 0, 3,
0, num_instances);
pipe_resource_reference(&buf, NULL);
}
static void r600_dma_emit_wait_idle(struct r600_common_context *rctx)

View file

@ -14,6 +14,7 @@
#include "util/u_memory.h"
#include "util/u_pack_color.h"
#include "util/u_surface.h"
#include "util/u_resource.h"
#include "util/os_time.h"
#include "frontend/winsys_handle.h"
#include <errno.h>
@ -442,7 +443,7 @@ static bool r600_texture_get_param(struct pipe_screen *screen,
switch (param) {
case PIPE_RESOURCE_PARAM_NPLANES:
*value = 1;
*value = util_resource_num(resource);
return true;
case PIPE_RESOURCE_PARAM_STRIDE:

View file

@ -20,6 +20,16 @@ AluGroup::AluGroup()
m_free_slots = has_t() ? 0x1f : 0xf;
}
void
AluGroup::apply_add_instr(AluInstr *instr)
{
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
m_has_pred_update |= instr->has_alu_flag(alu_update_exec);
assert(!(m_has_kill_op && m_has_pred_update));
}
bool
AluGroup::add_instruction(AluInstr *instr)
{
@ -32,17 +42,13 @@ AluGroup::add_instruction(AluInstr *instr)
ASSERTED auto opinfo = alu_ops.find(instr->opcode());
assert(opinfo->second.can_channel(AluOp::t, s_chip_class));
if (add_trans_instructions(instr)) {
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
apply_add_instr(instr);
return true;
}
}
if (add_vec_instructions(instr) && !instr->has_alu_flag(alu_is_trans)) {
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
apply_add_instr(instr);
return true;
}
@ -51,9 +57,7 @@ AluGroup::add_instruction(AluInstr *instr)
if (s_max_slots > 4 && opinfo->second.can_channel(AluOp::t, s_chip_class) &&
add_trans_instructions(instr)) {
instr->set_parent_group(this);
instr->pin_dest_to_chan();
m_has_kill_op |= instr->is_kill();
apply_add_instr(instr);
return true;
}
@ -128,6 +132,8 @@ AluGroup::add_trans_instructions(AluInstr *instr)
* make sure the corresponding vector channel is used */
assert(instr->has_alu_flag(alu_is_trans) || m_slots[instr->dest_chan()]);
m_has_kill_op |= instr->is_kill();
m_has_pred_update |= instr->has_alu_flag(alu_update_exec);
m_slot_assignemnt_order[m_next_slot_assignemnt++] = 4;
return true;
}
@ -170,19 +176,14 @@ AluGroup::add_vec_instructions(AluInstr *instr)
if (!m_slots[preferred_chan]) {
if (instr->bank_swizzle() != alu_vec_unknown) {
if (try_readport(instr, instr->bank_swizzle())) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = preferred_chan;
return true;
}
} else {
for (AluBankSwizzle i = alu_vec_012; i != alu_vec_unknown; ++i) {
if (try_readport(instr, i)) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = preferred_chan;
if (try_readport(instr, i))
return true;
}
}
}
} else {
auto dest = instr->dest();
@ -209,23 +210,17 @@ AluGroup::add_vec_instructions(AluInstr *instr)
sfn_log << SfnLog::schedule << "V: Try force channel " << free_chan << "\n";
dest->set_chan(free_chan);
if (instr->bank_swizzle() != alu_vec_unknown) {
if (try_readport(instr, instr->bank_swizzle())) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = free_chan;
if (try_readport(instr, instr->bank_swizzle()))
return true;
}
} else {
for (AluBankSwizzle i = alu_vec_012; i != alu_vec_unknown; ++i) {
if (try_readport(instr, i)) {
m_has_kill_op |= instr->is_kill();
m_slot_assignemnt_order[m_next_slot_assignemnt++] = free_chan;
if (try_readport(instr, i))
return true;
}
}
}
}
}
}
return false;
}
@ -318,6 +313,9 @@ AluGroup::try_readport(AluInstr *instr, AluBankSwizzle cycle)
else if (dest->pin() == pin_group)
dest->set_pin(pin_chgr);
}
m_has_kill_op |= instr->is_kill();
m_has_pred_update |= instr->has_alu_flag(alu_update_exec);
m_slot_assignemnt_order[m_next_slot_assignemnt++] = preferred_chan;
return true;
}
return false;

Some files were not shown because too many files have changed in this diff Show more