The previous fix was incomplete because if the same graphics pipeline
and the same PS epilog are rebind after vkCmdExecuteCommands(), the PS
epilog state wouldn't be re-emitted, and it will use a wrong VA (in case
both fragment shader user SGPRs aren't similar either).
Resetting the PS epilog to NULL in the primary should prevent any
issues, but this tracking still need to be improved because it caused
two issues recently.
Fixes: 1a00587c44 ("radv: fix a GPU hang with PS epilogs and secondary command buffers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15176
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a73fc90bcd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Remove the TU_DEBUG_FLUSHALL option that was force-enabled for a8xx chips.
The problematic CTS cases that required it were failing due to indirect
draw commands sourcing draw data from buffers whose content was prepared
by compute tasks.
Up until a8xx, firmware was managing an implicit wait before any indirect
draw parameters were read, with a delayed CP_WAIT_FOR_ME emitted only when
necessary or on devices enabling indirect_draw_wfm_quirk due to bugged
firmware. That implicit wait is gone on a8xx, so CP_WAIT_FOR_ME should be
emitted immediately, which also matches behavior of the proprietary driver.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
(cherry picked from commit 9931034dca)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
The post-processing infrastructure is showing it's age; it's written
using TGSI, which has been on the way out for a long time. There's also
few actually useful filters in there, and there are better tools out
there to inject shaders into applications.
Let's mark this as deprecated, so we can delete it in the future. Having
a deprecation period makes it easier for any potential users to find
alternatives in a timely matter.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
StackSizePerRay is the RTDispatchGlobals::AsyncStackSize and
DisableRTGlobalsKnownValues is to interpret how many Max BVH levels we
need to use. It's not relevant to Vulkan, since we have just 2 fixed BVH
levels.
Fixes: cb423ee6 ("anv: Fix Wa_14021821874, Wa_14018813551, Wa_14026600921")
Fixes: c1a44e8d ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 7a627fa8f3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
I ran into this case where genX(cmd_buffer_emit_bt_pool_base_address)
was returning immediately because it considered an RCS engine
emulating a compute queue as neither a render nor a compute queue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d581b7282b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Probably worked because we could always reach to things through the
binding table and the index was the same.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 3256fab5a3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
For some reason the android builders started noticing...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f6306198d0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
The Vulkan spec states that if VK_KHR_shader_clock is supported,
shaderSubgroupClock is a required feature. So let's not enable that
extension unless we can...
Fixes: e9c2c32409 ("panvk: enable VK_KHR_shader_clock")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit c8ae72f51d)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
The Vulkan spec states that if VK_ARM_shader_core_builtins is supported,
shaderCoreBuiltins is a required feature. So let's not enable that
extension unless we can...
Fixes: dff1d91c64 ("panvk: Enable VK_ARM_shader_core_builtins")
Reviewed-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 8cb89853b8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
This hasn't been supported since the TGSI envvar was ripped out. When
converted to NIR, we don't see these instructions at all.
Fixes: c3cbe610df ("nouveau: Delete the NV50_PROG_USE_TGSI env var.")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
(cherry picked from commit b062062430)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
We can't expose large texel-buffers if we don't emit the high bits.
Whoopsie!
Fixes: 4db7958edc ("pan/bi: Change texel buffer limits")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 57a80ff78c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
A few cases of UINT32_MAX were missed, whoops.
Fixes: c2c91e78fd ("pan/layout: Allow bigger size/surface stride on v12+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
(cherry picked from commit 69b8372fbf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
This change was tested on plam and cayman. Here are the tests fixed:
spec/arb_gl_spirv/execution/uniform/atomic-uint-aoa-cs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-aoa-fs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-array-cs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-array-fs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-cs: fail pass
spec/arb_gl_spirv/execution/uniform/atomic-uint-fs: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 0deac18581)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
This change adds a minimal support for gl_PointSize to
be used alongside gl_ClipDistance/gl_CullDistance.
This change was tested on palm and cayman. Here is the test fixed:
khr-gl4[5-6]/gl_spirv/spirv_validation_builtin_variable_decorations_test: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 032a2bdc1e)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
The atomic offset implementation was incomplete.
This change was tested on cayman, it fixes all the
variants of this test:
khr-gl4[2-6]/shader_atomic_counters/advanced-usage-multi-stage: fail pass
khr-gles31/core/shader_atomic_counters/advanced-usage-multi-stage: fail pass
Fixes: 06993e4ee3 ("r600: add support for hw atomic counters. (v3)")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 48902771ad)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
This change is inspired by b56f47611a ("radeonsi: fix
alpha-to-coverage + alpha-to-one used together for
gfx6-10.3") and implements the same algorithm.
This change was tested on rv770, palm and cayman. Here are the tests fixed:
spec/arb_framebuffer_object/execution/msaa-alpha-to-coverage_alpha-to-one: fail pass
spec/arb_framebuffer_object/execution/msaa-alpha-to-coverage_alpha-to-one_write-z: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
(cherry picked from commit 7513f48edf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Previously the output length of pvr_get_driver_build_sha() is changed to
BUILD_ID_EXPECTED_HASH_LENGTH, but the array defined to receive its
output, the driver_build_sha array inside struct pvr_instance, is
declared with BLAKE3_KEY_LEN, which is longer than
BUILD_ID_EXPECTED_HASH_LENGTH.
This leads to uninitialized memory being accessed when creating
pipelineCacheUUID value, and the pipelineCacheUUID value would become
random in each run, defecting the purpose of it.
Refactor the code copying the build ID to follow other drivers: changing
the parameter from the buffer to the instance pointer, inserting a
static assert thanks to being able to retrieve the destination buffer
length (as an array inside the instance structure) and using
copy_build_id_to_sha1() to do the final copy.
Fixes: 6a42493c94 ("pvr: Use BUILD_ID_EXPECTED_HASH_LENGTH")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
(cherry picked from commit 9870c8d8c4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Add the missing layout which do not need implemented anything in
mali gpu.
Fixed: dEQP-VK.image.host_image_copy.properties.properties
unifiedImageLayouts feature is supported, but layout
VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL was not included in
VkPhysicalDeviceHostImageCopyProperties::pCopySrcLayouts.
Fixes: 1cd61ee ("panvk: implement VK_EXT_host_image_copy for linear color images")
Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
(cherry picked from commit 62e7120384)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Fixes the following building errors:
../src/amd/vulkan/radv_rra.c:1369:43: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_bvh_stats_gfx12 stats = {};
^
../src/amd/vulkan/radv_rra.c:1376:45: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_bvh_stats_gfx10_3 stats = {};
^
2 errors generated.
Fixes: 8c10eab1 ("radv: Add an option for dumping BVH stats")
(cherry picked from commit dfca417db8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Fixes the following building errors:
../src/amd/vulkan/radv_shader.c:3460:42: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_shader_debug_info debug = {};
^
1 error generated.
../src/amd/vulkan/radv_shader_args.c:975:43: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct user_sgpr_info user_sgpr_info = {};
^
1 error generated.
Fixes: 480a94fb ("radv: Gather debug info about shader args")
(cherry picked from commit 46d396d9d8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
With autotune allocating counters low-to-high, the conflict with
PERFORMANCE_QUERY_KHR will happen if any CP-based counters are
used. This is a temporary workaround which just drops the first
two CP counters from being usable for performance queries.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
(cherry picked from commit 78e2bbc70f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
This is more consistent with the newly established pattern of the
UMD allocating all locally used performance counters low-to-high
instead of the prior high-to-low order.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
(cherry picked from commit f78541b765)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
The UMD will be switching to allocating counters from low-to-high,
so to avoid the chances of conflict with this new policy the PPS
driver now allocates the other way around. Additionally, this will
future proof it for the MSM-DRM uAPI for performance counters which
will similarly allocate from high-to-low.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
(cherry picked from commit 24849eef9f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
When preemption optimization is supported then the necessary CP
counters being missing causes a device initialization error which
is unnecessary as support can simply be disabled instead to allow
for a more graceful fail. This also fixes A8XX which doesn't have
performance counters hooked up yet.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
(cherry picked from commit a5ec9b7892)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Future kernel API for perfcounter management will likely be required for
a8xx and onwards. For a7xx and earlier, cmdstream-based selector and
counter register management is still supported.
Cc: mesa-stable
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
(cherry picked from commit c2708afbc7)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
If ac_drm_device_initialize returns -EACCES for the fd passed in.
A render node file description can't have DRM master status, which means
AMDGPU_CTX_PRIORITY_HIGH can't work without CAP_SYS_NICE (which
generally only the root user has).
Fixes: 8f30e90fc1 ("winsys/amdgpu: Prefer render node FD for ac_drm_device_initialize")
(cherry picked from commit 5cc3264b53)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Applications are required to set NonUniform if the resource is arrayed,
but with VK_DESCRIPTOR_MAPPING_SOURCE_HEAP_WITH_SHADER_RECORD_INDEX_EXT,
the resource is non-arrayed in the shader. So, it's technically not
required to set it. Although, the offset can vary per-lane and
NonUniform is implicit.
Backport-to: 26.1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 8e2869fa41)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
"size" is the allocated size of the array, not the number of immediates
actually used. We could wind up returning a too-large constlen, larger
than 512, and since the binning variant uses the non-binning variant's
constlen as it's max_const we could make binning variants use c512.x and
crash when encoding.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
(cherry picked from commit 49d29d4f10)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
We need to know the immediate count even after lowering, to compute the
overall const size. Previously we were using the capacity field, but
that's unreliable and won't be available once we switch to a real
dynamic array container instead of (poorly) reinventing one.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
(cherry picked from commit 280c64d720)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Some Max Payne 3 shaders are impacted by this and probably will fix some
issue there. The VK CTS isn't testing this, but it was verified to fix a
real problem by inserting 0 offsets into the instruction and having CTS
tests fail with the old ordering.
Totals from 3 (0.00% of 1163204) affected shaders:
CodeSize: 2496 -> 2736 (+9.62%)
Static cycle count: 732 -> 741 (+1.23%)
Fixes: ad01fbdda0 ("nak: Add a NIR texture lowering pass")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
(cherry picked from commit e09045e26c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41104>
Jay is under heavy development and is not considered released. It is
available in upstream Mesa for developers to hack on but is not part of
the 26.1 release. Add a comment acting like a chicken bit to fuse off the
compiler while minimizing conflicts with backports (which is why we don't remove
Jay wholesale from the release).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
With the previous commit ("ac/surface: Filter swizzle modes for VCN"),
only video-compatible swizzle modes will be picked, so we can enable
tiling for VCN2+.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40948>
This will allow compatible swizzle modes to be picked for RADV (radeonsi
filters modifiers when creating video surfaces).
This mirrors the logic from ac_modifier_supports_video, and in
addition ensures that XOR swizzle modes are disabled for image arrays
because VCN does not support slice indices.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40948>
Map X6R10X6G10X6B10X6A10_UNORM to the native R10X6G10X6B10X6A10X6_UNORM
HW format on PAN_ARCH >= 11 where it is supported.
Enable the extension with formatRgba10x6WithoutYCbCrSampler in the
physical device, allowing VK_FORMAT_R10X6G10X6B10X6A10X6_UNORM_4PACK16
to be used as a regular color format without YCbCr sampler conversion.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40653>