libdrm splits the HIGH address space in two equal parts for GPUs that
are affected by the SMEM loads with NULL PRT page.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38698>
In amdvgpu_bo_free(), when the reference count drops to 0, vdrm_flush()
is called before removing the bo from the handle_to_vbo hash table.
Since vdrm_flush() is a time-consuming operation and is executed outside
of the handle_to_vbo_mutex lock, another thread calling amdvgpu_bo_import()
can concurrently find this bo in the hash table, increment its refcount,
and attempt to use it. Once vdrm_flush() finishes, amdvgpu_bo_free()
proceeds to remove the bo and call free(), leaving the importing thread
with a dangling pointer, which leads to a use-after-free or double free
crash.
To fix this race condition, we must remove the bo from the hash table
under the lock first. After the bo is safely unlinked and the lock is
released, we can then perform the time-consuming vdrm_flush() and the
actual memory release.
Signed-off-by: zhaqian <zhaqian@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41146>
Currently the code handling deferred RTA clears cannot handle them for
secondary command buffers within render passes, because the code
immediately configures the transfer command for the deferred clear
operation, but the specific attachment image view isn't known when
recording secondary command buffers to be executed inside render passes.
Add code to record parameters for deferred RTA clears in secondary
command buffers when the attachment is unknown, and bind the recorded
clears to the attachment's image view when executing the secondary
command buffer inside a render pass.
Fixes many dynamic rendering random tests.
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40838>
The code that adds deferred RTA clear transfer commands checks whether
the newly allocated transfer command is NULL. However the list_addtail
call is before the check, which means that the check does not prevent
NULL dereference.
Reorder the code to ensure no NULL transfer commands would ever be added
to the deferred clear list.
In addition, pvr_transfer_cmd_alloc() has already set the command
buffer's error status when it returns NULL, so it's not needed to set it
again.
Fixes: 2eabbbe57d ("pvr: use linked list to back deferred clears")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40838>
For 2D array views of 3D images, the layer of the view corresponds to
the depth (instead of the layer, which should be always 0) of the image.
Fix the code emitting deferred RTA clears to set the depth instead of
the layer of the image to clear.
Fixes the flakiness of `dEQP-VK.renderpasses.renderpass*.
remaining_array_layers.multi_layer_fb.*`.
Fixes: 95820584d0 ("pvr: Add deferred RTA clears for cores without gs_rta_support.")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40838>
Deferred RTA clear will happen after the current graphics subcommand is
executed, which may override rendered image in the graphics subcommand.
In addition, the active render targets do not need "emulated" clear --
they can be really cleared by drawing rectangles.
Skip set up deferred RTA clear for active render target layers, and
continue to do immediate clear for these layers.
Fixes a few dynamic rendering random CTS tests, but the issue should
also exist in legacy renderpasses RTA clears.
Fixes: 95820584d0 ("pvr: Add deferred RTA clears for cores without gs_rta_support.")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40838>
While testing HW without gs_rta_support it was raised that this
change had been made in error. After retesting with the change
reverted the listed tests still pass.
This reverts commit d68344bffe.
Backport-to: 26.0
Reported-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40838>
When maxPerStageResources is less than 128, it must be at least the sum
of maxPerStageDescriptorUniformBuffers,
maxPerStageDescriptorStorageBuffers, maxPerStageDescriptorSampledImages,
maxPerStageDescriptorStorageImages,
maxPerStageDescriptorInputAttachments and maxColorAttachments.
As maxPerStageDescriptorStorageBuffers is previously increased, the
value of maxPerStageResources should be increased too.
This fixes regression on two limit validation tests in the Vulkan CTS --
dEQP-VK.info.device_properties and dEQP-VK.api.info.
vulkan1p2_limits_validation.general .
Fixes: 35f57a2739 ("pvr: increase value of maxPerStageDescriptorStorageBuffers")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41270>
According to ubsan shifting an int32_t by 31 bits to the left is undefined
behavior. So just declare it as uint32_t.
Backport-to: 26.1
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41252>
Otherwise we'd get tracepoints out of logical order, which doesn't
matter for perfetto at the moment, but would matter with future
perf warnings.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41102>
The last graphics job, which might write to the occlusion query result,
could still be running when vkCmdCopyQueryPoolResults is called.
Additionally wait for graphics jobs before copying the results.
Fixes: 24b1e3946c ("pvr: Add support to submit occlusion query sub cmds.")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40884>
Apparently CP_CCHE_INVALIDATE is just a plain register write underneath,
so it needs WFI before it, in order to invalidate at the right point.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41266>
Apparently CP_CCHE_INVALIDATE is just a plain register write underneath,
so it needs WFI before it, in order to invalidate at the right point.
```
CP_CCHE_INVALIDATE:
mov $addr, 0x9881
mov $data, 0x1
waitin
mov $01, $data
```
Fixes misrendering in Doom Eternal on A750.
Fixes: fb1c3f7f5d ("tu: Implement CCHE invalidation")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41266>
The intention of the original commit was to make all the shaders report
the same max_dispatch_width. When CS has multiple variants, this was
not happening as expected.
Fixes: 2acc2f18ea ("intel/compiler: report max dispatch width statistic")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41209>
When reloading live-out values along loop back-edges, we make sure to
reuse the original register. However, we failed to detect cases where
the spilled value got reloaded earlier for a src in a different
register. Fix this by reloading the value again in the original
register.
Fixes a RA validation failure in Windrose.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41262>
Expose VK_KHR_shader_integer_dot_product 4x8-bit packed dot
products using native HW instructions v8dot and setnnmode.
QPU instruction count for sdot_4x8_iadd compute shader:
Before (scalar decomposition): 18 ALU cycles
After (setnnmode + v8dot): 3 ALU cycles (6x)
We advertise integerDotProduct4x8BitPacked*Accelerated for V3D 7.1+
Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
This new VIR optimization pass tracks the current NN signedness
mode per block and removes duplicate setnnmode instructions.
When consecutive dot products use the same signedness mode, the backend
emits one setnnmode per dot product. This pass removes the redundant
ones, keeping only the first.
Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>