v_swap_b16 is not offically supported as VOP3, so it can't be used with v128-255.
Tests show that VOP3 appears to work correctly, but according to AMD that should
not be relied on.
https://github.com/llvm/llvm-project/pull/100442#discussion_r1703929676
Foz-DB Navi31:
Totals from 6 (0.01% of 79395) affected shaders:
Instrs: 64799 -> 65932 (+1.75%)
CodeSize: 360180 -> 368440 (+2.29%)
Latency: 1364648 -> 1365922 (+0.09%)
InvThroughput: 635843 -> 636475 (+0.10%)
Copies: 14766 -> 15698 (+6.31%)
VALU: 38743 -> 39675 (+2.41%)
Fixes: 80b8bbf0c5 ("aco/gfx11: use v_swap_b16")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30515>
(cherry picked from commit e0818cb87b)
Replicating what we do in genX_cmd_compute.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7ca5c84804 ("anv: add support for simple internal compute shaders")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>
(cherry picked from commit 4f093b2e2b)
Xe KMD will now report the different topology mask types based on the
type of the EU of running platform.
With this we don't need to divide the EU count by 2 in intel_perf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 132c5cdeb9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30522>
Sync xe_drm.h with f2881dfdaaa9 ("drm/xe/oa/uapi: Make bit masks unsigned").
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 3da911b092)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30522>
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.
But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.
Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.
Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>
(cherry picked from commit f8553f56ac)
The current code is not handling the potential NULL pointer in
VkDebugUtilsObjectNameInfoEXT::pObjectName
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e1b9a6e4f3 ("anv: initial RMV support")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30516>
(cherry picked from commit c6bf1f02c4)
this had a number of issues:
* pipe_loader_get_driinfo_xml() frees driver_driconf immediately,
except the driOptionCache struct string pointers are all just copied
in merge_driconf instead of having the strings copied, which means any
subsequent access of driver_driconf strings is invalid access
* pipe_loader_drm_get_driconf_by_name() is a disaster that only happened
to work because the dlopen here is the same lib that gets opened elsewhere
by mesa and not closed. if the lib here is actually closed, then all
the statically allocated strings become invalid, which means they need to
be manually copied
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30494>
(cherry picked from commit 0c220741e6)
We increased VS_EXPORT_COUNT to 8 for streamout in gfx10_shader_ngg,
but we forgot to increase the attribute ring stride, causing all waves
except the first one to get corrupted VS outputs.
Fixes: f703dfd1bb - radeonsi: add gfx12
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30503>
(cherry picked from commit 0e27df4521)
This fixes random GPU hangs on gfx12 due to incoherent indirect buffer data,
causing random indirect vertex and instance counts, which timeouts if
the random numbers are large.
Fixes: a8abbbb172 - radeonsi: remove r600_pipe_common.h
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30503>
(cherry picked from commit 83b88c54ba)
There was a case where if we don't sync, we wouldn't set TC_L2_dirty either,
which could cause problems later.
Fixes: f703dfd1bb - radeonsi: add gfx12
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30503>
(cherry picked from commit ebc5116e70)
I guess incorrect packet interrupts have been enabled, so this started hanging.
radeon_set_context_reg_seq shouldn't be used with gfx12_set_context_reg.
Fixes: f703dfd1bb - radeonsi: add gfx12
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30503>
(cherry picked from commit e4b3848fde)
Dota 2 and Counter-Strike 2 really want to be able to allocate memory
for both VkImages and VkBuffers from the same memory type. Xe2's
special compression-only memory type does not support buffers, which
makes these games crash. Disable CCS on these games as a workaround.
This is a temporary workaround as we're still working towards a
long-term solution (either by fixing the engine or finding a way
better expose our memory types).
Backport-to: 24.2
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11520
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11521
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>
(cherry picked from commit 644dcc0337)
These memory types are useless when CCS is disabled, don't leave them
there so they don't confuse applications.
Backport-to: 24.2
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>
(cherry picked from commit b4f5a04223)
Consolidate the various uses of CP_EVENT_WRITE into helpers, and use use
fd_gpu_event to manage the differences between a6xx and a7xx. This is a
bit churny as it spreading a fair bit of the CHIP template param around.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30304>
(cherry picked from commit 1f41d59059)
Our validation code doesn't need to know which bytes are accessed. It
only needs to know which grfs were accessed by an element. This also
helps to easily handle the Xe2 register size change.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
(cherry picked from commit 58469620d3)
Previously this check would create a mask of the bytes used in the
grf, and then shift the mask. This worked well when there was 32 bytes
in the register because a 64-bit uint64_t could easily detect that
bytes were used in the next regiter. (The next register was the high
32-bits of the `access_mask` variable.)
With Xe2, the register size becomes 64 bytes, meaning this strategy
doesn't work. Instead of a mask, we can just check to see if more than
1 grfs are used during each loop iteration. (Suggested by Ken.) This
will make it easier to extend for Xe2 in a follow on commit.
Verified this with
dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize
on Xe2, which otherwise would cause the program to fail to validate
because it assumed a grf was 32 bytes.
Backport-to: 24.2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>
(cherry picked from commit f2800deacb)
previously the size was checked at the top of the function, but this
ignored cases where the loader might end up resizing the drawable,
resulting in an attempted 0x0 swapchain creation based on stale
geometry
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30487>
(cherry picked from commit a6d97b0afe)
Transfer ops also use CCHE since they use the same path as
texture access.
This addresses the flakiness seen in
KHR-GL46.shader_storage_buffer_object.advanced-usage-sync-cs
CCHE wasn't being invalidated between the compute op and transfer
op which would sometimes lead to old/invalid data to be copied
in the transfer op.
Fixes: fb1c3f7f5d ("tu: Implement CCHE invalidation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11458
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30490>
(cherry picked from commit cf9588bae6)