This prints the swizzle pattern for all non-XOR tiling modes.
It can be used to determine which GPUs have the same tiling.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41405>
This can only happen with RADV_DEBUG=fullsync which literally flushes
all caches, but INV_ICACHE is invalid with RELEASE_MEM apparently.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41396>
To verify that some GPUs are compatible and that shader binaries can be
shared to avoid precompiling twice for SteamOS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41346>
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`
There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used
Clarify that the parent here is where the src's def is used, not where
it's defined.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
addrlib has an extra optimization for memcpy with HIC, there are two
modes:
- blockMemcpy: chip-specific layout but better performance overall
- hybridMemcpy: chip-agnostic
Because matching UUIDs doesn't matter on desktop, use the block memcpy
by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41019>
The vertex input state can be NULL if rasterization is disabled with
dynamic vertex inputs.
The input assembly state can be NULL if rasterization is disabled
and both states are dynamic (primive topology and primitive restart
enable).
This fixes a segfault with gpu-ratemeter vk_dyn.prim
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41335>
This has a bit of sorting overhead, but can significantly increase BVH
quality especially in big BVHs. gfx12 is faster at intersecting, so only
enable for gfx11 and earlier right now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41300>
This property is unrelated to the CTS conformance process from Khronos,
it just means that the driver passes that CTS version, even if not
"officially" conformant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41258>
Even if they are in the same block, we might still need to move the
source instructions if they are otherwise after our insert location.
This can happen in the case where we insert strict_wqm_coord before
terminate_if.
Fixes: ac33f82d54 ("ac/nir/lower_tex_coords: move input loads instead of cloning them")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41336>
nir_build_frag_coord generates the correct sysval loads based on NIR
options. nir_load_frag_coord shouldn't be used directly because drivers
don't have to support it.
v2: RADV can't use it because nir->options isn't set, so use load_pixel_coord.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>
We used this for two different purposes with different caching
requirements:
- it's always needed for LLVM, and needs to be part of the cache key
- it's needed for disassembly with ACO, and shouldn't be part of the cache
key
Eventually, we'll want the family to only be part of the cache key if LLVM
is used, but still accessable for when ACO needs the disassembler.
If we put it in radv_compiler_info::debug, we'll need to treat that
specially to hash it into the key when LLVM is used.
If we put it in radv_compiler_info::key, that will hash it into the key
unnecessarily if ACO is used and disassembly might be needed.
So just put the family in both, and use debug::family for disassembly and
key::family for LLVM.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41261>
This just separates tex coord lowering into a new pass.
The gfx_level parameter is now unused in ac_nir_lower_image_tex, but I'm
keeping it because it will be used in the future.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41173>
The expand was considering only the first sample, very old bug.
This fixes test_{copy,compute}_queue_depth_stencil_msaa from
vkd3d-proton on GFX11-GFX11.7 GPUs. Older GPUs don't support image
stores with depth/stencil MSAA images.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41267>
This contains new tests for DGC+multiview which are valid in DX12
but invalid in Vulkan, unless RADV allows support for it. Important
to have coverage for us because it's used for Crimson Desert.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41193>
libdrm splits the HIGH address space in two equal parts for GPUs that
are affected by the SMEM loads with NULL PRT page.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38698>
In amdvgpu_bo_free(), when the reference count drops to 0, vdrm_flush()
is called before removing the bo from the handle_to_vbo hash table.
Since vdrm_flush() is a time-consuming operation and is executed outside
of the handle_to_vbo_mutex lock, another thread calling amdvgpu_bo_import()
can concurrently find this bo in the hash table, increment its refcount,
and attempt to use it. Once vdrm_flush() finishes, amdvgpu_bo_free()
proceeds to remove the bo and call free(), leaving the importing thread
with a dangling pointer, which leads to a use-after-free or double free
crash.
To fix this race condition, we must remove the bo from the hash table
under the lock first. After the bo is safely unlinked and the lock is
released, we can then perform the time-consuming vdrm_flush() and the
actual memory release.
Signed-off-by: zhaqian <zhaqian@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41146>