Only create nir_load_rt_arg_scratch_offset_amd if needed.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35069>
radv_cmd_buffer_upload_alloc_aligned is used with alignment=0, which
guarantees that the alignment is at least 4.
Fixes: 9e16ed7a13 - ac/nir: switch nir_load_smem_amd uses to ac_nir_load_smem wrapper
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37345>
Usually wave64 performs better for fragment shaders,
because LDS sharing for interpolation is better.
But the rt traversal loop divergence is likely high enough to make
wave32 better on GFX10.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37360>
If the application really thinks it needs pswave32, let it use it.
Fragment shaders also have no concept of full subgroups, so the existing
code that chooses the subgroup size will work already.
For pre raster stages, we cannot allow this because of potential mismatches
in merged stages.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37360>
Having a list of all enabled/used extensions in meson allows us to get
rid of a lot of boilerplate in every bvh build shader.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35326>
These two structs are allowed to be in pNext and they should match
the primary command buffer info.
Found while implementing a new extension.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37278>
This means we can actually implement varying subgroup size correctly.
It also means that we implement the implicit SPIR-V 1.6 full subgroups
requirement in compute shaders with cswave32/rtwave32.
In the future it will also allow more optimizations that use the subgroup size.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
The only somewhat complex case here is GFX10 geometry shaders, if gewave32 is
used. We then only know the subgroup size when is_ngg is decided, as legacy
GS doesn't support wave32.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37294>
In case the source or destination were previously written
through L2, we need to writeback L2 to avoid the CP DMA accessing
stale data.
However, as the CP DMA doesn't write L2 either, an invalidation
is also needed to make sure other clients don't access stale data
when they read it through L2 after the CP DMA is complete.
Doing an invalidation before the CP DMA operation should take
care of both.
Additionally, radv_src_access_flush also invalidates L2 before
the copied data can be read.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36820>
VirtIO devices can be configured as platform devices instead of PCI,
which is especially common in microVM projects like libkrun.
Let's allow RADV to probe MMIO virtgpu devices.
Signed-off-by: Val Packett <val@invisiblethingslab.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37281>
The border color index must be captured and returned to the app,
otherwise it's broken if samplers aren't replayed in-order.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37291>
The full HiZ workaround is the only one that fixes the issue reliably.
Sadly, the performance results are mixed and sometimes it hurts. To
maintain performance, RADV will opt-in by selecting which workarounds
to apply from drirc.
As we can't know the full list of games that will be affected by a
potential performance regression, users are encouraged to try with
RADV_GFX12_HIZ_WA (see Mesa documentation for more explanations).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36174>
This is actually wrong because if radv_handle_fbfetch_output() triggers
a decompression pass and graphics shaders (ESO) are saved/restored
they won't be updated because radv_bind_graphics_shaders() was called
before.
This fixes a very recent regression that I noticed while implementing
a new extension.
This reverts commit 9b912f00c7.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37194>
RADV_DYNAMIC_VERTEX_INPUT_BINDING_STRIDE doesn't emit any context
registers, so it can be excluded for the late scissor workaround to
avoid re-emitting scissors all the time it's dirty.
This fixes a performance regression noticed with Cyberpunk on Vega10,
but other games are likely affected too. The late scissor workaround is
only applied on Raven/Vega10.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13828
Fixes: d7f401c2bb ("radv: bind the vertex binding strides like a normal dynamic state")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37252>