buffer_size is uint32_t so we must be careful to not overflow it.
radeonsi had code for this but radv doesn't, which means it will
hang if RADV_THREAD_TRACE_BUFFER_SIZE is too big or if buffer_size
is being doubled up to the point it overflows.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41383>
This advertizes both ffma and fmad on a couple of chipsets to support
VK_KHR_shader_fma and to improve OpenCL fma performance, where fma is not
optional and the emulation is more expensive than slow fma.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
This is now dead code. This code is not used by RADV.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41461>
FYI, ac_nir_lower_ps_early is only used by radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41461>
Otherwise, if a cmdbuf is recycled it would assume that a gang CS is
always is present even if it's not used. That means, it would emit
useless synchronization and use gang submit with a mostly empty gang
CS for nothing.
It seems better to create the gang CS on-demand only when it's strictly
required (for compute fallback with SDMA and task shaders). Even for
heavy uses of task shaders, that shouldn't hurt.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41543>
This splits the nir_move_to_top_input_loads option into 2 options. The latter
option is mainly for at_offset/at_sample loads. Then it updates most places to
use only the first option.
The rationale is that moving at_sample loads makes Control (game) shaders
worse, as per the code comment.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41167>
Since tied definitions no longer use precoloring, I don't think this is
needed anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41018>
If div_ceil(start.byte() + num_bytes, 4) != div_ceil(num_bytes, 4), like
v3b at byte=2.
Also, the non-const operator[] was unused and unsafe.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41018>
Some issues fixed:
- check that byte=0 for p_as_uniform/etc
- validate 5+ byte definitions
- fix v3b pseudo definition validation
- fix validation when the index of the definition is not 0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41018>
This can happen with v_cndmask_b32, if we were required to take the
sub-dword path in get_reg_simple().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41018>
Because SDMA doesn't support MSAA, it's possible to get there because
RADV fallback to compute queue in this case.
Some tests only pass because RDNA2 and older don't support image
stores with depth/stencil and MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41492>
We need to remove ddx/ddy before doing the cube lowering,
otherwise we insert instructions that break dominance.
Affects Sable.
Fixes: 7d552d71e9 ("ac/nir: optimize txd(coord, ddx/ddy(coord))")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41489>
When a VRS view is used with a depth/stencil view, the driver is
expected to copy the VRS rates to the HTILE buffer of the depth/stencil
view. Though if the image uses mipmaps and the base level can't support
HTILE there is no way to copy the rates. The workaround is to force VRS
to be 1x1 which is valid in Vulkan.
This fixes old VKCTS failures on RAPHAEL just because it supports
fragmentShadingRateWithShaderDepthStencilWrites compared to other GPUs
in CI (NAVI21/VANGOGH).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41427>
It's required with VK_KHR_maintenance11. This allows way more transfer
queue related CTS tests to run and all issues I found should already
be fixed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41316>
Small PS have their VGPR usage equal to the number of input VGPRs,
and this reduces it.
4 input VGPRs removed in most cases.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41226>