From the Vulkan spec:
`If pColorAttachmentLocations is NULL, it is
equivalent to setting each element to its index
within the array.`
Use similar logic to what we do in
CmdSetRenderingInputAttachmentIndices to handle
this behaviour properly.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35948>
This is a cleanup.
Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]
New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]
The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.
The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.
The ngg_lds_layout SGPR is now unused without GS in RADV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This looks like a hw bug on GFX10.3-GFX11.5 because RB+ seems to only
work as expected when all channels (RGBA) are written. With that format,
RGB channels must be all set or unset but setting the A channel is
legal so far.
This will reduce rendering performance with that format but it's the
less intrusive solution for now. This might be revisited in the near
future, also with more VKCTS coverage.
This has been tested and verified on GFX10.3 (NAVI21) and GFX11
(NAVI31) and GFX12 (NAVI48), unfortunately I don't have GFX11.5 but
let's assume it's broken there too.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13371
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35631>
This is a prerequisite for NGG lowering passes to return LDS vertex and
scratch sizes, which will lead to further simplifications. That will
require calling gfx10_get_ngg_info after radv_postprocess_nir, which means
LDS offsets are unknown when the passes are called.
This makes the 2 values no longer compile-time constants.
A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will
determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from
an SGPR, though that could be removed too (non-trivially) or handled as
a relocation.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
Only GFX10+ can support 1x user sample locations, but MSAA_ENABLE
needs to be enabled.
Fixes new VKCTS coverage dEQP-VK.pipeline.*samples_1*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35492>
DGC doesn't support multiview. The Vulkan spec says:
"VUID-vkCmdExecuteGeneratedCommandsEXT-None-11062
If a rendering pass is currently active, the view mask must be 0."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35342>
This improves write throughput for TCS outputs. It follows the same idea
as attribute stores in hw GS. The improvement is easily measurable with
a microbenchmark.
It also has the advantage that multiple output stores to the same address
don't result in multiple memory stores. Each output components gets only
one memory store at the end of the shader.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
One is only used by TCS, the other is only used by TES.
Use the same field for both, call it PATCH_VERTICES_IN.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
Loosely based on RadeonSI.
This is supposed to be faster because parsing the packet header seems
to be the main bottleneck on GFX12.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35282>
On GFX11+, DISABLE_FOR_AUTO_INDEX=1 automatically disables primitive
restart enable for non-indexed draws.
On GFX10-GFX10.3 the hw considers primitive restart enable for
non-indexed draws and the driver must disable it explicitly.
GFX9 and older gens aren't affected but applying the change for them
simplifies the implementation.
To fix that, move emitting primitive restart enable at draw time
because it needs to know if the draw is indexed or not.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13037
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34996>
Similar to indirect draw barrier, need similar fixups for conditional
rendering access.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34956>
When the hardware doesn't natively support 32-bit predication, the
driver has a fallback which allocates a 64-bit predicate to the upload
BO in order to copy the original value.
But when conditional rendering is enabled in the stateCommandBuffer
which is used by preprocess() and the execute() is recorded also in the
stateCommandBuffer. If the preprocess() is recorded in a different
cmdbuf which is submitted before the cmdbuf that contains execute(),
the fallback (ie. alloc + COPY_DATA) will be performed after. This would
cause the predicate value to be always 0.
To fix that, keep track of the user predication VA which is the only
VA that needs to be used by DGC because it reads 32-bit from the shader.
This fixes a very weird corner case with vkd3d-proton.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13143
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34953>
In a scenario where the viewports/scissors are a dynamic state but the
count is static (ie. updated when a graphics pipeline is bound), the
driver wasn't considering that and it was re-emitting the previous
number of viewports/scissors.
This fixes rendering issue with Blender.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13127
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34921>
CmdTraceRays is neither a dispatch or a draw command which means it
shouldn't be affected by conditional rendering.
Fixes recent VKCTS coverage.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34868>
The src and dst queue family indices in an image memory barrier may
contain arbitrary values that can be ignored unless both are different.
This fixes a crash in upcoming CTS tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34691>
Recently, I renamed most of the helpers for future work but I forgot
few things like meta keys, etc.
This is for consistency.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34558>
CP is very slow on GFX12 and parsing the packet header is the main
bottleneck. Using paired context regs reduce the number of packet
headers and it should be more optimal.
It doesn't seem worth when only one context reg is emitted (one packet
header and same number of DWORDS) or when consecutive context regs are
emitted (would increase the number of DWORDS).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421>