Add a separate flush_bits field for tracking cache
flushes in the ACE internal cmdbuf.
In barriers and image transitions we add these flush bits to ACE.
Create a semaphore in the upload BO which makes it possible
for ACE to wait for GFX for the purpose of synchronization.
This is necessary when a barrier needs to block task shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This implements NV_mesh_shader draw calls with task shaders.
- On the GFX side:
DISPATCH_TASKMESH_GFX for all draws
- On the ACE side:
DISPATCH_TASKMESH_DIRECT_ACE for direct draws
DISPATCH_TASKMESH_INDIRECT_MULTI_ACE for indirect draws
Additionally, the NV_mesh_shader indirect BO layout is
incompatible with AMD HW, so we add a function that copies
that into a suitable layout.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This is mainly going to be used by task shaders, because
the HW implementation mismatches the API:
- In the API, task shaders are considered graphics shaders which
are part of a graphics pipeline and the draws are submitted to
a graphics queue.
- The HW requires the driver to dispatch task shaders on
an async compute queue.
When a pipeline is bound that has a task shader, create a
driver-internal ACE (async compute engine) cmdbuf which
we are going to submit to an ACE queue.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This cleans up radv_flush_constants and also
the new function will be reused later.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
Initialize the inverted predication VA only when it is used
for the first time.
This is needed to get conditional rendering work correctly with
task shaders because the internal compute cmdbuf may not exist
yet when conditional rendering starts.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This can't do it in the loop because it doesn't easily know what
attributes use a binding.
We could do it in a separate loop, but there's no point, especially since
zink does CmdSetVertexInputEXT() after CmdBindVertexBuffers2().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: c335a4d70e ("radv: dynamically calculate misaligned_mask for dynamic vertex input")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17521>
When the application uses depth values outside of the [0.0,1.0] range
with VK_EXT_depth_range_unrestricted, or when explicitly disabled.
Otherwise, the driver can clamp to [0.0,1.0] internally for optimal
performance.
From the Vulkan spec "in 28.10.1. Depth Clamping and Range Adjustment":
"If depth clamping is not enabled and zf is not in the range [0, 1]
and either VK_EXT_depth_range_unrestricted is not enabled, or the
depth attachment has a fixed-point format, then zf is undefined
following this step."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16349>
When using a static callable stack, the required scratch has already
been allocated.
Dynamic stacks are located at the end of scratch memory
and are allocated on demand using radv_set_rt_stack_size.
Static stacks live at the start of scratch memory and are allocated in
create_rt_shader by setting scratch_size.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17579>
In the direct path we already skipped draws, but in DGC I noticed
that just emitting these packets can cause issues ...
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17269>
Required for indirect(2) ray tracing to work.
Fixes the following tests:
dEQP-VK.ray_tracing_pipeline.trace_rays_indirect2.indirect_*
dEQP-VK.ray_tracing_pipeline.trace_rays_cmds_maintenance_1.indirect2_*
Fixes: 16585664 ("radv: vkCmdTraceRaysIndirect2KHR")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17316>
Required for indirect(2) ray tracing to work.
Fixes: b30f96dd ("radv,aco: Use ray_launch_size_addr")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17316>
With dynamic vertex bindings the vertex format lookups are a lot
more frequent (vs being baked in the pipeline). Add a simple lookup
cache using a dynamic array to keep track of the hw values, and
avoid repeated translation.
This also reduces the memset to just the bitfields since all
the others will be overwritten.
Seen in perf traces gputest gimark with zink on radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15846>
For unknown reasons, COPY_DATA can hang on GFX10+ while it doesn't
hang on GFX9. Adding PFP_SYNC_ME before/after the COPY_DATA doesn't
fix the hang either.
Using a LOAD_CONTEXT_REG_INDEX packet shouldn't be needed unless the
driver supports preemption (shadow memory) which RADV doesn't support.
I don't have a real explanation but PFP_SYNC_ME+LOAD_CONTEXT_REG_INDEX
fixes a GPU hang with Space Engineers (game uses a bunch of consecutive
calls to vkCmdDrawIndirectByteCountEXT without anything in-between).
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5838
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17290>
The driver can't assume sample positions at (0.5, 0.5) when user
sample locations are used.
This doesn't fix anything in practice because NGGC is only enabled by
default on GFX10.3 and that extension is currently disabled on GFX10+,
but I would like to expose it at some point.
This fixes dEQP-VK.pipeline.*.sample_locations_ext.verify_location.*
(when the extension is enabled locally).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17228>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking subpass suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.
Fixes: 779e09639b ("radv: configure DB_Z_INFO.NUM_SAMPLES correctly on GFX11")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17100>
Streamout must be enabled for the PRIMITIVES_GENERATED query to work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15639>
When primitives generated query is used, the driver also needs to
emulate counting for NGG VS/TES.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15639>
This changes the trace rays logic to always use
VkTraceRaysIndirectCommand2KHR and implements
vkCmdTraceRaysIndirect2KHR. I renamed the
load_sbt_amd to sbt_base_amd and moved the SBT
load lowering from ACO to NIR.
Note that we can not just upload one pointer to
all the trace parameters because that would
be incompatible with traceRaysIndirect.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
Adds a small helper for handling cp dma sync. This
Also adds the missing handling for some stage
flags in write_event.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16782>
Use vk_buffer as a base for radv_buffer and
replace manual handling of VK_WHOLE_SIZE with
vk_buffer_range.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16764>