counterOffset was just ignored and nobody noticed (missing VKCTS
coverage).
VGT_STRMOUT_DRAW_OPAQUE_OFFSET will do the computation in hw for us.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33407>
Rename the drirc and call it radv_disable_dedicated_sparse_queue instead,
since normal queues support sparse now anyway.
Keep the workaround for existing known games, since they might not
expect a separate SPARSE queue to pop up.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33166>
Forcing a dedicated sparse queue is problematic in real-world scenarios.
In the current implicit sync world for sparse updates, we can rely on
submission order.
For use cases where an application can take advantage of the separate
sparse queue to do "async" updates, the existing implementation works
well, but problems arise when trying to implement D3D-style submission
ordering. E.g., when a game does sparse on a graphics or compute queue,
we need to guarantee that previous submissions, sparse update and future
submissions are properly ordered.
The Vulkan way of implementing this is to:
- Signal graphics queue to timeline N (i.e. last submission made)
- Wait on timeline N on the sparse queue
- Do sparse updates
- Signal timeline N + 1 on sparse queue
- Wait for timeline N + 1 on graphics queue (can be deferred until next
graphics submit)
This causes an unavoidable bubble in GPU execution, since the
existing sparse queue ends up doing:
- Wait pending signal. The implication here is that all previous GPU
work must have been submitted.
- Do VM operations on CPU timeline
- Wait for semaphores to signal (this is required for signal ordering)
- ... GPU is meanwhile stalling in a bubble due to GPU -> CPU -> GPU roundtrip.
- Signal semaphore on CPU (unblocks GPU work)
Letting the GPU go idle here is not great, and we can be screwed over by bad thread scheduling.
Another knock-on effect is that the graphics queue is now forced into
using a thread for submissions. This is because when the graphics queue
wants to wait for timeline N + 1, the sparse queue may not have
signalled the timeline yet on CPU, so effectively, we have created a
wait-before-signal situation internally in RADV. Throwing another thread
under the bus is not great either.
Just letting the queue in question support sparse binding solves all
these issues and I don't see a path forward where the D3D use case can
be solved in a separate queue world.
It is also friendlier to the ecosystem at large. RADV is the only driver
I know of that insists on separate sparse queues and multiple games
assume that graphics queue can support sparse.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33166>
This switches to disk_cache instead of our own mechanism which only
stored meta shaders when the logical was destroyed.
Meta shaders are still stored separately from the application shaders
because they are common to all applications on a given GPU/Mesa version.
The default cache is 32MiB which should be large enough.
This fixes massive stuttering in FF7 Rebirth but all apps are
technically affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33370>
I think this was broken as there might be a store_output with
less than 4 components to a location that shouldn't be smoothed
anyway (i.e. not the first one).
nir_lower_poly_line_smooth now handles the case where the first location
doesn't have 4 components.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33340>
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.
The old approach also stops worrking if passes reorder instructions.
This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
Same idea as the VS/TES and GS lowering:
Make shader compilation decisions based on the features of the
current GPU instead of ad-hoc deciding according to GFX level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
While theoretically all GFX11+ GPUs have an attribute ring, it is
nicer to have this property instead of deciding ad-hoc based on
the GFX level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
The intention is to have all the HW features affecting
shader compilation in one place, instead of ad-hoc decisions
in the code based on the GFX level and chip class.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
This fixes depth-only rendering with mesh shaders,
as well as array derefs in unlinked shaders in general.
Lowering array derefs of vectors is necessary for correctness.
Without this, nir_lower_io will incorrectly add the array index
to the IO intrinsic base instead of to the component offset.
This was previously only done during shader linking, which leaves
some problems with unlinked shaders and depth-only rendering.
Whether these calls can be safely removed from shader linking
will be investigated in a future commit.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12516
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33264>
Otherwise if the function name is stripped during NIR serialization,
importing libraries would break because entrypoint is NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33273>
Using the used component count is not enough. We need to consider
the component mask because any component can be disabled. This might
fix tests.
This removes the component counting from ac_get_fs_input_vgpr_cnt
and determines the component mask where it's needed.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
The Vulkan spec says:
"If logicOpEnable is VK_TRUE, then a logical operation selected by
logicOp is applied between each color attachment and the
fragment’s corresponding output value, and blending of all
attachments is treated as if it were disabled. Any attachments
using color formats for which logical operations are not supported
simply pass through the color values unmodified."
When logic op and blending are both enabled, logic op takes precedence
and values should be passed through unmodified. Also RB+ shouldn't
have any effects when blending is disabled.
Fixes new VKCTS coverage dEQP-VK.pipeline.*.logic_op_na_formats.*.
Fixes: 03b037a0e3 ("radv: disable logic op for float/srgb formats")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33235>