When cs allocation fails in radv_create_cmd_buffer,
radv_destroy_cmd_buffer is called before returning
VK_ERROR_OUT_OF_HOST_MEMORY. At that point, the upload list is not
initalized yet, so SIGSEGV will occur when trying to iterate through the
upload bo list. Initialize the upload list earlier to avoid this.
Signed-off-by: Benjamin Cheng <ben@bcheng.me>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22016>
With GPL it's not possible to know the primitive topology when
compiling the pre-rasterization stages. For NGG, we use the maximum
number of vertices per prim and rely on the hardware to ignore the
extra bits for points/lines.
Though, this can't work for NGG streamout because the number of
vertices per prim is used to compute a streamout offset. The only
way to solve this is to pass the number of vertices per prim through
a new user SGPR.
This fixes a bunch of streamout tests with Zink/RADV on GFX11.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21833>
This introduces tracking of the required semaphore values in pipelines,
which is then propagated to cmd_buffers on bind. Each queue also keeps
track the maximum count it has waited for, so that we can avoid the waiting
overhead once all the shaders are loaded and referenced.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
The correct workaround is to bind an internal index buffer to handle
robustness2 correctly.
Fixes dEQP-VK.robustness.index_access.* in CTS 1.3.5.0 on NAVI10.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21471>
Fixes
dEQP-VK.draw.dynamic_rendering.complete_secondary_cmd_buff.multi_draw.mosaic.*
on VEGA10 (related to the use of HTILE).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21549>
Cannot be used for SSBO, so ignore SCACHE invalidation.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 8df17163c7 ("radv: implement vkCmdWaitEvents2KHR")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21271>
For sync2 bits, overflow can happen.
Use BITFIELD64_BIT to align with ANV.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 8df17163c7 ("radv: implement vkCmdWaitEvents2KHR")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21271>
RGP requires shaders to be uploaded consecutively inside the same
buffer object. Otherwise, either it makes the driver generating
huge traces (ie. in GiB) or it fails to load traces at all. Hopefully,
this will be improved soon when AMDGPU drivers will have GPL support.
To workaround this, the driver relocates graphics shaders in the same
buffer object when a pipeline is created. Then at draw time, it
overwrites SPI_SHADER_PGM_xxx registers to make sure SQTT can match
between emitted and exported shaders. It's a bit suboptimal because
graphics shaders are uploaded twice but it's the best solution I found.
This will allow to implement GPL caching without breaking capturing
shaders with RGP.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21078>
The shaders were uploaded consecutively to fit a RGP constraint but
this was more like a workaround. This upload path doesn't work well for
graphics pipeline library and it was the main blocker for GPL caching.
This commit breaks capturing shaders with RGP if the offset between
shaders is too big. Next commit should fix it by using shaders reloc.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21078>
This significantly lowers the CPU overhead of this function.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20980>
This code used to runtime-disable NGG culling for small draw calls.
However, this had too much CPU overhead, let's remove it.
It will be solved by the shaders instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20980>
There was no measurable perf benefit from this optimization,
and it made the code messy and difficult to refactor.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20980>
The video buffers need to have objects aligned at certain ranges,
this enhances the uploader to allow an alignment to be specified.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20388>
Found by inspection. Something probably hangs because of this, but I don't
know what.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: c199a5160a ("radv: bind the VS input state for prologs created with GPL")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20913>
Even when small primitive culling is disabled, the face culling algorithm
in ac_nir_cull can delete tiny triangles when their area is almost zero.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20987>
When all shader stages have already been imported it's possible to
skip radv_graphics_pipeline_compile() entirely. This makes GPL
fast-linking VERY fast.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21068>
This fixes a copy-paste error found by manual inspection.
TES may be merged into GS with certain HW stage mappings, which lead to
duplicate set-register commands to be emitted with the old code.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20935>
Check the need for emitting prefetch before calling si_emit_cache_flush
to mask a possible cache miss delay and always inline radv_emit_prefetch_L2.
Either change alone is not significant but together they increase
drawcall throughput by 8% on i5-2500.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20877>
Approximately 10% improvement in CPU overhead score on 3900X.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20655>
INDEX_TYPE and NUM_INSTANCES PKT3 should be always written
if shadowing is enabled since they are not shadowed.
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18301>