With the per-submit mode, the driver was overwriting the existing file.
To fix that properly, add a RGP capture mode to add _frameXXX or
_submitXXX to the filenames.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40535>
Resizing the buffer isn't possible for per-submit captures and it
doesn't make sense either. This introduces a new envvar called
RADV_CACHE_COUNTERS_BUFFER_SIZE to increase the SPM BO size if needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40535>
Nvidia implements both the same way as AMD does, so it makes sense to
allow for code sharing here.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40541>
Abusing RADV_PERFTEST for experimental features doesn't make real
sense, and I think we should stop doing that.
The existing RADV_PERFTEST options like RADV_PERFTEST=transfer_queue
still exists but they are marked as deprecated, they will be removed
in future Mesa releases.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40646>
The most egregious case was AS updates, in which case radv_copy_memory
would decide to use compute, which overwrites the bound pipeline with
a copy shader. Subsequent dispatches assumed the update pipeline to be
bound, but dispatched another copy shader instead.
There is also a chance of this happening for geometry info copying for
RRA, so add another pass for that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39985>
Because there is no way to know where the address has been allocated
(GTT or VRAM), the existing entrypoints aren't dropped and the sparse
bit is derived from VK_ADDRESS_COMMAND_FULLY_BOUND_BIT_KHR.
It would be nice to figure out if the CP DMA vs compute heuristic for
GTT BOs on dGPUs could be removed to simplify this implementation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
This doubles vkoverhead's draw_16vattrib_change_dynamic performance.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40603>
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.
Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.
Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.
Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
They are only nightly jobs that run full VKCTS. The main advantage is
that we have mesh shaders coverage on NAVI31/GFX1201. It's still not
possible to enable that on pre-merge because of random GPU hangs.
Expect random GPU hangs on NAVI31/GFX1201 nightly jobs but I think
it's better than no coverage at all.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40626>
These jobs only skip the tests that are known to hang. The timeout is
also increased to 120s.
Also rename them to -full for less confusion.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40626>
This is a bit unusual, as we otherwise only use the VOP2 codesize
optimization opcodes in the register allocator.
But unless we change the scheduler to not split v_mov_b32_dpp and
v_dot, we have no other choice.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40510>
RADV supports ASTC emulation. Though it seems broken to some extent but
it's better to run the tests and mark them as expected failures anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40580>