With descriptor heap the driver will also have to emit indirect
descriptor heaps in some cases.
Rename couple of things to make them more generic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
Emitting compute dispatches on SDMA would just hang.
This fixes pending depth/stencil copy tests on transfer queue with
RADV_PERFTEST=transfer_queue.
Fixes: e6c485afb0 ("radv: initialize HiZ metadata during image layout transitions")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37795>
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.
From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
This existed since ccfe9813fb because NIR
had no atomic loads/stores. This is no longer the case.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
For non-atomic loads, this situation would require a data race.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
The primary CS doesn't need to use chaining in order to use IB2.
Allow using IB2 packets when chaining is disabled.
Rationale for this patch:
When chaining is enabled (the default), this patch removes a
useless check.
When chaining is disabled (by noibchaining), this patch allows us
to use IB2 without chaining.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
All CS always use IBs, so the naming was confusing.
Rename these fields to chain_ib to better reflect
what it actually means, which is enabling chaining:
radv_amdgpu_winsys::use_ib_bos
radv_amdgpu_cs::chain_ib
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
We form LDS clauses because heavily interleaving LDS and VALU leads to false
dependencies. But LDS is completely uncached, so splitting the clause with
waitcnts shouldn't hurt, it might even be beneficial because the first
LDS store can start earlier.
Foz-DB Navi48:
Totals from 170 (0.21% of 80287) affected shaders:
Instrs: 239633 -> 240148 (+0.21%)
CodeSize: 1276584 -> 1278532 (+0.15%)
Latency: 3788507 -> 3789876 (+0.04%); split: -0.01%, +0.04%
InvThroughput: 841637 -> 841694 (+0.01%); split: -0.01%, +0.02%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37701>
Lowering them earlier right after VTN would allow us to implement
embedded samplers for descriptor heap properly for merged shaders.
Non-immediate samplers are still lowered in
radv_nir_apply_pipeline_layout because they require shader arguments.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>
Use vk_video_is_profile_supported first, and add AMD specific
restrictions later.
vulkaninfo reports on Navi31:
H.264 Decode (4:2:0 8-bit) Baseline progressive
H.264 Decode (4:2:0 8-bit) Main progressive
H.264 Decode (4:2:0 8-bit) High progressive
H.264 Decode (4:2:0 8-bit) Baseline interlaced (interleaved lines)
H.264 Decode (4:2:0 8-bit) Main interlaced (interleaved lines)
H.264 Decode (4:2:0 8-bit) High interlaced (interleaved lines)
H.264 Decode (monochrome 8-bit) High progressive
H.264 Decode (monochrome 8-bit) High interlaced (interleaved lines)
H.265 Decode (4:2:0 8-bit) Main
H.265 Decode (4:2:0 8-bit) Main 10
H.265 Decode (4:2:0 8-bit) Main Still Picture
H.265 Decode (4:2:0 10-bit) Main 10
VP9 Decode (4:2:0 8-bit) Profile 0
VP9 Decode (4:2:0 10-bit) Profile 2
AV1 Decode (4:2:0 8-bit) Main with film grain support
AV1 Decode (4:2:0 8-bit) Main without film grain support
AV1 Decode (4:2:0 10-bit) Main with film grain support
AV1 Decode (4:2:0 10-bit) Main without film grain support
AV1 Decode (4:2:0 12-bit) Professional with film grain support
AV1 Decode (4:2:0 12-bit) Professional without film grain support
AV1 Decode (monochrome 8-bit) Main with film grain support
AV1 Decode (monochrome 8-bit) Main without film grain support
AV1 Decode (monochrome 10-bit) Main with film grain support
AV1 Decode (monochrome 10-bit) Main without film grain support
AV1 Decode (monochrome 12-bit) Professional with film grain support
AV1 Decode (monochrome 12-bit) Professional without film grain support
H.264 Encode (4:2:0 8-bit) Baseline
H.264 Encode (4:2:0 8-bit) Main
H.264 Encode (4:2:0 8-bit) High
H.265 Encode (4:2:0 8-bit) Main
H.265 Encode (4:2:0 8-bit) Main 10
H.265 Encode (4:2:0 8-bit) Main Still Picture
H.265 Encode (4:2:0 10-bit) Main 10
AV1 Encode (4:2:0 8-bit) Main
AV1 Encode (4:2:0 10-bit) Main
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>
Currently we wait until the second dword in feedback buffer changes
from 0 to 1, and then the rest of the feedback is read. There is no
guarantee that the rest of the feedback will be available, which can
cause bitstream size to be incorrectly returned as 0.
Add write memory command after encode, marking the query as available
to ensure the entire feedback buffer is ready.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13601
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>
This can trigger an assert otherwise. The space reserved before
executing DGC IBs is an arbitrary number which should be large enough
in all cases.
Found this while implementing descriptor heap.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37681>
Based on a patch by llyyr <llyyr.public@gmail.com>:
!36827 added the copy_sync_payloads function, but didn't enable use of
it in radv. This commit mirrors similar MRs for anv/panvk/nvk and uses
the common vk_drm_syncobj_copy_payloads function for copy_sync_payloads.
I'm not too familiar with radv internals, so there's potentially a good
reason why this isn't a good change. However, I've personally been using
this patch locally for around a month and have experienced no
regressions and around 8% uplift on vkmark test scores with a 6600 XT.
[vertex] device-local=true: 45110 -> 48489 (+7.5%)
[vertex] device-local=false: 17529 -> 17488 (-0.2%)
[texture] anisotropy=0: 44768 -> 48679 (+8.7%)
[texture] anisotropy=16: 44920 -> 48572 (+8.1%)
[shading] shading=gouraud: 44931 -> 48467 (+7.9%)
[shading] shading=blinn-phong-inf: 44849 -> 48740 (+8.7%)
[shading] shading=phong: 44695 -> 48645 (+8.8%)
[shading] shading=cel: 44809 -> 47938 (+7.0%)
[effect2d] kernel=edge: 45185 -> 47837 (+5.9%)
[effect2d] kernel=blur: 26919 -> 26762 (-0.6%)
[desktop] <default>: 40974 -> 44034 (+7.5%)
[cube] <default>: 45090 -> 49270 (+9.3%)
[clear] <default>: 41102 -> 44375 (+8.0%)
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37606)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37640>