For reads, we use the LD_PKA (AKA LD_BUFFER) so we can directly
pass the buffer index. For writes, we still convert the SSBO index
into a global address before doing a global load/store/atomic
operation, but we do that with an LEA_PKA instruction that takes
care of bounds checking.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
pan_nir_lower_sysvals.c is not a per-gen file. Pass the architecture
to panfrost_nir_lower_sysvals() to replace the existing #if PAN_ARCH <= 9
section.
Fixes: 9d981a4c5b ("panfrost: properly lower DrawID sysval on v9")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
On Mali(Valhall), the bounds checking can be done when in hardware, but
for this to work properly, we need to pass the offset to the
nir_load_ssbo_address() intrinsic.
Add an offset source to the intrinsic, and adjust the lowering pass
to conditionally lower the offset addition.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
On Mali(Valhall), we have a way to load SSBO data without going through
an SSBO index -> global address translation, so let's provide a way
to tell nir_lower_ssbo() when it shouldn't lower loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
If we want to be able to replace the SW-based <SSBO,offset> -> global
address logic by something that uses LEA_PKA to do the bounds check,
we need to emit the SSBO table and lower SSBO indices like we do
for other resources.
This should stay unused until we toggle the native SSBO switch.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
They use the same condition.
This also skips CS_PARTIAL_FLUSH when CB/DB is flushed because that also
waits for compute shaders.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
If we use CACHE_FLUSH_AND_INV_TS_EVENT, then DB_META and CB_META events
are redundant.
So determine the event first, and then determine whether to flush
DB/CB_META.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
Some of this code was duplicated and prepare_cb_db_flushes was called
in the wrong place in gfx6_emit_barrier where it didn't do anything.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
The new function si_fb_barrier_after_rendering will flag cache flushes and
waits in future commits. This is the beginning of unifying all framebuffer
barrier code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
We always pass SI_OP_SYNC_BEFORE to barriers, which makes it redundant.
If we don't want to sync "before", we just won't call
si_barrier_before_internal_op.
This makes the flags parameter unused in si_barrier_before_internal_op.
It might be used for something else in the future. All places now pass 0
to it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
The shader busy-waits until the query results are written, but that only
synchronizes for src. The destination buffer might also be used by previous
shaders, so we should wait until all shaders are idle. This might fix some
issues.
The missing si_mark_atom_dirty fix should have no effect, but all flags
changes should call it to be consistent.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
All places that call si_barrier_after_internal_op also set SI_OP_SYNC_AFTER,
so we can do the sync unconditionally.
If we want to skip the "after" sync in the future, we just won't call
si_barrier_after_internal_op.
CP DMA is the only one that will sync even without
si_barrier_after_internal_op, but CP DMA ops are usually small and almost
never used on GFX10+.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
The only remaining use had no effect because it doesn't call
si_barrier_before_internal_op at all and instead implements its own
barrier.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
It has only one use. The barriers didn't do anything because the caller
doesn't set any flags and implements its own barrier.
This is part of trying to push the barrier logic outside the functions
that implement internal ops.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>
Test suites even with AMD_DEBUG=mono didn't catch this.
Fixes: b7136d0890 - radeonsi: pass TCS inputs_read mask to LS output lowering on GFX9 + monolithic
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31193>