Commit graph

11281 commits

Author SHA1 Message Date
Benjamin Cheng
499d9e2e98 radv/video: Allow aliasing of video images
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39109>
2026-01-09 13:52:56 +00:00
Samuel Pitoiset
edb730f647 radv: fix flushing gang semaphore with SDMA/ACE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
If the main CS is SDMA and the gang CS is ACE, this would emit a
SDMA_FENCE packet on ACE which just hangs.

Fixes: b1938901d0 ("radv: Use SDMA fence packet when flushing gang semaphores")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39211>
2026-01-09 09:07:45 +00:00
Natalie Vock
1f6ac3fa93 radv/rt,aco: Always dispatch 1D workgroups for RT
We will swizzle the workgroups ourselves in the next commit.
Removes the need for 1D dispatch workarounds.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>
2026-01-08 19:49:54 +01:00
Natalie Vock
8baa95e4aa radv/rt: Use subgroup invocation for stack index
Workgroup == subgroup anyway, and we don't have the workgroup thread IDs
in RT shaders.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>
2026-01-08 19:49:45 +01:00
Georg Lehmann
a706769a0b nir: move exact bit to nir_fp_math_control
Unifies nir per instruction float control.

In the future this can be split into contract/reassoc/transform
like SPIR-V.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (except SPIR-V)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
2026-01-07 09:40:57 +00:00
Marek Olšák
1912a00a91 ALL: use SHA1_DIGEST_LENGTH etc. instead of hardcoding the numbers
only build_id is switched to use literal 20 instead of SHA1_DIGEST_LENGTH
because we will increase SHA1_DIGEST_LENGTH to BLAKE3_KEY_LEN

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39110>
2026-01-07 08:32:33 +00:00
Samuel Pitoiset
9f5dd888b6 radv/sqtt: add a comment about the allocation strategy of the SQTT BO
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39172>
2026-01-07 06:57:29 +00:00
Samuel Pitoiset
ffa343ed05 Revert "radv: allocate the SQTT BO in GTT for faster readback"
This reverts commit da07f1ef3f.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14591
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39172>
2026-01-07 06:57:29 +00:00
Samuel Pitoiset
59dc20262c ac/perfcounter: rename ac_pc_block::num_instances to num_scoped_instances
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39155>
2026-01-06 11:43:21 +00:00
Konstantin Seurer
405c93c665 radv: Optimize BVH4 acceleration structure updates
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It is more efficient to compute the child index of the current node
inside the parent node and write the bounds when available. The previous
code could load up to 16 AABBs to compute the new ones. The new code
also only needs 1/7 of the previously used scratch memory. The new code
seems to be around 30% faster (0.5ms) in GOTG on a 6700XT.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39139>
2026-01-05 15:24:54 +00:00
Timur Kristóf
c05d276473 radv: Mitigate GFX6-7 SMEM bug for robust OOB access
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Implement a mitigation for VM faults caused by SMEM reading
out of bounds when using robust buffer access.

- Pad uniform and storage buffer allocations with a readonly VM page
- Clamp SMEM offsets that can potentially read past the next page

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
2026-01-02 23:42:16 +00:00
Timur Kristóf
f866bed0db radv: Mitigate GFX6-7 SMEM bug for NULL and mutable descriptors
Implement a mitigation for VM faults caused by SMEM reading
from NULL descriptors.

In order to satisfy VKD3D-Proton's expectations on mutable
descriptors, we must do this in shader code, it is not
sufficient to use the address of a mapped BO when writing
null descriptors. It is not feasible to mitigate this
in VKD3D-Proton.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
2026-01-02 23:42:16 +00:00
Timur Kristóf
10a5e5e4f3 radv/amdgpu: Add ability to pad BOs with a read-only VM page
Map the first page of the same BO as read-only after the BO itself
in order to pad each BO with an extra page. This doesn't require
us to allocate any memory.

This is going to be used for a HW bug mitigation.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
2026-01-02 23:42:16 +00:00
Marek Olšák
bd9206192d radv: use ac_set_sx_downconvert_state_for_mrt
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39093>
2026-01-02 16:46:20 +00:00
Marek Olšák
e64e41f69e radv: fix halved pixel throughput for a few non-blended 16bpp/32bpp formats
Fixed formats:
* R16_SFLOAT
* R16G16_SFLOAT
* R5G5B5A1_UNORM
* A2B10G10R10_UINT

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39093>
2026-01-02 16:46:20 +00:00
Samuel Pitoiset
489550d380 radv: add new drirc radv_prefer_2d_swizzle_for_3d_storage
Because some games perform much better with 2D swizzle for 3D storage.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38084>
2026-01-02 09:54:29 +00:00
Samuel Pitoiset
ae99082f96 radv: use 2D swizzle modes for 3D CB render targets when optimal
Much faster because CB is optimal with 2D swizzle modes. This isn't
applied for storage images because it depends on the access pattern,
and benchmark results are very different.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38084>
2026-01-02 09:54:29 +00:00
Samuel Pitoiset
7ee931b98f radv: increase the reserved CS space size for SPM
Because there are many more instances per SPM counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39099>
2026-01-02 09:27:20 +01:00
Timur Kristóf
3d803d7a2e radv: Use compute copy for emulated formats
These aren't supported by the hardware, so better to use the
compute copy implementation with these formats.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25594>
2026-01-02 04:32:06 +00:00
Timur Kristóf
0638fa5156 radv: Use compute for transfer operations unsupported by SDMA
For transfer queue operations that aren't supported by SDMA,
implement them with ACE (Async Compute Engine) using the pre-
existing compute copy functions.

Add a helper radv_get_pm4_cs that returns the ACE gang CS for
transfer command buffers and the main CS for graphics/compute
command buffers. Use radv_get_pm4_cs to make sure to emit the
compute commands to the correct command stream.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25594>
2026-01-02 04:32:06 +00:00
Timur Kristóf
aa6c8b8953 radv: Add layout argument to transfer_copy_buffer_image.
This argument will be used with gang submit.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25594>
2026-01-02 04:32:06 +00:00
Timur Kristóf
72ac874ba6 radv: Remove radv_remove_varyings.
Not needed anymore, since we are now doing this on lowered I/O.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>
2026-01-01 21:26:26 -06:00
Timur Kristóf
473ef0b6fb radv: Use nir_remove_outputs with the noop FS.
As opposed to radv_remove_varyings, this one works fine with
mesh shaders as well.

This commit helps depth-only rendering with mesh shaders.

No Fossil DB changes.
(Possibly there are no applicable fossils in our DB.)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33928>
2026-01-01 21:26:04 -06:00
Timur Kristóf
24e0e8980f radv: Don't call nir_link_opt_varyings anymore
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The old nir_link_opt_varyings pass is superseded by the
new nir_opt_varyings pass.

Fossil DB stats on Strix Halo (GFX11.5):

Totals from 1291 (1.62% of 79825) affected shaders:
MaxWaves: 37070 -> 37078 (+0.02%)
Instrs: 985094 -> 985326 (+0.02%); split: -0.07%, +0.09%
CodeSize: 5144668 -> 5145384 (+0.01%); split: -0.06%, +0.07%
VGPRs: 68040 -> 68160 (+0.18%); split: -0.12%, +0.30%
Latency: 7923260 -> 7921208 (-0.03%); split: -0.04%, +0.02%
InvThroughput: 1291120 -> 1291008 (-0.01%); split: -0.05%, +0.05%
VClause: 16590 -> 16580 (-0.06%)
SClause: 27360 -> 27376 (+0.06%); split: -0.08%, +0.14%
Copies: 68767 -> 69041 (+0.40%); split: -0.51%, +0.91%
Branches: 19431 -> 19449 (+0.09%)
PreSGPRs: 55679 -> 55704 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 47787 -> 47926 (+0.29%); split: -0.00%, +0.30%
VALU: 572252 -> 572489 (+0.04%); split: -0.10%, +0.14%
SALU: 139916 -> 139845 (-0.05%); split: -0.10%, +0.05%

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 18:10:05 -06:00
Timur Kristóf
7ee52b7066 radv: Don't call nir_remove_unused_varyings anymore
The nir_remove_unused_varyings pass is not necessary anymore,
because nir_opt_varyings already does the same.

Fossil DB stats on Strix Halo (GFX11.5):

Totals from 3085 (3.86% of 79825) affected shaders:
MaxWaves: 91286 -> 91290 (+0.00%)
Instrs: 1337749 -> 1335687 (-0.15%); split: -0.39%, +0.24%
CodeSize: 6625244 -> 6618148 (-0.11%); split: -0.38%, +0.27%
VGPRs: 140424 -> 140352 (-0.05%); split: -0.07%, +0.02%
Latency: 5028592 -> 5021465 (-0.14%); split: -0.26%, +0.12%
InvThroughput: 669773 -> 671718 (+0.29%); split: -0.24%, +0.53%
VClause: 24431 -> 24407 (-0.10%); split: -0.17%, +0.07%
SClause: 30114 -> 29435 (-2.25%); split: -2.28%, +0.03%
Copies: 99243 -> 101319 (+2.09%); split: -1.32%, +3.41%
Branches: 27445 -> 27599 (+0.56%)
PreSGPRs: 119444 -> 119472 (+0.02%); split: -0.67%, +0.69%
PreVGPRs: 96667 -> 96688 (+0.02%); split: -0.00%, +0.02%
VALU: 741846 -> 744017 (+0.29%); split: -0.14%, +0.44%
SALU: 197068 -> 195256 (-0.92%); split: -0.96%, +0.05%
VMEM: 54067 -> 54053 (-0.03%); split: -0.03%, +0.00%
SMEM: 56565 -> 55131 (-2.54%); split: -2.59%, +0.05%

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 18:03:22 -06:00
Timur Kristóf
43496a6bf9 radv: Don't call nir_compact_varyings anymore
nir_compact_varyings is not necessary anymore, because everything
that it does, is also done by nir_opt_varyings.

The resulting shader stats are slightly negative because
without nir_compact_varyings, the I/O variables in TCS
are sorted less "fortunately".

After discussing this with the RADV team, we decided that
this is an acceptable loss.

Fossil DB stats on Strix Halo (GFX11.5):

Totals from 4577 (5.73% of 79825) affected shaders:
MaxWaves: 130456 -> 130532 (+0.06%); split: +0.06%, -0.00%
Instrs: 3012724 -> 3014809 (+0.07%); split: -0.06%, +0.13%
CodeSize: 15476368 -> 15484724 (+0.05%); split: -0.05%, +0.10%
VGPRs: 227976 -> 227832 (-0.06%); split: -0.14%, +0.07%
Latency: 13230769 -> 13237431 (+0.05%); split: -0.03%, +0.08%
InvThroughput: 1862029 -> 1864167 (+0.11%); split: -0.07%, +0.19%
VClause: 43128 -> 43123 (-0.01%); split: -0.08%, +0.07%
SClause: 61636 -> 61647 (+0.02%); split: -0.01%, +0.02%
Copies: 178023 -> 180309 (+1.28%); split: -0.80%, +2.09%
PreSGPRs: 195628 -> 195683 (+0.03%)
PreVGPRs: 161817 -> 161749 (-0.04%)
VALU: 1828727 -> 1831037 (+0.13%); split: -0.08%, +0.20%
SALU: 336688 -> 336668 (-0.01%); split: -0.01%, +0.00%
VMEM: 99441 -> 99545 (+0.10%); split: -0.00%, +0.11%

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 18:01:34 -06:00
Timur Kristóf
e2fabb4e4a radv: Don't call nir_opt_combine_stores anymore
Also no need for nir_lower_tess_level_array_vars_to_vec.
These should be now handled by nir_opt_vectorize_io.

Fossil DB stats on Strix Halo (GFX11.5):

Totals from 373 (0.47% of 79825) affected shaders:
Instrs: 381930 -> 380786 (-0.30%); split: -0.30%, +0.00%
CodeSize: 1888160 -> 1883644 (-0.24%); split: -0.24%, +0.01%
Latency: 1008755 -> 1008053 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 156523 -> 155275 (-0.80%); split: -0.81%, +0.01%
Copies: 22357 -> 20812 (-6.91%); split: -6.93%, +0.02%
VALU: 240904 -> 239359 (-0.64%); split: -0.64%, +0.00%

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 17:59:48 -06:00
Timur Kristóf
1106b0a1e2 radv: Only run some optimizations when scalarization made progress
These passes are called to clean up after scalarization, so
only call them when scalarization actually made progress.

No Fossil DB changes on Strix Halo (GFX11.5)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 17:54:55 -06:00
Timur Kristóf
58020fdc01 radv: Scalarize and re-vectorize unlinked shader I/O
Reasons to do this:
- Optimize VS inputs (always unlinked)
- Allow some optimization on unlinked shaders for GPL/ESO
- Prepare for retiring the old linking passes

Fossil DB stats on Strix Halo (GFX11.5):

Totals from 1814 (2.27% of 79825) affected shaders:
MaxWaves: 51232 -> 51434 (+0.39%)
Instrs: 1213430 -> 1212744 (-0.06%); split: -0.20%, +0.14%
CodeSize: 6124996 -> 6122472 (-0.04%); split: -0.17%, +0.13%
VGPRs: 93336 -> 92988 (-0.37%); split: -0.45%, +0.08%
Latency: 5360820 -> 5357501 (-0.06%); split: -0.29%, +0.23%
InvThroughput: 763087 -> 762937 (-0.02%); split: -0.11%, +0.09%
VClause: 22037 -> 22059 (+0.10%); split: -0.19%, +0.29%
SClause: 30971 -> 30884 (-0.28%); split: -0.46%, +0.17%
Copies: 73139 -> 73294 (+0.21%); split: -0.82%, +1.03%
Branches: 20370 -> 20346 (-0.12%)
PreSGPRs: 77373 -> 77404 (+0.04%)
PreVGPRs: 68218 -> 67093 (-1.65%); split: -1.78%, +0.13%
VALU: 662849 -> 663059 (+0.03%); split: -0.09%, +0.12%
SALU: 206745 -> 206781 (+0.02%); split: -0.06%, +0.08%
VMEM: 34230 -> 34250 (+0.06%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 17:54:31 -06:00
Timur Kristóf
8e6bff4caa radv: Lower 64-bit VS inputs to 32-bit
In RADV, we already lower all 64-bit I/O to 32-bit,
except VS inputs. Most of the newer NIR passes that
deal with I/O do not support 64-bit I/O, so now it's
time for us to also lower 64-bit VS inputs to 32-bit.

No Fossil DB changes on Strix Halo (GFX11.5).

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33979>
2026-01-01 17:44:40 -06:00
Samuel Pitoiset
b3c983b8dd amd,radv,radeonsi: add a new function to update windowed perf counters
Some checks failed
macOS-CI / macOS-CI (dri) (push) Has been cancelled
macOS-CI / macOS-CI (xlib) (push) Has been cancelled
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39065>
2025-12-24 07:20:01 +00:00
Samuel Pitoiset
47366527ce radv: fix capturing performance counters with SPM
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14333
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39065>
2025-12-24 07:20:01 +00:00
Samuel Pitoiset
e03461f3bd radv: change the default value of RADV_TRACE_CACHE_COUNTERS on < GFX10
To not print a warning about missing SPM by default on < GFX10.
Also move the function to radv_physical_device.c and make it non-static.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39065>
2025-12-24 07:20:01 +00:00
Timur Kristóf
450a6189de radv: Initialize transfer queue gang when needed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Initialize gang CS on unsupported transfer operations.

Add a wait when:
- SDMA needs to wait for previous transfer operations on ACE
- ACE needs to wait for previous transfer operations on SDMA

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:59 +00:00
Timur Kristóf
cc5190829f radv: Declare some gang submit functions in radv private header.
They will be called from the transfer copy functions.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:59 +00:00
Timur Kristóf
b1938901d0 radv: Use SDMA fence packet when flushing gang semaphores
Add back the SDMA fence packet to radv_flush_gang_semaphore.
This was regressed by 9666bd1245.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:59 +00:00
Timur Kristóf
d71a05dffa radv: Implement gang semaphores for transfer queues.
We need to use gang semaphores in the following two scenarios:

1. Leader to follower semaphore:
Increment the leader to follower semaphore when the leader wants
to block the follower: a transfer operation on ACE needs to wait
for a previous operation on SDMA.

2. Follower to leader semaphore:
Increment the follower to leader semaphore when the follower wants
to block the leader: a transfer operation on SDMA needs to wait
for a previous operation on ACE.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:58 +00:00
Timur Kristóf
4d0975dc83 radv: Update comments for gang semaphores
Change the explanation to use "leader" and "follower" terminology.
Explain better how it is used with GFX/ACE and SDMA/ACE.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:58 +00:00
Timur Kristóf
65bf4e7dcd radv: Require gang submit and compute for transfer queues
RADV's transfer queue implementation will use compute for
the transfer operations that aren't supported by the SDMA,
so we'll need gang submissions for that.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:58 +00:00
Timur Kristóf
f481a5f887 radv: Add function to determine if SDMA supports an image.
The following are not supported by SDMA:
- Sparse images (aka. PRT) on older GPUs
- Multisampled images

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:58 +00:00
Timur Kristóf
f55771a17d radv: Bypass L2 for gang semaphore BO with SDMA/ACE
When the "gang leader" is SDMA, we need to ensure that the
gang semaphores BO is coherent between SDMA and CP.
To achieve this, we need bypass the L2 cache when either SDMA
or CP are connected to L2.

Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
2025-12-23 12:14:58 +00:00
Timur Kristóf
fc57fa4589 radv, radeonsi: Don't pass task ring info to mesh/task payload lowering
The pass now uses the ring descriptors to figure these out.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032>
2025-12-22 15:17:59 +00:00
Samuel Pitoiset
044e7f6017 radv/nir: fix front_face opts for points/lines and unknown prim
Fixes new VKCTS coverage dEQP-VK.glsl.builtin_var.frontfacing.*.

Fixes: af375c6756 ("radv: Optimize fs builtins using static gfx state")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39041>
2025-12-22 07:59:30 +00:00
Daniel Schürmann
1e8d367537 amd: add and use ac_cu_info::has_vtx_format_alpha_adjust_bug
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:48 +00:00
Daniel Schürmann
f7c4aa48a0 ac/gpu_info: add some more flags to ac_cu_info
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:46 +00:00
Daniel Schürmann
f791e46c47 aco: add ac_cu_info to aco_compiler_options
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:46 +00:00
Daniel Schürmann
553b431aca ac/gpu_info: move some CU information into separate struct ac_cu_info
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>
2025-12-22 07:34:44 +00:00
Samuel Pitoiset
045b778ed6 radv: add the SQTT relocated shaders BO to the cmdbuf list
Found this while debugging another thing with amdgpu.debug_mask=0x1 (VM).

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39002>
2025-12-22 07:13:06 +00:00
Benjamin Cheng
fa8b0b6bbb radv/video: Enable write combine for decode
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39025>
2025-12-18 15:25:57 -05:00
Marek Olšák
3c5c96fedb radv: double pixel throughput in certain cases of PS without interpolated inputs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reduces the number of initialized VGPRs by 1 when no barycentric
coordinates are used.

I have verified with zink that this indeed increases performance for
cases where sysvals like frag_coord and front_face are used without
interpolated PS inputs.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38936>
2025-12-18 03:37:58 +00:00