Rhys Perry
69bc4efa37
aco/sched_ilp: improve scheduling with VMEM/DS->VALU WaW
...
This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.
fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%
fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262 >
2026-02-16 19:39:43 +00:00
Rhys Perry
88b6b6db17
aco: only consider cost of memory loads at waitcnt
...
We don't run this code before waitcnt insertion, so this isn't necessary.
This change improves accuracy in these two situations, because the waitcnt
insertion pass is more aware of divergent control flow:
v0 = valu
if (divergent) {
v0 = vmem
} else {
use(v0)
}
v0 = vmem
if (divergent) {
wait vmcnt(0)
} else {
wait vmcnt(0)
}
use(v0)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262 >
2026-02-16 19:39:43 +00:00
Rhys Perry
6963c8dd80
radv,aco/gfx11: preserve s2 when NGG_WAVE_ID_EN=1
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
According to the ISA doc, this is needed for hang recovery.
This works by just avoiding putting temporaries in s0-3 unless they're
precolored there.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720 >
2026-02-16 14:33:58 +00:00
Rhys Perry
f9c11a8e15
radv: add ngg_wave_id_en to radv_shader_info
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720 >
2026-02-16 14:33:57 +00:00
Marek Olšák
61a96be494
nir/lower_non_uniform_access: add an option not to lower tex & image queries
...
AMD can do non-uniform queries. The RADV change will be in a separate commit.
NFC for drivers.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743 >
2026-02-16 12:59:36 +00:00
Marek Olšák
a9df891bc6
nir: allow get_ssbo_size to return a 64-bit result
...
to match get_ubo_size, and to support HW where SSBOs can have a 64-bit size.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743 >
2026-02-16 12:59:36 +00:00
Samuel Pitoiset
47841c1142
radv/meta: remove useless DCC decompressions for image<->buffer
...
It's not needed to decompress DCC when formats are compatible each
other, this basically removes all decompressions on GFX11-GFX11.5.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39888 >
2026-02-16 07:40:13 +00:00
Emma Anholt
db532eaf00
ci/radv: Enable WSI testing.
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This gets us coverage of present_timing for KHR_display, which we don't
have on the older CTS used by the other drivers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39701 >
2026-02-13 23:57:14 +00:00
Emma Anholt
c332ee5dd6
ci/radv: Add some flakes I hit while testing WSI.
...
I upgraded some clearly flaky groups of tests in zink to regexes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39701 >
2026-02-13 23:57:14 +00:00
Rhys Perry
b60bff0429
aco: consider 64-bit transcendental normal valu for s_delay_alu
...
https://github.com/llvm/llvm-project/pull/180940
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39851 >
2026-02-13 17:03:34 +00:00
Marek Olšák
9237ca7e46
ac/llvm: remove unused functions
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638 >
2026-02-13 15:33:19 +00:00
Marek Olšák
d1e6a5c1c8
ac: lower load_num_workgroups in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638 >
2026-02-13 15:33:19 +00:00
Marek Olšák
1e11e83d1c
ac/nir: add ac_nir_lower_intrinsics_to_args_options structure
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638 >
2026-02-13 15:33:19 +00:00
Marek Olšák
a9e47751d2
ac: lower load_subgroup_id for ACO in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638 >
2026-02-13 15:33:19 +00:00
Marek Olšák
0a9bdcac79
ac: lower load_workgroup_ids for ACO in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638 >
2026-02-13 15:33:19 +00:00
Daniel Schürmann
97f095f6e0
aco/lower_branches: Add try_rotate_latch_block() optimization
...
This optimization looks for unconditional back-edges and aims
to rotate the loop in a way that the final block is emitted
before the loop header, essentially turning
BB1:
if ()
goto BB3;
BB2:
<loop body>
goto BB1;
BB3:
...
into
goto BB1;
BB2:
<loop body>
BB1:
if(!cond)
goto BB2;
BB3:
...
Totals from 4969 (5.89% of 84383) affected shaders: (Navi48)
Instrs: 15253038 -> 15254019 (+0.01%); split: -0.00%, +0.01%
CodeSize: 81225300 -> 81227696 (+0.00%); split: -0.02%, +0.02%
Latency: 320796283 -> 320693480 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 51395922 -> 51376156 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ade5e300ab
aco/insert_delay_alu: handle loop latch block before loop body
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
102aca9843
aco/assembler: emit block_kind_loop_latch before the loop header
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
da1594f8bb
aco: introduce notion of block_kind_loop_latch
...
A block annotated with block_kind_loop_latch denotes a block
the re-entry point for a loop back-edge. It is emitted after
the loop preheader and (potentially) before the loop header.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
9887ce6709
aco/print_asm: Sort block markers by block offset
...
We are going to emit blocks in a different order.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
800a4957bb
aco/lower_branches: Consider branch target of nested conditional branches
...
Totals from 1470 (1.74% of 84383) affected shaders: (Navi48)
Instrs: 5128451 -> 5126842 (-0.03%)
CodeSize: 29359832 -> 29353656 (-0.02%); split: -0.02%, +0.00%
Latency: 41047203 -> 41040786 (-0.02%)
InvThroughput: 6040459 -> 6039619 (-0.01%); split: -0.01%, +0.00%
Branches: 146219 -> 144648 (-1.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
fbf2083b8f
aco/isel: Don't emit ELSE side of divergent branches which jump
...
Totals from 50 (0.06% of 84383) affected shaders: (Navi48)
Instrs: 402490 -> 402444 (-0.01%); split: -0.01%, +0.00%
CodeSize: 2239024 -> 2238864 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1493 -> 1496 (+0.20%)
Latency: 5836785 -> 5836747 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1120893 -> 1120909 (+0.00%); split: -0.00%, +0.00%
Copies: 46128 -> 46082 (-0.10%)
VALU: 222708 -> 222715 (+0.00%); split: -0.00%, +0.00%
SALU: 53039 -> 52993 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ba32219cf8
aco/isel: Don't emit ELSE side of uniform branches which jump
...
Totals from 4 (0.00% of 84383) affected shaders: (Navi48)
Instrs: 16473 -> 16468 (-0.03%)
CodeSize: 85276 -> 85300 (+0.03%)
SpillSGPRs: 175 -> 176 (+0.57%)
Latency: 267907 -> 267885 (-0.01%)
InvThroughput: 36302 -> 36298 (-0.01%)
Copies: 1353 -> 1345 (-0.59%)
VALU: 9025 -> 9029 (+0.04%)
SALU: 2635 -> 2627 (-0.30%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
96a639918c
aco: don't emit p_logical_start / p_logical_end after divergent branches
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:44 +00:00
Daniel Schürmann
3743230252
aco/isel: Do IF-simplification if that didn't happen during NIR optimizations
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:43 +00:00
Daniel Schürmann
50b093ec90
aco/builder: Fix v_add_co_u32 carry-out to VCC if post_ra
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519 >
2026-02-13 14:49:43 +00:00
Eric Engestrom
fb1cb00a96
radv/ci: add vulkan fluster job on navi48
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39861 >
2026-02-13 13:48:03 +00:00
Samuel Pitoiset
1be4ffdff9
ac,radv,radeonsi: use correct swizzle/pitch for depth-only images with SDMA
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes new VKCTS coverage
dEQP-VK.api.copy_and_blit.core.use_after_copy.*.
is_stencil isn't set for RadeonSI because it doesn't do SDMA copies
with Z/S.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39800 >
2026-02-13 07:52:29 +01:00
Samuel Pitoiset
4ec3840184
radv/meta: move the barrier for depth/stencil compute resolves outside
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This barrier is only needed for rendering resolves (ie. not for
vkCmdResolveImage()). It's similar to color compute resolves now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805 >
2026-02-12 20:17:20 +00:00
Samuel Pitoiset
3e41b04de9
radv/meta: optimize a barrier with depth/stencil compute resolves
...
The compute resolve doesn't use HTILE of the destination image, so the
potential HTILE clear can run in parallel.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805 >
2026-02-12 20:17:20 +00:00
Samuel Pitoiset
85a3f7816d
radv/meta: add HTILE support to radv_fixup_resolve_dst_metadata()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805 >
2026-02-12 20:17:20 +00:00
Samuel Pitoiset
6a454dabda
radv/meta: stop fixing up HTILE after a partial resolve using compute
...
The decompression pass already resets HTILE to its uncompressed state,
so this is just redundant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805 >
2026-02-12 20:17:19 +00:00
Samuel Pitoiset
c3cc6fd051
radv: cleanup barriers after a depth/stencil expand
...
Synchronize in radv_expand_depth_stencil() is more robust.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805 >
2026-02-12 20:17:19 +00:00
Samuel Pitoiset
7dd7731ac7
radv/meta: fix partial depth/stencil resolves with compute
...
HTILE must be decompressed for partial resolves when the hw doesn't
write the decompressed DWORD to HTILE. The driver must also
synchronize the depth/stencil expand if using graphics (the compute
path is already correctly synchronized in the helper).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805 >
2026-02-12 20:17:18 +00:00
David Rosca
5d4f977573
radv/video: Support UVD decode on hawaii and older
...
H264 requires extra allocation in DPB. Use helper function
to get the required size, same as we do for encode.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:27 +00:00
David Rosca
24c74f522c
ac/vcn_dec: Make the helper functions static
...
They are only used in ac_vcn_dec.c now.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
7ad4f501fa
radv: Drop videoarraypath debug option
...
It's not really usefull and only works for H264/5.
On AV1/VP9 it would cause hang.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
19a8b7121e
radv/video: Remove old VCN and UVD decode implementation
...
Only ac_video_dec is now used.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
Benjamin Cheng
6aed906410
radv/video: Use ac_video_dec for decode
...
Supports VCN and UVD.
Co-authored-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
26979becec
radeonsi/video: Add video decoder using ac_video_dec
...
Supports VCN, VCN JPEG and UVD.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
4d06fb9acd
ac: Add UVD ac_video_dec implementation
...
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
9608abb26b
ac: Add VCN JPEG ac_video_dec implementation
...
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
79af03556c
ac: Add VCN ac_video_dec implementation
...
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:26 +00:00
David Rosca
b5028e84c8
ac: Add video decode interface
...
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627 >
2026-02-12 15:38:25 +00:00
Samuel Pitoiset
02a2451e1f
radv: rename radv_image_use_dcc_image_stores()
...
To radv_image_compress_dcc_on_image_stores() because it seems more
informative.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803 >
2026-02-12 15:18:26 +00:00
Samuel Pitoiset
d58080f787
radv/meta: add a function to fixup DCC metadata for compute resolves
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803 >
2026-02-12 15:18:25 +00:00
Samuel Pitoiset
ed166804f6
radv/meta: remove an useless barrier when fixing up DCC for compute resolves
...
The resolve operation doesn't use DCC of the destination image, so the
clear can run in parallel.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803 >
2026-02-12 15:18:25 +00:00
Samuel Pitoiset
a673c9e414
radv/meta: stop fixing up DCC after a partial resolve using compute
...
The decompression pass already resets DCC to its uncompressed state,
so this is just redundant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803 >
2026-02-12 15:18:25 +00:00
Konstantin Seurer
f574de2249
radv: Fix setting the viewport for depth stencil FS resolves
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 704fbbb ("radv/meta: rework depth/stencil resolves using graphics")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39836 >
2026-02-12 14:25:31 +00:00
Konstantin Seurer
bc86c5adae
radv: Stop saving descriptors before acceleration structure OPs
...
They only use compute+constants.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39836 >
2026-02-12 14:25:31 +00:00