Commit graph

19947 commits

Author SHA1 Message Date
Rhys Perry
69bc4efa37 aco/sched_ilp: improve scheduling with VMEM/DS->VALU WaW
This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.

fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%

fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
88b6b6db17 aco: only consider cost of memory loads at waitcnt
We don't run this code before waitcnt insertion, so this isn't necessary.

This change improves accuracy in these two situations, because the waitcnt
insertion pass is more aware of divergent control flow:

v0 = valu
if (divergent) {
    v0 = vmem
} else {
    use(v0)
}

v0 = vmem
if (divergent) {
    wait vmcnt(0)
} else {
    wait vmcnt(0)
}
use(v0)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
6963c8dd80 radv,aco/gfx11: preserve s2 when NGG_WAVE_ID_EN=1
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
According to the ISA doc, this is needed for hang recovery.

This works by just avoiding putting temporaries in s0-3 unless they're
precolored there.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
2026-02-16 14:33:58 +00:00
Rhys Perry
f9c11a8e15 radv: add ngg_wave_id_en to radv_shader_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
2026-02-16 14:33:57 +00:00
Marek Olšák
61a96be494 nir/lower_non_uniform_access: add an option not to lower tex & image queries
AMD can do non-uniform queries. The RADV change will be in a separate commit.

NFC for drivers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Marek Olšák
a9df891bc6 nir: allow get_ssbo_size to return a 64-bit result
to match get_ubo_size, and to support HW where SSBOs can have a 64-bit size.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39743>
2026-02-16 12:59:36 +00:00
Samuel Pitoiset
47841c1142 radv/meta: remove useless DCC decompressions for image<->buffer
It's not needed to decompress DCC when formats are compatible each
other, this basically removes all decompressions on GFX11-GFX11.5.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39888>
2026-02-16 07:40:13 +00:00
Emma Anholt
db532eaf00 ci/radv: Enable WSI testing.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This gets us coverage of present_timing for KHR_display, which we don't
have on the older CTS used by the other drivers.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39701>
2026-02-13 23:57:14 +00:00
Emma Anholt
c332ee5dd6 ci/radv: Add some flakes I hit while testing WSI.
I upgraded some clearly flaky groups of tests in zink to regexes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39701>
2026-02-13 23:57:14 +00:00
Rhys Perry
b60bff0429 aco: consider 64-bit transcendental normal valu for s_delay_alu
https://github.com/llvm/llvm-project/pull/180940

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39851>
2026-02-13 17:03:34 +00:00
Marek Olšák
9237ca7e46 ac/llvm: remove unused functions
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
d1e6a5c1c8 ac: lower load_num_workgroups in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
1e11e83d1c ac/nir: add ac_nir_lower_intrinsics_to_args_options structure
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
a9e47751d2 ac: lower load_subgroup_id for ACO in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
0a9bdcac79 ac: lower load_workgroup_ids for ACO in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Daniel Schürmann
97f095f6e0 aco/lower_branches: Add try_rotate_latch_block() optimization
This optimization looks for unconditional back-edges and aims
to rotate the loop in a way that the final block is emitted
before the loop header, essentially turning

BB1:
  if ()
    goto BB3;
BB2:
  <loop body>
  goto BB1;
BB3:
  ...

into

  goto BB1;
BB2:
  <loop body>
BB1:
  if(!cond)
    goto BB2;
BB3:
  ...

Totals from 4969 (5.89% of 84383) affected shaders: (Navi48)

Instrs: 15253038 -> 15254019 (+0.01%); split: -0.00%, +0.01%
CodeSize: 81225300 -> 81227696 (+0.00%); split: -0.02%, +0.02%
Latency: 320796283 -> 320693480 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 51395922 -> 51376156 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ade5e300ab aco/insert_delay_alu: handle loop latch block before loop body
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
102aca9843 aco/assembler: emit block_kind_loop_latch before the loop header
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
da1594f8bb aco: introduce notion of block_kind_loop_latch
A block annotated with block_kind_loop_latch denotes a block
the re-entry point for a loop back-edge. It is emitted after
the loop preheader and (potentially) before the loop header.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
9887ce6709 aco/print_asm: Sort block markers by block offset
We are going to emit blocks in a different order.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
800a4957bb aco/lower_branches: Consider branch target of nested conditional branches
Totals from 1470 (1.74% of 84383) affected shaders: (Navi48)

Instrs: 5128451 -> 5126842 (-0.03%)
CodeSize: 29359832 -> 29353656 (-0.02%); split: -0.02%, +0.00%
Latency: 41047203 -> 41040786 (-0.02%)
InvThroughput: 6040459 -> 6039619 (-0.01%); split: -0.01%, +0.00%
Branches: 146219 -> 144648 (-1.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
fbf2083b8f aco/isel: Don't emit ELSE side of divergent branches which jump
Totals from 50 (0.06% of 84383) affected shaders: (Navi48)

Instrs: 402490 -> 402444 (-0.01%); split: -0.01%, +0.00%
CodeSize: 2239024 -> 2238864 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1493 -> 1496 (+0.20%)
Latency: 5836785 -> 5836747 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1120893 -> 1120909 (+0.00%); split: -0.00%, +0.00%
Copies: 46128 -> 46082 (-0.10%)
VALU: 222708 -> 222715 (+0.00%); split: -0.00%, +0.00%
SALU: 53039 -> 52993 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ba32219cf8 aco/isel: Don't emit ELSE side of uniform branches which jump
Totals from 4 (0.00% of 84383) affected shaders: (Navi48)

Instrs: 16473 -> 16468 (-0.03%)
CodeSize: 85276 -> 85300 (+0.03%)
SpillSGPRs: 175 -> 176 (+0.57%)
Latency: 267907 -> 267885 (-0.01%)
InvThroughput: 36302 -> 36298 (-0.01%)
Copies: 1353 -> 1345 (-0.59%)
VALU: 9025 -> 9029 (+0.04%)
SALU: 2635 -> 2627 (-0.30%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
96a639918c aco: don't emit p_logical_start / p_logical_end after divergent branches
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
3743230252 aco/isel: Do IF-simplification if that didn't happen during NIR optimizations
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:43 +00:00
Daniel Schürmann
50b093ec90 aco/builder: Fix v_add_co_u32 carry-out to VCC if post_ra
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:43 +00:00
Eric Engestrom
fb1cb00a96 radv/ci: add vulkan fluster job on navi48
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39861>
2026-02-13 13:48:03 +00:00
Samuel Pitoiset
1be4ffdff9 ac,radv,radeonsi: use correct swizzle/pitch for depth-only images with SDMA
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes new VKCTS coverage
dEQP-VK.api.copy_and_blit.core.use_after_copy.*.

is_stencil isn't set for RadeonSI because it doesn't do SDMA copies
with Z/S.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39800>
2026-02-13 07:52:29 +01:00
Samuel Pitoiset
4ec3840184 radv/meta: move the barrier for depth/stencil compute resolves outside
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This barrier is only needed for rendering resolves (ie. not for
vkCmdResolveImage()). It's similar to color compute resolves now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805>
2026-02-12 20:17:20 +00:00
Samuel Pitoiset
3e41b04de9 radv/meta: optimize a barrier with depth/stencil compute resolves
The compute resolve doesn't use HTILE of the destination image, so the
potential HTILE clear can run in parallel.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805>
2026-02-12 20:17:20 +00:00
Samuel Pitoiset
85a3f7816d radv/meta: add HTILE support to radv_fixup_resolve_dst_metadata()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805>
2026-02-12 20:17:20 +00:00
Samuel Pitoiset
6a454dabda radv/meta: stop fixing up HTILE after a partial resolve using compute
The decompression pass already resets HTILE to its uncompressed state,
so this is just redundant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805>
2026-02-12 20:17:19 +00:00
Samuel Pitoiset
c3cc6fd051 radv: cleanup barriers after a depth/stencil expand
Synchronize in radv_expand_depth_stencil() is more robust.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805>
2026-02-12 20:17:19 +00:00
Samuel Pitoiset
7dd7731ac7 radv/meta: fix partial depth/stencil resolves with compute
HTILE must be decompressed for partial resolves when the hw doesn't
write the decompressed DWORD to HTILE. The driver must also
synchronize the depth/stencil expand if using graphics (the compute
path is already correctly synchronized in the helper).

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39805>
2026-02-12 20:17:18 +00:00
David Rosca
5d4f977573 radv/video: Support UVD decode on hawaii and older
H264 requires extra allocation in DPB. Use helper function
to get the required size, same as we do for encode.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:27 +00:00
David Rosca
24c74f522c ac/vcn_dec: Make the helper functions static
They are only used in ac_vcn_dec.c now.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
7ad4f501fa radv: Drop videoarraypath debug option
It's not really usefull and only works for H264/5.
On AV1/VP9 it would cause hang.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
19a8b7121e radv/video: Remove old VCN and UVD decode implementation
Only ac_video_dec is now used.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
Benjamin Cheng
6aed906410 radv/video: Use ac_video_dec for decode
Supports VCN and UVD.

Co-authored-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
26979becec radeonsi/video: Add video decoder using ac_video_dec
Supports VCN, VCN JPEG and UVD.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
4d06fb9acd ac: Add UVD ac_video_dec implementation
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
9608abb26b ac: Add VCN JPEG ac_video_dec implementation
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
79af03556c ac: Add VCN ac_video_dec implementation
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:26 +00:00
David Rosca
b5028e84c8 ac: Add video decode interface
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39627>
2026-02-12 15:38:25 +00:00
Samuel Pitoiset
02a2451e1f radv: rename radv_image_use_dcc_image_stores()
To radv_image_compress_dcc_on_image_stores() because it seems more
informative.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803>
2026-02-12 15:18:26 +00:00
Samuel Pitoiset
d58080f787 radv/meta: add a function to fixup DCC metadata for compute resolves
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803>
2026-02-12 15:18:25 +00:00
Samuel Pitoiset
ed166804f6 radv/meta: remove an useless barrier when fixing up DCC for compute resolves
The resolve operation doesn't use DCC of the destination image, so the
clear can run in parallel.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803>
2026-02-12 15:18:25 +00:00
Samuel Pitoiset
a673c9e414 radv/meta: stop fixing up DCC after a partial resolve using compute
The decompression pass already resets DCC to its uncompressed state,
so this is just redundant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39803>
2026-02-12 15:18:25 +00:00
Konstantin Seurer
f574de2249 radv: Fix setting the viewport for depth stencil FS resolves
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 704fbbb ("radv/meta: rework depth/stencil resolves using graphics")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39836>
2026-02-12 14:25:31 +00:00
Konstantin Seurer
bc86c5adae radv: Stop saving descriptors before acceleration structure OPs
They only use compute+constants.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39836>
2026-02-12 14:25:31 +00:00