Rhys Perry
0b0e124a73
aco: use lv1.resize() pattern
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39537 >
2026-01-28 16:46:30 +00:00
Rhys Perry
5f5032bb6a
aco: use lv1/lv2 instead of v1/v2.as_linear()
...
This is just a search+replace then clang-format.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39537 >
2026-01-28 16:46:30 +00:00
Rhys Perry
c98204c963
aco: add lv1/lv2 as alias for v1/v2.as_linear()
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39537 >
2026-01-28 16:46:29 +00:00
Samuel Pitoiset
50a3699552
radv: advertise VK_KHR_internally_synchronized_queues
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39489 >
2026-01-28 15:32:58 +00:00
David Rosca
2f0d18f6af
radv/video: Use coded size from session params instead of codedExtent
...
cef8eff74d ("radv/video: Override H265 SPS unaligned resolutions")
fixes the case where app specifies resolution with lower than required
alignment. But in case of higher alignment, the stream is still not
going to be correctly decodable.
Use size from session params to set the coded size, instead of using
codedExtent of input image.
Only use codedExtent to calculate padding.
Fixes dEQP-VK.video.encode.h265.quantization_map_delta*
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39529 >
2026-01-28 12:46:29 +00:00
Samuel Pitoiset
83fabf7d41
radv: rework app workarounds implemented using internal layers
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Just override the needed entrypoints.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39549 >
2026-01-28 11:46:25 +00:00
Samuel Pitoiset
875b6ab951
radv/sqtt: reduce the number of timed cmdbufs
...
Use the same for post/pre GPU timestamps when possible.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39174 >
2026-01-28 11:11:24 +00:00
Samuel Pitoiset
4508518f8e
radv/sqtt: rework acquiring timed cmdbufs
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39174 >
2026-01-28 11:11:24 +00:00
Samuel Pitoiset
553179ab73
radv/sqtt: rework acquiring GPU timestamps
...
To acquire all GPU timestamp objects at the same time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39174 >
2026-01-28 11:11:24 +00:00
Georg Lehmann
2d38da94d4
aco: allow v_cmpx with DPP
...
The wording in the RDNA3 ISA doc was since clarified, v_cmpx with DPP
behaves exactly like one would expect:
FI controls whether the source value can be read from inactive lanes,
but inactive lanes always write a 0 bit. The same applies to v_cmp with DPP.
Foz-DB Navi48:
Totals from 987 (1.20% of 82405) affected shaders:
Instrs: 517003 -> 516445 (-0.11%); split: -0.11%, +0.00%
CodeSize: 2782688 -> 2780508 (-0.08%); split: -0.08%, +0.00%
Latency: 2059169 -> 2056327 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 365374 -> 365328 (-0.01%); split: -0.03%, +0.01%
Copies: 64669 -> 65616 (+1.46%)
SALU: 70693 -> 70652 (-0.06%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:51 +00:00
Georg Lehmann
1c1bd9d090
aco: only apply DPP with 3 or less uses
...
Creating many new DPP instructions increases code size and decreases throughput.
Foz-DB Navi48:
Totals from 2196 (2.67% of 82179) affected shaders:
MaxWaves: 59930 -> 59960 (+0.05%); split: +0.08%, -0.03%
Instrs: 3718514 -> 3718298 (-0.01%); split: -0.08%, +0.07%
CodeSize: 20593544 -> 20507660 (-0.42%); split: -0.43%, +0.02%
VGPRs: 135924 -> 135744 (-0.13%); split: -0.17%, +0.04%
Latency: 33174704 -> 33163001 (-0.04%); split: -0.07%, +0.04%
InvThroughput: 6500723 -> 6491382 (-0.14%); split: -0.15%, +0.01%
VClause: 72348 -> 72343 (-0.01%); split: -0.06%, +0.05%
SClause: 83160 -> 83165 (+0.01%); split: -0.03%, +0.04%
Copies: 286592 -> 285575 (-0.35%); split: -0.45%, +0.09%
Branches: 99970 -> 99971 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 103280 -> 103279 (-0.00%)
PreVGPRs: 95590 -> 95440 (-0.16%); split: -0.30%, +0.14%
VALU: 1931369 -> 1931725 (+0.02%); split: -0.08%, +0.09%
SALU: 637663 -> 636780 (-0.14%); split: -0.15%, +0.01%
VOPD: 65236 -> 65589 (+0.54%); split: +0.91%, -0.37%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:51 +00:00
Georg Lehmann
bb6a3e2891
aco/optimizer: rework how dpp is applied
...
Using the common helpers means we can use VINTERP instead of DPP,
which has higher throughput and smaller CodeSize.
Foz-DB Navi48:
Totals from 986 (1.20% of 82405) affected shaders:
Instrs: 1985282 -> 1985545 (+0.01%); split: -0.01%, +0.02%
CodeSize: 11179700 -> 11151780 (-0.25%); split: -0.26%, +0.01%
Latency: 19899190 -> 19897694 (-0.01%); split: -0.01%, +0.01%
InvThroughput: 4110650 -> 4104911 (-0.14%)
VClause: 44143 -> 44139 (-0.01%); split: -0.03%, +0.02%
Copies: 164340 -> 164344 (+0.00%); split: -0.02%, +0.02%
VALU: 1061904 -> 1061908 (+0.00%); split: -0.00%, +0.00%
SALU: 305980 -> 305974 (-0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:51 +00:00
Georg Lehmann
228cb29dae
aco/optimizer: allow DPP with scalar src1 in alu_opt_info_is_valid
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:51 +00:00
Georg Lehmann
d4c0318f48
aco: apply DPP with scalar src1 on gfx11.5+
...
Foz-DB Navi48:
Totals from 6261 (7.62% of 82179) affected shaders:
MaxWaves: 176284 -> 176236 (-0.03%); split: +0.01%, -0.03%
Instrs: 5850185 -> 5828451 (-0.37%); split: -0.41%, +0.04%
CodeSize: 31363324 -> 31419904 (+0.18%); split: -0.08%, +0.26%
VGPRs: 328284 -> 328200 (-0.03%); split: -0.07%, +0.05%
SpillSGPRs: 2268 -> 2256 (-0.53%)
Latency: 50235516 -> 50218816 (-0.03%); split: -0.06%, +0.03%
InvThroughput: 8256243 -> 8242036 (-0.17%); split: -0.22%, +0.05%
VClause: 81000 -> 80975 (-0.03%); split: -0.11%, +0.08%
SClause: 136376 -> 136387 (+0.01%); split: -0.11%, +0.11%
Copies: 414021 -> 417894 (+0.94%); split: -0.13%, +1.07%
Branches: 105301 -> 105298 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 291360 -> 291432 (+0.02%)
PreVGPRs: 238593 -> 238729 (+0.06%); split: -0.02%, +0.08%
VALU: 3425446 -> 3403463 (-0.64%); split: -0.65%, +0.01%
SALU: 815505 -> 819372 (+0.47%); split: -0.02%, +0.50%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:51 +00:00
Georg Lehmann
3fe329b3d0
aco/ra: don't move sgpr into v_fmac_f32_dpp src0
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:50 +00:00
Georg Lehmann
903d940fa9
aco: don't convert VOP3P to VOP3 when applying DPP
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:50 +00:00
Georg Lehmann
8ac7b9fc37
aco: undo operand swap if applying DPP fails
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:50 +00:00
Georg Lehmann
531228159f
aco/validate: allow dpp with scalar src1 on gfx11.5+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:50 +00:00
Georg Lehmann
140ca3bb50
aco: disable DPP for rev integer subs and shifts
...
It is not documented anywhere, but at least on gfx12 and gfx10.3
DPP is applied to src1 instead of src0.
This might be useful for shifts, but to be safe just disable DPP
completely for now.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14739
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:49 +00:00
Georg Lehmann
510dbbae7f
aco/optimizer: use opcode_supports_dpp
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:49 +00:00
Georg Lehmann
8e99bf5380
aco: add a helper function for non supported DPP opcodes
...
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516 >
2026-01-27 20:42:49 +00:00
Georg Lehmann
4b1996b1c7
aco: fix demote in header of single iteration loop
...
The control is not divergent before a divergent break in a single iteration loop,
but we already pushed the loop mask on the stack.
Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14733
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39528 >
2026-01-27 17:39:05 +00:00
Samuel Pitoiset
5709644f2c
radv: optimize barriers when clearing HiZ on GFX12
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
HiZ must only be cleared when the full HiZ workaround is enabled. This
means that the previous slow clear draw would disable HiZ because it
hits the conditions (ie. depth/stencil enable and depth writes enabled).
So, the draw and the dispatch can run in parallel by moving the barrier
earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39433 >
2026-01-27 14:37:01 +00:00
Samuel Pitoiset
96829d6c5e
radv/meta: return the flush bits from radv_clear_hiz()
...
Similar to other functions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39433 >
2026-01-27 14:37:01 +00:00
Samuel Pitoiset
5911ba5ff5
radv/meta: fix 3D color resolves with compute when base slice isn't zero
...
Needs to consider the base offset, otherwise it's resolving to the
first 3D slice.
Fixes very recent VKCTS coverage dEQP-VK.pipeline.*.multisample.m10_resolve.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39393 >
2026-01-27 14:14:19 +00:00
Hans-Kristian Arntzen
42f021fc29
radv: Enable EXT_present_timing.
...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770 >
2026-01-27 11:09:51 +00:00
Samuel Pitoiset
14d3fb5f1b
radv: add a workaround for a synchronization bug in Strange Brigade Vulkan
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This game has broken synchronization reported by VVL and it indeed
doesn't wait for idle right before present. Workaround this by
injecting a full barrier (easier than rewriting the dep struct).
This only applies to the Vulkan backend.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14705
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39480 >
2026-01-27 09:18:25 +00:00
Wang Ruitang
e11c04c0cc
amd/common/virtio: use device fd to init sync provider
...
Use fd after dup instead of the one before dup to avoid
drm_syncobj_find failed in guest kernel when dev is found in
dev_list.
When dev is not found in dev_list, it uses device fd which is
duplicated, to init sync provider. And when it's found, the same
device fd should be used. Otherwise, it would caused inconsistency
and failures like in the Android domU CTS test where the guest
kernel attempts to locate a syncobj. This occurs because
vdrm_device_connect and VIRTGPU_EXECBUFFER ioctl use fd after dup
while util_sync_provider_drm uses the one before dup.
The fix has been validated with the CtsSdkSandboxWebkitTestCases in
Android domU, and the previously failing test cases no longer occur.
Signed-off-by: Ruitang.Wang@amd.com
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39520 >
2026-01-27 08:24:35 +00:00
David Rosca
62f07b8c63
radeonsi/vcn: Add low latency decode debug option
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Similar to the low latency option for encode, this reduces latency
of decoding at the cost of increased power usage.
Can be enabled with AMD_DEBUG=lowlatencydec
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39450 >
2026-01-26 15:00:06 +00:00
Benjamin Cheng
c10ebb0fda
radv/video: Use a more reliable way of computing tile sizes
...
Some apps (old FFmpeg, contemporary CTS) send down pMi{Col,Row}Starts in
SB units, not MI units. Instead of dependening on those values which
could be unreliable, derive the tile sizes in SB using other parameters.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39492 >
2026-01-26 14:41:20 +00:00
Georg Lehmann
809fb0fba3
ac/nir/lower_ps_late: emit scalar f2f16_rtz for when one half of a packed export is undef
...
Foz-DB Navi48:
Totals from 7200 (8.74% of 82405) affected shaders:
Instrs: 9056391 -> 9048177 (-0.09%); split: -0.09%, +0.00%
CodeSize: 48681288 -> 48640684 (-0.08%); split: -0.09%, +0.00%
VGPRs: 413088 -> 413784 (+0.17%)
Latency: 76340711 -> 76320080 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 12692959 -> 12684618 (-0.07%); split: -0.07%, +0.00%
VClause: 148823 -> 148821 (-0.00%)
Copies: 601739 -> 601874 (+0.02%); split: -0.01%, +0.03%
VALU: 5213356 -> 5207253 (-0.12%); split: -0.12%, +0.00%
SALU: 1160815 -> 1160817 (+0.00%); split: -0.00%, +0.00%
VOPD: 79520 -> 79444 (-0.10%); split: +0.09%, -0.18%
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:23 +00:00
Georg Lehmann
8c895c5c61
ac/nir/lower_ps_late: CSE partial packed exports
...
Foz-DB Navi48:
Totals from 425 (0.52% of 82405) affected shaders:
Instrs: 1110029 -> 1109658 (-0.03%); split: -0.03%, +0.00%
CodeSize: 6135272 -> 6133848 (-0.02%); split: -0.02%, +0.00%
VGPRs: 29856 -> 29844 (-0.04%)
Latency: 10258411 -> 10258043 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1898177 -> 1897661 (-0.03%)
Copies: 88221 -> 88173 (-0.05%)
VALU: 575276 -> 574894 (-0.07%)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:22 +00:00
Georg Lehmann
e74323577f
aco/optimizer: optimize pack(undef, f2f16_rtz(a)) for salu
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:22 +00:00
Georg Lehmann
6cbd16daae
aco/optimizer: optimize pack(undef, f2f16_rtz(a)) for gfx8+
...
Do this late because the v_cvt_pkrtz_f16_f32 can be applied to
its operand.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:22 +00:00
Georg Lehmann
57ca974d1d
aco/optimizer: optimize pack(undef, f2f16_rtz(a)) for gfx6/7
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:21 +00:00
Georg Lehmann
ba73792de0
aco/optimizer: fix parsing salu p_insert as shift
...
Fixes: 88f7e3fff3 ("aco/optimizer: parse pseudo alu instructions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:21 +00:00
Georg Lehmann
830d6de9ff
aco/isel: optimize pack_32_2x16_split(undef, const)
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412 >
2026-01-26 10:54:20 +00:00
Rhys Perry
928ecfc6c0
radv: fix RADV_DEBUG=shaderstats with RT pipelines
...
radv_dump_shader_stats() printed stats for every shader with a certain
stage, and we called this function each time an RT shader is compiled.
This means we could repeat the stats for a shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39484 >
2026-01-26 09:26:14 +00:00
Rhys Perry
e59a0df302
aco/insert_fp_mode: remove incorrect assertion
...
This can happen if a loop has no continues, and the later code should work
fine in this situation.
This fixes war_thunder/0013a69e097b2471 on navi21.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b9d28ab9b ("aco/insert_fp_mode: insert fp mode in reverse")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39481 >
2026-01-26 08:57:33 +00:00
Samuel Pitoiset
c91ed27582
radv: use the SQTT enable bit for PKT3_DISPATCH_TASKMESH_INDIRECT_MULTI_ACE
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39425 >
2026-01-26 08:10:53 +00:00
Samuel Pitoiset
e272c8062d
radv: use the SQTT enable bit for PKT3_DISPATCH_MESH_INDIRECT_MULTI
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39425 >
2026-01-26 08:10:53 +00:00
Samuel Pitoiset
c7da19e2bf
radv: use the SQTT enable bit for PKT3_DRAW_{INDEX}_INDIRECT_MULTI
...
This reports more info in RGP.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39425 >
2026-01-26 08:10:52 +00:00
Samuel Pitoiset
e5982496f6
radv: move emitting SQTT markers closer to the draw/dispatch packets
...
Some packets already include a SQTT enable bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39425 >
2026-01-26 08:10:52 +00:00
Georg Lehmann
5827de9cd6
aco/gfx12: use 64bit add/sub to swap sgprs
...
Not writing SCC requires less instructions and gives the scheduler more
freedom.
Foz-DB GFX1201:
Totals from 114 (0.14% of 82179) affected shaders:
Instrs: 276265 -> 275791 (-0.17%)
CodeSize: 1460504 -> 1458504 (-0.14%)
Latency: 902933 -> 902548 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 166517 -> 166512 (-0.00%)
SClause: 6703 -> 6698 (-0.07%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39329 >
2026-01-23 10:13:19 +00:00
Georg Lehmann
763b4f1f0a
radv/gfx11: add a RADV_PERFTEST flag to expose bfloat16 cmat
...
This doesn't pass CTS because of precision issues, but might still be useful.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14699
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39456 >
2026-01-23 09:41:20 +00:00
Marek Olšák
ebeb904c95
ac,radeonsi: set optimal COMPUTE_DISPATCH_INTERLEAVE for buffer clears/copies
...
Small buffer clears are a bit faster now.
The numbers were tuned specifically for this compute shader.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39290 >
2026-01-22 22:28:39 +00:00
Marek Olšák
a5e1d31dad
ac/nir/meta: tune 12B clear buffer performance for gfx12
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39290 >
2026-01-22 22:28:39 +00:00
Marek Olšák
9257cf04a1
ac/nir/meta: tune image clear & copy performance for gfx12
...
Compute shaders are the fastest for all copies and some clears.
Note that this is a very different compute shader than the one in RADV.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39290 >
2026-01-22 22:28:38 +00:00
Natalie Vock
15328a5ef3
aco: Fix parameter stack size calculation
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This only accounted for 1/32 (or 1/64) of the actual parameter size. In
some cases this meant that some threads were smashing other threads'
stacks.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39455 >
2026-01-22 22:02:31 +00:00
jaap aarts
8f7941f92d
radv/sqtt: Prevent concurrent submit when sqtt is enabled
...
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39090 >
2026-01-21 18:55:56 +00:00