Commit graph

19643 commits

Author SHA1 Message Date
Samuel Pitoiset
fb43d7bff2 ac/perfcounter: re-order GPU perf blocks on GFX12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39199>
2026-01-12 08:10:31 +00:00
Samuel Pitoiset
3b6ff80d48 ac/perfcounter: define more GPU blocks on GFX12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39199>
2026-01-12 08:10:31 +00:00
Samuel Pitoiset
eb37d6ceb7 ac/perfcounter: fix computing number of 16-bit/32-bit SPM counters
Determine them only when both are explicitly 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39199>
2026-01-12 08:10:30 +00:00
Samuel Pitoiset
d1efdc7e76 ac/perfcounter: fix number of 32-bit SPM counters
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39199>
2026-01-12 08:10:29 +00:00
Samuel Pitoiset
9fe57d3882 ac/spm: define new per-shader engine blocks
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39199>
2026-01-12 08:10:29 +00:00
Samuel Pitoiset
60fac38491 ac/spm: fix typo in one GPU perf block name
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39199>
2026-01-12 08:10:29 +00:00
Samuel Pitoiset
db02077c8a radv: remove extra instructions after UNREACHABLE
Minor cleanups.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39237>
2026-01-12 07:41:08 +00:00
Samuel Pitoiset
e1e2517664 radv: use UNREACHABLE for illegal texture filter
Found this with a broken CTS test, way easier to crash for isolating
the test case.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39237>
2026-01-12 07:41:08 +00:00
Samuel Pitoiset
91e0f8f1e5 radv/rt: fix a compilation warning about uninitialized fields
Just zero-initialize the layout struct to fix the following warning
because radv_use_bvh8() might return FALSE.

../src/amd/vulkan/radv_acceleration_structure.c: In function ‘radv_update_as_gfx12’:
../src/amd/vulkan/radv_acceleration_structure.c:873:70: warning: ‘layout.bounds_offsets’ may be used uninitialized [-Wmaybe-uninitialized]
  873 |       .bounds = state->build_info->scratchData.deviceAddress + layout.bounds_offsets,
      |                                                                ~~~~~~^~~~~~~~~~~~~~~
../src/amd/vulkan/radv_acceleration_structure.c:866:33: note: ‘layout.bounds_offsets’ was declared here
  866 |    struct update_scratch_layout layout;
      |                                 ^~~~~~

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39228>
2026-01-12 07:18:50 +00:00
Konstantin Seurer
077292f65b radv/bvh: Use box16 nodes when bvh8 is not used
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Using box16 nodes trades bvh quality for memory bandwidth which seems to
be roughly equal in performance.

Stats assuming box16 nodes are as expensive as box32 nodes:
Totals from 7668 (79.68% of 9624) affected BVHs:
compacted_size: 951666944 -> 742347648 (-22.00%)
max_depth: 57606 -> 57615 (+0.02%)
sah: 129114796242 -> 129998517775 (+0.68%); split: -0.00%, +0.68%
scene_sah: 188564162 -> 192063633 (+1.86%); split: -0.02%, +1.88%
box16_node_count: 0 -> 3270600 (+inf%)
box32_node_count: 3365707 -> 95100 (-97.17%)

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37883>
2026-01-10 11:36:28 +01:00
Konstantin Seurer
543a88af99 radv/bvh: Add radv_aabb16 and use it for box16 nodes
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37883>
2026-01-10 11:36:19 +01:00
Konstantin Seurer
fefdad9249 radv/rra: Count box16 nodes properly
Otherwise rra won't allocate memory when loading the capture.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37883>
2026-01-10 11:34:18 +01:00
Konstantin Seurer
39d58a55a7 aco: Add support to f2f16 with rtpi/rtni
Those rounding modes are needed when computing 16-bit bounding boxes
since the bounding box must not get smaller.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37883>
2026-01-10 11:34:12 +01:00
Alyssa Rosenzweig
235e868ef7 ac/nir: use nir_is_shared_access
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39219>
2026-01-09 20:51:13 +00:00
Benjamin Cheng
499d9e2e98 radv/video: Allow aliasing of video images
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39109>
2026-01-09 13:52:56 +00:00
Georg Lehmann
6d07a56c6a ac/nir/lower_ps_late: preserve signed zero, inf, nan for exports
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39187>
2026-01-09 11:58:52 +00:00
Georg Lehmann
84ecac58a6 ac/nir/opt_pack_half: preserve fp_math_ctrl
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39187>
2026-01-09 11:58:52 +00:00
Georg Lehmann
5241343ccb ac/nir/lower_sin_cos: preserve fp_math_ctrl
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39187>
2026-01-09 11:58:52 +00:00
Georg Lehmann
9331726157 ac/nir/lower_sin_cos: use nir_shader_alu_pass
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39187>
2026-01-09 11:58:52 +00:00
Samuel Pitoiset
4fa20bacac radv/ci: document a regression with transfer queue on RENOIR
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Weird that only RENOIR fails given that ASTC/ETC2 aren't natively
supported too.

Needs to be investigated but SDMA supports these formats to some
extent it seems.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39230>
2026-01-09 10:47:31 +00:00
Samuel Pitoiset
edb730f647 radv: fix flushing gang semaphore with SDMA/ACE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
If the main CS is SDMA and the gang CS is ACE, this would emit a
SDMA_FENCE packet on ACE which just hangs.

Fixes: b1938901d0 ("radv: Use SDMA fence packet when flushing gang semaphores")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39211>
2026-01-09 09:07:45 +00:00
Natalie Vock
60dd9d797e aco: Swizzle ray launch IDs in the RT prolog
This converts from 1D workgroups to 2D ray launch IDs entirely via
shader ALU, including handling partial/cut-off workgroups optimally.

Doing this entirely in-shader means it Just Works(TM) with indirect
dispatches as well. Previous approaches manipulating various things on
CPU depending on the dispatch size couldn't handle indirect dispatches.

The swizzle implemented here also swizzles with a recursive Z-order
pattern, which should be a little more optimal than arranging
invocations linearly within the wave.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>
2026-01-08 19:49:55 +01:00
Natalie Vock
1f6ac3fa93 radv/rt,aco: Always dispatch 1D workgroups for RT
We will swizzle the workgroups ourselves in the next commit.
Removes the need for 1D dispatch workarounds.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>
2026-01-08 19:49:54 +01:00
Natalie Vock
8baa95e4aa radv/rt: Use subgroup invocation for stack index
Workgroup == subgroup anyway, and we don't have the workgroup thread IDs
in RT shaders.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>
2026-01-08 19:49:45 +01:00
Georg Lehmann
330e88abb8 amd/drm-shim: add vega20
Vega20 ISA is different enough from Vega10 that having it
in drm-shim is useful for testing compiler changes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39188>
2026-01-08 09:30:54 +00:00
Pierre-Eric Pelloux-Prayer
bfa8dcf3b3 ac/sdma: fix src/dst pitch for sdma < 4
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes DRM_PRIME with AMD_DEBUG=notiling.

Fixes: 5f8fa6ae03 ("ac,radv,radeonsi: add ac_emit_sdma_copy_linear_sub_window()")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39019>
2026-01-08 09:45:58 +01:00
Pierre-Eric Pelloux-Prayer
2f347b5725 ac/sdma: fix ac_sdma_get_tiled_header_dword for older gen
The header should be 0 for older sdma as well. This fixes
DRI_PRIME support for radeonsi.

Fixes: f5ecc5ffd5 ("ac,radv,radeonsi: add ac_emit_sdma_copy_tiled_sub_window()")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39019>
2026-01-08 09:45:48 +01:00
Georg Lehmann
a706769a0b nir: move exact bit to nir_fp_math_control
Unifies nir per instruction float control.

In the future this can be split into contract/reassoc/transform
like SPIR-V.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (except SPIR-V)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
2026-01-07 09:40:57 +00:00
Georg Lehmann
eb4737a1dd nir: add nir_alu_instr_is_exact helper
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
2026-01-07 09:40:57 +00:00
Marek Olšák
492a176cbb util: increase SHA1_DIGEST_LENGTH to 32 (BLAKE3_KEY_LEN)
The last 12 bytes are always 0 for now. With this, all SHA1 functions
can be internally implemented as BLAKE3, so that we can switch everything
to BLAKE3 by only changing the implementation of the sha1 utility.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39110>
2026-01-07 08:32:33 +00:00
Marek Olšák
1912a00a91 ALL: use SHA1_DIGEST_LENGTH etc. instead of hardcoding the numbers
only build_id is switched to use literal 20 instead of SHA1_DIGEST_LENGTH
because we will increase SHA1_DIGEST_LENGTH to BLAKE3_KEY_LEN

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39110>
2026-01-07 08:32:33 +00:00
Samuel Pitoiset
9f5dd888b6 radv/sqtt: add a comment about the allocation strategy of the SQTT BO
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39172>
2026-01-07 06:57:29 +00:00
Samuel Pitoiset
ffa343ed05 Revert "radv: allocate the SQTT BO in GTT for faster readback"
This reverts commit da07f1ef3f.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14591
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39172>
2026-01-07 06:57:29 +00:00
Marek Olšák
13cfd0176c ac/gpu_info: add #define AMD_MEMCHANNEL_INTERLEAVE_BYTES
radeon_info::pipe_interleave_bytes is renamed to r600_pipe_interleave_bytes
where it can be 512 on some chips.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39120>
2026-01-06 20:32:10 +00:00
Marek Olšák
92133bb0ab amd: demystify various optimizations we already have for memory channels
Explain why we do what we do, and use the radeon_info field properly.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39120>
2026-01-06 20:32:10 +00:00
Samuel Pitoiset
4fce09268a ac/perfcounter: rename ac_pc_block::num_global_instances to num_instances
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39155>
2026-01-06 11:43:21 +00:00
Samuel Pitoiset
59dc20262c ac/perfcounter: rename ac_pc_block::num_instances to num_scoped_instances
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39155>
2026-01-06 11:43:21 +00:00
Samuel Pitoiset
3658d9588f ac/perfcounter: add num_{16,32}bit_spm_counters to GPU blocks
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39155>
2026-01-06 11:43:21 +00:00
Samuel Pitoiset
9a1b925400 ac/spm: use GPU block distribution mode to determine instances
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39155>
2026-01-06 11:43:20 +00:00
Samuel Pitoiset
ac0423eb3f ac/spm: use GPU block distribution mode to determine broadcasting
SPM is only implemented for GFX10+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39155>
2026-01-06 11:43:19 +00:00
Rhys Perry
614437ead5 ac/nir: don't vectorize 16-bit shared loads to 8-bit
Fixes an issue where a 2x16 load was vectorized with a 1x16 load into a
8x8 load. This became possible after 49d923078f increased
aligned_new_size from 6 bytes to 8 bytes.

fossil-db (navi31):
Totals from 5 (0.01% of 79825) affected shaders:
Instrs: 6994 -> 6257 (-10.54%)
CodeSize: 44000 -> 39464 (-10.31%)
Latency: 90482 -> 89795 (-0.76%)
InvThroughput: 202955 -> 201926 (-0.51%)
VClause: 560 -> 565 (+0.89%)
Copies: 1135 -> 1108 (-2.38%); split: -2.82%, +0.44%
PreVGPRs: 201 -> 199 (-1.00%)
VALU: 3882 -> 3201 (-17.54%)
SALU: 493 -> 479 (-2.84%)
VOPD: 262 -> 258 (-1.53%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 49d923078f ("ac/nir: fix calculation of aligned_new_size")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14500
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39156>
2026-01-06 10:28:02 +00:00
Marek Olšák
3552028e87 ac/lower_ngg_mesh: fix a segfault accessing out_variables out of bounds
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
component_addr_off (in bytes) was used to offset a component index (in dwords).

Cc: mesa-stable

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39144>
2026-01-05 22:23:42 +00:00
Marek Olšák
b340588119 ac/gpu_info: don't read uninitialized dev_filename
This fixes radeonsi-run-tests.py not being able to read AMD_DEBUG=info.

Fixes: 8777894d3e - amd: remove radeon_info::dev_filename

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39145>
2026-01-05 21:56:36 +00:00
Konstantin Seurer
405c93c665 radv: Optimize BVH4 acceleration structure updates
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It is more efficient to compute the child index of the current node
inside the parent node and write the bounds when available. The previous
code could load up to 16 AABBs to compute the new ones. The new code
also only needs 1/7 of the previously used scratch memory. The new code
seems to be around 30% faster (0.5ms) in GOTG on a 6700XT.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39139>
2026-01-05 15:24:54 +00:00
Daniel Schürmann
2d0d5fc104 aco/validate: validate constant bus limit after register allocation based on PhysReg
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>
2026-01-05 14:54:00 +00:00
Daniel Schürmann
eb16f701a6 aco/tests: Add new test to pack 2x16 SGPRs into VGPR
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>
2026-01-05 14:54:00 +00:00
Daniel Schürmann
61c1ec541d aco/tests: Add test for subdword extraction from SGPR
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>
2026-01-05 14:54:00 +00:00
Daniel Schürmann
0674c9d30e aco/validate: Validate correct RegisterClasses after lowering to HW instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>
2026-01-05 14:53:59 +00:00
Daniel Schürmann
b087bf2fbf aco/lower_to_hw: Fix SGPR Operand RegClasses for pack_2x16
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>
2026-01-05 14:53:59 +00:00
Daniel Schürmann
9f5996ae8a aco/lower_to_hw: Don't use 2 SGPR operands before GFX10 in a single VOP3 instruction in do_pack_2x16()
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>
2026-01-05 14:53:58 +00:00