Rhys Perry
b18421ae3d
amd/lower_mem_access_bit_sizes: fix shared access when bytes<bit_size/8
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This can happen with (for example) 32x2 loads with
align_mul=4,align_offset=2.
This patch does bit_size=min(bit_size,bytes) to prevent num_components
from being 0.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 52cd5f7e69 ("ac/nir_lower_mem_access_bit_sizes: Split unsupported shared memory instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:34 +00:00
Rhys Perry
e89b22280f
amd/lower_mem_access_bit_sizes: be more careful with 8/16-bit scratch load
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.3
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:34 +00:00
Rhys Perry
8829fc3bd6
amd/lower_mem_access_bit_sizes: improve subdword/unaligned SMEM lowering
...
Summary of changes:
- handle unaligned 16-bit scalar loads when supported_dword=true
- increases the size of 8/16/32/64-bit buffer loads which are not dword
aligned, which can create less SMEM loads.
- handles when "bytes" is less than "bit_size / 8"
fossil-db (gfx1201):
Totals from 26 (0.03% of 79839) affected shaders:
Instrs: 12676 -> 12710 (+0.27%); split: -0.30%, +0.57%
CodeSize: 67272 -> 67384 (+0.17%); split: -0.24%, +0.40%
Latency: 44399 -> 44375 (-0.05%); split: -0.09%, +0.04%
SClause: 352 -> 344 (-2.27%)
SALU: 3972 -> 3992 (+0.50%)
SMEM: 554 -> 528 (-4.69%)
fossil-db (navi21):
Totals from 6 (0.01% of 79825) affected shaders:
Instrs: 2192 -> 2186 (-0.27%)
CodeSize: 12188 -> 12140 (-0.39%)
Latency: 10037 -> 10033 (-0.04%); split: -0.12%, +0.08%
SMEM: 124 -> 118 (-4.84%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:34 +00:00
Rhys Perry
79b2fa785d
amd/lower_mem_access_bit_sizes: don't create subdword UBO loads with LLVM
...
These are unsupported.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14127
Fixes: fbf0399517 ("amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953 >
2025-10-21 22:10:33 +00:00
Samuel Pitoiset
7cd12e5c6a
amd: move CP emit helpers to ac_cmdbuf_cp.c/h
...
Seems more organized this way.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:31:20 +02:00
Samuel Pitoiset
e0ffc41d9a
amd,radv: move SDMA utility helpers to common code
...
Only simple ones for now. Other functions need more rework.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:31:20 +02:00
Samuel Pitoiset
4989b6e6b9
amd,radv,radeonsi: add ac_emit_cp_write_data_{head}()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:31:20 +02:00
Samuel Pitoiset
ed7f9df864
amd: add a predicate parameter to ac_emit_cp_copy_data()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:31:20 +02:00
Samuel Pitoiset
29c2d02d64
amd,radv,radeonsi: add ac_emit_cp_load_context_reg_index()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:31:20 +02:00
Samuel Pitoiset
c7c237dd27
amd,radv,radeonsi: add ac_emit_cp_nop()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:31:13 +02:00
Samuel Pitoiset
5801986f53
amd: add missing _cp_ to some emit helpers
...
Just for consistency with other helpers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:30:34 +02:00
Samuel Pitoiset
a0117b5e74
amd,radv: add ac_emit_cp_atomic_mem()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37881 >
2025-10-21 13:30:34 +02:00
Georg Lehmann
9e41a7c139
treewide: use nir_load_global alias of nir_build_load_global
...
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37959 >
2025-10-21 12:37:58 +02:00
Timur Kristóf
d20049b430
ac/nir/ngg_mesh: Lower num_subgroups to constant
...
Mesh shader workgroups always have the same amount of subgroups.
When the API workgroup size is the same as the real workgroup
size, this is a small optimization (using a constant instead of
a shader arg).
When the API workgroup size is smaller than the real workgroup
size (eg. when the number of output vertices or primitves is
greater than the API workgroup size on RDNA 2), this fixes a
potential bug because num_subgroups would return the "real"
workgroup size instead of the API one.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37947 >
2025-10-20 14:05:40 +00:00
Samuel Pitoiset
abcaa46f6c
amd,radv,radeonsi: add ac_cmdbuf_flush_vgt_streamout()
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:41 +00:00
Samuel Pitoiset
679332f9a9
amd,radv,radeonsi: add ac_emit_cp_acquire_mem()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:40 +00:00
Samuel Pitoiset
9ad7fb8569
amd,radv,radeonsi: add ac_emit_cp_gfx_scratch()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:40 +00:00
Samuel Pitoiset
9ff8e71b4e
amd,radv,radeonsi: add ac_emit_cp_tess_rings()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:39 +00:00
Samuel Pitoiset
47a64f5b6f
amd,radv,radeonsi: add ac_emit_cp_gfx11_ge_rings()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:38 +00:00
Samuel Pitoiset
044bafb6ac
amd: add a predicate parameter to ac_emit_cp_pfp_sync_me()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:36 +00:00
Samuel Pitoiset
48b4a43e8f
amd,radv,radeonsi: add ac_emit_cp_set_predication()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37870 >
2025-10-16 06:31:36 +00:00
Rhys Perry
3e9921f52e
radv: only call radv_should_use_wgp_mode() once
...
This will let the compiler choose between CU and WGP mode.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37791 >
2025-10-15 13:37:48 +01:00
Daniel Schürmann
b2c44e3a65
amd: change radeon_info::lds_size_per_workgroup for GFX10+ to 64KB
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Even though in WGP-mode, there is a total of 128KB LDS, a single workgroup
can access at most 64KB.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:09 +00:00
Daniel Schürmann
eecd1c020d
amd: keep ac_shader_config::lds_size unaligned
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:09 +00:00
Daniel Schürmann
f99eba729d
amd/common: remove radeon_info::lds_alloc_granularity and radeon_info::lds_encode_granularity
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:08 +00:00
Daniel Schürmann
6fd5766620
amd: add and use utility functions for LDS size encoding
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:08 +00:00
Daniel Schürmann
b651234414
amd: change ac_shader_config::lds_size to bytes
...
We still keep it aligned to allocation granularity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
David Rosca
282e8285f1
ac/surface: Limit video modifiers to 64K_S also for VCN 2.2
...
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14032
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37819 >
2025-10-15 10:24:29 +00:00
Samuel Pitoiset
88f53906d8
amd,radv,radeonsi: add ac_emit_cp_pfp_sync_me()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:25 +00:00
Samuel Pitoiset
7ead034a06
amd,radv,radeonsi: add ac_emit_cp_copy_data()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:25 +00:00
Samuel Pitoiset
0e358cec52
amd,radv,radeonsi: add ac_emit_cp_release_mem_pws()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:24 +00:00
Samuel Pitoiset
c45035ceb4
amd,radv,radeonsi: add ac_emit_cp_acquire_mem_pws()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:23 +00:00
Samuel Pitoiset
6329e282b8
amd,radv,radeonsi: add ac_emit_cp_wait_mem()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:23 +00:00
Samuel Pitoiset
82cbfc964a
amd,radv: add ac_emit_write_data_imm()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:20 +00:00
Samuel Pitoiset
ac262c351f
amd,radv: add ac_emit_cond_exec()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37813 >
2025-10-15 07:35:20 +00:00
David Rosca
3e458d06b8
radeonsi/vcn: Support BT2020 matrix with EFC
...
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37058 >
2025-10-15 06:06:44 +00:00
Daniel Schürmann
d0b87a0d5f
ac/nir_flag_smem_for_loads: call divergence analysis internally
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also don't flag more SMEM instructions (in ACO) after the last
call to ac_nir_lower_mem_access_bit_sizes().
Totals from 75 (0.09% of 79839) affected shaders: (Navi48)
Instrs: 191246 -> 189960 (-0.67%)
CodeSize: 996840 -> 985976 (-1.09%)
Latency: 3066184 -> 2945500 (-3.94%)
InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02%
SClause: 4848 -> 4699 (-3.07%)
Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78%
Branches: 5176 -> 5003 (-3.34%)
PreSGPRs: 6222 -> 6272 (+0.80%)
VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06%
SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03%
SMEM: 7158 -> 6739 (-5.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
9b1a635bb3
amd/common: merge radv_nir_opt_access_speculate() into ac_nir_flag_smem_for_loads()
...
One shader is negatively affected, but we save 2 entire iterations over every shader.
This effect is also mitigated with the next commits.
Totals from 1 (0.00% of 79839) affected shaders: (Navi48)
Instrs: 947 -> 958 (+1.16%)
CodeSize: 4728 -> 4732 (+0.08%)
Latency: 20678 -> 20723 (+0.22%)
InvThroughput: 2697 -> 2698 (+0.04%)
SClause: 26 -> 27 (+3.85%)
Copies: 139 -> 145 (+4.32%)
Branches: 46 -> 47 (+2.17%)
VALU: 460 -> 463 (+0.65%)
SALU: 201 -> 204 (+1.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
8ff44f17ef
amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads
...
We can simply extract from the loaded dwords as per
nir_lower_mem_access_bit_sizes() lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Daniel Schürmann
fbf0399517
amd/lower_mem_access_bit_sizes: lower all SMEM instructions to supported sizes
...
This creates more SMEM instruction, mostly because vec3 64bit are being split
instead of overfetched.
Totals from 442 (0.55% of 79839) affected shaders: (Navi48)
Instrs: 288998 -> 289469 (+0.16%); split: -0.04%, +0.21%
CodeSize: 1538212 -> 1541460 (+0.21%); split: -0.03%, +0.24%
Latency: 3010072 -> 3009373 (-0.02%); split: -0.04%, +0.01%
InvThroughput: 885572 -> 885564 (-0.00%); split: -0.00%, +0.00%
VClause: 6900 -> 6885 (-0.22%); split: -0.28%, +0.06%
SClause: 4457 -> 4469 (+0.27%); split: -0.18%, +0.45%
VALU: 162473 -> 162469 (-0.00%)
SALU: 42871 -> 42855 (-0.04%); split: -0.05%, +0.01%
SMEM: 6893 -> 7239 (+5.02%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Samuel Pitoiset
bc32286e5b
radv: declare a new user SGPR for dynamic descriptors
...
To move them out of push constants.
fossils-db (GFX1201):
Totals from 20700 (25.99% of 79646) affected shaders:
Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03%
CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02%
Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00%
VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00%
SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04%
Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10%
PreSGPRs: 836283 -> 837260 (+0.12%)
SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05%
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768 >
2025-10-14 15:34:43 +00:00
David Rosca
c941e57d74
ac/gfx10_format_table: Use new names for 422 subsampled formats
...
Fixes: f20ee2806e ("util/format: Add subsampling info to our YUV-as-RGB format names")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37838 >
2025-10-14 09:33:28 +00:00
Georg Lehmann
e26a8be7af
ac/nir: enable nir atomic load/store opts
...
Foz-DB GFX1201:
Totals from 4 (0.00% of 80287) affected shaders:
Instrs: 2928 -> 2920 (-0.27%); split: -0.31%, +0.03%
CodeSize: 15424 -> 15392 (-0.21%); split: -0.23%, +0.03%
Latency: 835578 -> 823220 (-1.48%)
InvThroughput: 3307941 -> 3258515 (-1.49%)
Copies: 459 -> 447 (-2.61%)
VALU: 1297 -> 1291 (-0.46%)
SALU: 595 -> 589 (-1.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822 >
2025-10-14 06:24:17 +00:00
Samuel Pitoiset
ef900e93fc
ac/surface: fix host image copies with stencil-only
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748 >
2025-10-10 13:46:51 +00:00
Samuel Pitoiset
9a7f1401d8
ac/surface: fix host image copies with 96-bits formats
...
Fixes dEQP-VK.image.host_image_copy.simple.r32g32b32_* with
RADV_PERFTEST=hic on RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748 >
2025-10-10 13:46:51 +00:00
Samuel Pitoiset
aeec53f020
radv,radeonsi: use new ac_cmdbuf macros
...
But keep them behind existing macros for consistency until all macros
are moved to common code.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36292 >
2025-10-08 18:00:15 +00:00
Samuel Pitoiset
902f5a8618
radv: replace radeon_cmdbuf by ac_cmdbuf completely
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36292 >
2025-10-08 18:00:15 +00:00
Samuel Pitoiset
9ff4750eaf
ac/cmdbuf: introduce ac_cmdbuf
...
This will be shared by both drivers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36292 >
2025-10-08 18:00:14 +00:00
Samuel Pitoiset
a7ae26c96c
ac/sqtt: use void pointers for start/stop CS
...
Similar to BOs which are different structs between drivers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36292 >
2025-10-08 18:00:14 +00:00
Samuel Pitoiset
12cccb2f75
radv: remove useless radeon_cmdbuf forwarded declaration
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36292 >
2025-10-08 18:00:13 +00:00