mesa/src/amd/common
Rhys Perry c3d27906d8 radv: vectorize lowered shader IO
fossil-db (navi31):
Totals from 2329 (2.93% of 79377) affected shaders:
MaxWaves: 72152 -> 72102 (-0.07%)
Instrs: 1048791 -> 1041920 (-0.66%); split: -0.72%, +0.07%
CodeSize: 5331832 -> 5285572 (-0.87%); split: -0.90%, +0.03%
VGPRs: 113844 -> 113820 (-0.02%); split: -0.14%, +0.12%
Latency: 4349524 -> 4346374 (-0.07%); split: -0.35%, +0.28%
InvThroughput: 609449 -> 609235 (-0.04%); split: -0.27%, +0.24%
VClause: 22613 -> 22451 (-0.72%); split: -1.03%, +0.31%
SClause: 21197 -> 21177 (-0.09%); split: -0.45%, +0.35%
Copies: 81900 -> 82446 (+0.67%); split: -1.51%, +2.18%
PreSGPRs: 94697 -> 93596 (-1.16%); split: -1.23%, +0.07%
PreVGPRs: 69962 -> 70080 (+0.17%); split: -0.01%, +0.18%
VALU: 625247 -> 625390 (+0.02%); split: -0.23%, +0.25%
SALU: 101692 -> 101555 (-0.13%); split: -0.24%, +0.11%
VMEM: 46459 -> 44845 (-3.47%)

fossil-db (navi21):
Totals from 17522 (22.07% of 79377) affected shaders:
MaxWaves: 425698 -> 425460 (-0.06%); split: +0.00%, -0.06%
Instrs: 11444215 -> 11428321 (-0.14%); split: -0.14%, +0.00%
CodeSize: 59227492 -> 59019376 (-0.35%); split: -0.35%, +0.00%
VGPRs: 780920 -> 781208 (+0.04%); split: -0.00%, +0.04%
Latency: 44965072 -> 44926529 (-0.09%); split: -0.12%, +0.03%
InvThroughput: 9718148 -> 9728793 (+0.11%); split: -0.01%, +0.12%
VClause: 225732 -> 225605 (-0.06%); split: -0.10%, +0.04%
SClause: 217196 -> 217160 (-0.02%); split: -0.03%, +0.01%
Copies: 1050351 -> 1065263 (+1.42%); split: -0.03%, +1.45%
PreSGPRs: 747538 -> 747223 (-0.04%); split: -0.05%, +0.01%
PreVGPRs: 626702 -> 626748 (+0.01%); split: -0.00%, +0.01%
VALU: 6629403 -> 6643822 (+0.22%); split: -0.01%, +0.23%
SALU: 1898492 -> 1898452 (-0.00%); split: -0.00%, +0.00%
VMEM: 529942 -> 528361 (-0.30%)

fossil-db (vega10):
Totals from 1791 (2.84% of 62962) affected shaders:
MaxWaves: 12270 -> 12253 (-0.14%); split: +0.01%, -0.15%
Instrs: 602026 -> 597473 (-0.76%); split: -0.83%, +0.08%
CodeSize: 3109872 -> 3071664 (-1.23%); split: -1.26%, +0.03%
SGPRs: 137826 -> 137938 (+0.08%); split: -0.10%, +0.19%
VGPRs: 70364 -> 70520 (+0.22%); split: -0.03%, +0.26%
Latency: 4757850 -> 4781905 (+0.51%); split: -0.35%, +0.86%
InvThroughput: 2296941 -> 2310685 (+0.60%); split: -0.14%, +0.74%
VClause: 14161 -> 14050 (-0.78%); split: -1.23%, +0.44%
SClause: 14058 -> 14077 (+0.14%); split: -0.57%, +0.70%
Copies: 40954 -> 42191 (+3.02%); split: -1.69%, +4.71%
PreSGPRs: 64314 -> 63214 (-1.71%); split: -1.81%, +0.10%
PreVGPRs: 53558 -> 53894 (+0.63%); split: -0.01%, +0.64%
VALU: 449920 -> 450830 (+0.20%); split: -0.19%, +0.39%
SALU: 32973 -> 32839 (-0.41%); split: -0.76%, +0.35%
VMEM: 28796 -> 25151 (-12.66%)

fossil-db (polaris10):
Totals from 1769 (2.86% of 61794) affected shaders:
MaxWaves: 12024 -> 12021 (-0.02%)
Instrs: 474761 -> 470760 (-0.84%); split: -0.94%, +0.10%
CodeSize: 2447964 -> 2420712 (-1.11%); split: -1.15%, +0.04%
SGPRs: 129664 -> 129728 (+0.05%); split: -0.14%, +0.19%
VGPRs: 65216 -> 65560 (+0.53%); split: -0.05%, +0.58%
Latency: 4304734 -> 4318319 (+0.32%); split: -0.41%, +0.72%
InvThroughput: 2114950 -> 2122580 (+0.36%); split: -0.18%, +0.54%
VClause: 10933 -> 10808 (-1.14%); split: -1.42%, +0.27%
SClause: 11430 -> 11446 (+0.14%); split: -0.70%, +0.84%
Copies: 32290 -> 31891 (-1.24%); split: -2.80%, +1.56%
PreSGPRs: 58184 -> 57096 (-1.87%); split: -1.98%, +0.11%
PreVGPRs: 48757 -> 48874 (+0.24%); split: -0.02%, +0.26%
VALU: 359097 -> 358582 (-0.14%); split: -0.25%, +0.11%
SALU: 26279 -> 25934 (-1.31%); split: -1.75%, +0.43%
VMEM: 18825 -> 17247 (-8.38%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
..
nir radv: vectorize lowered shader IO 2025-02-07 13:52:57 +00:00
virtio ac/virtio: add virtio-only AMDGPU_GEM_CREATE flag 2025-01-16 12:24:33 +00:00
.clang-format amd: import libdrm_amdgpu ioctl wrappers 2024-11-25 21:03:41 -05:00
ac_binary.c amd: add initial common code for gfx12 2024-05-11 22:14:05 -04:00
ac_binary.h ac,radeonsi,winsyses: switch to SPDX-License-Identifier: MIT 2023-05-24 21:48:19 +00:00
ac_cmdbuf.c radv: fix fetching draw vertex data from counter buffers with transform feedback 2025-02-07 07:59:39 +00:00
ac_cmdbuf.h ac,radeonsi,radv: add common GFX preambles 2024-08-27 14:14:57 +00:00
ac_debug.c amd: add initial common code for gfx12 2024-05-11 22:14:05 -04:00
ac_debug.h ac: Add VCN IB parser 2024-09-23 19:25:08 +00:00
ac_descriptors.c ac/descriptors: allow to configure DCC for buffer descriptors 2025-01-30 08:18:22 +00:00
ac_descriptors.h ac/descriptors: allow to configure DCC for buffer descriptors 2025-01-30 08:18:22 +00:00
ac_drm_fourcc.h ac/surf: add more modifiers to gfx12 supported list 2024-12-16 07:35:06 +00:00
ac_fake_hw_db.h ac/fake_hw_db: deobfuscate GPU name strings 2025-01-29 07:20:02 +00:00
ac_formats.c ac,radeonsi: add ac_is_reduction_mode_supported() 2024-07-10 07:57:42 +00:00
ac_formats.h ac,radeonsi: add ac_is_reduction_mode_supported() 2024-07-10 07:57:42 +00:00
ac_gather_context_rolls.c ac: Improve context roll readability 2024-03-19 16:08:14 +00:00
ac_gpu_info.c ac/nir/ngg: Add and use a has_ngg_passthru_no_msg field to ac_gpu_info. 2025-01-30 15:26:45 +00:00
ac_gpu_info.h ac/nir/ngg: Add and use a has_ngg_passthru_no_msg field to ac_gpu_info. 2025-01-30 15:26:45 +00:00
ac_hw_stage.h amd: Move ac_hw_stage to its own file 2023-07-03 21:12:45 +00:00
ac_ib_parser.c ac/parse_ib: Replace the parameter list with ac_ib_parser 2024-03-19 16:08:13 +00:00
ac_linux_drm.c amd: add ac_drm_device_get_cookie 2025-01-22 14:55:56 +00:00
ac_linux_drm.h amd: add ac_drm_device_get_cookie 2025-01-22 14:55:56 +00:00
ac_msgpack.c ac: remove unused code 2024-12-26 10:12:43 +00:00
ac_msgpack.h ac: remove unused code 2024-12-26 10:12:43 +00:00
ac_parse_ib.c ac/parse_ib: Parse VCN IB_COMMON_OP_WRITEMEMORY 2024-12-27 08:17:16 +00:00
ac_perfcounter.c ac/perfcounter: fix buffer overflow 2024-11-08 13:31:02 +00:00
ac_perfcounter.h ac/perfcounter: compute the number of global instances of TCP,SQ,GL1C and GL2C 2023-09-14 14:17:19 +00:00
ac_pm4.c amd: use a valid size for ac_pm4_state allocation 2024-07-22 10:09:34 +00:00
ac_pm4.h ac,radeonsi import PM4 state from RadeonSI 2024-06-06 20:26:47 +00:00
ac_rgp.c ac/rgp: assume GFX11_5 use the same SQTT/RGP versions as GFX11 2024-07-17 16:25:19 +00:00
ac_rgp.h ac/rgp: update dumping queue event records to the capture 2023-11-13 08:53:09 +00:00
ac_rgp_elf_object_pack.c amd: Use align64 instead of ALIGN for 64 bit value parameter 2024-01-03 22:02:17 +00:00
ac_rtld.c ac/llvm: implement WA in nir to llvm 2024-06-20 13:14:33 +00:00
ac_rtld.h ac/llvm: implement WA in nir to llvm 2024-06-20 13:14:33 +00:00
ac_shader_args.c nir: change "user_data_amd" sysval from 4 to 8 components 2024-04-13 16:45:08 +00:00
ac_shader_args.h ac/nir: split local_invocation_ids to 3 separate VGPR inputs 2025-01-02 17:36:55 +00:00
ac_shader_debug_info.h amd: Add ac_shader_debug_info 2024-11-11 08:39:13 +00:00
ac_shader_util.c radeonsi: fix PS prolog not counting used fragcoord VGPRs correctly 2025-01-29 07:19:40 +00:00
ac_shader_util.h radeonsi: fix PS prolog not counting used fragcoord VGPRs correctly 2025-01-29 07:19:40 +00:00
ac_shadowed_regs.c amd: add initial common code for gfx12 2024-05-11 22:14:05 -04:00
ac_shadowed_regs.h amd: add a new helper that prints all non-shadowed regs 2023-06-17 23:42:21 +00:00
ac_spm.c ac/spm: do not abort when the SPM BO is too small 2024-10-29 18:33:17 +00:00
ac_spm.h ac/spm: do not abort when the SPM BO is too small 2024-10-29 18:33:17 +00:00
ac_sqtt.c ac/sqtt: update programming SQTT on GFX12 2025-01-20 23:50:10 +00:00
ac_sqtt.h ac/sqtt: update programming SQTT on GFX12 2025-01-20 23:50:10 +00:00
ac_surface.c ac/surface: always allow LINEAR modifier for color formats 2025-02-06 01:48:25 +00:00
ac_surface.h ac,radv,radeonsi: add new GFX12_DCC_WRITE_COMPRESS_DISABLE tiling flag 2025-02-03 21:12:07 +00:00
ac_surface_meta_address_test.c ac: add 'polaris12' gpu to ac_fake_hw_db 2024-11-08 13:31:02 +00:00
ac_surface_modifier_test.c amd: update addrlib 2024-12-26 21:02:21 +00:00
ac_uvd_dec.h radeonsi/uvd: Set decode target swizzle mode on GFX9 2025-01-17 08:53:05 +00:00
ac_vcn.h ac/vcn: allow sq signature package to be skipped 2024-11-29 10:01:49 +10:00
ac_vcn_av1_default.h ac/radeonsi: add av1 defaults header file from radeonsi 2023-06-16 05:53:44 +00:00
ac_vcn_dec.c ac/vcn_dec: Fix AV1 film grain on VCN5 2025-02-07 13:13:45 +00:00
ac_vcn_dec.h ac/vcn_dec: Fix AV1 film grain on VCN5 2025-02-07 13:13:45 +00:00
ac_vcn_enc.c ac: Add ac_vcn_init_enc_cmds 2024-09-20 06:58:29 +00:00
ac_vcn_enc.h radeonsi/vcn: Enable VCN4 AV1 encode WA 2024-11-01 14:05:04 +00:00
ac_vcn_enc_av1_default_cdf.h ac,radeonsi: move vcn enc av1 default cdf file to common 2023-09-14 07:51:24 +00:00
amd_family.c amd: drop support for LLVM 15, 16, 17 2025-02-01 04:22:30 +00:00
amd_family.h radeonsi/vcn: Add vcn_5_0_1 support 2025-01-16 23:41:28 +00:00
amd_kernel_code_t.h amd/common: add AMD_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32 property 2023-08-31 20:30:03 +00:00
gfx10_format_table.h amd/common: only pass gfx_level to ac_get_gfx10_format_table() 2024-05-22 08:31:39 +00:00
gfx10_format_table.py ac,radeonsi,winsyses: switch to SPDX-License-Identifier: MIT 2023-05-24 21:48:19 +00:00
meson.build ac/nir/ngg: Move GS lowering to separate file. 2025-01-30 15:26:46 +00:00
sid.h ac,radeonsi: add SDMA DCC tiling for GFX12+ 2025-01-30 08:18:22 +00:00
sid_tables.py ac,radeonsi,winsyses: switch to SPDX-License-Identifier: MIT 2023-05-24 21:48:19 +00:00