fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 15:38:19 +02:00

Author	SHA1	Message	Date
Samuel Pitoiset	603541f1a2	ac/gpu_info: add cp_dma_use_L2 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32971>	2025-01-13 08:07:58 +00:00
Marek Olšák	e640d5a9c3	amd: vectorize SMEM loads aggressively, allow overfetching for ACO If there is a 4-byte hole between 2 loads, they are vectorized. Example: load 4 + hole 4 + load 8 -> load 16 This helps GLSL uniform loads, which are often sparse. See the code for more info. RADV could get better code by vectorizing later. radeonsi+ACO - TOTALS FROM AFFECTED SHADERS (45482/58355) Spilled SGPRs: 841 -> 747 (-11.18 %) Code Size: 67552396 -> 65291092 (-3.35 %) bytes Max Waves: 714439 -> 714520 (0.01 %) This should have no effect on LLVM because ac_build_buffer_load scalarizes SMEM, but it's improved for some reason: radeonsi+LLVM - TOTALS FROM AFFECTED SHADERS (4673/58355) Spilled SGPRs: 1450 -> 1282 (-11.59 %) Spilled VGPRs: 106 -> 107 (0.94 %) Scratch size: 101 -> 102 (0.99 %) dwords per thread Code Size: 14994624 -> 14956316 (-0.26 %) bytes Max Waves: 66679 -> 66735 (0.08 %) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>	2025-01-09 22:01:54 +00:00
Marek Olšák	abd5216ae8	ac,radeonsi: scalarize overfetching loads There is nothing preventing ACO from generating loads with unused components. This happens often with GLSL uniforms. Some of those loads are partially re-vectorized after this. radeonsi+ACO: TOTALS FROM AFFECTED SHADERS (19564/58918) VGPRs: 732900 -> 728448 (-0.61 %) Spilled SGPRs: 429 -> 433 (0.93 %) Code Size: 38446004 -> 38485612 (0.10 %) bytes Max Waves: 305440 -> 305549 (0.04 %) Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>	2025-01-09 22:01:54 +00:00
Marek Olšák	58a88bbdb9	ac/nir/ngg: export positions after streamout to improve performance Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>	2025-01-09 20:47:16 +00:00
Marek Olšák	fc73749d6c	ac/nir/ngg: fold so_vertex_index * so_stride into immediate offset Instead of using a different voffset VGPR per streamout vertex, point voffset to the first vertex for all 3 vertices because the stride and vertex index are constant and can be in the immediate offset. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>	2025-01-09 20:47:16 +00:00
Marek Olšák	97e82af162	ac/nir/ngg: vectorize streamout stores for NGG optimally Walk the whole vertex stride thanks to XFB info sorted by offset, gather individual components from same or different outputs, and once we have gathered 4, store them as vec4. It also removes the memory_modes field from VMEM stores because I don't think it's needed. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>	2025-01-09 20:47:16 +00:00
Marek Olšák	4f2e2e10bc	ac/nir: vectorize streamout stores for legacy pipeline optimally Walk the whole vertex stride thanks to XFB info sorted by offset, gather individual components from same or different outputs, and once we have gathered 4, store them as vec4. It also removes the COHERENT flag from VMEM stores because NGG streamout doesn't use it either and I don't think it's needed. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>	2025-01-09 20:47:16 +00:00
Marek Olšák	e399f3bed9	ac/nir: sort xfb info to facilitate vectorization of xfb stores xfb stores are not vectorized properly, leading to generating random soup of b32, b64, b96, and b128 stores. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>	2025-01-09 20:47:16 +00:00
Samuel Pitoiset	f09f31d093	ac/nir: fix a comment typo in load_subgroup_id_lowered() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32940>	2025-01-09 08:02:19 +00:00
Samuel Pitoiset	44ba856089	ac/nir: fix lowering subgroup ID for compute shaders on GFX12 This is lowered in backend compilers (LLVM or ACO) because it needs to access ttmp registers which aren't exposed to NIR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32940>	2025-01-09 08:02:19 +00:00
Marek Olšák	c20c46cf7b	ac: update ATOMIC_MEM definitions Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32877>	2025-01-07 20:24:19 +00:00
Samuel Pitoiset	c5fe9dcf16	ac/descriptors: fix configuring NBC views on GFX12 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32892>	2025-01-07 09:15:12 +00:00
David Rosca	e33452a6d3	ac/surface: Don't force linear for VIDEO_REFERENCE with emulated image opcodes This caused regression by using higher pitch than needed on compute-only devices, resulting in video decode errors. Fixes: `308bae950f` ("ac/surface: Add RADEON_SURF_VIDEO_REFERENCE") Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32863>	2025-01-04 09:13:44 +00:00
Marek Olšák	7fbca998b1	amd: optimize atomics before lowering intrinsics ac_nir_lower_intrinsics_to_args will lower most system values. I have to keep the divergence analysis in ACO, otherwise it goes haywire. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:56 +00:00
Marek Olšák	5dd9171765	ac/nir: set upper ranges for range analysis while lowering system values Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	0d5b03f2b9	ac/nir: split local_invocation_ids to 3 separate VGPR inputs so that we can set the upper range per VGPR. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	65d241c947	ac/nir: set arg_upper_bound_u32 for vs_rel_patch_id Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	1d9fbe5387	ac/nir: add helper ac_nir_load_arg_upper_bound Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	cfeaa45dc6	ac/nir: clean up ac_nir_lower_indirect_derefs IO variables can't occur here anymore. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	ae22da2ff8	ac/nir: lower more loads in ac_nir_lower_intrinsics_to_args instead of drivers Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	ceb6f8fc32	amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR The overwrite instruction complicates it a little, which is why these intrinsics are lowered together. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	61bfb4fa06	amd: lower load_subgroup_invocation in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	e69f47faee	amd: lower load_local_invocation_index in NIR This is the last intrinsic that needed the LS VGPR bug workaround in ACO and ac_nir_to_llvm. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	433ca6ba38	ac/nir: extract a load_subgroup_id lowered helper this will be used in the next commit Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	342dcbdc8b	amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR 2 things complicate this: - overwrite_vs_arguments_amd - the LS VGPR bug workaround Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	66dd70adc5	amd: lower load_gs_wave_id_amd in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	923f59c971	amd: lower load_barycentric_at_offset in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	16ab05fad1	amd: lower load_barycentric_pixel/centroid/sample in NIR radeonsi needs to preserve interp_mode in the arg load. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	a15e733a81	ac,radeonsi: move load_vector_arg flags to common code This will be needed by lowering of barycentrics. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	7e83f6ca8b	amd: lower load_front_face in NIR radeonsi must do this after si_lower_nir_abi, which optimizes front_face, but doesn't lower it. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	6ad5225b2a	amd: lower load_frag_shading_rate in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	6d2e29ff6e	amd: lower load_sample_pos in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	110e474b4f	amd: lower load_sample_id in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	684c8da553	amd: lower load_invocation_id in NIR ACO can't look for it because it's lowered there. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	d281240c57	amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	0d372b043b	amd: lower load_local_invocation_id in NIR This is based on ACO. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	13cb5c7b72	amd: lower load_frag_coord in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	58cb155068	amd: lower load_pixel_coord in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	85c3b5159a	ac/nir: handle disabled PS VGPRs in ac_nir_load_arg_at_offset Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Timur Kristóf	652a0b48bc	amd: Set lower_layer_fs_input_to_sysval in common code, not in drivers. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32641>	2025-01-02 14:07:51 +00:00
Timur Kristóf	ed88616a12	ac/nir/ngg: Don't mark multiview layer output as varying. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32641>	2025-01-02 14:07:51 +00:00
Marek Olšák	c21bc65ba7	nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads A negative hole size means the loads overlap. This will be used by drivers to handle overlapping loads in the callback easily. Reviewed-by: Mel Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>	2025-01-01 00:03:55 +00:00
Georg Lehmann	e112e2b047	nir,amd: optimize front_face ? a : -a Foz-DB Navi31: Totals from 3345 (4.21% of 79395) affected shaders: MaxWaves: 96182 -> 96174 (-0.01%) Instrs: 3135439 -> 3129508 (-0.19%); split: -0.24%, +0.05% CodeSize: 16776088 -> 16718048 (-0.35%); split: -0.38%, +0.03% VGPRs: 190884 -> 190848 (-0.02%); split: -0.03%, +0.01% Latency: 32624132 -> 32621734 (-0.01%); split: -0.16%, +0.16% InvThroughput: 5759987 -> 5749957 (-0.17%); split: -0.23%, +0.05% VClause: 51044 -> 51086 (+0.08%); split: -0.12%, +0.20% SClause: 103415 -> 103223 (-0.19%); split: -0.64%, +0.45% Copies: 170398 -> 170555 (+0.09%); split: -0.64%, +0.74% PreSGPRs: 135567 -> 133887 (-1.24%) PreVGPRs: 140569 -> 141317 (+0.53%) VALU: 1959144 -> 1953839 (-0.27%); split: -0.30%, +0.03% SALU: 217956 -> 217676 (-0.13%); split: -0.20%, +0.07% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>	2024-12-30 22:31:35 +00:00
Timur Kristóf	de2cb4a7d3	ac/nir: Only store params to attribute ring that are varying. On GFX11+, varying outputs from the last pre-rasterization stage are implemented by storing the outputs to the so-called attribute ring. Make sure to only store them when necessary. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:41 -06:00
Timur Kristóf	13234a8a8a	ac/nir: Only export parameters when they are actually varying. In AMD terminology, varying outputs are implemented by parameter export instructions on GFX6-10.3 GPUs. Only emit those when actually necessary. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:38 -06:00
Timur Kristóf	4d6c00944b	ac/nir: Only export positions when they are really system values. In AMD terminology, a system value is implemented by position export instructions. Make sure to only emit those when they are needed. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:36 -06:00
Timur Kristóf	f5981e8c0b	ac/nir: Split GS output usage masks to varying and sysval masks. To keep track which output is used for what purpose. Note that this commit just adds the capability to track this separately in ac/nir. The drivers will need to be updated in the future to take advantage of this. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:33 -06:00
Timur Kristóf	92464109e3	ac/nir: Mark when pre-rast output is used as varying or sysval. In this commit, just collect the info. It will be taken into use by subsequent commits. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:29 -06:00
Timur Kristóf	cb0671aede	ac/nir/ngg: Refactor storing per-primitive primitive ID to attribute ring. Simplify the code using the helpers introduced in previous commits. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:26 -06:00
Timur Kristóf	edde762b56	ac/nir/ngg: Move emitting GS vertex param exports to if. On GFX10-10.3 (when no attribute ring is present), only emit the GS vertex parameter exports on the vertex export threads. Other threads don't have anything to export. Move this code around to make it a bit easier to follow. Also add some comments to better explain what's what. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>	2024-12-28 10:31:23 -06:00

1 2 3 4 5 ...

2956 commits