fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-22 21:48:09 +02:00

Author	SHA1	Message	Date
Daniel Schürmann	69dcd5be3a	aco: don't assume that demote doesn't cause an empty exec mask Totals from 188 (0.24% of 79377) affected shaders: (Navi31) Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12% CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07% Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14% InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12% SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21% Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10% Branches: 5740 -> 5863 (+2.14%) PreSGPRs: 7004 -> 7034 (+0.43%) VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04% SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06% Fixes: `f35e229fae` ('aco: skip code if exec is empty') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>	2025-03-26 08:45:12 +00:00
Georg Lehmann	3b5e537b09	aco/gfx11.5: remove vinterp ddx/ddy path While the idea to take advantage of the higher throughput wasn't bad, the hardware wasn't design with this in mind and doesn't behave like expected with constant sources. Fixes: `bee487df48` ("aco/gfx11.5+: use vinterp for fddx/fddy") Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>	2025-03-12 11:31:54 +00:00
Georg Lehmann	20dd6dfa12	aco/isel: use s_mul_i32 instead of s_cselect_b32 for a ? b : 0 It doesn't require SCC and this is more consistent with b2f. Foz-DB Navi21: Totals from 2107 (2.64% of 79789) affected shaders: Instrs: 6619774 -> 6619280 (-0.01%); split: -0.01%, +0.00% CodeSize: 36754448 -> 36752396 (-0.01%); split: -0.01%, +0.00% Latency: 62207779 -> 62206422 (-0.00%); split: -0.00%, +0.00% InvThroughput: 13090494 -> 13090204 (-0.00%); split: -0.00%, +0.00% VClause: 171572 -> 171573 (+0.00%) SClause: 257528 -> 257530 (+0.00%) Copies: 607680 -> 607204 (-0.08%); split: -0.10%, +0.02% VALU: 4189422 -> 4189418 (-0.00%) SALU: 1001750 -> 1001264 (-0.05%); split: -0.07%, +0.02% Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>	2025-03-04 21:36:17 +00:00
Ivan Avdeev	ff6504d4c0	radv: add experimental support for AMD BC-250 board AMD BC-250 is a mining board based on an AMD APU with an integrated GPU that kernel recognizes as Cyan Skillfish. It is basically RDNA1/GFX10, but with added hardware ray tracing support. LLVM calls it GFX1013, see https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html Support for this GPU hasn't been extensively tested. Some games are known to work, some non-trivial ray query compute and ray tracing pipeline rendering works too. Q2RTX works. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>	2025-03-04 08:07:31 +00:00
Daniel Schürmann	52253da783	aco: unify get_addr_sgpr_from_waves() and get_addr_vgpr_from_waves() into one function which returns the limit as RegisterDemand. Also remove the unused get_extra_sgprs() from aco_ir.h. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>	2025-02-21 13:49:41 +00:00
Rhys Perry	539f9b4ba6	nir,aco,radv: add align_mul/offset to buffer_amd intrinsics Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>	2025-02-07 13:52:57 +00:00
Daniel Schürmann	1a8a643bbd	aco/isel: track control flow divergence in loops more accurately We introduce two new variables, cf_context::in_divergent_cf and cf_context::parent_loop.has_divergent_break, in order to determine whether there is any other invocations on a different CF path. Totals from 1305 (1.64% of 79395) affected shaders: (Navi31) Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01% CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01% VGPRs: 68820 -> 48048 (-30.18%) Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07% InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07% VClause: 12384 -> 12350 (-0.27%) SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57% Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02% PreSGPRs: 49007 -> 48907 (-0.20%) PreVGPRs: 32171 -> 32121 (-0.16%) VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00% SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	583c3586fe	aco/isel: remove loop nest information from exec_info Since we never enter loops with an empty exec mask, and the control flow is structured, we don't need to consider the loop nest depth. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	a77258346c	aco/isel: fix assumptions about potential empty exec mask in nested control flow Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	44216e035f	aco/isel: add and use exec_info::empty() helper Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	8e8398832c	aco/isel: use cf_context in loop_context to restore cf information Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	8b9c9fb904	aco/isel: use cf_context in if_context to restore cf information Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	c2bfc05d71	aco/isel: rename cf_context::has_divergent_branch Make it more consistent with cf_context::has_branch. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	0c5a91b9f2	aco/isel: move cf_info into separate struct cf_context Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Daniel Schürmann	61fa007e48	aco/isel: fix empty exec tracking for uniform branches Totals from 5 (0.01% of 79395) affected shaders: (Navi31) Instrs: 54730 -> 54715 (-0.03%) CodeSize: 276928 -> 276852 (-0.03%) Latency: 215212 -> 214874 (-0.16%) InvThroughput: 40154 -> 40150 (-0.01%) Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01% Branches: 1625 -> 1615 (-0.62%) SALU: 5682 -> 5678 (-0.07%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>	2025-02-05 10:54:21 +00:00
Konstantin Seurer	60a20bcf3d	nir: Stop using instructions for debug info Annotating ssa defs without affecting compilation is impossible with debug info instructions since referencing a nir_def from the debug info instr will add uses. The old approach also stops worrking if passes reorder instructions. This patch proposes a solution which should not regress performance just like the old approach. The difference is that this one allocates a bit more space for debug info instead of adding a new instruction for it. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>	2025-01-30 20:14:01 +00:00
Marek Olšák	f98613d47c	aco: implement replacement of sample_mask_in with helper_invocation in PS prolog Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>	2025-01-25 12:20:26 -05:00
Marek Olšák	a842f198d7	aco: simplify how broadcast_last_cbuf is implemented in PS epilog So PS epilogs only need a single bool flag that determines whether all enabled color buffers should be written. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>	2025-01-25 12:20:26 -05:00
Marek Olšák	5c4f737b84	aco: implement replacing frag_coord with pixel_coord in PS prolog This adds an option to replace frag_coord.xy with pixel_coord when sample shading is disabled, which is most of the time. This reduces the number of input VGPRs. It's already implement in ac_nir_lower_ps_early for monolithic shaders. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>	2025-01-25 12:20:26 -05:00
Marek Olšák	d7d4d56f5b	ac,aco,radeonsi: replace SampleMaskIn with 1 << SampleID if full sample shading Since the sample mask is always 1 << sample_id with full sample shading, just use that instead of loading sample_mask_in. Set it to 0 if it's a helper invocation. This removes the sample mask input VGPR. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>	2025-01-25 12:20:25 -05:00
Marek Olšák	d160252270	ac: use Z_EXPORT_FORMAT=32_AR for Z + Alpha mrtz exports This should be faster than 32_ABGR. Also, stencil exports are changed from UINT16_ABGR to 32_GR, which should have no effect on performance. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>	2025-01-16 02:58:03 +00:00
Timur Kristóf	50035f0316	ac/nir: Move all ac_nir_* files to a new folder. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>	2025-01-14 13:46:30 +01:00
Samuel Pitoiset	10e424f586	aco: always use ds_bpermute for shuffle/rotate on GFX12 ds_bpermute supports both 32 and 64 lanes now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974>	2025-01-13 08:33:38 +00:00
Samuel Pitoiset	f94bd67b82	aco: fix VS prologs on GFX12 MTBUF/MUBUF instructions must use zero for SOFFSET, use const_offset instead. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32904>	2025-01-07 13:44:32 +00:00
Marek Olšák	0d5b03f2b9	ac/nir: split local_invocation_ids to 3 separate VGPR inputs so that we can set the upper range per VGPR. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	ceb6f8fc32	amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR The overwrite instruction complicates it a little, which is why these intrinsics are lowered together. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	61bfb4fa06	amd: lower load_subgroup_invocation in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	e69f47faee	amd: lower load_local_invocation_index in NIR This is the last intrinsic that needed the LS VGPR bug workaround in ACO and ac_nir_to_llvm. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	342dcbdc8b	amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR 2 things complicate this: - overwrite_vs_arguments_amd - the LS VGPR bug workaround Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	66dd70adc5	amd: lower load_gs_wave_id_amd in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	923f59c971	amd: lower load_barycentric_at_offset in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	16ab05fad1	amd: lower load_barycentric_pixel/centroid/sample in NIR radeonsi needs to preserve interp_mode in the arg load. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	7e83f6ca8b	amd: lower load_front_face in NIR radeonsi must do this after si_lower_nir_abi, which optimizes front_face, but doesn't lower it. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	6ad5225b2a	amd: lower load_frag_shading_rate in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	6d2e29ff6e	amd: lower load_sample_pos in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	110e474b4f	amd: lower load_sample_id in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	684c8da553	amd: lower load_invocation_id in NIR ACO can't look for it because it's lowered there. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	d281240c57	amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	0d372b043b	amd: lower load_local_invocation_id in NIR This is based on ACO. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	13cb5c7b72	amd: lower load_frag_coord in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Marek Olšák	58cb155068	amd: lower load_pixel_coord in NIR Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>	2025-01-02 17:36:55 +00:00
Georg Lehmann	43fca7fffe	amd: support load_front_face_fsign Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>	2024-12-30 22:31:35 +00:00
Georg Lehmann	aee0c7274c	amd: switch to FRONT_FACE_ALL_BITS(0) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>	2024-12-30 22:31:34 +00:00
Georg Lehmann	33a73203b0	aco/isel: skip and(exec) for top level demote_if/terminate_if In nested control flow this is nessecary to not demote/terminate invocations that are part of the global but not part of the local mask. At the top level, the masks are the same and no additional invocations can be accidentally disabled. Foz-DB Navi21: Totals from 2095 (2.64% of 79395) affected shaders: Instrs: 1058326 -> 1056839 (-0.14%) CodeSize: 5632480 -> 5626616 (-0.10%) Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00% InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00% Copies: 114446 -> 114433 (-0.01%) SALU: 230585 -> 229098 (-0.64%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755>	2024-12-26 18:34:38 +00:00
Marek Olšák	de996ac481	radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled This adds kill_z and kill_stencil flags to the shader PS epilog key, which removes those outputs if depth or stencil are disabled. It must be implemented in: * ACO PS epilog * LLVM PS epilog * ac_nir_lower_ps for monolithic shaders Some of the samplemask code wasn't completely correct, but probably harmless. Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>	2024-12-24 12:02:20 +00:00
Qiang Yu	dff14d102d	aco: fix voffset missing when buffer store base >=4096 Regression on test: dEQP-GLES31.functional.geometry_shading.basic.output_256 voffset is missing if buffer store base >=4096, we need to re-calculate offen after resolve_excess_vmem_const_offset(). Fixes: `cdaf269924` ("aco: inline store_vmem_mubuf/emit_single_mubuf_store") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767>	2024-12-24 01:42:45 +00:00
Marek Olšák	85c20def94	ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads Cross-invocation TCS input access doesn't prevent same-invocation access. This improves shaders that use both for the same inputs. Also, if some components of a vec4 slot only use same-invocation access and other components only use cross-invocation access (it's possible after compaction), this takes the VGPR path for the components with same-invocation access, which didn't happen previously because all masks only describe whole vec4s. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>	2024-12-18 11:07:59 +00:00
Samuel Pitoiset	553eb1a3fd	radv: fix alpha-to-coverage with alpha-to-one when MRTZ is also exported On AMD hardware, it's possible to export a separate alpha channel for applying alpha-to-one after alpha-to-coverage and not before. On GFX11+, it's already mostly supported but alpha needs to be exported to MRTZ.a and one to MRT0.a. The hw always uses alpha for alpha-to-coverage from MRTZ.a. On older generations, the driver needs the same separate alpha export but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE which selects alpha from MRTZ.a. This should fix alpha-to-coverage with alpha-to-one when either depth, stencil or samplemask are exported but it still needs a slightly different solution without MRTZ. I will fix that later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>	2024-12-11 10:50:31 +00:00
Samuel Pitoiset	70047e6bd6	aco: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11 For FS epilogs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>	2024-12-11 10:50:31 +00:00
Georg Lehmann	4a977ea24f	aco/gfx11+: use v_and_b32 to extract local id 0 Foz-DB Navi31: Totals from 2561 (3.23% of 79206) affected shaders: CodeSize: 10399004 -> 10389120 (-0.10%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32532>	2024-12-10 11:58:21 +00:00

1 2 3 4 5 ...

1313 commits