fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 13:10:10 +01:00

Author	SHA1	Message	Date
Rhys Perry	4e459df0fc	aco/ra: initialize temp_in_scc earlier We need to know if there's a temporary in SCC before the instruction, not after. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `93c8ebfa78` ("aco: Initial commit of independent AMD compiler") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10459>	2021-05-17 13:31:07 +00:00
Daniel Schürmann	b960169257	aco/ra: also prevent overflow register for p_create_vector operands Fixes: `d659ce0d6c` ('aco/ra: prevent underflow register for p_create_vector operands') Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10832>	2021-05-17 11:18:25 +00:00
Connor Abbott	a40714abf7	nir/lower_phis_to_scalar: Add "lower_all" option We don't want to have to deal with vector phis in freedreno, because vectors are always split/unsplit around vectorized instructions anyways, and the stated reason for not scalarising them (it hurting coalescing) won't apply to us because we won't be using nir_from_ssa. Add this option so that we don't have to do the equivalent thing while translating from NIR. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>	2021-05-17 09:59:45 +00:00
Daniel Schürmann	d659ce0d6c	aco/ra: prevent underflow register for p_create_vector operands It could happen that we tested negative out-of-range registers for p_create_vector operands resulting in a crash. Fixes: `8962510e38` ('aco/ra: Conservatively refactor get_reg_specified to use PhysRegInterval') Closes: #4697 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10799>	2021-05-14 17:26:41 +00:00
Tony Wasserka	80ee9d3947	aco/scheduler: Verify register demand invariants in debug mode Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10644>	2021-05-13 15:27:57 +00:00
Tony Wasserka	50ba919d37	aco/scheduler: Fix register demand computation for upwards moves The initial value needs to be taken from the instruction that is being moved over, not the one to be moved. Additionally the parameter of this function was removed because it was misleading. Setting it to any value other than source_idx would cause register_demand to be initialized incorrectly. (Instead, the maximum demand among the covered instructions would need to be determined.) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10644>	2021-05-13 15:27:57 +00:00
Tony Wasserka	c528af1076	aco/scheduler: Fix register demand computation for downwards moves Previously, changes in total_demand_clause were not always propagated to total_demand. For instance, clause moves do not change the local register demand at the end of a clause, yet they may still affect the total maximum. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `8235bc6411` ("aco: try to group together VMEM loads of the same resource") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4533 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10644>	2021-05-13 15:27:57 +00:00
Daniel Schürmann	c7d679f0f7	aco: relax validation rules for p_reduce dst RegType By exposing a subgroupSize of 64, reductions with cluster_size 32 in wave32 might be considered divergent, and thus, result in a VGPR. Fixes: dEQP-VK.subgroups.clustered.graphics.subgroupclustered* with wave32 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10769>	2021-05-13 15:10:24 +00:00
Daniel Schürmann	989e9867a6	aco: fix additional register requirements for spilling It could happen that VGPR spilling without SGPR spilling calculated a negative spills_to_vgpr number and then increasing the VGPR target demand above the limit. Cc: mesa-stable Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10756>	2021-05-12 14:13:24 +00:00
Timur Kristóf	bb127c2130	radv: Use new NIR lowering of NGG GS when ACO is used. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>	2021-05-12 13:47:04 +00:00
Timur Kristóf	9732881729	radv: Use new NGG NIR lowering for VS/TES when ACO is used. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>	2021-05-12 13:47:04 +00:00
Timur Kristóf	89a76ff786	aco: Implement new NGG specific NIR intrinsics. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>	2021-05-12 13:47:04 +00:00
Timur Kristóf	75a002f809	aco: Split ngg_emit_sendmsg_gs_alloc_req from the wave0 check. This allows us to emit the gs_alloc_req independently of the wave ID check, which is what the NIR lowering will need. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>	2021-05-12 13:47:04 +00:00
Timur Kristóf	ad8dd39bd3	aco: Fixup the NIR metadata after sanitize_cf_list. sanitize_cf_list can in fact invalidate the dominance metadata, which we need to use eg. nir_unsigned_upper_bound. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>	2021-05-12 13:47:04 +00:00
Timur Kristóf	00fd087f0a	aco: Allow workgroup barrier and shared scope for NGG shaders. NGG already needs to use workgroup barriers, but this commit allows them to come from NIR as opposed to just emitting it in ACO instruction selection. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>	2021-05-12 13:47:04 +00:00
Rhys Perry	a54f111831	radv,aco: compact vertex buffer descriptors It seems common for there to be holes. fossil-db (GFX10.3, robustBufferAccess enabled): Totals from 33791 (23.10% of 146267) affected shaders: (no statistics changed) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7871>	2021-05-10 12:09:14 +00:00
Rhys Perry	20a0744e22	Revert "radv,aco: don't use MUBUF for multi-channel loads on GFX8 with robustness2" This reverts commit `a8a6b9fb2f`. This is no longer necessary now that we fixup the size when creating the descriptors. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7871>	2021-05-10 12:09:14 +00:00
Rhys Perry	157c6b0f33	radv,aco: use per-attribute vertex descriptors for robustness We have to use a different num_records for each attribute to correctly implement robust buffer access. fossil-db (GFX10.3, robustBufferAccess enabled): Totals from 60059 (41.06% of 146267) affected shaders: VGPRs: 2169040 -> 2169024 (-0.00%); split: -0.02%, +0.02% CodeSize: 79473128 -> 81156016 (+2.12%); split: -0.00%, +2.12% MaxWaves: 1635360 -> 1635258 (-0.01%); split: +0.00%, -0.01% Instrs: 15559040 -> 15793205 (+1.51%); split: -0.01%, +1.52% Latency: 90954792 -> 91308768 (+0.39%); split: -0.30%, +0.69% InvThroughput: 14937873 -> 14958761 (+0.14%); split: -0.04%, +0.18% VClause: 444280 -> 412074 (-7.25%); split: -9.22%, +1.97% SClause: 588545 -> 644141 (+9.45%); split: -0.54%, +9.99% Copies: 1010395 -> 1011232 (+0.08%); split: -0.44%, +0.53% Branches: 274279 -> 274282 (+0.00%); split: -0.00%, +0.00% PreSGPRs: 1431171 -> 1405056 (-1.82%); split: -2.89%, +1.07% PreVGPRs: 1575253 -> 1575259 (+0.00%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7871>	2021-05-10 12:09:14 +00:00
Rhys Perry	dfa38fa0c7	aco: group loads from the same vertex binding into the same clause In the future, we might have vertex attribute loads from the same binding but with different descriptors. Since they will be loading from the same buffer, we should continue grouping them into clauses. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7871>	2021-05-10 12:09:14 +00:00
Tony Wasserka	741e84f554	aco/spill: Fix improper handling of exec phis The "continue" was placed in the wrong loop, leading to exec being counted as a spilled register when it wasn't. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `a56ddca4e8` ('aco: make all exec accesses non-temporaries') Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4533 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10486>	2021-05-03 10:31:07 +00:00
Rhys Perry	ee9b744cb5	radv,aco: use nir_address_format_vec2_index_32bit_offset The vec2 index helps the compiler make use of SMEM's SOFFSET field when loading descriptors. fossil-db (GFX10.3): Totals from 126326 (86.37% of 146267) affected shaders: VGPRs: 4898704 -> 4899088 (+0.01%); split: -0.02%, +0.03% SpillSGPRs: 13490 -> 14404 (+6.78%); split: -1.10%, +7.87% CodeSize: 306442996 -> 302277700 (-1.36%); split: -1.36%, +0.01% MaxWaves: 3277108 -> 3276624 (-0.01%); split: +0.01%, -0.02% Instrs: 58301101 -> 57469370 (-1.43%); split: -1.43%, +0.01% VClause: 1208270 -> 1199264 (-0.75%); split: -1.02%, +0.28% SClause: 2517691 -> 2432744 (-3.37%); split: -3.75%, +0.38% Copies: 3518643 -> 3161097 (-10.16%); split: -10.45%, +0.29% Branches: 1228383 -> 1228254 (-0.01%); split: -0.12%, +0.11% PreSGPRs: 3973880 -> 4031099 (+1.44%); split: -0.19%, +1.63% PreVGPRs: 3831599 -> 3831707 (+0.00%) Cycles: 1785250712 -> 1778222316 (-0.39%); split: -0.42%, +0.03% VMEM: 52873776 -> 50663317 (-4.18%); split: +0.18%, -4.36% SMEM: 8534270 -> 8361666 (-2.02%); split: +1.79%, -3.82% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9523>	2021-04-27 15:56:07 +00:00
Samuel Pitoiset	4c2add8cba	aco: adjust NGG if provoking vertex mode is last Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10449>	2021-04-27 07:31:03 +00:00
James Park	1351fcf3c3	amd: Fix warnings around variable sizes Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6162>	2021-04-23 10:37:22 +00:00
Timur Kristóf	74c467d988	aco: Mark VCC clobbered for iadd8 and iadd16 reductions on GFX6-7. On GFX6-7, the 8 and 16-bit integer add reductions use the 32-bit v_add instruction, which clobbers the VCC register. Cc: mesa-stable Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10346>	2021-04-22 11:29:49 +00:00
Rhys Perry	776ba40115	aco: add and use Program::progress This is used when printing the program and to avoid updating register demand during post-RA liveness analysis. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>	2021-04-21 11:09:33 +00:00
Rhys Perry	2d36232e62	aco: allow SDWA sels smaller than the operand size p_extract_vector copy-propagation can create byte sels for v2b operands. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>	2021-04-21 11:09:33 +00:00
Rhys Perry	655ba1e3a9	aco: don't update register demand during RA validation It isn't intended to be accurate after RA, so num_waves can become zero, breaking the sgpr_limit calculation. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>	2021-04-21 11:09:33 +00:00
Rhys Perry	0eaa5dfac0	aco: remove image parameter from get_sampler_desc() We can just check whether tex_instr is NULL instead. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>	2021-04-20 17:42:21 +00:00
Rhys Perry	3cbe9894f7	aco: set TRUNC_COORD=0 for nir_texop_tg4 Fixes black squares in Assassin's Creed: Valhalla and rendering of FidelityFX-CACAO demo. fossil-db (sienna cichlid): Totals from 3052 (2.09% of 146267) affected shaders: SpillSGPRs: 8437 -> 8646 (+2.48%) CodeSize: 30993832 -> 31116916 (+0.40%); split: -0.00%, +0.40% Instrs: 5869934 -> 5886783 (+0.29%); split: -0.00%, +0.29% Latency: 250330521 -> 250463770 (+0.05%); split: -0.00%, +0.05% InvThroughput: 59797617 -> 59814584 (+0.03%); split: -0.00%, +0.03% VClause: 92114 -> 92132 (+0.02%) SClause: 197373 -> 197338 (-0.02%); split: -0.02%, +0.01% Copies: 479482 -> 482394 (+0.61%); split: -0.01%, +0.61% Branches: 219629 -> 219635 (+0.00%) PreSGPRs: 248970 -> 249366 (+0.16%) fossil-db (polaris10): Totals from 3050 (2.06% of 147787) affected shaders: SGPRs: 282864 -> 282912 (+0.02%); split: -0.01%, +0.02% VGPRs: 242572 -> 242612 (+0.02%) SpillSGPRs: 10387 -> 10675 (+2.77%) CodeSize: 31872460 -> 31996128 (+0.39%) MaxWaves: 10924 -> 10925 (+0.01%) Instrs: 6222217 -> 6239072 (+0.27%) Latency: 317482545 -> 317773685 (+0.09%); split: -0.00%, +0.09% InvThroughput: 156149624 -> 156242072 (+0.06%); split: -0.00%, +0.06% VClause: 92295 -> 92254 (-0.04%); split: -0.05%, +0.01% SClause: 243342 -> 243321 (-0.01%); split: -0.01%, +0.00% Copies: 678902 -> 681700 (+0.41%); split: -0.00%, +0.41% Branches: 219698 -> 219703 (+0.00%) PreSGPRs: 244251 -> 244644 (+0.16%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: `58f25098a0` ("radv: Use TRUNC_COORD on samplers") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3110 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>	2021-04-20 17:42:21 +00:00
Samuel Pitoiset	9434675d60	aco: fix opquantize2f16 on GFX6-7 Make sure to preserve signed zeroes. Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero on GFX6 (Pitcairn). Untested on GFX7. Fixes: `54a09545ec` ("aco: optimize a*0.0") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>	2021-04-19 16:33:37 +00:00
Marek Olšák	ec1ddb976a	amd/registers: rename IMG_FORMAT to GFX10_FORMAT to disambiguate the meaning Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>	2021-04-17 02:37:49 +00:00
Marek Olšák	b878444c3a	amd: drop support for LLVM 10 It doesn't support RDNA 2. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>	2021-04-16 09:25:19 +00:00
Samuel Pitoiset	936b58378c	amd: drop support for LLVM 8 It doesn't support Navi1x and the removal enables this nice code cleanup. v2: rebase - mareko Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1) Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>	2021-04-16 09:25:19 +00:00
Michel Dänzer	d200f45875	Use explicit break instead of fall-through to break-only case clang generates a warning if there's no explicit break or fall-through annotation. The latter would be kind of silly in this case, and not robust against any future changes turning the fall-through invalid. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>	2021-04-15 16:01:22 +00:00
Michel Dänzer	2928c21eb7	Convert most remaining free-form fall-through comments to FALLTHROUGH One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is explicitly disabled. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>	2021-04-15 16:01:22 +00:00
Rhys Perry	5b8a4516e6	aco/ra: remove live-in temporary from live_out_per_block when moving it Otherwise, handle_loop_phis() might pass it to handle_live_in() and then we could have two phis for this variable. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `7c64623e94` ("aco/ra: refactor SSA repairing during register allocation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10236>	2021-04-14 19:04:08 +00:00
Rhys Perry	11fde1247c	aco/ra: use original names when renaming loop carried phi operands Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `7c64623e94` ("aco/ra: refactor SSA repairing during register allocation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10236>	2021-04-14 19:04:08 +00:00
Timur Kristóf	f3e004cb56	aco: Add a simple heuristic to decide early or late primitive export. Late export is theoretically better if used with LATE_ALLOC, but in practice, the early export has an advantage of lower register usage, therefore more concurrent waves. The idea of this commit is that "small" shaders benefit from early primitive export more, due to being able to launch much more waves. Let's consider a NIR shader "small" when it has only 1 block. This yields both better performance, and better stats, than always using late export. Fossil DB on Sienna: Totals from 12807 (8.76% of 146265) affected shaders: VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83% SpillSGPRs: 1458 -> 1538 (+5.49%) CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14% MaxWaves: 282902 -> 278516 (-1.55%) Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18% VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30% SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16% Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26% Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26% PreSGPRs: 434701 -> 447396 (+2.92%) PreVGPRs: 527783 -> 540527 (+2.41%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>	2021-04-14 14:25:10 +00:00
Timur Kristóf	5dbab03a80	aco: Emit fewer branches for NGG VS/TES with late primitive export. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>	2021-04-14 14:25:10 +00:00
Timur Kristóf	af7d5f5b86	aco: Set block_kind_export_end in create_vs/fs_exports. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>	2021-04-14 14:25:10 +00:00
Timur Kristóf	2b312a4fd7	aco: Extract ngg_nogs_export_prim_id to a separate function. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>	2021-04-14 14:25:10 +00:00
Timur Kristóf	231ef14b3d	aco: Use s_setprio 3 at the beginning of every VS and TES. The user-set priority of shaders matters very little, but we hope this might still help speed up VS input loads especially. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>	2021-04-14 14:25:10 +00:00
Timur Kristóf	4c86c7aa15	aco: Remove useless s_setprio near gs_alloc_req. We learned that the gs_alloc_req is not actually when the export space allocation happens. So it makes no sense to prioritize it. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>	2021-04-14 14:25:10 +00:00
Timur Kristóf	75cd43741a	aco: Align NGG scratch size to 16 so a single ds_read can always read it. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>	2021-04-14 14:05:24 +00:00
Timur Kristóf	c1346e5c22	aco: Optimize workgroup exclusive scan to better avoid bank conflicts. Previously, every wave had multiple active lanes read the LDS, and the data was processed by VALU DPP instructions. Now, only the first lane reads the LDS in order to avoid bank conflicts, and the results are processed by SALU. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>	2021-04-14 14:05:24 +00:00
Daniel Schürmann	b6a28aaa8b	aco/cssa: don't create parallelcopies for constants and exec if we are able to spill these directly. Totals from 4913 (3.60% of 136546) affected shaders (Raven): SpillSGPRs: 16021 -> 15451 (-3.56%); split: -3.87%, +0.31% CodeSize: 58102020 -> 57371464 (-1.26%); split: -1.26%, +0.00% Instrs: 11411454 -> 11230105 (-1.59%); split: -1.59%, +0.00% Latency: 555706331 -> 550058635 (-1.02%); split: -1.07%, +0.05% InvThroughput: 273023354 -> 271854469 (-0.43%); split: -0.44%, +0.01% SClause: 385168 -> 385371 (+0.05%); split: -0.01%, +0.06% Copies: 1342084 -> 1175762 (-12.39%); split: -12.40%, +0.01% Branches: 392619 -> 378662 (-3.55%); split: -3.56%, +0.00% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>	2021-04-13 18:40:57 +00:00
Daniel Schürmann	18ba93e673	aco/cssa: rewrite lower_to_cssa pass The previous pass was based on misconceptions and rounded up with bug fixes. The new pass is entirely rewritten and basically just one-to-one from the paper: "Revisiting Out-of-SSA Translation for Correctness, CodeQuality, and Efficiency" by B. Boissinot et al. It also incorporates the value-equality testing. The regressions are mainly due to creating parallelcopies for exec phis at loop headers (mitigated in the next commit). Totals from 4933 (3.61% of 136546) affected shaders (Raven): SpillSGPRs: 16249 -> 16527 (+1.71%); split: -0.28%, +1.99% SpillVGPRs: 1771 -> 1595 (-9.94%) CodeSize: 57544436 -> 58280304 (+1.28%); split: -0.00%, +1.28% Scratch: 176128 -> 179200 (+1.74%) Instrs: 11265783 -> 11445884 (+1.60%); split: -0.00%, +1.60% Latency: 552596156 -> 555880540 (+0.59%); split: -0.53%, +1.13% InvThroughput: 271431862 -> 273097423 (+0.61%); split: -0.18%, +0.79% VClause: 160240 -> 160241 (+0.00%); split: -0.02%, +0.02% SClause: 386863 -> 386685 (-0.05%); split: -0.07%, +0.02% Copies: 1180801 -> 1345633 (+13.96%); split: -0.02%, +13.98% Branches: 379129 -> 393052 (+3.67%); split: -0.01%, +3.69% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>	2021-04-13 18:40:57 +00:00
Daniel Schürmann	9d73a4a412	aco: add new reindex_ssa() pass Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>	2021-04-13 18:40:57 +00:00
Daniel Schürmann	d75c73e6a6	aco: fix kill flags on phi operands Fossil-db changes are likely due to how the CSSA pass works. Totals from 1782 (1.31% of 136546) affected shaders (Raven): CodeSize: 25333292 -> 25294020 (-0.16%); split: -0.16%, +0.00% Instrs: 4916059 -> 4908218 (-0.16%); split: -0.16%, +0.00% Latency: 282860167 -> 282707176 (-0.05%); split: -0.08%, +0.03% InvThroughput: 136487564 -> 136394958 (-0.07%); split: -0.12%, +0.05% VClause: 74791 -> 74795 (+0.01%) Copies: 542115 -> 534280 (-1.45%); split: -1.48%, +0.04% Branches: 168977 -> 168966 (-0.01%); split: -0.01%, +0.01% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>	2021-04-13 18:40:57 +00:00
Daniel Schürmann	13e4fed01f	aco: lower p_spill with constants correctly Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>	2021-04-13 18:40:57 +00:00

1 2 3 4 5 ...

1462 commits