fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 22:28:06 +02:00

Author	SHA1	Message	Date
Rhys Perry	c3dd1931d9	aco: allow Builder::Result to be dereferenced Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>	2023-01-10 16:01:38 +00:00
Rhys Perry	e386523380	aco/gfx11: fix discard early exit removal optimization This optimization never happened because the NULL target was removed in GFX11. fossil-db (gfx1100): Totals from 5439 (4.04% of 134574) affected shaders: Instrs: 407865 -> 387123 (-5.09%) CodeSize: 2163340 -> 2060644 (-4.75%) Latency: 3432378 -> 3327802 (-3.05%) InvThroughput: 270133 -> 262980 (-2.65%) Branches: 8524 -> 3085 (-63.81%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20513>	2023-01-10 14:01:29 +00:00
Rhys Perry	810ced93f3	aco: align scratch size during assembly This lets us use less scratch if both VGPR spilling and scratch intrinsics are used. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>	2023-01-09 21:46:13 +00:00
Rhys Perry	c9846158cd	aco/gfx11: reduce scratch allocation alignment fossil-db (gfx1100): Totals from 112 (0.08% of 134574) affected shaders: Scratch: 1513472 -> 1455360 (-3.84%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>	2023-01-09 21:46:13 +00:00
Georg Lehmann	c241980751	aco: Mark more instructions as 16bit on GFX10. p_cvt_f16_f32_rtne will be lowered to v_cvt_f16_f32 and we already know that preserves the high bits. I tested the others on GFX1036. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20574>	2023-01-09 18:54:35 +00:00
Rhys Perry	b64afc1d37	aco: use s_delay_alu skip field fossil-db (gfx1100): Totals from 130066 (96.65% of 134574) affected shaders: Instrs: 80208817 -> 71420648 (-10.96%) CodeSize: 403523036 -> 368370360 (-8.71%) Latency: 658064779 -> 657935384 (-0.02%); split: -0.02%, +0.00% InvThroughput: 87698268 -> 87693326 (-0.01%); split: -0.01%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>	2023-01-09 18:22:59 +00:00
Rhys Perry	e2f083c0a7	aco: add more dependency instructions under waitcnt class This makes these instructions free when considering pipeline statistics and s_delay_alu insertion. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>	2023-01-09 18:22:59 +00:00
Rhys Perry	c8357136d4	aco: improve parse_delay_alu Use gpr_map to determine how many cycles each dependency of the s_delay_alu needs. This information helps the pass avoid further s_delay_alu instructions. fossil-db (gfx1100): Totals from 13097 (9.73% of 134574) affected shaders: Instrs: 30711894 -> 30702692 (-0.03%) CodeSize: 153462500 -> 153425692 (-0.02%) Latency: 372758612 -> 372741922 (-0.00%) InvThroughput: 50164111 -> 50160717 (-0.01%); split: -0.01%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>	2023-01-09 18:22:59 +00:00
Rhys Perry	9e55b3b790	aco/gfx11: update s_code_end padding Match ac_rtld_open(). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Cc: 22.3 <mesa-stable> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20536>	2023-01-06 16:09:51 +00:00
Georg Lehmann	39b7502f04	aco: Use v_mov_b16 on GFX11. Foz-DB GFX1100: Totals from 4684 (3.47% of 134913) affected shaders: CodeSize: 41086444 -> 41043476 (-0.10%) Instrs: 8176019 -> 8175995 (-0.00%) Latency: 83792071 -> 83792023 (-0.00%) InvThroughput: 10311371 -> 10311369 (-0.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20369>	2023-01-03 22:49:46 +00:00
Daniel Schürmann	83b31b11a5	aco: Reassign dead definitions of p_split_vector to associated register Any unused split_vector definition can always use the same register as the operand. This avoids creating unnecessary copies. Fossil DB stats on Rembrandt (RDNA2): Totals from 3904 (2.89% of 134906) affected shaders: CodeSize: 18326692 -> 18271688 (-0.30%) Instrs: 3386632 -> 3372888 (-0.41%) Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00% InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00% Copies: 224301 -> 210559 (-6.13%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:04:07 +01:00
Timur Kristóf	75b1027722	aco: Try to reassign split vector registers post-RA. Eliminate unnecessary copies when the operand registers of a p_split_vector instruction are not clobbered between the p_split_vector and the user of its definitions. This happens when p_split_vector doesn't kill its operand and its definitions have a shorter lifespan that the operand. It affects every NGG culling shader among other things. This optimization exists because it's too difficult to solve it in RA, and should be removed after we solved this in RA. v2 by Daniel Schürmann: - Rearrange and simplify conditions for the new optimization - Fix a few bugs v3 by Daniel Schürmann: - Check number of encoded ALU operands Fossil DB stats on Rembrandt (RDNA2): Totals from 64896 (48.10% of 134906) affected shaders: CodeSize: 175693348 -> 175434944 (-0.15%) Instrs: 33333912 -> 33269388 (-0.19%) Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00% InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00% Copies: 2806550 -> 2742038 (-2.30%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:04:07 +01:00
Timur Kristóf	3d29779a25	aco/optimizer_postRA: Distinguish overwritten untrackable and subdword. This allows is_overwritten_since to return false when the last writer instruction of a register can't be tracked but we know it wasn't written in the current block. Fossil DB stats on Rembrandt (RDNA2): Totals from 1163 (0.86% of 134906) affected shaders: CodeSize: 9815920 -> 9805016 (-0.11%) Instrs: 1843688 -> 1840962 (-0.15%) Latency: 19219153 -> 19209171 (-0.05%) InvThroughput: 3354375 -> 3353852 (-0.02%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:04:07 +01:00
Daniel Schürmann	d3b0f78110	aco/optimizer_postRA: Initialize loop header with preheader information This works because of SSA and should be safer than just setting 'not_written_yet'. No Fossil DB changes on Rembrandt (RDNA2). Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:03:57 +01:00
Daniel Schürmann	8f4eccb138	aco: fix reset_block_regs() in postRA-optimizer Accidentally, we picked the index of the predecessors instead of the predecessors. Totals from 8496 (6.30% of 134913) affected shaders: (GFX10.3) CodeSize: 64070724 -> 64022516 (-0.08%); split: -0.08%, +0.00% Instrs: 11932750 -> 11920698 (-0.10%); split: -0.10%, +0.00% Latency: 144040266 -> 144017062 (-0.02%); split: -0.02%, +0.00% InvThroughput: 29327735 -> 29326421 (-0.00%); split: -0.00%, +0.00% Fossil DB stats on Rembrandt (RDNA2): Totals from 4488 (3.33% of 134906) affected shaders: CodeSize: 42759736 -> 42735392 (-0.06%); split: -0.06%, +0.00% Instrs: 7960522 -> 7954436 (-0.08%); split: -0.08%, +0.00% Latency: 96192647 -> 96172571 (-0.02%); split: -0.02%, +0.00% InvThroughput: 19313576 -> 19312575 (-0.01%); split: -0.01%, +0.00% Fixes: `75967a4814` ('aco/optimizer_postRA: Speed up reset_block() with predecessors.') Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:03:51 +01:00
Rhys Perry	98e83f19f9	aco/gfx11: implement load_input_vertex Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20341>	2022-12-16 17:45:34 +00:00
Rhys Perry	192486b7aa	aco/gfx11: export mrtz in discard early exit for non-color shaders If a shader doesn't export any color targets and instead only exports mrtz, the discard early exit block should match. Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in The Witcher 3 (classic). https://reviews.llvm.org/D128185 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: `bc8da20dda` ("aco: export MRT0 instead of NULL on GFX11") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>	2022-12-16 15:35:28 +00:00
Timur Kristóf	db5c3f170f	aco: Emulate Wave64 bpermute on GFX11. Similar to emit_gfx10_wave64_bpermute, but uses the new v_permlane64_b32 instruction to swap data between wave halves. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Timur Kristóf	853e76f007	aco: Stylistic changes to emit_gfx10_wave64_bpermute. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Timur Kristóf	640e801651	aco: Split opcodes for GFX6 and GFX10 emulated bpermute. Different sequences are emitted for these, so it makes sense to have different opcodes too. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Timur Kristóf	614348f28b	aco: Don't accept constants on p_bpermute. The sequence emitted for this pseudo instruction is not ready to handle constants or literals at all. Cc: mesa-stable Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Marek Olšák	716ac4a55d	nir: replace IS_SWIZZLED flag with ACCESS_IS_SWIZZLED_AMD Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>	2022-12-13 20:33:05 +00:00
Marek Olšák	7998c3bdd3	nir: remove redundant SLC_AMD in favor of ACCESS_STREAM_CACHE_POLICY ACCESS_STREAM_CACHE_POLICY was added to map to SLC for AMD. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>	2022-12-13 20:33:05 +00:00
Rhys Perry	20e670d060	aco/ra: don't swap create_vector operand with definition blocker for SGPRs There is no SGPR swap instruction, we always need 3 XORs. fossil-db (navi21): Totals from 76 (0.06% of 135636) affected shaders: Instrs: 58400 -> 58347 (-0.09%); split: -0.10%, +0.01% CodeSize: 312580 -> 312368 (-0.07%); split: -0.08%, +0.01% Latency: 843333 -> 843180 (-0.02%); split: -0.02%, +0.00% InvThroughput: 126431 -> 126412 (-0.02%) Copies: 4008 -> 3955 (-1.32%); split: -1.47%, +0.15% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>	2022-12-09 15:58:43 +00:00
Rhys Perry	a05dd58309	aco/ra: don't swap p_create_vector operand with definition blocker for scc SCC is 1-bit, and we can't copy a 32-bit value into it. Fixes dEQP-VK.spirv_assembly.type.scalar.i32.iequal_tesse with ACO_DEBUG=noopt. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `9476986e6f` ("aco/ra: special-case get_reg_for_create_vector_copy()") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>	2022-12-09 15:58:43 +00:00
Samuel Pitoiset	011a0b97b2	radv,aco: move radv_ps_epilog_key to the graphics pipeline key To avoid redundant structs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>	2022-12-08 13:28:00 +00:00
Samuel Pitoiset	b7f49de625	radv,aco: use 8-bit for color_is_int{8,10} everywhere Do not need 32-bits because there is only up to 8 MRTs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>	2022-12-08 13:28:00 +00:00
Samuel Pitoiset	9079bd821c	radv,aco: rename color output related fields for consistency Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>	2022-12-08 13:28:00 +00:00
Tatsuyuki Ishi	327c906424	aco: Migrate RA to use std::optional The use of std::optional simplifies expressions and would be useful for some upcoming RA tweaks. C++17 has been available since the merge of rusticl and should be safe to use as far as packaging is concerned. A few style choices are: - Testing for emptiness uses implicit bool conversion. - Constructing an empty value uses {}. - Constructing a filled value uses the implicit conversion constructor. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20125>	2022-12-08 12:08:01 +00:00
Bas Nieuwenhuizen	513442dc32	aco: Add s_delay_alu support for GFX11+ Roughly copied from LLVM. This facilitates better ALU usage by switching between waves when there is an ALU stall, which isn't automatic anymore on GFX11. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>	2022-12-07 22:05:25 +00:00
Bas Nieuwenhuizen	cd3bf56ace	aco: Add helper to get cycle info for an instruction. For use in s_delay_alu tracking Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>	2022-12-07 22:05:25 +00:00
Bas Nieuwenhuizen	352e492c7b	aco: Add isTrans helper. For the s_delay_alu tracking. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>	2022-12-07 22:05:25 +00:00
Rhys Perry	bd30adf89d	aco: apply NUW to additions for scratch access fossil-db (navi21): Totals from 52 (0.04% of 135636) affected shaders: Instrs: 79036 -> 78567 (-0.59%) CodeSize: 431188 -> 427984 (-0.74%) Latency: 1318142 -> 1313821 (-0.33%) InvThroughput: 293842 -> 292836 (-0.34%) VClause: 2555 -> 2361 (-7.59%); split: -8.06%, +0.47% Copies: 8746 -> 8767 (+0.24%); split: -0.11%, +0.35% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>	2022-12-06 15:23:38 +00:00
Rhys Perry	381de3c809	aco: more carefully apply constant offsets into scratch accesses Death stranding does scratch_arr[80-idx]. This doesn't seem to work if we try to combine the subtraction into the access. fossil-db (navi21): Totals from 52 (0.04% of 135636) affected shaders: Instrs: 78560 -> 79036 (+0.61%) CodeSize: 427940 -> 431188 (+0.76%) Latency: 1313809 -> 1318142 (+0.33%) InvThroughput: 292833 -> 293842 (+0.34%) VClause: 2361 -> 2555 (+8.22%); split: -0.51%, +8.73% Copies: 8767 -> 8746 (-0.24%); split: -0.35%, +0.11% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `0e783d687a` ("aco: use scratch_* for scratch load/store on GFX9+") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7735 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>	2022-12-06 15:23:38 +00:00
Samuel Pitoiset	da32cbb5c6	aco: fix missing uses of MRT output flags Fixes regressions on GFX6 and the RAGE2 workaround. Fixes: `a297ac10a4` ("radv,aco: stop lowering FS outputs in NIR") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20154>	2022-12-05 15:01:19 +00:00
Samuel Pitoiset	a297ac10a4	radv,aco: stop lowering FS outputs in NIR This was a bad idea because: - it diverges too much with the fragment shader epilog - it doesn't allow to implement alpha-to-coverage via MRTZ correctly - it was supposed to be used by LLVM but this never happened Reverting this back allows us to fix alpha-to-coverage via MRTZ on GFX11 easily, including for fragment shader epilogs. fossils-db (NAVI21): Totals from 20411 (15.13% of 134913) affected shaders: VGPRs: 972056 -> 971400 (-0.07%); split: -0.08%, +0.01% CodeSize: 92284804 -> 92295392 (+0.01%); split: -0.05%, +0.06% MaxWaves: 465010 -> 465166 (+0.03%); split: +0.03%, -0.00% Instrs: 17034162 -> 17034963 (+0.00%); split: -0.00%, +0.01% Latency: 252013190 -> 251971764 (-0.02%); split: -0.03%, +0.02% InvThroughput: 45859625 -> 45842556 (-0.04%); split: -0.04%, +0.01% VClause: 324627 -> 324629 (+0.00%); split: -0.03%, +0.03% SClause: 672918 -> 672826 (-0.01%); split: -0.05%, +0.04% Copies: 1172126 -> 1158152 (-1.19%); split: -1.20%, +0.01% Branches: 420602 -> 420604 (+0.00%); split: -0.00%, +0.00% PreSGPRs: 1025441 -> 1025481 (+0.00%) PreVGPRs: 861787 -> 860650 (-0.13%); split: -0.17%, +0.03% Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>	2022-12-05 08:22:28 +00:00
Samuel Pitoiset	3be728f1d0	aco: fix indexing MRT0 alpha channel for alpha-to-coverage via MRTZ on GFX11 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>	2022-12-05 08:22:28 +00:00
Samuel Pitoiset	20856bfe0f	aco: always use 32-bit for exporting alpha-to-coverage via MRTZ on GFX11 16-bit isn't possible. Note that this is currently style broken for compressed formats because the w channel is never written to. Ported from RadeonSI ('radeonsi/gfx11: fix alpha-to-coverage with stencil or samplemask export') Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>	2022-12-05 08:22:28 +00:00
Bas Nieuwenhuizen	89663828ea	aco: Don't use v_lshrrev_b64 for moves on GFX11. Looking at VOPD things, shifts are not very likely to get dual issued but plain moves are. Looking at RDNA2 v_lshrrev_b64 are half the perf of v_mov_b32 (but you need twice as many moves), so on GFX11 this likely reaches the threshold where moves are faster. Totals from 68400 (50.70% of 134906) affected shaders: CodeSize: 275489516 -> 275459536 (-0.01%); split: -0.01%, +0.00% Instrs: 51775474 -> 51991286 (+0.42%) Latency: 589884847 -> 589066439 (-0.14%); split: -0.15%, +0.01% InvThroughput: 127154986 -> 126037619 (-0.88%); split: -0.88%, +0.00% Copies: 3756157 -> 3976193 (+5.86%) Branches: 1259604 -> 1260072 (+0.04%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>	2022-12-02 13:25:57 +00:00
Bas Nieuwenhuizen	91fe2a2361	aco: Use more detailed wave64 timing for GFX10+. Also nabbed some dual issue stuff for GFX11 from LLVM. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>	2022-12-02 13:25:57 +00:00
Rhys Perry	9b6ab40b3b	aco: improve do_pack_2x16() with zero constants We can skip the v_or_b32 or use an instruction smaller than v_alignbyte_b32. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>	2022-12-01 21:43:28 +00:00
Rhys Perry	917cfd587c	aco: use v_minmax/v_maxmin opcodes fossil-db (gfx1100): Totals from 29868 (22.12% of 135032) affected shaders: MaxWaves: 741336 -> 741344 (+0.00%) Instrs: 34624902 -> 34539766 (-0.25%); split: -0.25%, +0.00% CodeSize: 187196804 -> 187192100 (-0.00%); split: -0.01%, +0.01% VGPRs: 1816860 -> 1816788 (-0.00%); split: -0.01%, +0.01% Latency: 502597202 -> 502245627 (-0.07%); split: -0.08%, +0.01% InvThroughput: 84813176 -> 84586122 (-0.27%); split: -0.28%, +0.01% VClause: 633826 -> 633749 (-0.01%); split: -0.02%, +0.01% SClause: 1317738 -> `1317047` (-0.05%); split: -0.06%, +0.01% Copies: 2130610 -> 2130954 (+0.02%); split: -0.03%, +0.05% Branches: 766093 -> 765969 (-0.02%); split: -0.02%, +0.00% PreSGPRs: 1630250 -> 1630034 (-0.01%); split: -0.02%, +0.00% PreVGPRs: 1590777 -> 1590664 (-0.01%); split: -0.01%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>	2022-12-01 21:43:28 +00:00
Rhys Perry	dfbc8e0192	aco: change order in combine_minmax() Prepare for future optimizations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>	2022-12-01 21:43:28 +00:00
Rhys Perry	ce5838599d	aco/gfx11: use v_cvt_i32_i16/v_cvt_u32_u16 fossil-db (gfx1100): Totals from 52753 (39.07% of 135032) affected shaders: CodeSize: 153603860 -> 153163384 (-0.29%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>	2022-12-01 21:43:28 +00:00
Georg Lehmann	a3beb82cf6	aco: Use wave size specific opcode for s_or in cube map coord code. Cc: mesa-stable Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20041>	2022-12-01 01:39:27 +00:00
Georg Lehmann	22be0d09a0	aco: Don't prematurely emit s_andn2. Split s_not + s_and allows more inverse comparision and s_cbranch_vccz optimizations. Foz-DB Navi21: Totals from 516 (0.38% of 134913) affected shaders: CodeSize: 7273724 -> 7273720 (-0.00%) Instrs: 1364408 -> 1364407 (-0.00%) Latency: 14604862 -> 14604858 (-0.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19143>	2022-11-30 18:25:15 +00:00
Rhys Perry	0cb48ec3b7	radv,aco: remove old streamout code Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>	2022-11-29 14:28:11 +00:00
Rhys Perry	3a96977542	radv,aco: remove old GS copy shader code Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>	2022-11-29 14:28:11 +00:00
Rhys Perry	17bd2721e6	radv,aco: implement GS copy shaders using NIR Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>	2022-11-29 14:28:11 +00:00

... 5 6 7 8 9 ...

2571 commits