fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 15:48:19 +02:00

Author	SHA1	Message	Date
Rhys Perry	44fdd2ebcb	aco: end reduce tmp after control flow, when used within control flow In the case of: v0 = start_linear_vgpr if (...) { } else { use_linear_vgpr(v0) } v0 = phi We need a p_end_linear_vgpr to ensure that the phi does not use the same VGPR as the linear VGPR. fossil-db (gfx1100): Totals from 3763 (2.80% of 134574) affected shaders: MaxWaves: 90296 -> 90164 (-0.15%) Instrs: 6857726 -> 6856608 (-0.02%); split: -0.03%, +0.01% CodeSize: 35382188 -> 35377688 (-0.01%); split: -0.02%, +0.01% VGPRs: 234864 -> 235692 (+0.35%); split: -0.01%, +0.36% Latency: 47471923 -> 47474965 (+0.01%); split: -0.03%, +0.04% InvThroughput: 5640320 -> 5639736 (-0.01%); split: -0.04%, +0.03% VClause: 93098 -> 93107 (+0.01%); split: -0.01%, +0.02% SClause: 214137 -> 214130 (-0.00%); split: -0.00%, +0.00% Copies: 369895 -> 369305 (-0.16%); split: -0.31%, +0.15% Branches: 164996 -> 164504 (-0.30%); split: -0.30%, +0.00% PreVGPRs: 210655 -> 211438 (+0.37%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20621>	2023-02-01 15:45:22 +00:00
Rhys Perry	695cf75266	aco: set has_color_exports with GPL Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: `192486b7aa` ("aco/gfx11: export mrtz in discard early exit for non-color shaders") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20937>	2023-01-27 16:51:56 +00:00
Erik Faye-Lund	d54c8a47c6	meson: avoid using deprecated build_root() method The meson.build_root() method has been deprecated, so let's switch to meson.project_build_root(), which usually means the same thing. The case where it doesn't do the same thing is if Mesa is a subproject to some other project, but in that case I believe we want the build root of Mesa, not of the parent project anyway. Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20907>	2023-01-27 11:35:50 +00:00
Timur Kristóf	c644461b71	radv, aco, ac: Implement pack_half_2x16_rtz_split. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>	2023-01-26 12:24:24 +00:00
Timur Kristóf	9fc5d8d211	aco: Remove dynamic VS input loads. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20733>	2023-01-26 02:43:11 +00:00
Timur Kristóf	81620fc7b0	aco: Enable constant exec mask based optimization on compute shaders. We know for sure exec is initially -1 when the shader always has full subgroups. Fossil DB stats on GFX11: Totals from 3884 (2.88% of 134913) affected shaders: SpillSGPRs: 1673 -> 1697 (+1.43%); split: -1.67%, +3.11% SpillVGPRs: 2316 -> 2310 (-0.26%); split: -0.65%, +0.39% CodeSize: 19584436 -> 19567156 (-0.09%); split: -0.13%, +0.04% Scratch: 217088 -> 216832 (-0.12%) Instrs: 3784596 -> 3780303 (-0.11%); split: -0.15%, +0.03% Latency: 39971204 -> 39794967 (-0.44%); split: -0.47%, +0.03% InvThroughput: 7885552 -> 7801247 (-1.07%); split: -1.14%, +0.07% VClause: 74654 -> 74611 (-0.06%); split: -0.07%, +0.01% SClause: 103139 -> 103043 (-0.09%); split: -0.13%, +0.04% Copies: 279864 -> 281995 (+0.76%); split: -0.72%, +1.48% Branches: 92082 -> 92084 (+0.00%); split: -0.03%, +0.03% PreSGPRs: 155637 -> 149491 (-3.95%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>	2023-01-26 01:59:26 +00:00
Timur Kristóf	39448c8e9c	radv, aco: Add uses_full_subgroups to compute shader info. Allow the compiler to assume that the shader always has full subgroups, meaning that the initial EXEC mask is -1 in all waves (all lanes enabled). This assumption is incorrect for ray tracing and internal (meta) shaders because they can use unaligned dispatch. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>	2023-01-26 01:59:26 +00:00
Georg Lehmann	e527f686ca	Revert "aco: Combine v_cvt_u32_f32 with insert to v_cvt_pk_u8_f32." This reverts commit `6d02054047`. v_cvt_pk_u8_f32 returns 0xff instead of v_cvt_u32_f32 & 0xff if the input is larger than 255. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8128 Cc: mesa-stable Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20829>	2023-01-23 16:22:55 +00:00
Rhys Perry	26e4621fa2	aco/tests: update assembler tests for latest LLVM 16 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20747>	2023-01-23 12:30:28 +00:00
Rhys Perry	b0fa106dc6	aco/tests: fix assembler.gfx11.vop12c_v128 with LLVM 15 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8089 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20747>	2023-01-23 12:30:28 +00:00
Dylan Baker	3c5e969144	meson: replace uses of ExternalProgram.path with .full_path The former is deprecated Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20409>	2023-01-19 16:29:03 +00:00
Rhys Perry	068c84f275	aco: add support for fp32 addition atomics Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19810>	2023-01-17 17:39:15 +00:00
Timur Kristóf	faba30a8f3	aco/optimizer: Optimize p_extract + v_mul_u32_u24 to v_mad_u32_u16. This should perform the same but removes SDWA from the address calculations in NGG culling shaders for example. This is done because SDWA is no longer available on GFX11. Fossil DB stats on GFX1100: Totals from 36 (0.03% of 134913) affected shaders: CodeSize: 300968 -> 300884 (-0.03%); split: -0.04%, +0.01% Instrs: 60955 -> 60863 (-0.15%); split: -0.15%, +0.00% Latency: 426809 -> 426819 (+0.00%); split: -0.06%, +0.06% InvThroughput: 39076 -> 39025 (-0.13%); split: -0.14%, +0.01% VClause: 1440 -> 1443 (+0.21%) Copies: 5714 -> 5725 (+0.19%) Fossil DB stats on GFX1100 with NGG culling enabled: Totals from 60953 (45.18% of 134913) affected shaders: VGPRs: 2273172 -> 2273160 (-0.00%) CodeSize: 186401864 -> 186403036 (+0.00%); split: -0.00%, +0.00% Instrs: 37038048 -> 36977353 (-0.16%); split: -0.16%, +0.00% Latency: 146466770 -> 146350172 (-0.08%); split: -0.08%, +0.00% InvThroughput: 15342790 -> 15228585 (-0.74%); split: -0.74%, +0.00% VClause: 669662 -> 669665 (+0.00%) Copies: 2972380 -> 2972482 (+0.00%); split: -0.01%, +0.01% Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17924>	2023-01-16 19:27:39 +00:00
Timur Kristóf	171d76ded1	aco/optimizer: Add missing v_lshlrev condition to can_apply_extract. This was already handled by apply_extract but missing from can_apply_extract, therefore may not be properly applied everywhere. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17924>	2023-01-16 19:27:39 +00:00
Rhys Perry	39c214769b	aco: restore semantic_can_reorder for GS output stores Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20296>	2023-01-16 17:25:51 +00:00
Rhys Perry	18d3e4fecd	radv,aco: use ac_nir_lower_legacy_gs Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20296>	2023-01-16 17:25:51 +00:00
Bas Nieuwenhuizen	edca10e9c9	aco: Pass correct number of coords to Vega 1D LOD instruction. If we pass a physical 2D texture descriptor we should also pass 2 coords. Otherwise it just uses the random content in the second register which ends up funny sometimes. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20696>	2023-01-13 16:55:06 +01:00
Samuel Pitoiset	e11e68b56b	radv,aco: fix enable_mrt_output_nan_fixup for RAGE2 again Driver workarounds for game bugs can be easily broken. This one shouldn't be applied to meta shaders and this restores previous logic. Fixes: `da32cbb5c6` ("aco: fix missing uses of MRT output flags") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20637>	2023-01-11 15:55:32 +00:00
Georg Lehmann	2b28983c5d	aco: Use NSA on GFX11 with more than 5 vaddr registers. On GFX11 the first 4 vaddr are single registers and the last contains the remaining vector. image_bvh64_intersect_ray has a special NSA layout. Foz-DB GFX1100: Totals from 2763 (2.05% of 134913) affected shaders: VGPRs: 145884 -> 145056 (-0.57%); split: -1.03%, +0.46% CodeSize: 18406864 -> 18326136 (-0.44%); split: -0.47%, +0.04% MaxWaves: 76030 -> 76146 (+0.15%) Instrs: 3559785 -> 3525287 (-0.97%); split: -0.97%, +0.00% Latency: 44278460 -> 43303419 (-2.20%); split: -2.33%, +0.13% InvThroughput: 4966295 -> 4914927 (-1.03%); split: -1.04%, +0.01% VClause: 51755 -> 51991 (+0.46%); split: -0.05%, +0.50% SClause: 105241 -> 105267 (+0.02%); split: -0.08%, +0.10% Copies: 214141 -> 182419 (-14.81%); split: -14.82%, +0.01% Branches: 69525 -> 69521 (-0.01%) PreVGPRs: 120910 -> 120256 (-0.54%); split: -0.56%, +0.02% No changes on Navi21. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20370>	2023-01-11 00:00:38 +00:00
Georg Lehmann	9538d523b6	aco: Validate GFX11 NSA correctly. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20370>	2023-01-11 00:00:38 +00:00
Georg Lehmann	9abe4850ba	aco: Handle NSA with vectors in get_mimg_nsa_dwords. No Foz-DB changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20370>	2023-01-11 00:00:38 +00:00
Rhys Perry	b1e59646de	aco/gfx11: increase vgpr_limit to 256 fossil-db (gfx1100): Totals from 280 (0.21% of 134574) affected shaders: MaxWaves: 3124 -> 2846 (-8.90%); split: +3.46%, -12.36% Instrs: 1139038 -> 1091407 (-4.18%); split: -4.18%, +0.00% CodeSize: 5809332 -> 5486812 (-5.55%); split: -5.55%, +0.00% VGPRs: 35004 -> 42864 (+22.45%); split: -1.85%, +24.31% SpillSGPRs: 1896 -> 1865 (-1.64%); split: -2.37%, +0.74% SpillVGPRs: 17807 -> 2382 (-86.62%) Scratch: 2573312 -> 736256 (-71.39%) Latency: 27470485 -> 17981296 (-34.54%); split: -34.54%, +0.00% InvThroughput: 5606102 -> 6527051 (+16.43%); split: -4.19%, +20.61% VClause: 32319 -> 19927 (-38.34%); split: -39.13%, +0.78% SClause: 15014 -> 14897 (-0.78%); split: -0.95%, +0.17% Copies: 102977 -> 93511 (-9.19%); split: -9.93%, +0.74% Branches: 15164 -> 14969 (-1.29%) PreSGPRs: 19132 -> 19014 (-0.62%) PreVGPRs: 30494 -> 37460 (+22.84%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>	2023-01-10 16:01:38 +00:00
Rhys Perry	6872f8d861	aco/gfx11: allow true 16-bit instructions to access v128+ It looks like the LLVM assembler promotes true 16-bit instructions to VOP3 in this case. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>	2023-01-10 16:01:38 +00:00
Rhys Perry	254b178d5b	aco: disallow SGPRS/constants with interpolation instructions https://reviews.llvm.org/D137575 The VINTRP format cannot encode anything except VGPRs. Reading VINTERPInstructions.td, looks like it's the same for GFX11. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>	2023-01-10 16:01:38 +00:00
Rhys Perry	5af891a747	aco: add more opcodes to can_use_DPP() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>	2023-01-10 16:01:38 +00:00
Rhys Perry	c3dd1931d9	aco: allow Builder::Result to be dereferenced Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>	2023-01-10 16:01:38 +00:00
Rhys Perry	e386523380	aco/gfx11: fix discard early exit removal optimization This optimization never happened because the NULL target was removed in GFX11. fossil-db (gfx1100): Totals from 5439 (4.04% of 134574) affected shaders: Instrs: 407865 -> 387123 (-5.09%) CodeSize: 2163340 -> 2060644 (-4.75%) Latency: 3432378 -> 3327802 (-3.05%) InvThroughput: 270133 -> 262980 (-2.65%) Branches: 8524 -> 3085 (-63.81%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20513>	2023-01-10 14:01:29 +00:00
Rhys Perry	810ced93f3	aco: align scratch size during assembly This lets us use less scratch if both VGPR spilling and scratch intrinsics are used. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>	2023-01-09 21:46:13 +00:00
Rhys Perry	c9846158cd	aco/gfx11: reduce scratch allocation alignment fossil-db (gfx1100): Totals from 112 (0.08% of 134574) affected shaders: Scratch: 1513472 -> 1455360 (-3.84%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>	2023-01-09 21:46:13 +00:00
Georg Lehmann	c241980751	aco: Mark more instructions as 16bit on GFX10. p_cvt_f16_f32_rtne will be lowered to v_cvt_f16_f32 and we already know that preserves the high bits. I tested the others on GFX1036. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20574>	2023-01-09 18:54:35 +00:00
Rhys Perry	b64afc1d37	aco: use s_delay_alu skip field fossil-db (gfx1100): Totals from 130066 (96.65% of 134574) affected shaders: Instrs: 80208817 -> 71420648 (-10.96%) CodeSize: 403523036 -> 368370360 (-8.71%) Latency: 658064779 -> 657935384 (-0.02%); split: -0.02%, +0.00% InvThroughput: 87698268 -> 87693326 (-0.01%); split: -0.01%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>	2023-01-09 18:22:59 +00:00
Rhys Perry	e2f083c0a7	aco: add more dependency instructions under waitcnt class This makes these instructions free when considering pipeline statistics and s_delay_alu insertion. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>	2023-01-09 18:22:59 +00:00
Rhys Perry	c8357136d4	aco: improve parse_delay_alu Use gpr_map to determine how many cycles each dependency of the s_delay_alu needs. This information helps the pass avoid further s_delay_alu instructions. fossil-db (gfx1100): Totals from 13097 (9.73% of 134574) affected shaders: Instrs: 30711894 -> 30702692 (-0.03%) CodeSize: 153462500 -> 153425692 (-0.02%) Latency: 372758612 -> 372741922 (-0.00%) InvThroughput: 50164111 -> 50160717 (-0.01%); split: -0.01%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>	2023-01-09 18:22:59 +00:00
Rhys Perry	9e55b3b790	aco/gfx11: update s_code_end padding Match ac_rtld_open(). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Cc: 22.3 <mesa-stable> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20536>	2023-01-06 16:09:51 +00:00
Georg Lehmann	39b7502f04	aco: Use v_mov_b16 on GFX11. Foz-DB GFX1100: Totals from 4684 (3.47% of 134913) affected shaders: CodeSize: 41086444 -> 41043476 (-0.10%) Instrs: 8176019 -> 8175995 (-0.00%) Latency: 83792071 -> 83792023 (-0.00%) InvThroughput: 10311371 -> 10311369 (-0.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20369>	2023-01-03 22:49:46 +00:00
Daniel Schürmann	83b31b11a5	aco: Reassign dead definitions of p_split_vector to associated register Any unused split_vector definition can always use the same register as the operand. This avoids creating unnecessary copies. Fossil DB stats on Rembrandt (RDNA2): Totals from 3904 (2.89% of 134906) affected shaders: CodeSize: 18326692 -> 18271688 (-0.30%) Instrs: 3386632 -> 3372888 (-0.41%) Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00% InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00% Copies: 224301 -> 210559 (-6.13%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:04:07 +01:00
Timur Kristóf	75b1027722	aco: Try to reassign split vector registers post-RA. Eliminate unnecessary copies when the operand registers of a p_split_vector instruction are not clobbered between the p_split_vector and the user of its definitions. This happens when p_split_vector doesn't kill its operand and its definitions have a shorter lifespan that the operand. It affects every NGG culling shader among other things. This optimization exists because it's too difficult to solve it in RA, and should be removed after we solved this in RA. v2 by Daniel Schürmann: - Rearrange and simplify conditions for the new optimization - Fix a few bugs v3 by Daniel Schürmann: - Check number of encoded ALU operands Fossil DB stats on Rembrandt (RDNA2): Totals from 64896 (48.10% of 134906) affected shaders: CodeSize: 175693348 -> 175434944 (-0.15%) Instrs: 33333912 -> 33269388 (-0.19%) Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00% InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00% Copies: 2806550 -> 2742038 (-2.30%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:04:07 +01:00
Timur Kristóf	3d29779a25	aco/optimizer_postRA: Distinguish overwritten untrackable and subdword. This allows is_overwritten_since to return false when the last writer instruction of a register can't be tracked but we know it wasn't written in the current block. Fossil DB stats on Rembrandt (RDNA2): Totals from 1163 (0.86% of 134906) affected shaders: CodeSize: 9815920 -> 9805016 (-0.11%) Instrs: 1843688 -> 1840962 (-0.15%) Latency: 19219153 -> 19209171 (-0.05%) InvThroughput: 3354375 -> 3353852 (-0.02%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:04:07 +01:00
Daniel Schürmann	d3b0f78110	aco/optimizer_postRA: Initialize loop header with preheader information This works because of SSA and should be safer than just setting 'not_written_yet'. No Fossil DB changes on Rembrandt (RDNA2). Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:03:57 +01:00
Daniel Schürmann	8f4eccb138	aco: fix reset_block_regs() in postRA-optimizer Accidentally, we picked the index of the predecessors instead of the predecessors. Totals from 8496 (6.30% of 134913) affected shaders: (GFX10.3) CodeSize: 64070724 -> 64022516 (-0.08%); split: -0.08%, +0.00% Instrs: 11932750 -> 11920698 (-0.10%); split: -0.10%, +0.00% Latency: 144040266 -> 144017062 (-0.02%); split: -0.02%, +0.00% InvThroughput: 29327735 -> 29326421 (-0.00%); split: -0.00%, +0.00% Fossil DB stats on Rembrandt (RDNA2): Totals from 4488 (3.33% of 134906) affected shaders: CodeSize: 42759736 -> 42735392 (-0.06%); split: -0.06%, +0.00% Instrs: 7960522 -> 7954436 (-0.08%); split: -0.08%, +0.00% Latency: 96192647 -> 96172571 (-0.02%); split: -0.02%, +0.00% InvThroughput: 19313576 -> 19312575 (-0.01%); split: -0.01%, +0.00% Fixes: `75967a4814` ('aco/optimizer_postRA: Speed up reset_block() with predecessors.') Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>	2023-01-01 15:03:51 +01:00
Rhys Perry	98e83f19f9	aco/gfx11: implement load_input_vertex Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20341>	2022-12-16 17:45:34 +00:00
Rhys Perry	192486b7aa	aco/gfx11: export mrtz in discard early exit for non-color shaders If a shader doesn't export any color targets and instead only exports mrtz, the discard early exit block should match. Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in The Witcher 3 (classic). https://reviews.llvm.org/D128185 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: `bc8da20dda` ("aco: export MRT0 instead of NULL on GFX11") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>	2022-12-16 15:35:28 +00:00
Timur Kristóf	db5c3f170f	aco: Emulate Wave64 bpermute on GFX11. Similar to emit_gfx10_wave64_bpermute, but uses the new v_permlane64_b32 instruction to swap data between wave halves. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Timur Kristóf	853e76f007	aco: Stylistic changes to emit_gfx10_wave64_bpermute. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Timur Kristóf	640e801651	aco: Split opcodes for GFX6 and GFX10 emulated bpermute. Different sequences are emitted for these, so it makes sense to have different opcodes too. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Timur Kristóf	614348f28b	aco: Don't accept constants on p_bpermute. The sequence emitted for this pseudo instruction is not ready to handle constants or literals at all. Cc: mesa-stable Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>	2022-12-14 13:54:04 +00:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Marek Olšák	716ac4a55d	nir: replace IS_SWIZZLED flag with ACCESS_IS_SWIZZLED_AMD Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>	2022-12-13 20:33:05 +00:00
Marek Olšák	7998c3bdd3	nir: remove redundant SLC_AMD in favor of ACCESS_STREAM_CACHE_POLICY ACCESS_STREAM_CACHE_POLICY was added to map to SLC for AMD. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>	2022-12-13 20:33:05 +00:00
Rhys Perry	20e670d060	aco/ra: don't swap create_vector operand with definition blocker for SGPRs There is no SGPR swap instruction, we always need 3 XORs. fossil-db (navi21): Totals from 76 (0.06% of 135636) affected shaders: Instrs: 58400 -> 58347 (-0.09%); split: -0.10%, +0.01% CodeSize: 312580 -> 312368 (-0.07%); split: -0.08%, +0.01% Latency: 843333 -> 843180 (-0.02%); split: -0.02%, +0.00% InvThroughput: 126431 -> 126412 (-0.02%) Copies: 4008 -> 3955 (-1.32%); split: -1.47%, +0.15% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>	2022-12-09 15:58:43 +00:00

1 2 3 4 5 ...

2296 commits