From 63659fc15c65c604a39bfd400d3fe6260af65a4b Mon Sep 17 00:00:00 2001 From: Rhys Perry Date: Wed, 28 Oct 2020 13:32:55 +0000 Subject: [PATCH] radv: use byte/word extract/insert instructions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ACO doesn't yet combine extract/insert into instructions, but it seems to already generate less instructions because NIR optimizes shift+and to these instructions. Code size is worse in some cases though because we have to always use a literal when masking. fossil-db (Sienna Cichlid): Totals from 14361 (9.58% of 149839) affected shaders: VGPRs: 850152 -> 850304 (+0.02%); split: -0.02%, +0.04% SpillSGPRs: 7979 -> 7989 (+0.13%); split: -0.03%, +0.15% CodeSize: 88031216 -> 88162520 (+0.15%); split: -0.01%, +0.16% MaxWaves: 269414 -> 269426 (+0.00%) Instrs: 16695182 -> 16662852 (-0.19%); split: -0.21%, +0.01% Latency: 375592693 -> 375544364 (-0.01%); split: -0.04%, +0.03% InvThroughput: 75627700 -> 75607720 (-0.03%); split: -0.07%, +0.04% fossil-db (Polaris): Totals from 13816 (9.13% of 151365) affected shaders: SGPRs: 984896 -> 982512 (-0.24%); split: -0.29%, +0.05% VGPRs: 809220 -> 809112 (-0.01%); split: -0.02%, +0.01% SpillSGPRs: 9181 -> 9185 (+0.04%); split: -0.04%, +0.09% CodeSize: 82017952 -> 82123484 (+0.13%); split: -0.01%, +0.14% MaxWaves: 65721 -> 65723 (+0.00%) Instrs: 16008744 -> 15988007 (-0.13%); split: -0.18%, +0.05% Latency: 439911623 -> 439869622 (-0.01%); split: -0.04%, +0.03% InvThroughput: 185898770 -> 185841742 (-0.03%); split: -0.08%, +0.05% Signed-off-by: Rhys Perry Reviewed-by: Timur Kristóf Part-of: --- src/amd/vulkan/radv_shader.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 083e0ca5620..9c3940e3ffb 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -72,10 +72,6 @@ static const struct nir_shader_compiler_options nir_options = { .lower_unpack_unorm_2x16 = true, .lower_unpack_unorm_4x8 = true, .lower_unpack_half_2x16 = true, - .lower_extract_byte = true, - .lower_extract_word = true, - .lower_insert_byte = true, - .lower_insert_word = true, .lower_ffma16 = true, .lower_ffma32 = true, .lower_ffma64 = true,