From 92e59d71cd21fdccc05d6fcf93036bd6a343b4c2 Mon Sep 17 00:00:00 2001 From: Kenneth Graunke Date: Tue, 9 Jul 2024 03:44:26 -0700 Subject: [PATCH] intel/nir: Don't needlessly split u2f16 for nir_type_uint32 Commit f695a9fed2b0 moved the 64-bit float <-> 16-bit float conversion splitting into a core NIR pass, so the code remaining here is only needed for 64-bit integer types. Presumably in an attempt to remove the float handling, it replaced simple bit_size == 64 checks with this expression: (full_type & (nir_type_int64 | nir_type_uint64)) I believe that the intended expression was: (full_type == nir_type_int64 || full_type == nir_type_uint64) Unfortunately, the former is incorrect. Any integer or unsigned NIR type would trigger the former expression. For example: nir_type_uint32 & (nir_type_int64 | nir_type_uint64) => nir_type_uint This meant that we were splitting e.g. u2f16 on 32-bit unsigned types into u2f32 and f2f16, when we can easily natively handle that case. To fix this, we go back to simple bit_size == 64 checks. This pass is already run after nir_lower_fp16_casts which will split the float case, so we will never see it here. fossil-db on Alchemist shows a -1.14% reduction in affected shaders for google-meet-clvk shaders. In another ChromeOS workload, it improves performance by around 8% on Meteorlake. Thanks to Sushma Venkatesh Reddy for finding this performance issue! Fixes: f695a9fed2b0 ("intel/compiler: use nir_lower_fp16_casts") Reviewed-by: Ian Romanick Reviewed-by: Ivan Briano Part-of: (cherry picked from commit 837c441acb8a16cdfde9725eed35825cc1e56b1e) --- .pick_status.json | 2 +- src/intel/compiler/intel_nir_lower_conversions.c | 5 ++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/.pick_status.json b/.pick_status.json index 406eb063dff..a267826a5f3 100644 --- a/.pick_status.json +++ b/.pick_status.json @@ -784,7 +784,7 @@ "description": "intel/nir: Don't needlessly split u2f16 for nir_type_uint32", "nominated": true, "nomination_type": 1, - "resolution": 0, + "resolution": 1, "main_sha": null, "because_sha": "f695a9fed2b0eac39acdaca132f5fc8c43c9f7d7", "notes": null diff --git a/src/intel/compiler/intel_nir_lower_conversions.c b/src/intel/compiler/intel_nir_lower_conversions.c index e0dde853349..ba8af15babb 100644 --- a/src/intel/compiler/intel_nir_lower_conversions.c +++ b/src/intel/compiler/intel_nir_lower_conversions.c @@ -60,9 +60,8 @@ lower_alu_instr(nir_builder *b, nir_alu_instr *alu) * 32-bit float type so we don't lose range when we convert from * a 64-bit integer. */ - unsigned int64_types = nir_type_int64 | nir_type_uint64; - if ((src_full_type == nir_type_float16 && (dst_full_type & int64_types)) || - ((src_full_type & int64_types) && dst_full_type == nir_type_float16)) { + if ((src_full_type == nir_type_float16 && dst_bit_size == 64) || + (src_bit_size == 64 && dst_full_type == nir_type_float16)) { split_conversion(b, alu, src_type, nir_type_float | 32, dst_type | dst_bit_size); return true;