From 92e59d71cd21fdccc05d6fcf93036bd6a343b4c2 Mon Sep 17 00:00:00 2001
From: Kenneth Graunke <kenneth@whitecape.org>
Date: Tue, 9 Jul 2024 03:44:26 -0700
Subject: [PATCH] intel/nir: Don't needlessly split u2f16 for nir_type_uint32

Commit f695a9fed2b0 moved the 64-bit float <-> 16-bit float conversion
splitting into a core NIR pass, so the code remaining here is only
needed for 64-bit integer types.

Presumably in an attempt to remove the float handling, it replaced
simple bit_size == 64 checks with this expression:

   (full_type & (nir_type_int64 | nir_type_uint64))

I believe that the intended expression was:

   (full_type == nir_type_int64 || full_type == nir_type_uint64)

Unfortunately, the former is incorrect.  Any integer or unsigned
NIR type would trigger the former expression.  For example:

   nir_type_uint32 & (nir_type_int64 | nir_type_uint64) => nir_type_uint

This meant that we were splitting e.g. u2f16 on 32-bit unsigned types
into u2f32 and f2f16, when we can easily natively handle that case.

To fix this, we go back to simple bit_size == 64 checks.  This pass is
already run after nir_lower_fp16_casts which will split the float case,
so we will never see it here.

fossil-db on Alchemist shows a -1.14% reduction in affected shaders for
google-meet-clvk shaders.  In another ChromeOS workload, it improves
performance by around 8% on Meteorlake.

Thanks to Sushma Venkatesh Reddy for finding this performance issue!

Fixes: f695a9fed2b0 ("intel/compiler: use nir_lower_fp16_casts")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30091>
(cherry picked from commit 837c441acb8a16cdfde9725eed35825cc1e56b1e)
---
 .pick_status.json                                | 2 +-
 src/intel/compiler/intel_nir_lower_conversions.c | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/.pick_status.json b/.pick_status.json
index 406eb063dff..a267826a5f3 100644
--- a/.pick_status.json
+++ b/.pick_status.json
@@ -784,7 +784,7 @@
         "description": "intel/nir: Don't needlessly split u2f16 for nir_type_uint32",
         "nominated": true,
         "nomination_type": 1,
-        "resolution": 0,
+        "resolution": 1,
         "main_sha": null,
         "because_sha": "f695a9fed2b0eac39acdaca132f5fc8c43c9f7d7",
         "notes": null
diff --git a/src/intel/compiler/intel_nir_lower_conversions.c b/src/intel/compiler/intel_nir_lower_conversions.c
index e0dde853349..ba8af15babb 100644
--- a/src/intel/compiler/intel_nir_lower_conversions.c
+++ b/src/intel/compiler/intel_nir_lower_conversions.c
@@ -60,9 +60,8 @@ lower_alu_instr(nir_builder *b, nir_alu_instr *alu)
     * 32-bit float type so we don't lose range when we convert from
     * a 64-bit integer.
     */
-   unsigned int64_types = nir_type_int64 | nir_type_uint64;
-   if ((src_full_type == nir_type_float16 && (dst_full_type & int64_types)) ||
-       ((src_full_type & int64_types) && dst_full_type == nir_type_float16)) {
+   if ((src_full_type == nir_type_float16 && dst_bit_size == 64) ||
+       (src_bit_size == 64 && dst_full_type == nir_type_float16)) {
       split_conversion(b, alu, src_type, nir_type_float | 32,
                        dst_type | dst_bit_size);
       return true;