nir/opt_algebraic: optimize patterns hit with OpenCL

This patterns were all found in the AGX quads tessellator, a medium-sized OpenCL kernel. LLVM generates a lot of garbage around booleans which we need to chew through. Though there's nothing AGX or really OpenCL specific here, so some of this could help graphics shaders too. Together, their effect is significant for that kernel instr count & occupancy: before: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 384 threads after: 2848 inst, 2246 alu, 2246 fscib, 1000 ic, 22260 bytes, 231 regs, 448 threads No significant changes on GL shaderdb (a single godot shader regressed 1 instruction, 1344->1345). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Eric R. Smith <eric.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>
2026-05-05 20:28:04 +02:00 · 2024-10-27 14:49:14 -04:00 · 2024-10-27 14:49:14 -04:00 · 33299354e0
commit 33299354e0
parent fc0545e6a7
1 changed files with 10 additions and 0 deletions
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@ -1956,6 +1956,16 @@ optimizations.extend([
   (('u2u32', ('iadd(is_used_once)', 'a@64', b)),
    ('iadd', ('u2u32', a), ('u2u32', b))),

+   # Redundant trip through 8-bit
+   (('i2i16', ('u2u8', ('iand', 'a@16', 1))), ('iand', 'a@16', 1)),
+   (('u2u16', ('u2u8', ('iand', 'a@16', 1))), ('iand', 'a@16', 1)),
+
+   # Reduce 16-bit integers to 1-bit booleans, hit with OpenCL. In turn, this
+   # lets iand(b2i1(...), 1) get simplified. Backends can usually fuse iand/inot
+   # so this should be no worse when it isn't strictly better.
+   (('bcsel', a, 0, ('b2i16', 'b@1')), ('b2i16', ('iand', ('inot', a), b))),
+   (('bcsel', a, ('b2i16', 'b@1'), ('b2i16', 'c@1')), ('b2i16', ('bcsel', a, b, c))),
+
   # Lowered pack followed by lowered unpack, for the high bits
   (('u2u32', ('ushr', ('ior', ('ishl', a, 32), ('u2u64', b)), 32)), ('u2u32', a)),
   (('u2u16', ('ushr', ('ior', ('ishl', a, 16), ('u2u32', b)), 16)), ('u2u16', a)),