nir/opt_algebraic: optimize patterns hit with OpenCL

This patterns were all found in the AGX quads tessellator, a medium-sized OpenCL
kernel. LLVM generates a lot of garbage around booleans which we need to chew
through. Though there's nothing AGX or really OpenCL specific here, so some of
this could help graphics shaders too.

Together, their effect is significant for that kernel instr count & occupancy:

before: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 384 threads
after:  2848 inst, 2246 alu, 2246 fscib, 1000 ic, 22260 bytes, 231 regs, 448 threads

No significant changes on GL shaderdb (a single godot shader regressed 1
instruction, 1344->1345).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>
This commit is contained in:
Alyssa Rosenzweig 2024-10-27 14:49:14 -04:00 committed by Marge Bot
parent fc0545e6a7
commit 33299354e0

View file

@ -1956,6 +1956,16 @@ optimizations.extend([
(('u2u32', ('iadd(is_used_once)', 'a@64', b)),
('iadd', ('u2u32', a), ('u2u32', b))),
# Redundant trip through 8-bit
(('i2i16', ('u2u8', ('iand', 'a@16', 1))), ('iand', 'a@16', 1)),
(('u2u16', ('u2u8', ('iand', 'a@16', 1))), ('iand', 'a@16', 1)),
# Reduce 16-bit integers to 1-bit booleans, hit with OpenCL. In turn, this
# lets iand(b2i1(...), 1) get simplified. Backends can usually fuse iand/inot
# so this should be no worse when it isn't strictly better.
(('bcsel', a, 0, ('b2i16', 'b@1')), ('b2i16', ('iand', ('inot', a), b))),
(('bcsel', a, ('b2i16', 'b@1'), ('b2i16', 'c@1')), ('b2i16', ('bcsel', a, b, c))),
# Lowered pack followed by lowered unpack, for the high bits
(('u2u32', ('ushr', ('ior', ('ishl', a, 32), ('u2u64', b)), 32)), ('u2u32', a)),
(('u2u16', ('ushr', ('ior', ('ishl', a, 16), ('u2u32', b)), 16)), ('u2u16', a)),