From 4dff3ff005b47befd3e4a903b08d5b4bdbef6ae3 Mon Sep 17 00:00:00 2001
From: Georg Lehmann <dadschoorse@gmail.com>
Date: Thu, 29 Sep 2022 17:36:55 +0200
Subject: [PATCH] nir/opt_algebraic: Optimize open coded bfm.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Foz-DB Navi21:
Totals from 1553 (1.15% of 134913) affected shaders:
SpillVGPRs: 2246 -> 2223 (-1.02%); split: -1.42%, +0.40%
CodeSize: 10409156 -> 10410720 (+0.02%); split: -0.03%, +0.04%
Instrs: 1899725 -> 1898773 (-0.05%); split: -0.07%, +0.02%
Latency: 71225814 -> 71118314 (-0.15%); split: -0.21%, +0.06%
InvThroughput: 13384926 -> 13330369 (-0.41%); split: -0.47%, +0.06%
VClause: 38309 -> 38284 (-0.07%); split: -0.17%, +0.11%
SClause: 70743 -> 70706 (-0.05%)
Copies: 167296 -> 167230 (-0.04%); split: -0.28%, +0.24%
Branches: 42446 -> 42444 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 95191 -> 95188 (-0.00%)

Some minor instructions count regressions in parallel-rdp
because v_bfm_b32 can't use SDWA, but overall an improvement.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18887>
---
 src/compiler/nir/nir_opt_algebraic.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
index 86c01f49ee8..1f9b0915078 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -1924,6 +1924,10 @@ optimizations.extend([
    (('bfm', ('umin', 'width', ('iadd', 32, ('ineg', ('iand', 31, 'offset')))), 'offset'),
     ('bfm', 'width', 'offset')),
 
+   # open-coded BFM
+   (('iadd@32', ('ishl', 1, a), -1), ('bfm', a, 0), 'options->lower_bitfield_insert_to_bitfield_select || options->lower_bitfield_insert'),
+   (('ishl', ('bfm', a, 0), b), ('bfm', a, b)),
+
    # Section 8.8 (Integer Functions) of the GLSL 4.60 spec says:
    #
    #    If bits is zero, the result will be zero.