Karol Herbst
a9b18f8607
nir: rename ffma to ffma_old
...
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165 >
2026-05-19 18:13:27 +00:00
Daniel Schürmann
0832f3251c
nir/opt_algebraic: extend some extract_u8 pattern to extract_i8
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
and remove some duplicate extract pattern.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41385 >
2026-05-09 21:23:40 +00:00
Daniel Schürmann
9895b5e5da
nir/opt_algebraic: optimize downcast followed by upcast to extract
...
Totals from 217 (0.10% of 208640) affected shaders: (Navi48)
Instrs: 283561 -> 282870 (-0.24%)
CodeSize: 1604864 -> 1601136 (-0.23%); split: -0.24%, +0.01%
Latency: 2992301 -> 2990107 (-0.07%); split: -0.09%, +0.02%
InvThroughput: 602722 -> 601316 (-0.23%); split: -0.23%, +0.00%
Copies: 26490 -> 26471 (-0.07%); split: -0.10%, +0.03%
VALU: 147735 -> 147176 (-0.38%)
SALU: 51545 -> 51541 (-0.01%)
VOPD: 11140 -> 11204 (+0.57%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41385 >
2026-05-09 21:23:40 +00:00
Georg Lehmann
1716cbff37
nir,amd: reassociate fadd to create more fma/mad
...
ACO's backend fusing is quite competent, but it cannot reorder adds.
This adds a simple algebraic pass to do that for us.
Foz-DB Navi10:
Totals from 13568 (18.76% of 72319) affected shaders:
MaxWaves: 304722 -> 304004 (-0.24%); split: +0.10%, -0.33%
Instrs: 15084252 -> 14993010 (-0.60%); split: -0.61%, +0.00%
CodeSize: 81480188 -> 81372600 (-0.13%); split: -0.17%, +0.04%
VGPRs: 741580 -> 743680 (+0.28%); split: -0.10%, +0.38%
SpillSGPRs: 9418 -> 9434 (+0.17%)
Latency: 154602014 -> 154312940 (-0.19%); split: -0.29%, +0.10%
InvThroughput: 44628554 -> 44442595 (-0.42%); split: -0.47%, +0.05%
VClause: 300035 -> 300054 (+0.01%); split: -0.31%, +0.31%
SClause: 370992 -> 370640 (-0.09%); split: -0.15%, +0.06%
Copies: 1162401 -> 1162800 (+0.03%); split: -0.30%, +0.33%
Branches: 300646 -> 300654 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 673675 -> 675057 (+0.21%); split: -0.00%, +0.21%
PreVGPRs: 633017 -> 634768 (+0.28%); split: -0.29%, +0.57%
VALU: 10800351 -> 10712041 (-0.82%); split: -0.82%, +0.00%
SALU: 1752917 -> 1753203 (+0.02%); split: -0.04%, +0.06%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41348 >
2026-05-08 11:49:43 +00:00
Georg Lehmann
52b195b4e8
nir/opt_algebraic: add more fmulz pattern
...
Totals from 3 (0.00% of 202440) affected shaders: (Navi48)
Instrs: 5684 -> 5641 (-0.76%); split: -0.77%, +0.02%
CodeSize: 30952 -> 30708 (-0.79%); split: -0.80%, +0.01%
Latency: 9236 -> 9199 (-0.40%); split: -0.42%, +0.02%
InvThroughput: 2287 -> 2273 (-0.61%)
VALU: 3900 -> 3884 (-0.41%)
SALU: 305 -> 289 (-5.25%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848 >
2026-05-04 09:42:59 +00:00
Daniel Schürmann
012d72f2b0
nir/opt_algebraic: add some imul24_relaxed pattern
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41178 >
2026-05-01 10:07:26 +00:00
Daniel Schürmann
708093d830
nir/opt_algebraic: use imul24_relaxed for lowered dot4x8_add
...
Totals from 28 (0.04% of 72819) affected shaders: (Navi10)
MaxWaves: 181 -> 186 (+2.76%)
Instrs: 406735 -> 338360 (-16.81%)
CodeSize: 2913588 -> 2469712 (-15.23%)
VGPRs: 5520 -> 5468 (-0.94%)
SpillVGPRs: 32 -> 0 (-inf%)
LDS: 64512 -> 62464 (-3.17%)
Scratch: 10240 -> 0 (-inf%)
Latency: 11028252 -> 4357120 (-60.49%)
InvThroughput: 11004126 -> 4079018 (-62.93%)
VClause: 1686 -> 2055 (+21.89%); split: -0.89%, +22.78%
SClause: 890 -> 852 (-4.27%)
Copies: 4516 -> 2644 (-41.45%); split: -41.59%, +0.13%
PreSGPRs: 982 -> 974 (-0.81%)
PreVGPRs: 5356 -> 4284 (-20.01%)
VALU: 370529 -> 330201 (-10.88%)
SALU: 28850 -> 1170 (-95.94%)
VMEM: 2616 -> 2560 (-2.14%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41178 >
2026-05-01 10:07:25 +00:00
Lorenzo Rossi
2a7d817591
nir/opt_algebraic: optimize fadd/fmul with 16-bit source and constant
...
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096 >
2026-04-30 17:33:09 +00:00
Karol Herbst
4e67582ddf
nir: add fmul_rtz optimizations
...
NVK is only going to use it for `fmul_rtz(frcp(ipa), ipa)` patterns, so
try not too hard to optimize this.
Totals from 10 (0.00% of 1212873) affected shaders:
CodeSize: 34480 -> 34288 (-0.56%); split: -0.60%, +0.05%
Static cycle count: 6225 -> 6132 (-1.49%); split: -1.57%, +0.08%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179 >
2026-04-30 15:42:40 +00:00
Alyssa Rosenzweig
6a43e6c9e0
nir/opt_algebraic: add redundant u2u32/unpack_64_2x32_split_x patterns
...
reduces hello world kernel 57 -> 44 inst on jay. why do we have two opcodes that
do literally the same thing? :/
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41085 >
2026-04-23 19:54:21 +00:00
Brandon Jones
d1dd65d425
nir/opt_algebraic: fix fabs optimization
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes a regression found in blender's unit testing, which called
fabs(-0.0) and invoked an NIR optimization that is was not valid for
the parameter -0.0. IEEE 754 requires that abs clear the sign bit
for the value -0.0.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060 >
2026-04-21 04:10:29 +00:00
Georg Lehmann
0066328cf1
nir/opt_algebraic: create more 64bit bit test
...
Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 3429 -> 3425 (-0.12%)
CodeSize: 19580 -> 19568 (-0.06%)
Latency: 13629 -> 13628 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 1853 -> 1847 (-0.32%)
Copies: 235 -> 237 (+0.85%)
VALU: 1901 -> 1898 (-0.16%)
SALU: 381 -> 380 (-0.26%)
VOPD: 307 -> 309 (+0.65%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705 >
2026-04-08 08:44:20 +00:00
Rhys Perry
b30c0d8264
nir/algebraic: optimize exact f2u32(fmul(unpack_norm))
...
fossil-db (navi21):
Totals from 16 (0.01% of 202427) affected shaders:
Instrs: 17730 -> 17226 (-2.84%)
CodeSize: 97500 -> 95708 (-1.84%)
InvThroughput: 44437 -> 44419 (-0.04%)
Copies: 1502 -> 1446 (-3.73%)
VALU: 9973 -> 9525 (-4.49%)
SALU: 3509 -> 3453 (-1.60%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740 >
2026-04-08 07:10:26 +00:00
Rhys Perry
f52dace6e8
nir/tests: fix NaN/inf checks in skip_test()
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740 >
2026-04-08 07:10:26 +00:00
Georg Lehmann
8730c039bf
nir/opt_algebraic: move some lower_lerp patterns
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
If we rely on the pattern order here, we don't need to duplicate per bit size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40730 >
2026-04-04 10:44:09 +00:00
Georg Lehmann
f192fe99eb
nir/opt_algebraic: update open coded flerp(..., b2f(c)) to bcsel patterns
...
We remove 1.0 - b2f(a) since 6a662a59b7 .
Foz-DB Navi48:
Totals from 200 (0.10% of 205032) affected shaders:
Instrs: 410309 -> 409750 (-0.14%); split: -0.18%, +0.05%
CodeSize: 2140424 -> 2136956 (-0.16%); split: -0.21%, +0.05%
Latency: 5834394 -> 5834042 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 906879 -> 906374 (-0.06%); split: -0.06%, +0.01%
VClause: 8247 -> 8244 (-0.04%)
SClause: 7721 -> 7723 (+0.03%); split: -0.03%, +0.05%
Copies: 20515 -> 20487 (-0.14%); split: -0.29%, +0.16%
PreVGPRs: 14510 -> 14481 (-0.20%)
VALU: 228703 -> 228235 (-0.20%); split: -0.28%, +0.07%
SALU: 62832 -> 62914 (+0.13%); split: -0.18%, +0.31%
VOPD: 929 -> 927 (-0.22%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40730 >
2026-04-04 10:44:09 +00:00
Georg Lehmann
eff9f00533
nir/search: remove matching variable type
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Now unused, and if you really need it use a search helper.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40713 >
2026-04-01 09:52:45 +00:00
Georg Lehmann
5b1405dcbf
nir/opt_algebraic: remove a few non 1bit bool patterns
...
We almost exclusive optimize 1bit booleans nowadays,
so I think these shouldn't be needed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40713 >
2026-04-01 09:52:45 +00:00
Karol Herbst
5bb3c9f69c
nir: rename fsin_amd and fcos_amd to a more generic name
...
Nvidia implements both the same way as AMD does, so it makes sense to
allow for code sharing here.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40541 >
2026-03-31 01:47:29 +02:00
Georg Lehmann
643dd510d4
nir/opt_algebraic: optimize b2f(a) * b
...
When the multiplication is only used by fadd, it's not a clear win
because of potential fma fusion.
Totals from 8015 (6.99% of 114655) affected shaders:
MaxWaves: 199394 -> 199466 (+0.04%); split: +0.04%, -0.01%
Instrs: 17461518 -> 17451076 (-0.06%); split: -0.10%, +0.04%
CodeSize: 94779552 -> 94769828 (-0.01%); split: -0.07%, +0.06%
VGPRs: 526012 -> 525532 (-0.09%); split: -0.10%, +0.01%
SpillSGPRs: 12466 -> 12517 (+0.41%); split: -0.09%, +0.50%
Latency: 191274766 -> 191297394 (+0.01%); split: -0.03%, +0.04%
InvThroughput: 31465968 -> 31456785 (-0.03%); split: -0.07%, +0.04%
VClause: 312081 -> 312073 (-0.00%); split: -0.10%, +0.09%
SClause: 366914 -> 366906 (-0.00%); split: -0.02%, +0.01%
Copies: 1222482 -> 1221933 (-0.04%); split: -0.20%, +0.15%
Branches: 376651 -> 376577 (-0.02%); split: -0.03%, +0.01%
PreSGPRs: 442974 -> 443240 (+0.06%); split: -0.01%, +0.07%
PreVGPRs: 415964 -> 415668 (-0.07%); split: -0.09%, +0.02%
VALU: 9403517 -> 9393916 (-0.10%); split: -0.12%, +0.02%
SALU: 2799420 -> 2800430 (+0.04%); split: -0.13%, +0.16%
VOPD: 472826 -> 472347 (-0.10%); split: +0.09%, -0.19%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399 >
2026-03-20 08:50:41 +00:00
Georg Lehmann
d2b37b667e
nir/opt_algebraic: optimize more fmulz(1.0, a) remains
...
If dxvk's opencoded fmulz gets partially constant folded,
it leaves this mess behind.
It's important to do this before the more general fmul+b2f patterns added
in the next commit, because they change the signed zero behavior in a way
that can't be optimized back.
Foz-DB Navi48:
Totals from 36 (0.03% of 114655) affected shaders:
Instrs: 16513 -> 15706 (-4.89%)
CodeSize: 99756 -> 95760 (-4.01%)
Latency: 45165 -> 44151 (-2.25%)
InvThroughput: 8344 -> 7886 (-5.49%)
VClause: 395 -> 401 (+1.52%)
Copies: 639 -> 634 (-0.78%)
PreSGPRs: 1158 -> 1154 (-0.35%)
PreVGPRs: 1227 -> 1225 (-0.16%)
VALU: 11310 -> 10769 (-4.78%)
SALU: 813 -> 809 (-0.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399 >
2026-03-20 08:50:41 +00:00
Georg Lehmann
b96c42c916
nir/opt_algebraic: optimize more near useless bcsel
...
Foz-DB Navi48:
Totals from 327 (0.29% of 114655) affected shaders:
Instrs: 732971 -> 731642 (-0.18%); split: -0.19%, +0.01%
CodeSize: 3696020 -> 3689824 (-0.17%); split: -0.17%, +0.00%
Latency: 4405319 -> 4403413 (-0.04%); split: -0.06%, +0.01%
InvThroughput: 650209 -> 649659 (-0.08%); split: -0.10%, +0.01%
Copies: 53872 -> 53736 (-0.25%); split: -0.27%, +0.02%
Branches: 15598 -> 15571 (-0.17%)
VALU: 262391 -> 261969 (-0.16%)
SALU: 268112 -> 267699 (-0.15%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399 >
2026-03-20 08:50:41 +00:00
Georg Lehmann
6cfe6eaa79
nir/opt_algebraic: create ldexp from exp2
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
ldexp uses the full width VALU path, exp2 the transcendental SIMD8.
Foz-DB Navi21:
Totals from 729 (0.64% of 114627) affected shaders:
MaxWaves: 20071 -> 20103 (+0.16%); split: +0.18%, -0.02%
Instrs: 869129 -> 867654 (-0.17%); split: -0.17%, +0.00%
CodeSize: 4709000 -> 4708460 (-0.01%); split: -0.02%, +0.00%
VGPRs: 31184 -> 31128 (-0.18%); split: -0.23%, +0.05%
Latency: 7610726 -> 7597238 (-0.18%); split: -0.18%, +0.00%
InvThroughput: 1822323 -> 1819815 (-0.14%); split: -0.14%, +0.00%
VClause: 22494 -> 22493 (-0.00%); split: -0.03%, +0.02%
SClause: 20520 -> 20509 (-0.05%)
Copies: 72025 -> 72024 (-0.00%); split: -0.01%, +0.01%
Branches: 22028 -> 22029 (+0.00%)
PreVGPRs: 21601 -> 21602 (+0.00%)
VALU: 604821 -> 603339 (-0.25%); split: -0.25%, +0.00%
SALU: 114258 -> 114262 (+0.00%); split: -0.00%, +0.01%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33900 >
2026-03-20 08:15:08 +00:00
Georg Lehmann
ec331cc48a
nir: replace lower_ldexp with has_ldexp
...
I can be bothered to fix all the backends that don't set lower_ldexp,
and only two backends have ldexp anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33900 >
2026-03-20 08:15:08 +00:00
Georg Lehmann
98ff0a394a
nir/opt_algebraic: move some fsat patterns next to the other fsat patterns
...
I almost missed that they already exist multiple times.
No Foz-DB chagnes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389 >
2026-03-16 13:03:50 +00:00
Georg Lehmann
607f26814f
nir/opt_algebraic: remove manual patterns that optimizes flt([0.0, 1.0], 0.0)
...
Range analysis can figure this out.
No Foz-DB changes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389 >
2026-03-16 13:03:50 +00:00
Georg Lehmann
530bb4278c
nir/opt_algebraic: remove manual pattern that removes fmax(..., 0.0)
...
Range analysis will figure this out.
No Foz-DB changes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389 >
2026-03-16 13:03:50 +00:00
Georg Lehmann
4d176c8ea5
nir/opt_algebraic: turn fabs(a) into fneg(a) if a is not positive
...
fneg is usually more optimizable.
Foz-DB Navi48:
Totals from 214 (0.19% of 114655) affected shaders:
Instrs: 694279 -> 694155 (-0.02%); split: -0.02%, +0.00%
CodeSize: 3749268 -> 3748024 (-0.03%); split: -0.03%, +0.00%
VGPRs: 18252 -> 18264 (+0.07%)
Latency: 5453691 -> 5453503 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1024436 -> 1024314 (-0.01%); split: -0.01%, +0.00%
VALU: 453136 -> 453041 (-0.02%); split: -0.02%, +0.00%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389 >
2026-03-16 13:03:50 +00:00
Georg Lehmann
d77c2a1ece
nir/opt_algebraic: take advantage of range helpers including nnan
...
No Foz-DB changes.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389 >
2026-03-16 13:03:49 +00:00
Georg Lehmann
aad2b9bfc7
nir/opt_algebraic: be more strict when optimizing fcmp(a + #b, #c)
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291 >
2026-03-13 07:13:10 +00:00
Georg Lehmann
624313d35d
nir/opt_algebraic: lower ninf fisfinite correctly
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291 >
2026-03-13 07:13:09 +00:00
Georg Lehmann
aa831b6690
nir/opt_algebraic: skip more redundant alignment iand
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Useful for smaller/larger loads. Also there is no reason to be bitsize
specific here if we use an signed constant.
Foz-DB Navi48:
Totals from 8 (0.01% of 114655) affected shaders:
Instrs: 7629 -> 7612 (-0.22%)
CodeSize: 40772 -> 40692 (-0.20%)
Latency: 54880 -> 54944 (+0.12%)
InvThroughput: 8879 -> 8880 (+0.01%); split: -0.08%, +0.09%
VALU: 4029 -> 4027 (-0.05%); split: -0.15%, +0.10%
SALU: 1260 -> 1249 (-0.87%)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40292 >
2026-03-10 06:57:50 +00:00
Georg Lehmann
6936282bd3
nir/opt_algebraic: remove min(a, >= 1.0) before fsat
...
Foz-DB Navi48:
Totals from 86 (0.08% of 114655) affected shaders:
Instrs: 217553 -> 217408 (-0.07%); split: -0.07%, +0.01%
CodeSize: 1159992 -> 1159380 (-0.05%); split: -0.06%, +0.01%
Latency: 1657600 -> 1657533 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 203205 -> 203178 (-0.01%); split: -0.02%, +0.00%
SClause: 5245 -> 5244 (-0.02%)
Copies: 13726 -> 13716 (-0.07%); split: -0.14%, +0.07%
VALU: 130151 -> 130039 (-0.09%); split: -0.09%, +0.00%
SALU: 26476 -> 26474 (-0.01%); split: -0.02%, +0.01%
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40281 >
2026-03-09 21:11:25 +00:00
Georg Lehmann
108a4d4341
nir: create more fsat using range analysis
...
Foz-DB Navi48:
Totals from 5922 (5.17% of 114655) affected shaders:
Instrs: 5188307 -> 5184193 (-0.08%); split: -0.09%, +0.01%
CodeSize: 27852544 -> 27843252 (-0.03%); split: -0.05%, +0.01%
Latency: 28723967 -> 28714268 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 4745002 -> 4742298 (-0.06%); split: -0.07%, +0.01%
VClause: 68649 -> 68650 (+0.00%)
SClause: 103932 -> 103917 (-0.01%); split: -0.02%, +0.00%
Copies: 244683 -> 244706 (+0.01%); split: -0.01%, +0.02%
PreSGPRs: 272361 -> 272362 (+0.00%); split: -0.00%, +0.00%
VALU: 3248960 -> 3245520 (-0.11%); split: -0.11%, +0.00%
SALU: 516784 -> 516796 (+0.00%); split: -0.01%, +0.01%
VOPD: 8910 -> 8895 (-0.17%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40281 >
2026-03-09 21:11:25 +00:00
Georg Lehmann
4885e5cf3a
nir: remove more fsat using range analysis
...
Foz-DB Navi48:
Totals from 3018 (3.65% of 82636) affected shaders:
MaxWaves: 69274 -> 69280 (+0.01%)
Instrs: 7165414 -> 7157581 (-0.11%); split: -0.12%, +0.01%
CodeSize: 38890212 -> 38823132 (-0.17%); split: -0.18%, +0.00%
VGPRs: 228672 -> 228624 (-0.02%)
Latency: 64789026 -> 64784877 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 11805156 -> 11802642 (-0.02%); split: -0.02%, +0.00%
VClause: 136900 -> 136886 (-0.01%); split: -0.03%, +0.02%
SClause: 150135 -> 150130 (-0.00%); split: -0.01%, +0.01%
Copies: 574690 -> 574894 (+0.04%); split: -0.03%, +0.06%
Branches: 187169 -> 187086 (-0.04%); split: -0.04%, +0.00%
PreSGPRs: 190074 -> 190067 (-0.00%); split: -0.00%, +0.00%
PreVGPRs: 189564 -> 189538 (-0.01%); split: -0.02%, +0.00%
VALU: 3955188 -> 3949411 (-0.15%); split: -0.15%, +0.00%
SALU: 1114659 -> 1114729 (+0.01%); split: -0.02%, +0.03%
SMEM: 231080 -> 231077 (-0.00%); split: -0.00%, +0.00%
VOPD: 116150 -> 116180 (+0.03%); split: +0.04%, -0.02%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987 >
2026-03-07 05:01:45 +00:00
Georg Lehmann
506bb5a609
nir/search_helpers: use fp class analysis more
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987 >
2026-03-07 05:01:45 +00:00
Georg Lehmann
32b5719a9f
nir/opt_algebraic: add is_not_uint_zero for b2i16(uge) pattern
...
More fallout from f2a59fdea6 .
is_not_zero now always returns whether the result is a floating point zero.
When combined with the fp denorm handling that will be added to
floating point range analysis, this is false for many sensible integer values.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987 >
2026-03-07 05:01:44 +00:00
Georg Lehmann
ab773fc5d4
nir/opt_algebraic: fix frsq clamp pattern
...
This is not NaN correct.
And also make the pattern 32bit only because the constant is hard coded
FLT_MAX.
Fixes: 780b5c1037 ("nir/algebraic: Simplify some Inf and NaN avoidance code")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987 >
2026-03-07 05:01:42 +00:00
Georg Lehmann
ba30de1f97
nir/opt_algebraic: remove pattern that skips iabs with range analysis
...
Fixes: f2a59fdea6 ("nir: remove non float nir_analyse_range support")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987 >
2026-03-07 05:01:41 +00:00
Caio Oliveira
da57fbfb07
nir: Fix constant folding for iadd_sat
...
Use INT_MIN instead of INT_MAX for underflow.
Fixes: cc4b50b023 ("nir/opcodes: use u_overflow to fix incorrect checks")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40252 >
2026-03-06 22:26:07 +00:00
Georg Lehmann
6a218e346d
nir: remove lower_vector_cmp
...
Use nir_lower_alu_width or nir_lower_alu_to_scalar instead.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40197 >
2026-03-04 19:50:28 +00:00
Georg Lehmann
3e6e1e213c
nir: remove fall_equal/fany_nequal opcodes
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40197 >
2026-03-04 19:50:27 +00:00
Georg Lehmann
7194dfcc2c
nir/opt_algebraic: optimize b2i(a) * b to bcsel
...
Foz-DB Navi48:
Totals from 3180 (2.77% of 114655) affected shaders:
MaxWaves: 85526 -> 85446 (-0.09%)
Instrs: 2681446 -> 2678641 (-0.10%); split: -0.17%, +0.07%
CodeSize: 14295536 -> 14284628 (-0.08%); split: -0.13%, +0.05%
VGPRs: 174792 -> 174636 (-0.09%); split: -0.16%, +0.07%
SpillSGPRs: 306 -> 308 (+0.65%)
Latency: 14078973 -> 14070122 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 2774242 -> 2764051 (-0.37%); split: -0.37%, +0.00%
VClause: 41744 -> 41734 (-0.02%); split: -0.10%, +0.07%
SClause: 58176 -> 58154 (-0.04%); split: -0.05%, +0.01%
Copies: 222967 -> 223108 (+0.06%); split: -0.14%, +0.20%
Branches: 57317 -> 57322 (+0.01%)
PreSGPRs: 140454 -> 140451 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 131649 -> 131540 (-0.08%); split: -0.09%, +0.01%
VALU: 1509318 -> 1505443 (-0.26%); split: -0.26%, +0.00%
SALU: 384419 -> 385838 (+0.37%); split: -0.01%, +0.38%
VOPD: 13272 -> 13286 (+0.11%); split: +0.14%, -0.03%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40160 >
2026-03-02 15:58:30 +00:00
Georg Lehmann
3d304d5647
nir/opt_algebraic: remove is_used_once on outer instruction
...
This just prevents useful optimizations.
is_used_once only makes sense on inner instructions, to prevent
creating more new instructions than will be removed.
Foz-DB Navi48:
Totals from 16989 (14.82% of 114655) affected shaders:
MaxWaves: 434379 -> 434353 (-0.01%); split: +0.01%, -0.01%
Instrs: 29030794 -> 29022514 (-0.03%); split: -0.07%, +0.04%
CodeSize: 155293092 -> 155262816 (-0.02%); split: -0.05%, +0.03%
VGPRs: 1093980 -> 1094088 (+0.01%); split: -0.01%, +0.02%
SpillSGPRs: 9801 -> 9803 (+0.02%); split: -0.03%, +0.05%
Latency: 356327270 -> 356283384 (-0.01%); split: -0.03%, +0.02%
InvThroughput: 58239439 -> 58229374 (-0.02%); split: -0.03%, +0.01%
VClause: 451716 -> 451815 (+0.02%); split: -0.07%, +0.09%
SClause: 654614 -> 654556 (-0.01%); split: -0.03%, +0.03%
Copies: 1809805 -> 1809297 (-0.03%); split: -0.20%, +0.17%
Branches: 552382 -> 552384 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 947188 -> 947224 (+0.00%); split: -0.01%, +0.02%
PreVGPRs: 879583 -> 880173 (+0.07%); split: -0.01%, +0.08%
VALU: 16317859 -> 16309975 (-0.05%); split: -0.07%, +0.02%
SALU: 4256121 -> 4259315 (+0.08%); split: -0.05%, +0.12%
SMEM: 1067069 -> 1067070 (+0.00%)
VOPD: 440855 -> 440792 (-0.01%); split: +0.05%, -0.07%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00
Georg Lehmann
41878e5714
nir_opt_algebraic: remove unneeded is_not_const
...
These were needed when we didn't constant fold inside nir_search,
to prevent infinite loops.
But now all they do is slow down pattern matching.
Foz-DB Navi48:
Totals from 107 (0.09% of 114655) affected shaders:
Instrs: 162439 -> 162481 (+0.03%); split: -0.01%, +0.03%
CodeSize: 943056 -> 942988 (-0.01%); split: -0.03%, +0.02%
Latency: 971667 -> 970865 (-0.08%); split: -0.09%, +0.00%
InvThroughput: 164452 -> 164521 (+0.04%); split: -0.02%, +0.07%
Copies: 7980 -> 7982 (+0.03%)
VALU: 103572 -> 103566 (-0.01%); split: -0.05%, +0.04%
SALU: 12825 -> 12878 (+0.41%)
VOPD: 5235 -> 5190 (-0.86%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00
Georg Lehmann
374cbc17a4
nir_opt_algebraic: reassociate fadd into ffma where one factor is a constant
...
This restriction doesn't really make sense, probably an accident.
Foz-DB Navi48:
Totals from 2290 (2.00% of 114655) affected shaders:
MaxWaves: 57496 -> 57510 (+0.02%); split: +0.06%, -0.03%
Instrs: 2817419 -> 2816209 (-0.04%); split: -0.12%, +0.08%
CodeSize: 15218816 -> 15220576 (+0.01%); split: -0.09%, +0.10%
VGPRs: 147456 -> 147384 (-0.05%); split: -0.07%, +0.02%
Latency: 13757114 -> 13751833 (-0.04%); split: -0.13%, +0.09%
InvThroughput: 2463343 -> 2462482 (-0.03%); split: -0.07%, +0.04%
VClause: 40137 -> 40153 (+0.04%); split: -0.07%, +0.11%
SClause: 57351 -> 57385 (+0.06%); split: -0.12%, +0.18%
Copies: 135482 -> 136258 (+0.57%); split: -0.22%, +0.79%
Branches: 30886 -> 30894 (+0.03%)
PreSGPRs: 113470 -> 113462 (-0.01%); split: -0.03%, +0.02%
PreVGPRs: 117554 -> 117591 (+0.03%); split: -0.01%, +0.04%
VALU: 1682734 -> 1681557 (-0.07%); split: -0.10%, +0.03%
SALU: 390685 -> 391301 (+0.16%); split: -0.07%, +0.22%
VOPD: 6159 -> 6254 (+1.54%); split: +1.72%, -0.18%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00
Georg Lehmann
b949122908
nir/opt_algebraic: remove loops for b2f/b2i equality handling
...
The feq/fneu patterns already existed, and there is no reason to use bit size based
loops here.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00
Georg Lehmann
83091276f8
nir_opt_algebraic: remove more specific cmp+bcsel opts
...
Only some minimal difference from pattern ordering:
Foz-DB Navi48:
Totals from 3 (0.00% of 114655) affected shaders:
Instrs: 4556 -> 4533 (-0.50%)
CodeSize: 23716 -> 23608 (-0.46%)
Latency: 27424 -> 26336 (-3.97%)
InvThroughput: 4674 -> 4672 (-0.04%)
SClause: 107 -> 105 (-1.87%)
Copies: 351 -> 346 (-1.42%)
Branches: 130 -> 126 (-3.08%)
VALU: 2598 -> 2595 (-0.12%)
SALU: 561 -> 555 (-1.07%)
SMEM: 169 -> 167 (-1.18%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00
Georg Lehmann
4190241795
nir/opt_algebraic: optimize all comparisons of b2f/b2i with constants
...
Foz-DB Navi48:
Totals from 857 (0.75% of 114655) affected shaders:
Instrs: 1136993 -> 1132422 (-0.40%); split: -0.48%, +0.08%
CodeSize: 6096636 -> 6070832 (-0.42%); split: -0.48%, +0.06%
VGPRs: 49668 -> 49620 (-0.10%)
Latency: 24014661 -> 24044601 (+0.12%); split: -0.04%, +0.16%
InvThroughput: 4182482 -> 4183708 (+0.03%); split: -0.12%, +0.15%
VClause: 17698 -> 17695 (-0.02%)
SClause: 25214 -> 25213 (-0.00%)
Copies: 81474 -> 81396 (-0.10%); split: -0.79%, +0.69%
Branches: 24722 -> 24650 (-0.29%); split: -0.36%, +0.07%
PreSGPRs: 43338 -> 43291 (-0.11%); split: -0.22%, +0.11%
VALU: 652975 -> 649760 (-0.49%); split: -0.50%, +0.00%
SALU: 153961 -> 153797 (-0.11%); split: -0.72%, +0.61%
VOPD: 10650 -> 10684 (+0.32%); split: +0.38%, -0.07%
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00
Georg Lehmann
ef6f5377da
nir/opt_algebraic: remove fcmp+fneg patterns that are cleaned up earlier
...
No Foz-DB changes, as expected.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40138 >
2026-03-02 15:24:36 +00:00