Commit graph

639 commits

Author SHA1 Message Date
Georg Lehmann
a0665e79e9 nir/opt_algebraic: push fsat into bcsel with constant
bcsel doesn't have a free clamp modifier on AMD hardware,
but what's inside might have free clamp.

Foz-DB Navi31:
Totals from 873 (1.09% of 80273) affected shaders:
MaxWaves: 22008 -> 21968 (-0.18%)
Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02%
CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01%
VGPRs: 57900 -> 57960 (+0.10%)
Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00%
VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14%
SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02%
Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10%
PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08%
VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01%
SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Georg Lehmann
e9e5146848 nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive
Foz-DB Navi31:
Totals from 946 (1.18% of 80273) affected shaders:
Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00%
CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00%
Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00%
VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00%
SClause: 109694 -> 109688 (-0.01%)
Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01%
Branches: 132636 -> 132633 (-0.00%)
PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00%
VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00%
SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Georg Lehmann
e43ef6533b nir/opt_algebraic: remove 8bit roundtrip when vectorizing i2i16(unpack_4x8(a).zw)
Explicit 16bit instructions are nicer to vectorize.

Helps FSR4 on GFX11 marginally.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 59781 -> 58518 (-2.11%)
CodeSize: 413428 -> 404156 (-2.24%)
Latency: 193770 -> 190768 (-1.55%)
InvThroughput: 226274 -> 221628 (-2.05%)
VClause: 796 -> 793 (-0.38%); split: -1.01%, +0.63%
Copies: 3342 -> 3008 (-9.99%); split: -11.01%, +1.02%
PreSGPRs: 312 -> 305 (-2.24%)
VALU: 51448 -> 50213 (-2.40%)
SALU: 1074 -> 1048 (-2.42%)
VOPD: 1783 -> 1718 (-3.65%); split: +0.95%, -4.60%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
Georg Lehmann
037c2532ab nir/opt_algebraic: create non 32bit bitfield_select
Foz-DB Navi21:
Totals from 68 (0.08% of 80255) affected shaders:
Instrs: 197878 -> 197709 (-0.09%); split: -0.09%, +0.00%
CodeSize: 1060700 -> 1060472 (-0.02%); split: -0.02%, +0.00%
Latency: 659865 -> 659673 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 117010 -> 116985 (-0.02%); split: -0.03%, +0.00%
VClause: 3781 -> 3779 (-0.05%)
Copies: 15317 -> 15265 (-0.34%); split: -0.35%, +0.01%
PreVGPRs: 3251 -> 3250 (-0.03%)
VALU: 96800 -> 96799 (-0.00%); split: -0.00%, +0.00%
SALU: 57006 -> 56836 (-0.30%); split: -0.30%, +0.00%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>
2025-07-21 20:42:32 +00:00
Alyssa Rosenzweig
421d0e0953 nir: mark exact fmul in ldexp lowering
this chain of fmul is deliberately chosen for floating point precision
reasons, it needs to be exact, or else we might try to reassociate it
and break subnormal handling.

avoids regressing dEQP-VK.glsl.builtin.precision.ldexp_subnormals.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
2025-07-21 11:42:18 +00:00
Alyssa Rosenzweig
042adf3cc5 nir/opt_algebraic: optimize signed pow in Control
used in a post-processing shader which goes 896 instrs -> 749 instrs.

In my Control fossil:

   Totals from 2 (0.63% of 319) affected shaders:
   Instrs: 2078 -> 1841 (-11.41%)
   CodeSize: 14540 -> 12800 (-11.97%)
   ALU: 1779 -> 1626 (-8.60%)
   FSCIB: 1779 -> 1626 (-8.60%)
   Uniforms: 370 -> 372 (+0.54%)

In radv_fossils, there are affected shaders in Dredge.

   Totals from 4 (0.01% of 54019) affected shaders:
   Instrs: 2306 -> 2294 (-0.52%)
   CodeSize: 16594 -> 16534 (-0.36%)
   ALU: 2010 -> 2004 (-0.30%)
   FSCIB: 2010 -> 2004 (-0.30%)
   Uniforms: 1138 -> 1146 (+0.70%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
2025-07-08 17:09:16 +00:00
Alyssa Rosenzweig
2765017553 nir: fuse ffma even with float controls
The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise,
because they are a contraction. However, they respect signed zero/Inf/NaN rules.
As such, it is legal to do this fusion with shader float controls as long as the
exact bit is not set (mapping to SPIR-V NoContract).

Unfortunately, NIR's imprecise rules do not distinguish between contraction
issues versus float special case issues, forcing nir_search to skip all
imprecise rules when any shader float control modes are used. This notably
affects DXVK, which sets shader float controls to get D3D11 float behaviour and
hence loses FMA fusing.

Therefore, we plumb in the exact bit to express NoContract independent of the
float controls, and weaken the requirement for fma fusion to allowable
contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in
SPIR-V is just a multiply add that we're allowed to contract rather than the
real deal.

Drivers that use their own FMA fusing passes (notably, Intel and AMD) are
unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results
on hk shown:

Totals from 2194 (4.06% of 54019) affected shaders:
MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01%
Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01%
CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01%
Spills: 1094 -> 747 (-31.72%)
Fills: 988 -> 681 (-31.07%)
Scratch: 4444 -> 3820 (-14.04%)
ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
IC: 215398 -> 215274 (-0.06%)
GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96%
Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04%
Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07%

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
2025-07-08 17:09:16 +00:00
Georg Lehmann
045ddb992a nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat
Helps vectorized emulated fp16 -> fp8 conversions

No Foz-DB changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>
2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig
4f7cae5e61 nir/opt_algebraic: add trichotomy identity
In https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802 we will
significantly rework geometry shaders & transform feedback. In the new approach,
transform feedback is executed as part of the hardware vertex shader, meaning
the vertex shader needs to write out all the "copies" of the same value into
different parts of the XFB buffer. In the general case of a GS writing triangle
strips, we get 0-3 copies. This is good and lets us parallelize XFB better with
GS.

In the case of a VS alone with XFB, we insert a passthrough GS. In that case
special case, we can only get at most 1 copy, so if we can prove the length of
the output strip is 3 we can delete 2/3 of the shader.

Anyway, the only thing preventing NIR from doing that optimization is failing to
see through some conditionals, fixed by optimizing with the law of trichotomy.

We could add other variants of this pattern (signed vs unsigned, iand vs
ior/ixor) if we expect anything else to hit this other than my boutique use
case.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
2025-06-30 16:24:04 +00:00
Emma Anholt
bc8994cb48 nir: Add a pass to reassociate multiplication of mat*mat*vec.
The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but
mat4*(mat4*vec4) is only 32.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35622>
2025-06-23 17:49:51 +00:00
Georg Lehmann
f047a67fba nir,aco: optimize FP16_OFVL pattern created by vkd3d-proton
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:27 +00:00
Georg Lehmann
ad80b554f4 spirv: use feq for OpIsInf
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This effectively reverts fcca6a83cd because feq was clarified to be ordered
when used with exact and without fast math flags.

It's common for HW to only have free abs for floating point instructions.

Foz-DB Navi21:
Totals from 63 (0.08% of 80065) affected shaders:
Instrs: 337027 -> 336667 (-0.11%); split: -0.12%, +0.02%
CodeSize: 1846752 -> 1845000 (-0.09%); split: -0.13%, +0.03%
Latency: 3401087 -> 3400633 (-0.01%); split: -0.04%, +0.03%
InvThroughput: 847299 -> 845939 (-0.16%); split: -0.19%, +0.03%
VClause: 7693 -> 7694 (+0.01%)
Copies: 45175 -> 45240 (+0.14%); split: -0.12%, +0.27%
PreSGPRs: 3555 -> 3553 (-0.06%)
PreVGPRs: 4565 -> 4564 (-0.02%)
VALU: 225473 -> 225245 (-0.10%); split: -0.13%, +0.03%
SALU: 44735 -> 44625 (-0.25%)

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35437>
2025-06-11 18:34:21 +00:00
Lionel Landwerlin
978933c015 nir/opt_algebraic: extend lowering for (i|u)bitfield_extract
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35334>
2025-06-04 16:28:39 +00:00
Samuel Pitoiset
226b0e28db nir: generalize bitfield insert/extract sizes
Original patch from Alyssa Rosenzweig

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35209>
2025-06-04 09:37:53 +00:00
Alyssa Rosenzweig
759dc70bde nir: generalize bitfield_reverse bit size
No reason we can't reverse other bit sizes, we just need to generalize the
constant folding & bit size lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35198>
2025-05-28 16:29:30 +00:00
Ian Romanick
37ee91679a nir/algebraic: Generalize an existing bfi(a, 0, ...) pattern
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210561118 -> 210560921 (-0.00%)
Send messages: 10979615 -> 10979613 (-0.00%)
Cycle count: 31576352808 -> 31576347218 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66068161 -> 66068157 (-0.00%)
Non SSA regs after NIR: 60230775 -> 60230949 (+0.00%)

Totals from 180 (0.03% of 707082) affected shaders:
Instrs: 68035 -> 67838 (-0.29%)
Send messages: 3190 -> 3188 (-0.06%)
Cycle count: 3979496 -> 3973906 (-0.14%); split: -0.14%, +0.00%
Max live registers: 11812 -> 11808 (-0.03%)
Non SSA regs after NIR: 18878 -> 19052 (+0.92%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
2025-05-16 14:49:25 -07:00
Ian Romanick
464955bbdd nir/algebraic: Optimize some open-coded extract_i8
These were initially observed in Hogwarts Legacy while working on
something else entirely. Two compute shaders in that app are helped
for spills and fills. On Skylake, one of the shaders benefits from
this change, and the other is hurt pretty significantly.

About 40 vertex shaders in Shadow of the Tomb Raider were helped for
instructions.

v2: Use ~0xff instead of 0xffffff00 to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.

v3: No, really, fix the various errors to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake)
Totals:
Instrs: 210566294 -> 210561118 (-0.00%)
Cycle count: 31582309052 -> 31576352808 (-0.02%); split: -0.02%, +0.00%
Spill count: 519300 -> 519280 (-0.00%)
Fill count: 625181 -> 625161 (-0.00%)
Scratch Memory Size: 36289536 -> 36281344 (-0.02%)
Max live registers: 66068413 -> 66068161 (-0.00%)
Non SSA regs after NIR: 60230773 -> 60230775 (+0.00%)

Totals from 1662 (0.24% of 707082) affected shaders:
Instrs: 635064 -> 629888 (-0.82%)
Cycle count: 36549632 -> 30593388 (-16.30%); split: -16.43%, +0.14%
Spill count: 246 -> 226 (-8.13%)
Fill count: 280 -> 260 (-7.14%)
Scratch Memory Size: 16384 -> 8192 (-50.00%)
Max live registers: 178491 -> 178239 (-0.14%)
Non SSA regs after NIR: 169552 -> 169554 (+0.00%)

Tiger Lake
Totals:
Instrs: 238544730 -> 238539407 (-0.00%)
Cycle count: 23679446097 -> 23673238578 (-0.03%); split: -0.03%, +0.00%
Max live registers: 42494925 -> 42494799 (-0.00%)
Non SSA regs after NIR: 63639071 -> 63639074 (+0.00%)

Totals from 1662 (0.21% of 802704) affected shaders:
Instrs: 626604 -> 621281 (-0.85%)
Cycle count: 26444363 -> 20236844 (-23.47%); split: -23.50%, +0.02%
Max live registers: 95405 -> 95279 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)

Ice Lake
Totals:
Instrs: 238855310 -> 238826534 (-0.01%)
Cycle count: 24952257277 -> 24944589398 (-0.03%); split: -0.03%, +0.00%
Spill count: 575510 -> 575117 (-0.07%)
Fill count: 713007 -> 708632 (-0.61%)
Max live registers: 42499556 -> 42499432 (-0.00%)
Non SSA regs after NIR: 64388747 -> 64388750 (+0.00%)

Totals from 1662 (0.21% of 805149) affected shaders:
Instrs: 926887 -> 898111 (-3.10%)
Cycle count: 67025583 -> 59357704 (-11.44%); split: -11.45%, +0.01%
Spill count: 5168 -> 4775 (-7.60%)
Fill count: 32883 -> 28508 (-13.30%)
Max live registers: 95614 -> 95490 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)

Skylake
Totals:
Instrs: 161904416 -> 161895239 (-0.01%); split: -0.01%, +0.00%
Cycle count: 20098067714 -> 20090767583 (-0.04%); split: -0.04%, +0.00%
Spill count: 525546 -> 525789 (+0.05%); split: -0.04%, +0.09%
Fill count: 603369 -> 602276 (-0.18%); split: -0.28%, +0.10%
Max live registers: 33895714 -> 33895590 (-0.00%)
Non SSA regs after NIR: 57348729 -> 57348730 (+0.00%)

Totals from 1655 (0.25% of 653734) affected shaders:
Instrs: 769979 -> 760802 (-1.19%); split: -1.83%, +0.64%
Cycle count: 51365416 -> 44065285 (-14.21%); split: -14.22%, +0.01%
Spill count: 4186 -> 4429 (+5.81%); split: -4.90%, +10.70%
Fill count: 16356 -> 15263 (-6.68%); split: -10.50%, +3.82%
Max live registers: 95115 -> 94991 (-0.13%)
Non SSA regs after NIR: 180797 -> 180798 (+0.00%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
2025-05-16 14:49:05 -07:00
Georg Lehmann
0a30611c10 nir/opt_algebraic: some bitfield_select optimizations
Foz-DB Navi21:
Totals from 47 (0.06% of 79789) affected shaders:
Instrs: 69536 -> 69363 (-0.25%)
CodeSize: 370624 -> 369388 (-0.33%)
Latency: 383505 -> 383298 (-0.05%)
InvThroughput: 72924 -> 72727 (-0.27%)
PreSGPRs: 2618 -> 2610 (-0.31%)
VALU: 43261 -> 43091 (-0.39%)
SALU: 13065 -> 13063 (-0.02%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34739>
2025-05-13 10:59:09 +00:00
Karol Herbst
f0fa2209a8 nir: add nir_opt_algebraic_integer_promotion
This handles basic operations where clang promotes integers to 32 bits
according to the C99 spec in OpenCL C source code.

This is its own opt_algerbraic pass, because we don't wanna fight with
nir_lower_bit_size.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>
2025-05-12 09:29:20 +00:00
Georg Lehmann
02e743c99e nir: add an option to lower bf2f and f2bf
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Rhys Perry
f538cae743 nir/algebraic: optimize ior(unpack_4x8, unpack_4x8<<8) to unpack_32_2x16
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Christian Gmeiner
f17d350001 lima: Move fdot lowering from NIR to lima
This change relocates the fdot lowering from the generic NIR to the lima,
since lima is the only consumer of this particular lowering. This avoids
potential conflicts with the similar fdot lowering already present in
nir_lower_alu_width.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34757>
2025-04-30 17:33:38 +00:00
Georg Lehmann
3e26fc4498 nir/opt_algebraic: disable fsat(a + 1.0) opt if a can be NaN
Foz-DB Navi21:
Totals from 9 (0.01% of 79789) affected shaders:
Instrs: 6782 -> 6796 (+0.21%); split: -0.03%, +0.24%
CodeSize: 40020 -> 40108 (+0.22%); split: -0.04%, +0.26%
Latency: 23764 -> 23758 (-0.03%)
InvThroughput: 6424 -> 6431 (+0.11%); split: -0.08%, +0.19%
SClause: 273 -> 275 (+0.73%)
Copies: 338 -> 339 (+0.30%)
VALU: 5138 -> 5147 (+0.18%); split: -0.06%, +0.23%
SALU: 349 -> 350 (+0.29%)
SMEM: 498 -> 500 (+0.40%)

Fixes: a4a3487aae ("nir/opt_algebraic: optimize patterns from Skia")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
8ad695195e nir/opt_algebraic: turn exact fmin(1.0, a) into fsat if a is not NaN and not negative
Foz-DB Navi21:
Totals from 2456 (3.08% of 79789) affected shaders:
Instrs: 3415398 -> 3413352 (-0.06%); split: -0.06%, +0.00%
CodeSize: 18781096 -> 18776092 (-0.03%); split: -0.03%, +0.00%
VGPRs: 158512 -> 158528 (+0.01%)
Latency: 39528900 -> 39526687 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 10612237 -> 10609296 (-0.03%); split: -0.03%, +0.00%
VClause: 71028 -> 71034 (+0.01%)
SClause: 93971 -> 93975 (+0.00%); split: -0.00%, +0.01%
Copies: 257525 -> 257521 (-0.00%); split: -0.01%, +0.01%
VALU: 2483374 -> 2481325 (-0.08%); split: -0.09%, +0.00%
SALU: 348207 -> 348211 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
18a0de1834 nir/opt_algebraic: optimize fmax(ffma(a, b, c), 0.0) to fsat
Foz-DB Navi21:
Totals from 2621 (3.28% of 79789) affected shaders:
MaxWaves: 55744 -> 55736 (-0.01%)
Instrs: 2840180 -> 2832647 (-0.27%); split: -0.27%, +0.00%
CodeSize: 15497364 -> 15464692 (-0.21%); split: -0.21%, +0.00%
VGPRs: 138448 -> 138456 (+0.01%)
Latency: 22319512 -> 22307018 (-0.06%); split: -0.06%, +0.01%
InvThroughput: 5745108 -> 5729197 (-0.28%); split: -0.28%, +0.00%
Copies: 110279 -> 110268 (-0.01%); split: -0.04%, +0.03%
VALU: 2210578 -> 2203211 (-0.33%); split: -0.33%, +0.00%
SALU: 169014 -> 168841 (-0.10%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
f71fc26393 nir/opt_algebraic: generalize fmax(fadd(a, b), 0.0) to fsat by not requiring fneg
Not a large effect, but it's positive and makes the pattern simpler.

Foz-DB Navi21:
Totals from 1 (0.00% of 79789) affected shaders:
Instrs: 145 -> 138 (-4.83%)
CodeSize: 784 -> 756 (-3.57%)
Latency: 1495 -> 1487 (-0.54%)
InvThroughput: 210 -> 196 (-6.67%)
VALU: 103 -> 96 (-6.80%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Ian Romanick
1d2ebeca17 nir/algebraic: Allow fmin(a,a) optimization when flush denorm to zero is not set
I was surprised this had any affect on Intel GPUs because we have been
unconditionally performing this optimization in the backend since June
2014.

Once that error is fixed (later in this MR), this change prevents a
couple dozen regressions in shader-db and around 90 regressions in
fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and
that can be a big deal.

v2: Add 64-bit too. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16970141 -> 16970139 (<.01%)
instructions in affected programs: 40 -> 38 (-5.00%)
helped: 2 / HURT: 0

total cycles in shared programs: 914617580 -> 914617548 (<.01%)
cycles in affected programs: 3428 -> 3396 (-0.93%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%)

Totals from 83 (0.01% of 706657) affected shaders:
Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02%
Non SSA regs after NIR: 78997 -> 78901 (-0.12%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
2025-04-15 23:59:31 +00:00
Georg Lehmann
d046ecf95a nir/opt_algebraic: optimize open coded ffract
Foz-DB Navi21:
Totals from 274 (0.34% of 79789) affected shaders:
Instrs: 522630 -> 522181 (-0.09%); split: -0.09%, +0.01%
CodeSize: 2880668 -> 2878940 (-0.06%); split: -0.07%, +0.01%
VGPRs: 14488 -> 14464 (-0.17%)
Latency: 4092358 -> 4091243 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 1014148 -> 1013471 (-0.07%); split: -0.07%, +0.00%
VClause: 11646 -> 11639 (-0.06%)
SClause: 18614 -> 18611 (-0.02%)
Copies: 56248 -> 56309 (+0.11%); split: -0.05%, +0.16%
PreVGPRs: 13649 -> 13647 (-0.01%)
VALU: 359733 -> 359285 (-0.12%); split: -0.13%, +0.01%
SALU: 59719 -> 59720 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33369>
2025-04-11 12:36:02 +00:00
Marek Olšák
1d5c42528b nir/opt_algebraic: lower 16-bit imul_high & umul_high
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Georg Lehmann
2b1fc1a7fe nir: add option to keep mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:15 +00:00
Georg Lehmann
b386659588 nir/opt_algebraic: create ubfe from (a & mask) >> c
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 917 (1.16% of 79188) affected shaders:
Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00%
CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00%
Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01%
VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00%
SClause: 62294 -> 62293 (-0.00%)
Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01%
VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00%
SALU: 359390 -> 358910 (-0.13%)
VMEM: 101966 -> 101924 (-0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>
2025-03-14 11:15:04 +00:00
Georg Lehmann
d272a6e261 nir/opt_algebraic: optimize d3d a ? b : 0
Foz-DB Navi21:
Totals from 3466 (4.34% of 79789) affected shaders:
MaxWaves: 73163 -> 73161 (-0.00%); split: +0.02%, -0.02%
Instrs: 3993862 -> 3987633 (-0.16%); split: -0.19%, +0.04%
CodeSize: 21747420 -> 21725620 (-0.10%); split: -0.15%, +0.05%
VGPRs: 190736 -> 190728 (-0.00%); split: -0.04%, +0.03%
SpillSGPRs: 489 -> 478 (-2.25%); split: -2.86%, +0.61%
Latency: 48169718 -> 48159068 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 12132999 -> 12128721 (-0.04%); split: -0.05%, +0.01%
VClause: 78063 -> 78052 (-0.01%); split: -0.09%, +0.08%
SClause: 109095 -> 108996 (-0.09%); split: -0.13%, +0.04%
Copies: 265784 -> 264530 (-0.47%); split: -0.72%, +0.25%
Branches: 84533 -> 84553 (+0.02%)
PreSGPRs: 172577 -> 172531 (-0.03%); split: -0.19%, +0.16%
PreVGPRs: 165776 -> 165825 (+0.03%); split: -0.06%, +0.09%
VALU: 2851544 -> 2850426 (-0.04%); split: -0.08%, +0.04%
SALU: 413543 -> 408408 (-1.24%); split: -1.45%, +0.21%
VMEM: 139890 -> 139887 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
2e7f34af6b nir/opt_algebraic: optimize more ine/ieq(umin(b2i, ), 0)
Foz-DB Navi21:
Totals from 76 (0.10% of 79789) affected shaders:
MaxWaves: 1050 -> 1062 (+1.14%)
Instrs: 113754 -> 113691 (-0.06%); split: -0.11%, +0.06%
CodeSize: 605096 -> 605216 (+0.02%); split: -0.03%, +0.05%
VGPRs: 6024 -> 5976 (-0.80%)
Latency: 1776501 -> 1777519 (+0.06%); split: -0.06%, +0.12%
InvThroughput: 379644 -> 376751 (-0.76%)
SClause: 2132 -> 2134 (+0.09%)
Copies: 4131 -> 4128 (-0.07%); split: -1.77%, +1.69%
PreSGPRs: 4275 -> 4270 (-0.12%)
PreVGPRs: 5568 -> 5526 (-0.75%)
VALU: 86732 -> 86581 (-0.17%); split: -0.24%, +0.07%
SALU: 7112 -> 7198 (+1.21%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
7bc3062a3b nir/opt_algebraic: push comparisons with constants into bcsel with constant
Foz-DB Navi21:
Totals from 1657 (2.08% of 79789) affected shaders:
MaxWaves: 30275 -> 30261 (-0.05%); split: +0.01%, -0.05%
Instrs: 3316251 -> 3315701 (-0.02%); split: -0.04%, +0.02%
CodeSize: 17831924 -> 17832020 (+0.00%); split: -0.06%, +0.06%
SpillSGPRs: 815 -> 859 (+5.40%)
SpillVGPRs: 3335 -> 3293 (-1.26%)
Scratch: 231424 -> 230400 (-0.44%)
Latency: 33413310 -> 33402751 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 9116062 -> 9112904 (-0.03%); split: -0.04%, +0.00%
VClause: 65587 -> 65560 (-0.04%); split: -0.05%, +0.01%
SClause: 86208 -> 86261 (+0.06%); split: -0.02%, +0.08%
Copies: 356158 -> 356439 (+0.08%); split: -0.07%, +0.15%
PreSGPRs: 101710 -> 101806 (+0.09%); split: -0.01%, +0.11%
PreVGPRs: 89293 -> 89286 (-0.01%); split: -0.04%, +0.04%
VALU: 2220900 -> 2218839 (-0.09%); split: -0.11%, +0.01%
SALU: 472988 -> 474567 (+0.33%); split: -0.08%, +0.42%
VMEM: 118401 -> 118347 (-0.05%)
SMEM: 123597 -> 123592 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
3837bc6d16 nir/opt_algebraic: optimize ~a == ~b and ~a == #b
Foz-DB Navi21:
Totals from 2 (0.00% of 79789) affected shaders:
Instrs: 8343 -> 8323 (-0.24%)
CodeSize: 43884 -> 43764 (-0.27%)
Latency: 19390 -> 19363 (-0.14%)
InvThroughput: 3380 -> 3356 (-0.71%)
VALU: 5413 -> 5393 (-0.37%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
8759223498 nir/opt_algebraic: optimize b2i/b2f comparision with non 0/1 constants
Foz-DB Navi21:
Totals from 28 (0.04% of 79789) affected shaders:
MaxWaves: 732 -> 728 (-0.55%)
Instrs: 23425 -> 22559 (-3.70%)
CodeSize: 137740 -> 132292 (-3.96%)
VGPRs: 1128 -> 1144 (+1.42%)
Latency: 94604 -> 92423 (-2.31%)
InvThroughput: 19166 -> 18814 (-1.84%); split: -2.38%, +0.54%
VClause: 429 -> 423 (-1.40%)
SClause: 937 -> 926 (-1.17%)
Copies: 1199 -> 914 (-23.77%); split: -24.52%, +0.75%
Branches: 451 -> 421 (-6.65%)
PreSGPRs: 1043 -> 996 (-4.51%)
PreVGPRs: 992 -> 973 (-1.92%); split: -3.53%, +1.61%
VALU: 17566 -> 16865 (-3.99%)
SALU: 1254 -> 1157 (-7.74%)
VMEM: 619 -> 609 (-1.62%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
2bfcfef5da nir/opt_algebraic: optimize bcsel of b2f and constants
Foz-DB Navi21:
Totals from 212 (0.27% of 79789) affected shaders:
MaxWaves: 4024 -> 4030 (+0.15%)
Instrs: 1314134 -> 1313894 (-0.02%); split: -0.03%, +0.02%
CodeSize: 7033216 -> 7026888 (-0.09%); split: -0.10%, +0.01%
VGPRs: 14224 -> 14176 (-0.34%)
Latency: 7402062 -> 7399180 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 1724879 -> 1723773 (-0.06%); split: -0.07%, +0.00%
VClause: 37741 -> 37711 (-0.08%); split: -0.11%, +0.03%
SClause: 29266 -> 29268 (+0.01%); split: -0.01%, +0.01%
Copies: 123810 -> 123786 (-0.02%); split: -0.19%, +0.17%
Branches: 42370 -> 42407 (+0.09%); split: -0.03%, +0.11%
PreSGPRs: 13149 -> 13196 (+0.36%); split: -0.05%, +0.40%
PreVGPRs: 12407 -> 12395 (-0.10%)
VALU: 884471 -> 883475 (-0.11%); split: -0.12%, +0.01%
SALU: 177671 -> 178408 (+0.41%); split: -0.03%, +0.45%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
b90826736d nir/opt_algebraic: optimize bit_count(a) != 0
vkd3d-proton will emit
b = ballot(!gl_HelperInvocation);
(subgroupBallotBitCount(b) != 0u) ? subgroupShuffle(a, subgroupBallotFindLSB(b)) : 0u;

for WaveReadFirstLane(a) in fragment shaders

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33808>
2025-02-28 18:03:04 +00:00
Georg Lehmann
a237a3def8 nir/opt_algebraic: optimize b2i(a) != -b2i(b)
Foz-DB Navi21:
Totals from 4 (0.01% of 79377) affected shaders:
Instrs: 881 -> 861 (-2.27%)
CodeSize: 4968 -> 4836 (-2.66%)
Latency: 6127 -> 6006 (-1.97%)
InvThroughput: 1128 -> 1068 (-5.32%)
VALU: 564 -> 534 (-5.32%)
SALU: 111 -> 121 (+9.01%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
4141043295 nir/opt_algebraic: optimize constant shift of DXBC booleans
Can be combined with further iand.

Foz-DB Navi21:
Totals from 190 (0.24% of 79377) affected shaders:
Instrs: 100628 -> 100225 (-0.40%); split: -0.41%, +0.01%
CodeSize: 567828 -> 565884 (-0.34%); split: -0.35%, +0.00%
Latency: 968415 -> 968052 (-0.04%); split: -0.09%, +0.06%
InvThroughput: 285804 -> 285210 (-0.21%); split: -0.25%, +0.04%
VClause: 1959 -> 1958 (-0.05%)
Copies: 5696 -> 5711 (+0.26%)
PreSGPRs: 7567 -> 7569 (+0.03%)
VALU: 77161 -> 76751 (-0.53%); split: -0.54%, +0.01%
SALU: 7831 -> 7840 (+0.11%); split: -0.09%, +0.20%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
1e522e7d75 nir/opt_algebraic: optimize dxbc boolean not
Foz-DB Navi21:
Totals from 237 (0.30% of 79377) affected shaders:
Instrs: 486690 -> 486146 (-0.11%); split: -0.11%, +0.00%
CodeSize: 2629516 -> 2626052 (-0.13%); split: -0.13%, +0.00%
VGPRs: 18744 -> 18736 (-0.04%)
Latency: 7404763 -> 7399806 (-0.07%); split: -0.07%, +0.01%
InvThroughput: 1800282 -> 1798388 (-0.11%); split: -0.11%, +0.00%
VClause: 12101 -> 12106 (+0.04%); split: -0.01%, +0.05%
Copies: 34225 -> 34170 (-0.16%); split: -0.21%, +0.05%
PreSGPRs: 14634 -> 14639 (+0.03%)
PreVGPRs: 16713 -> 16706 (-0.04%)
VALU: 317523 -> 316693 (-0.26%); split: -0.26%, +0.00%
SALU: 53814 -> 54097 (+0.53%); split: -0.38%, +0.90%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
f9722e35be nir/opt_algebraic: optimize more boolean bcsel with constants
Foz-DB Navi21:
Totals from 667 (0.84% of 79377) affected shaders:
Instrs: 3890980 -> 3886878 (-0.11%); split: -0.11%, +0.00%
CodeSize: 21088576 -> 21065848 (-0.11%); split: -0.11%, +0.00%
SpillSGPRs: 458 -> 446 (-2.62%); split: -3.49%, +0.87%
Latency: 26160728 -> 26162856 (+0.01%); split: -0.02%, +0.02%
InvThroughput: 6999254 -> 7000593 (+0.02%); split: -0.01%, +0.03%
VClause: 103745 -> 103743 (-0.00%)
SClause: 93113 -> 93109 (-0.00%)
Copies: 344097 -> 344794 (+0.20%); split: -0.05%, +0.25%
Branches: 134546 -> 134764 (+0.16%); split: -0.01%, +0.17%
PreSGPRs: 40677 -> 40298 (-0.93%); split: -0.93%, +0.00%
PreVGPRs: 40185 -> 40190 (+0.01%)
VALU: 2584477 -> 2584468 (-0.00%); split: -0.00%, +0.00%
SALU: 573587 -> 569353 (-0.74%); split: -0.75%, +0.01%
SMEM: 124794 -> 124790 (-0.00%)

v2 (idr): Remove a pattern that is made redundant by this commit
combined with the previous commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
9785fa460c nir/opt_algebraic: optimize DXBC boolean bcsel
Foz-DB Navi21:
Totals from 1749 (2.20% of 79377) affected shaders:
Instrs: 1695408 -> 1685149 (-0.61%); split: -0.68%, +0.07%
CodeSize: 9241312 -> 9174180 (-0.73%); split: -0.79%, +0.06%
VGPRs: 90688 -> 90664 (-0.03%); split: -0.04%, +0.01%
SpillSGPRs: 278 -> 298 (+7.19%)
Latency: 9560167 -> 9540386 (-0.21%); split: -0.29%, +0.08%
InvThroughput: 2236022 -> 2220411 (-0.70%); split: -0.72%, +0.02%
VClause: 29910 -> 29917 (+0.02%)
Copies: 146365 -> 145230 (-0.78%); split: -1.03%, +0.25%
Branches: 59545 -> 59560 (+0.03%)
PreSGPRs: 78858 -> 79242 (+0.49%); split: -0.10%, +0.59%
PreVGPRs: 78643 -> 78560 (-0.11%); split: -0.11%, +0.00%
VALU: 1127861 -> 1113990 (-1.23%); split: -1.24%, +0.01%
SALU: 249535 -> 253237 (+1.48%); split: -0.15%, +1.63%

v2 (idr): Remove a pattern that is now redundant.

v3 (idr): Don't undistribute ineg from bcsel. On platforms where ineg
is a free source modifier, this can be harmful.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
674d970861 nir/opt_algebraic: 0 >= a -> 0 == a
Foz-DB Navi21:
Totals from 2179 (2.75% of 79377) affected shaders:
MaxWaves: 40987 -> 40917 (-0.17%); split: +0.00%, -0.18%
Instrs: 5950981 -> 5949310 (-0.03%); split: -0.04%, +0.01%
CodeSize: 32120808 -> 32110328 (-0.03%); split: -0.04%, +0.00%
VGPRs: 141704 -> 141768 (+0.05%); split: -0.01%, +0.05%
SpillSGPRs: 1750 -> 1746 (-0.23%)
Latency: 56667295 -> 56562916 (-0.18%); split: -0.19%, +0.00%
InvThroughput: 13292128 -> 13288691 (-0.03%); split: -0.03%, +0.00%
VClause: 151845 -> 151755 (-0.06%); split: -0.06%, +0.00%
SClause: 172316 -> 172443 (+0.07%); split: -0.02%, +0.09%
Copies: 458724 -> 458951 (+0.05%); split: -0.08%, +0.13%
Branches: 195239 -> 195351 (+0.06%); split: -0.00%, +0.06%
PreSGPRs: 135304 -> 135317 (+0.01%); split: -0.01%, +0.02%
PreVGPRs: 122430 -> 122428 (-0.00%); split: -0.01%, +0.01%
VALU: 3924585 -> 3924062 (-0.01%); split: -0.02%, +0.01%
SALU: 820666 -> 819414 (-0.15%); split: -0.17%, +0.02%
SMEM: 247036 -> 247142 (+0.04%); split: -0.00%, +0.04%

v2 (idr): Remove a pattern that is now redundant. This was originaly
removed in a commit later in the MR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
000f14f7fd nir/opt_algebraic: optimize ineg(a) == #b
No Foz-DB changes.

v2 (idr): Remove some patterns that are now redundant. These were
originally removed in a commit later in the MR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:08 +00:00
Georg Lehmann
3e4ac92298 nir/opt_algebraic: optimize ineg(a) == ineg(b)
DXBC boolean cleanup.

Foz-DB Navi21:
Totals from 19 (0.02% of 79188) affected shaders:
Instrs: 9720 -> 9652 (-0.70%)
CodeSize: 54056 -> 53640 (-0.77%)
Latency: 95357 -> 94377 (-1.03%); split: -1.03%, +0.00%
InvThroughput: 17331 -> 16939 (-2.26%)
Copies: 604 -> 605 (+0.17%)
PreSGPRs: 832 -> 838 (+0.72%)
PreVGPRs: 701 -> 699 (-0.29%)
VALU: 6551 -> 6485 (-1.01%)
SALU: 893 -> 891 (-0.22%); split: -1.68%, +1.46%

v2 (idr): Remove a pattern that is now redundant. The version without
ineg already exists much earlier in the file. Search for b2iN.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:08 +00:00
Georg Lehmann
e0cebac14f nir/opt_algebraic: optimize b2f(a != 0) * a
Just D3D9 things.

Foz-DB Navi21:
Totals from 137 (0.17% of 79377) affected shaders:
MaxWaves: 3366 -> 3370 (+0.12%); split: +0.24%, -0.12%
Instrs: 76462 -> 72091 (-5.72%)
CodeSize: 411584 -> 380792 (-7.48%)
Latency: 279472 -> 275505 (-1.42%); split: -2.01%, +0.59%
InvThroughput: 71311 -> 65369 (-8.33%)
VClause: 1587 -> 1612 (+1.58%); split: -1.01%, +2.58%
SClause: 1111 -> 1105 (-0.54%); split: -1.08%, +0.54%
Copies: 5621 -> 5602 (-0.34%); split: -1.39%, +1.05%
PreSGPRs: 5266 -> 5241 (-0.47%); split: -0.51%, +0.04%
PreVGPRs: 4249 -> 4236 (-0.31%); split: -0.35%, +0.05%
VALU: 50049 -> 45901 (-8.29%)
SALU: 8948 -> 8818 (-1.45%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>
2025-02-24 16:34:53 +00:00
Ian Romanick
15544ed858 nir/algebraic: Undistribute b2i from logic-ops
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973309 -> 16973173 (<.01%)
instructions in affected programs: 13780 -> 13644 (-0.99%)
helped: 31 / HURT: 0

total cycles in shared programs: 915620550 -> 915618604 (<.01%)
cycles in affected programs: 185962 -> 184016 (-1.05%)
helped: 30 / HURT: 1

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748003 -> 209745278 (-0.00%)
Cycle count: 30514920400 -> 30514716506 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65477183 -> 65477584 (+0.00%)
Non SSA regs after NIR: 237334710 -> 237333632 (-0.00%)

Totals from 1257 (0.18% of 706651) affected shaders:
Instrs: 693039 -> 690314 (-0.39%)
Cycle count: 39792504 -> 39588610 (-0.51%); split: -0.97%, +0.46%
Max live registers: 194170 -> 194571 (+0.21%)
Non SSA regs after NIR: 821978 -> 820900 (-0.13%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
a48a044cf6 nir/algebraic: Simplify equality comparisons of b2T with 1 or 0
Adding the b2i(a) == 1 and b2i(a) != 1 patterns also helps prevent
regressions when spurious negations are removed from integer equality
comparisons, as is done in !33498.

v2: Make all variables part of the iteration instead of calculating some
of them. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973331 -> 16973309 (<.01%)
instructions in affected programs: 266 -> 244 (-8.27%)
helped: 2 / HURT: 0

total cycles in shared programs: 915620774 -> 915620550 (<.01%)
cycles in affected programs: 4360 -> 4136 (-5.14%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748011 -> 209748003 (-0.00%)
Cycle count: 30514920286 -> 30514920400 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334726 -> 237334710 (-0.00%)

Totals from 8 (0.00% of 706651) affected shaders:
Instrs: 16956 -> 16948 (-0.05%)
Cycle count: 261052 -> 261166 (+0.04%); split: -0.92%, +0.96%
Non SSA regs after NIR: 20000 -> 19984 (-0.08%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
3f39d8f4ff nir/algebraic: Optimize zero comparisons of umax or umin
I observered some of the existing patterns stopped being applied after
some of the ult-to-ieq optimizations in !33498. It turns out that these
patterns occur even without those changes.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973339 -> 16973331 (<.01%)
instructions in affected programs: 7977 -> 7969 (-0.10%)
helped: 2 / HURT: 0

total cycles in shared programs: 915620938 -> 915620774 (<.01%)
cycles in affected programs: 136022 -> 135858 (-0.12%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 209748173 -> 209748011 (-0.00%); split: -0.00%, +0.00%
Cycle count: 30514361348 -> 30514920286 (+0.00%); split: -0.00%, +0.00%
Spill count: 511813 -> 511808 (-0.00%)
Fill count: 622537 -> 622533 (-0.00%)
Max live registers: 65477033 -> 65477183 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334728 -> 237334726 (-0.00%); split: -0.00%, +0.00%

Totals from 26 (0.00% of 706651) affected shaders:
Instrs: 332073 -> 331911 (-0.05%); split: -0.05%, +0.00%
Cycle count: 959758560 -> 960317498 (+0.06%); split: -0.03%, +0.09%
Spill count: 10293 -> 10288 (-0.05%)
Fill count: 23784 -> 23780 (-0.02%)
Max live registers: 9682 -> 9832 (+1.55%); split: -0.08%, +1.63%
Non SSA regs after NIR: 232135 -> 232133 (-0.00%); split: -0.03%, +0.03%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 233538532 -> 233536113 (-0.00%); split: -0.00%, +0.00%
Cycle count: 24428142259 -> 24426705655 (-0.01%); split: -0.01%, +0.00%
Spill count: 513128 -> 512923 (-0.04%)
Fill count: 557329 -> 557108 (-0.04%)
Max live registers: 42129806 -> 42129881 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 256711720 -> 256711718 (-0.00%); split: -0.00%, +0.00%

Totals from 26 (0.00% of 805759) affected shaders:
Instrs: 325629 -> 323210 (-0.74%); split: -0.74%, +0.00%
Cycle count: 893896782 -> 892460178 (-0.16%); split: -0.21%, +0.05%
Spill count: 10467 -> 10262 (-1.96%)
Fill count: 24291 -> 24070 (-0.91%)
Max live registers: 4946 -> 5021 (+1.52%); split: -0.08%, +1.60%
Non SSA regs after NIR: 232980 -> 232978 (-0.00%); split: -0.03%, +0.03%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 237289818 -> 237289714 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22959586058 -> 22960049302 (+0.00%); split: -0.00%, +0.00%
Max live registers: 42182257 -> 42182337 (+0.00%)
Non SSA regs after NIR: 255579974 -> 255579970 (-0.00%); split: -0.00%, +0.00%

Totals from 23 (0.00% of 802019) affected shaders:
Instrs: 27051 -> 26947 (-0.38%); split: -0.39%, +0.01%
Cycle count: 10545917 -> 11009161 (+4.39%); split: -0.09%, +4.49%
Max live registers: 2198 -> 2278 (+3.64%)
Non SSA regs after NIR: 31741 -> 31737 (-0.01%); split: -0.20%, +0.19%

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00