Commit graph

624 commits

Author SHA1 Message Date
Ian Romanick
37ee91679a nir/algebraic: Generalize an existing bfi(a, 0, ...) pattern
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210561118 -> 210560921 (-0.00%)
Send messages: 10979615 -> 10979613 (-0.00%)
Cycle count: 31576352808 -> 31576347218 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66068161 -> 66068157 (-0.00%)
Non SSA regs after NIR: 60230775 -> 60230949 (+0.00%)

Totals from 180 (0.03% of 707082) affected shaders:
Instrs: 68035 -> 67838 (-0.29%)
Send messages: 3190 -> 3188 (-0.06%)
Cycle count: 3979496 -> 3973906 (-0.14%); split: -0.14%, +0.00%
Max live registers: 11812 -> 11808 (-0.03%)
Non SSA regs after NIR: 18878 -> 19052 (+0.92%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
2025-05-16 14:49:25 -07:00
Ian Romanick
464955bbdd nir/algebraic: Optimize some open-coded extract_i8
These were initially observed in Hogwarts Legacy while working on
something else entirely. Two compute shaders in that app are helped
for spills and fills. On Skylake, one of the shaders benefits from
this change, and the other is hurt pretty significantly.

About 40 vertex shaders in Shadow of the Tomb Raider were helped for
instructions.

v2: Use ~0xff instead of 0xffffff00 to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.

v3: No, really, fix the various errors to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake)
Totals:
Instrs: 210566294 -> 210561118 (-0.00%)
Cycle count: 31582309052 -> 31576352808 (-0.02%); split: -0.02%, +0.00%
Spill count: 519300 -> 519280 (-0.00%)
Fill count: 625181 -> 625161 (-0.00%)
Scratch Memory Size: 36289536 -> 36281344 (-0.02%)
Max live registers: 66068413 -> 66068161 (-0.00%)
Non SSA regs after NIR: 60230773 -> 60230775 (+0.00%)

Totals from 1662 (0.24% of 707082) affected shaders:
Instrs: 635064 -> 629888 (-0.82%)
Cycle count: 36549632 -> 30593388 (-16.30%); split: -16.43%, +0.14%
Spill count: 246 -> 226 (-8.13%)
Fill count: 280 -> 260 (-7.14%)
Scratch Memory Size: 16384 -> 8192 (-50.00%)
Max live registers: 178491 -> 178239 (-0.14%)
Non SSA regs after NIR: 169552 -> 169554 (+0.00%)

Tiger Lake
Totals:
Instrs: 238544730 -> 238539407 (-0.00%)
Cycle count: 23679446097 -> 23673238578 (-0.03%); split: -0.03%, +0.00%
Max live registers: 42494925 -> 42494799 (-0.00%)
Non SSA regs after NIR: 63639071 -> 63639074 (+0.00%)

Totals from 1662 (0.21% of 802704) affected shaders:
Instrs: 626604 -> 621281 (-0.85%)
Cycle count: 26444363 -> 20236844 (-23.47%); split: -23.50%, +0.02%
Max live registers: 95405 -> 95279 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)

Ice Lake
Totals:
Instrs: 238855310 -> 238826534 (-0.01%)
Cycle count: 24952257277 -> 24944589398 (-0.03%); split: -0.03%, +0.00%
Spill count: 575510 -> 575117 (-0.07%)
Fill count: 713007 -> 708632 (-0.61%)
Max live registers: 42499556 -> 42499432 (-0.00%)
Non SSA regs after NIR: 64388747 -> 64388750 (+0.00%)

Totals from 1662 (0.21% of 805149) affected shaders:
Instrs: 926887 -> 898111 (-3.10%)
Cycle count: 67025583 -> 59357704 (-11.44%); split: -11.45%, +0.01%
Spill count: 5168 -> 4775 (-7.60%)
Fill count: 32883 -> 28508 (-13.30%)
Max live registers: 95614 -> 95490 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)

Skylake
Totals:
Instrs: 161904416 -> 161895239 (-0.01%); split: -0.01%, +0.00%
Cycle count: 20098067714 -> 20090767583 (-0.04%); split: -0.04%, +0.00%
Spill count: 525546 -> 525789 (+0.05%); split: -0.04%, +0.09%
Fill count: 603369 -> 602276 (-0.18%); split: -0.28%, +0.10%
Max live registers: 33895714 -> 33895590 (-0.00%)
Non SSA regs after NIR: 57348729 -> 57348730 (+0.00%)

Totals from 1655 (0.25% of 653734) affected shaders:
Instrs: 769979 -> 760802 (-1.19%); split: -1.83%, +0.64%
Cycle count: 51365416 -> 44065285 (-14.21%); split: -14.22%, +0.01%
Spill count: 4186 -> 4429 (+5.81%); split: -4.90%, +10.70%
Fill count: 16356 -> 15263 (-6.68%); split: -10.50%, +3.82%
Max live registers: 95115 -> 94991 (-0.13%)
Non SSA regs after NIR: 180797 -> 180798 (+0.00%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
2025-05-16 14:49:05 -07:00
Georg Lehmann
0a30611c10 nir/opt_algebraic: some bitfield_select optimizations
Foz-DB Navi21:
Totals from 47 (0.06% of 79789) affected shaders:
Instrs: 69536 -> 69363 (-0.25%)
CodeSize: 370624 -> 369388 (-0.33%)
Latency: 383505 -> 383298 (-0.05%)
InvThroughput: 72924 -> 72727 (-0.27%)
PreSGPRs: 2618 -> 2610 (-0.31%)
VALU: 43261 -> 43091 (-0.39%)
SALU: 13065 -> 13063 (-0.02%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34739>
2025-05-13 10:59:09 +00:00
Karol Herbst
f0fa2209a8 nir: add nir_opt_algebraic_integer_promotion
This handles basic operations where clang promotes integers to 32 bits
according to the C99 spec in OpenCL C source code.

This is its own opt_algerbraic pass, because we don't wanna fight with
nir_lower_bit_size.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>
2025-05-12 09:29:20 +00:00
Georg Lehmann
02e743c99e nir: add an option to lower bf2f and f2bf
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Rhys Perry
f538cae743 nir/algebraic: optimize ior(unpack_4x8, unpack_4x8<<8) to unpack_32_2x16
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Christian Gmeiner
f17d350001 lima: Move fdot lowering from NIR to lima
This change relocates the fdot lowering from the generic NIR to the lima,
since lima is the only consumer of this particular lowering. This avoids
potential conflicts with the similar fdot lowering already present in
nir_lower_alu_width.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34757>
2025-04-30 17:33:38 +00:00
Georg Lehmann
3e26fc4498 nir/opt_algebraic: disable fsat(a + 1.0) opt if a can be NaN
Foz-DB Navi21:
Totals from 9 (0.01% of 79789) affected shaders:
Instrs: 6782 -> 6796 (+0.21%); split: -0.03%, +0.24%
CodeSize: 40020 -> 40108 (+0.22%); split: -0.04%, +0.26%
Latency: 23764 -> 23758 (-0.03%)
InvThroughput: 6424 -> 6431 (+0.11%); split: -0.08%, +0.19%
SClause: 273 -> 275 (+0.73%)
Copies: 338 -> 339 (+0.30%)
VALU: 5138 -> 5147 (+0.18%); split: -0.06%, +0.23%
SALU: 349 -> 350 (+0.29%)
SMEM: 498 -> 500 (+0.40%)

Fixes: a4a3487aae ("nir/opt_algebraic: optimize patterns from Skia")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
8ad695195e nir/opt_algebraic: turn exact fmin(1.0, a) into fsat if a is not NaN and not negative
Foz-DB Navi21:
Totals from 2456 (3.08% of 79789) affected shaders:
Instrs: 3415398 -> 3413352 (-0.06%); split: -0.06%, +0.00%
CodeSize: 18781096 -> 18776092 (-0.03%); split: -0.03%, +0.00%
VGPRs: 158512 -> 158528 (+0.01%)
Latency: 39528900 -> 39526687 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 10612237 -> 10609296 (-0.03%); split: -0.03%, +0.00%
VClause: 71028 -> 71034 (+0.01%)
SClause: 93971 -> 93975 (+0.00%); split: -0.00%, +0.01%
Copies: 257525 -> 257521 (-0.00%); split: -0.01%, +0.01%
VALU: 2483374 -> 2481325 (-0.08%); split: -0.09%, +0.00%
SALU: 348207 -> 348211 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
18a0de1834 nir/opt_algebraic: optimize fmax(ffma(a, b, c), 0.0) to fsat
Foz-DB Navi21:
Totals from 2621 (3.28% of 79789) affected shaders:
MaxWaves: 55744 -> 55736 (-0.01%)
Instrs: 2840180 -> 2832647 (-0.27%); split: -0.27%, +0.00%
CodeSize: 15497364 -> 15464692 (-0.21%); split: -0.21%, +0.00%
VGPRs: 138448 -> 138456 (+0.01%)
Latency: 22319512 -> 22307018 (-0.06%); split: -0.06%, +0.01%
InvThroughput: 5745108 -> 5729197 (-0.28%); split: -0.28%, +0.00%
Copies: 110279 -> 110268 (-0.01%); split: -0.04%, +0.03%
VALU: 2210578 -> 2203211 (-0.33%); split: -0.33%, +0.00%
SALU: 169014 -> 168841 (-0.10%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
f71fc26393 nir/opt_algebraic: generalize fmax(fadd(a, b), 0.0) to fsat by not requiring fneg
Not a large effect, but it's positive and makes the pattern simpler.

Foz-DB Navi21:
Totals from 1 (0.00% of 79789) affected shaders:
Instrs: 145 -> 138 (-4.83%)
CodeSize: 784 -> 756 (-3.57%)
Latency: 1495 -> 1487 (-0.54%)
InvThroughput: 210 -> 196 (-6.67%)
VALU: 103 -> 96 (-6.80%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Ian Romanick
1d2ebeca17 nir/algebraic: Allow fmin(a,a) optimization when flush denorm to zero is not set
I was surprised this had any affect on Intel GPUs because we have been
unconditionally performing this optimization in the backend since June
2014.

Once that error is fixed (later in this MR), this change prevents a
couple dozen regressions in shader-db and around 90 regressions in
fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and
that can be a big deal.

v2: Add 64-bit too. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16970141 -> 16970139 (<.01%)
instructions in affected programs: 40 -> 38 (-5.00%)
helped: 2 / HURT: 0

total cycles in shared programs: 914617580 -> 914617548 (<.01%)
cycles in affected programs: 3428 -> 3396 (-0.93%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%)

Totals from 83 (0.01% of 706657) affected shaders:
Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02%
Non SSA regs after NIR: 78997 -> 78901 (-0.12%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
2025-04-15 23:59:31 +00:00
Georg Lehmann
d046ecf95a nir/opt_algebraic: optimize open coded ffract
Foz-DB Navi21:
Totals from 274 (0.34% of 79789) affected shaders:
Instrs: 522630 -> 522181 (-0.09%); split: -0.09%, +0.01%
CodeSize: 2880668 -> 2878940 (-0.06%); split: -0.07%, +0.01%
VGPRs: 14488 -> 14464 (-0.17%)
Latency: 4092358 -> 4091243 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 1014148 -> 1013471 (-0.07%); split: -0.07%, +0.00%
VClause: 11646 -> 11639 (-0.06%)
SClause: 18614 -> 18611 (-0.02%)
Copies: 56248 -> 56309 (+0.11%); split: -0.05%, +0.16%
PreVGPRs: 13649 -> 13647 (-0.01%)
VALU: 359733 -> 359285 (-0.12%); split: -0.13%, +0.01%
SALU: 59719 -> 59720 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33369>
2025-04-11 12:36:02 +00:00
Marek Olšák
1d5c42528b nir/opt_algebraic: lower 16-bit imul_high & umul_high
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Georg Lehmann
2b1fc1a7fe nir: add option to keep mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:15 +00:00
Georg Lehmann
b386659588 nir/opt_algebraic: create ubfe from (a & mask) >> c
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 917 (1.16% of 79188) affected shaders:
Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00%
CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00%
Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01%
VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00%
SClause: 62294 -> 62293 (-0.00%)
Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01%
VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00%
SALU: 359390 -> 358910 (-0.13%)
VMEM: 101966 -> 101924 (-0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>
2025-03-14 11:15:04 +00:00
Georg Lehmann
d272a6e261 nir/opt_algebraic: optimize d3d a ? b : 0
Foz-DB Navi21:
Totals from 3466 (4.34% of 79789) affected shaders:
MaxWaves: 73163 -> 73161 (-0.00%); split: +0.02%, -0.02%
Instrs: 3993862 -> 3987633 (-0.16%); split: -0.19%, +0.04%
CodeSize: 21747420 -> 21725620 (-0.10%); split: -0.15%, +0.05%
VGPRs: 190736 -> 190728 (-0.00%); split: -0.04%, +0.03%
SpillSGPRs: 489 -> 478 (-2.25%); split: -2.86%, +0.61%
Latency: 48169718 -> 48159068 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 12132999 -> 12128721 (-0.04%); split: -0.05%, +0.01%
VClause: 78063 -> 78052 (-0.01%); split: -0.09%, +0.08%
SClause: 109095 -> 108996 (-0.09%); split: -0.13%, +0.04%
Copies: 265784 -> 264530 (-0.47%); split: -0.72%, +0.25%
Branches: 84533 -> 84553 (+0.02%)
PreSGPRs: 172577 -> 172531 (-0.03%); split: -0.19%, +0.16%
PreVGPRs: 165776 -> 165825 (+0.03%); split: -0.06%, +0.09%
VALU: 2851544 -> 2850426 (-0.04%); split: -0.08%, +0.04%
SALU: 413543 -> 408408 (-1.24%); split: -1.45%, +0.21%
VMEM: 139890 -> 139887 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
2e7f34af6b nir/opt_algebraic: optimize more ine/ieq(umin(b2i, ), 0)
Foz-DB Navi21:
Totals from 76 (0.10% of 79789) affected shaders:
MaxWaves: 1050 -> 1062 (+1.14%)
Instrs: 113754 -> 113691 (-0.06%); split: -0.11%, +0.06%
CodeSize: 605096 -> 605216 (+0.02%); split: -0.03%, +0.05%
VGPRs: 6024 -> 5976 (-0.80%)
Latency: 1776501 -> 1777519 (+0.06%); split: -0.06%, +0.12%
InvThroughput: 379644 -> 376751 (-0.76%)
SClause: 2132 -> 2134 (+0.09%)
Copies: 4131 -> 4128 (-0.07%); split: -1.77%, +1.69%
PreSGPRs: 4275 -> 4270 (-0.12%)
PreVGPRs: 5568 -> 5526 (-0.75%)
VALU: 86732 -> 86581 (-0.17%); split: -0.24%, +0.07%
SALU: 7112 -> 7198 (+1.21%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
7bc3062a3b nir/opt_algebraic: push comparisons with constants into bcsel with constant
Foz-DB Navi21:
Totals from 1657 (2.08% of 79789) affected shaders:
MaxWaves: 30275 -> 30261 (-0.05%); split: +0.01%, -0.05%
Instrs: 3316251 -> 3315701 (-0.02%); split: -0.04%, +0.02%
CodeSize: 17831924 -> 17832020 (+0.00%); split: -0.06%, +0.06%
SpillSGPRs: 815 -> 859 (+5.40%)
SpillVGPRs: 3335 -> 3293 (-1.26%)
Scratch: 231424 -> 230400 (-0.44%)
Latency: 33413310 -> 33402751 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 9116062 -> 9112904 (-0.03%); split: -0.04%, +0.00%
VClause: 65587 -> 65560 (-0.04%); split: -0.05%, +0.01%
SClause: 86208 -> 86261 (+0.06%); split: -0.02%, +0.08%
Copies: 356158 -> 356439 (+0.08%); split: -0.07%, +0.15%
PreSGPRs: 101710 -> 101806 (+0.09%); split: -0.01%, +0.11%
PreVGPRs: 89293 -> 89286 (-0.01%); split: -0.04%, +0.04%
VALU: 2220900 -> 2218839 (-0.09%); split: -0.11%, +0.01%
SALU: 472988 -> 474567 (+0.33%); split: -0.08%, +0.42%
VMEM: 118401 -> 118347 (-0.05%)
SMEM: 123597 -> 123592 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
3837bc6d16 nir/opt_algebraic: optimize ~a == ~b and ~a == #b
Foz-DB Navi21:
Totals from 2 (0.00% of 79789) affected shaders:
Instrs: 8343 -> 8323 (-0.24%)
CodeSize: 43884 -> 43764 (-0.27%)
Latency: 19390 -> 19363 (-0.14%)
InvThroughput: 3380 -> 3356 (-0.71%)
VALU: 5413 -> 5393 (-0.37%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
8759223498 nir/opt_algebraic: optimize b2i/b2f comparision with non 0/1 constants
Foz-DB Navi21:
Totals from 28 (0.04% of 79789) affected shaders:
MaxWaves: 732 -> 728 (-0.55%)
Instrs: 23425 -> 22559 (-3.70%)
CodeSize: 137740 -> 132292 (-3.96%)
VGPRs: 1128 -> 1144 (+1.42%)
Latency: 94604 -> 92423 (-2.31%)
InvThroughput: 19166 -> 18814 (-1.84%); split: -2.38%, +0.54%
VClause: 429 -> 423 (-1.40%)
SClause: 937 -> 926 (-1.17%)
Copies: 1199 -> 914 (-23.77%); split: -24.52%, +0.75%
Branches: 451 -> 421 (-6.65%)
PreSGPRs: 1043 -> 996 (-4.51%)
PreVGPRs: 992 -> 973 (-1.92%); split: -3.53%, +1.61%
VALU: 17566 -> 16865 (-3.99%)
SALU: 1254 -> 1157 (-7.74%)
VMEM: 619 -> 609 (-1.62%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
2bfcfef5da nir/opt_algebraic: optimize bcsel of b2f and constants
Foz-DB Navi21:
Totals from 212 (0.27% of 79789) affected shaders:
MaxWaves: 4024 -> 4030 (+0.15%)
Instrs: 1314134 -> 1313894 (-0.02%); split: -0.03%, +0.02%
CodeSize: 7033216 -> 7026888 (-0.09%); split: -0.10%, +0.01%
VGPRs: 14224 -> 14176 (-0.34%)
Latency: 7402062 -> 7399180 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 1724879 -> 1723773 (-0.06%); split: -0.07%, +0.00%
VClause: 37741 -> 37711 (-0.08%); split: -0.11%, +0.03%
SClause: 29266 -> 29268 (+0.01%); split: -0.01%, +0.01%
Copies: 123810 -> 123786 (-0.02%); split: -0.19%, +0.17%
Branches: 42370 -> 42407 (+0.09%); split: -0.03%, +0.11%
PreSGPRs: 13149 -> 13196 (+0.36%); split: -0.05%, +0.40%
PreVGPRs: 12407 -> 12395 (-0.10%)
VALU: 884471 -> 883475 (-0.11%); split: -0.12%, +0.01%
SALU: 177671 -> 178408 (+0.41%); split: -0.03%, +0.45%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
b90826736d nir/opt_algebraic: optimize bit_count(a) != 0
vkd3d-proton will emit
b = ballot(!gl_HelperInvocation);
(subgroupBallotBitCount(b) != 0u) ? subgroupShuffle(a, subgroupBallotFindLSB(b)) : 0u;

for WaveReadFirstLane(a) in fragment shaders

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33808>
2025-02-28 18:03:04 +00:00
Georg Lehmann
a237a3def8 nir/opt_algebraic: optimize b2i(a) != -b2i(b)
Foz-DB Navi21:
Totals from 4 (0.01% of 79377) affected shaders:
Instrs: 881 -> 861 (-2.27%)
CodeSize: 4968 -> 4836 (-2.66%)
Latency: 6127 -> 6006 (-1.97%)
InvThroughput: 1128 -> 1068 (-5.32%)
VALU: 564 -> 534 (-5.32%)
SALU: 111 -> 121 (+9.01%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
4141043295 nir/opt_algebraic: optimize constant shift of DXBC booleans
Can be combined with further iand.

Foz-DB Navi21:
Totals from 190 (0.24% of 79377) affected shaders:
Instrs: 100628 -> 100225 (-0.40%); split: -0.41%, +0.01%
CodeSize: 567828 -> 565884 (-0.34%); split: -0.35%, +0.00%
Latency: 968415 -> 968052 (-0.04%); split: -0.09%, +0.06%
InvThroughput: 285804 -> 285210 (-0.21%); split: -0.25%, +0.04%
VClause: 1959 -> 1958 (-0.05%)
Copies: 5696 -> 5711 (+0.26%)
PreSGPRs: 7567 -> 7569 (+0.03%)
VALU: 77161 -> 76751 (-0.53%); split: -0.54%, +0.01%
SALU: 7831 -> 7840 (+0.11%); split: -0.09%, +0.20%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
1e522e7d75 nir/opt_algebraic: optimize dxbc boolean not
Foz-DB Navi21:
Totals from 237 (0.30% of 79377) affected shaders:
Instrs: 486690 -> 486146 (-0.11%); split: -0.11%, +0.00%
CodeSize: 2629516 -> 2626052 (-0.13%); split: -0.13%, +0.00%
VGPRs: 18744 -> 18736 (-0.04%)
Latency: 7404763 -> 7399806 (-0.07%); split: -0.07%, +0.01%
InvThroughput: 1800282 -> 1798388 (-0.11%); split: -0.11%, +0.00%
VClause: 12101 -> 12106 (+0.04%); split: -0.01%, +0.05%
Copies: 34225 -> 34170 (-0.16%); split: -0.21%, +0.05%
PreSGPRs: 14634 -> 14639 (+0.03%)
PreVGPRs: 16713 -> 16706 (-0.04%)
VALU: 317523 -> 316693 (-0.26%); split: -0.26%, +0.00%
SALU: 53814 -> 54097 (+0.53%); split: -0.38%, +0.90%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
f9722e35be nir/opt_algebraic: optimize more boolean bcsel with constants
Foz-DB Navi21:
Totals from 667 (0.84% of 79377) affected shaders:
Instrs: 3890980 -> 3886878 (-0.11%); split: -0.11%, +0.00%
CodeSize: 21088576 -> 21065848 (-0.11%); split: -0.11%, +0.00%
SpillSGPRs: 458 -> 446 (-2.62%); split: -3.49%, +0.87%
Latency: 26160728 -> 26162856 (+0.01%); split: -0.02%, +0.02%
InvThroughput: 6999254 -> 7000593 (+0.02%); split: -0.01%, +0.03%
VClause: 103745 -> 103743 (-0.00%)
SClause: 93113 -> 93109 (-0.00%)
Copies: 344097 -> 344794 (+0.20%); split: -0.05%, +0.25%
Branches: 134546 -> 134764 (+0.16%); split: -0.01%, +0.17%
PreSGPRs: 40677 -> 40298 (-0.93%); split: -0.93%, +0.00%
PreVGPRs: 40185 -> 40190 (+0.01%)
VALU: 2584477 -> 2584468 (-0.00%); split: -0.00%, +0.00%
SALU: 573587 -> 569353 (-0.74%); split: -0.75%, +0.01%
SMEM: 124794 -> 124790 (-0.00%)

v2 (idr): Remove a pattern that is made redundant by this commit
combined with the previous commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
9785fa460c nir/opt_algebraic: optimize DXBC boolean bcsel
Foz-DB Navi21:
Totals from 1749 (2.20% of 79377) affected shaders:
Instrs: 1695408 -> 1685149 (-0.61%); split: -0.68%, +0.07%
CodeSize: 9241312 -> 9174180 (-0.73%); split: -0.79%, +0.06%
VGPRs: 90688 -> 90664 (-0.03%); split: -0.04%, +0.01%
SpillSGPRs: 278 -> 298 (+7.19%)
Latency: 9560167 -> 9540386 (-0.21%); split: -0.29%, +0.08%
InvThroughput: 2236022 -> 2220411 (-0.70%); split: -0.72%, +0.02%
VClause: 29910 -> 29917 (+0.02%)
Copies: 146365 -> 145230 (-0.78%); split: -1.03%, +0.25%
Branches: 59545 -> 59560 (+0.03%)
PreSGPRs: 78858 -> 79242 (+0.49%); split: -0.10%, +0.59%
PreVGPRs: 78643 -> 78560 (-0.11%); split: -0.11%, +0.00%
VALU: 1127861 -> 1113990 (-1.23%); split: -1.24%, +0.01%
SALU: 249535 -> 253237 (+1.48%); split: -0.15%, +1.63%

v2 (idr): Remove a pattern that is now redundant.

v3 (idr): Don't undistribute ineg from bcsel. On platforms where ineg
is a free source modifier, this can be harmful.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
674d970861 nir/opt_algebraic: 0 >= a -> 0 == a
Foz-DB Navi21:
Totals from 2179 (2.75% of 79377) affected shaders:
MaxWaves: 40987 -> 40917 (-0.17%); split: +0.00%, -0.18%
Instrs: 5950981 -> 5949310 (-0.03%); split: -0.04%, +0.01%
CodeSize: 32120808 -> 32110328 (-0.03%); split: -0.04%, +0.00%
VGPRs: 141704 -> 141768 (+0.05%); split: -0.01%, +0.05%
SpillSGPRs: 1750 -> 1746 (-0.23%)
Latency: 56667295 -> 56562916 (-0.18%); split: -0.19%, +0.00%
InvThroughput: 13292128 -> 13288691 (-0.03%); split: -0.03%, +0.00%
VClause: 151845 -> 151755 (-0.06%); split: -0.06%, +0.00%
SClause: 172316 -> 172443 (+0.07%); split: -0.02%, +0.09%
Copies: 458724 -> 458951 (+0.05%); split: -0.08%, +0.13%
Branches: 195239 -> 195351 (+0.06%); split: -0.00%, +0.06%
PreSGPRs: 135304 -> 135317 (+0.01%); split: -0.01%, +0.02%
PreVGPRs: 122430 -> 122428 (-0.00%); split: -0.01%, +0.01%
VALU: 3924585 -> 3924062 (-0.01%); split: -0.02%, +0.01%
SALU: 820666 -> 819414 (-0.15%); split: -0.17%, +0.02%
SMEM: 247036 -> 247142 (+0.04%); split: -0.00%, +0.04%

v2 (idr): Remove a pattern that is now redundant. This was originaly
removed in a commit later in the MR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:09 +00:00
Georg Lehmann
000f14f7fd nir/opt_algebraic: optimize ineg(a) == #b
No Foz-DB changes.

v2 (idr): Remove some patterns that are now redundant. These were
originally removed in a commit later in the MR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:08 +00:00
Georg Lehmann
3e4ac92298 nir/opt_algebraic: optimize ineg(a) == ineg(b)
DXBC boolean cleanup.

Foz-DB Navi21:
Totals from 19 (0.02% of 79188) affected shaders:
Instrs: 9720 -> 9652 (-0.70%)
CodeSize: 54056 -> 53640 (-0.77%)
Latency: 95357 -> 94377 (-1.03%); split: -1.03%, +0.00%
InvThroughput: 17331 -> 16939 (-2.26%)
Copies: 604 -> 605 (+0.17%)
PreSGPRs: 832 -> 838 (+0.72%)
PreVGPRs: 701 -> 699 (-0.29%)
VALU: 6551 -> 6485 (-1.01%)
SALU: 893 -> 891 (-0.22%); split: -1.68%, +1.46%

v2 (idr): Remove a pattern that is now redundant. The version without
ineg already exists much earlier in the file. Search for b2iN.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498>
2025-02-25 20:38:08 +00:00
Georg Lehmann
e0cebac14f nir/opt_algebraic: optimize b2f(a != 0) * a
Just D3D9 things.

Foz-DB Navi21:
Totals from 137 (0.17% of 79377) affected shaders:
MaxWaves: 3366 -> 3370 (+0.12%); split: +0.24%, -0.12%
Instrs: 76462 -> 72091 (-5.72%)
CodeSize: 411584 -> 380792 (-7.48%)
Latency: 279472 -> 275505 (-1.42%); split: -2.01%, +0.59%
InvThroughput: 71311 -> 65369 (-8.33%)
VClause: 1587 -> 1612 (+1.58%); split: -1.01%, +2.58%
SClause: 1111 -> 1105 (-0.54%); split: -1.08%, +0.54%
Copies: 5621 -> 5602 (-0.34%); split: -1.39%, +1.05%
PreSGPRs: 5266 -> 5241 (-0.47%); split: -0.51%, +0.04%
PreVGPRs: 4249 -> 4236 (-0.31%); split: -0.35%, +0.05%
VALU: 50049 -> 45901 (-8.29%)
SALU: 8948 -> 8818 (-1.45%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>
2025-02-24 16:34:53 +00:00
Ian Romanick
15544ed858 nir/algebraic: Undistribute b2i from logic-ops
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973309 -> 16973173 (<.01%)
instructions in affected programs: 13780 -> 13644 (-0.99%)
helped: 31 / HURT: 0

total cycles in shared programs: 915620550 -> 915618604 (<.01%)
cycles in affected programs: 185962 -> 184016 (-1.05%)
helped: 30 / HURT: 1

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748003 -> 209745278 (-0.00%)
Cycle count: 30514920400 -> 30514716506 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65477183 -> 65477584 (+0.00%)
Non SSA regs after NIR: 237334710 -> 237333632 (-0.00%)

Totals from 1257 (0.18% of 706651) affected shaders:
Instrs: 693039 -> 690314 (-0.39%)
Cycle count: 39792504 -> 39588610 (-0.51%); split: -0.97%, +0.46%
Max live registers: 194170 -> 194571 (+0.21%)
Non SSA regs after NIR: 821978 -> 820900 (-0.13%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
a48a044cf6 nir/algebraic: Simplify equality comparisons of b2T with 1 or 0
Adding the b2i(a) == 1 and b2i(a) != 1 patterns also helps prevent
regressions when spurious negations are removed from integer equality
comparisons, as is done in !33498.

v2: Make all variables part of the iteration instead of calculating some
of them. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973331 -> 16973309 (<.01%)
instructions in affected programs: 266 -> 244 (-8.27%)
helped: 2 / HURT: 0

total cycles in shared programs: 915620774 -> 915620550 (<.01%)
cycles in affected programs: 4360 -> 4136 (-5.14%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748011 -> 209748003 (-0.00%)
Cycle count: 30514920286 -> 30514920400 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334726 -> 237334710 (-0.00%)

Totals from 8 (0.00% of 706651) affected shaders:
Instrs: 16956 -> 16948 (-0.05%)
Cycle count: 261052 -> 261166 (+0.04%); split: -0.92%, +0.96%
Non SSA regs after NIR: 20000 -> 19984 (-0.08%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
3f39d8f4ff nir/algebraic: Optimize zero comparisons of umax or umin
I observered some of the existing patterns stopped being applied after
some of the ult-to-ieq optimizations in !33498. It turns out that these
patterns occur even without those changes.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973339 -> 16973331 (<.01%)
instructions in affected programs: 7977 -> 7969 (-0.10%)
helped: 2 / HURT: 0

total cycles in shared programs: 915620938 -> 915620774 (<.01%)
cycles in affected programs: 136022 -> 135858 (-0.12%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 209748173 -> 209748011 (-0.00%); split: -0.00%, +0.00%
Cycle count: 30514361348 -> 30514920286 (+0.00%); split: -0.00%, +0.00%
Spill count: 511813 -> 511808 (-0.00%)
Fill count: 622537 -> 622533 (-0.00%)
Max live registers: 65477033 -> 65477183 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334728 -> 237334726 (-0.00%); split: -0.00%, +0.00%

Totals from 26 (0.00% of 706651) affected shaders:
Instrs: 332073 -> 331911 (-0.05%); split: -0.05%, +0.00%
Cycle count: 959758560 -> 960317498 (+0.06%); split: -0.03%, +0.09%
Spill count: 10293 -> 10288 (-0.05%)
Fill count: 23784 -> 23780 (-0.02%)
Max live registers: 9682 -> 9832 (+1.55%); split: -0.08%, +1.63%
Non SSA regs after NIR: 232135 -> 232133 (-0.00%); split: -0.03%, +0.03%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 233538532 -> 233536113 (-0.00%); split: -0.00%, +0.00%
Cycle count: 24428142259 -> 24426705655 (-0.01%); split: -0.01%, +0.00%
Spill count: 513128 -> 512923 (-0.04%)
Fill count: 557329 -> 557108 (-0.04%)
Max live registers: 42129806 -> 42129881 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 256711720 -> 256711718 (-0.00%); split: -0.00%, +0.00%

Totals from 26 (0.00% of 805759) affected shaders:
Instrs: 325629 -> 323210 (-0.74%); split: -0.74%, +0.00%
Cycle count: 893896782 -> 892460178 (-0.16%); split: -0.21%, +0.05%
Spill count: 10467 -> 10262 (-1.96%)
Fill count: 24291 -> 24070 (-0.91%)
Max live registers: 4946 -> 5021 (+1.52%); split: -0.08%, +1.60%
Non SSA regs after NIR: 232980 -> 232978 (-0.00%); split: -0.03%, +0.03%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 237289818 -> 237289714 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22959586058 -> 22960049302 (+0.00%); split: -0.00%, +0.00%
Max live registers: 42182257 -> 42182337 (+0.00%)
Non SSA regs after NIR: 255579974 -> 255579970 (-0.00%); split: -0.00%, +0.00%

Totals from 23 (0.00% of 802019) affected shaders:
Instrs: 27051 -> 26947 (-0.38%); split: -0.39%, +0.01%
Cycle count: 10545917 -> 11009161 (+4.39%); split: -0.09%, +4.49%
Max live registers: 2198 -> 2278 (+3.64%)
Non SSA regs after NIR: 31741 -> 31737 (-0.01%); split: -0.20%, +0.19%

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
4311121e73 nir/algebraic: More (a == 0 || a == 1 || ...) patterns
At least some Total War: Warhammer3 vertex shaders associate the
comparisons differntly, so the existing patterns were not triggered.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748654 -> 209748173 (-0.00%)
Cycle count: 30514333964 -> 30514361348 (+0.00%); split: -0.00%, +0.00%
Fill count: 622688 -> 622537 (-0.02%)
Max live registers: 65477039 -> 65477033 (-0.00%)
Non SSA regs after NIR: 237334768 -> 237334728 (-0.00%)

Totals from 512 (0.07% of 706651) affected shaders:
Instrs: 1000693 -> 1000212 (-0.05%)
Cycle count: 42174312 -> 42201696 (+0.06%); split: -0.15%, +0.21%
Fill count: 11456 -> 11305 (-1.32%)
Max live registers: 121599 -> 121593 (-0.00%)
Non SSA regs after NIR: 1253445 -> 1253405 (-0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Georg Lehmann
56aac9fdec nir/opt_algebraic: optimize ffract(ffract(a))
Foz-DB Navi21:
Totals from 163 (0.21% of 79377) affected shaders:
Instrs: 233933 -> 233685 (-0.11%)
CodeSize: 1252492 -> 1251500 (-0.08%); split: -0.08%, +0.00%
Latency: 1227625 -> 1227405 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 266954 -> 266668 (-0.11%)
VClause: 4193 -> 4191 (-0.05%)
Copies: 20935 -> 20932 (-0.01%); split: -0.02%, +0.01%
PreSGPRs: 10395 -> 10391 (-0.04%)
VALU: 163725 -> 163475 (-0.15%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33557>
2025-02-18 20:38:57 +00:00
Rhys Perry
3c5bcc5f7f nir/algebraic: optimize ishl(iadd(iadd(a, #b), c), #d)
fossil-db (navi31):
Totals from 671 (0.85% of 79377) affected shaders:
MaxWaves: 17048 -> 17052 (+0.02%); split: +0.04%, -0.01%
Instrs: 786643 -> 785459 (-0.15%); split: -0.20%, +0.05%
CodeSize: 4074988 -> 4069304 (-0.14%); split: -0.18%, +0.04%
VGPRs: 43896 -> 43860 (-0.08%); split: -0.11%, +0.03%
SpillSGPRs: 753 -> 748 (-0.66%)
Latency: 8187731 -> 8186707 (-0.01%); split: -0.11%, +0.10%
InvThroughput: 1274564 -> 1274582 (+0.00%); split: -0.11%, +0.11%
VClause: 14292 -> 14183 (-0.76%); split: -0.98%, +0.22%
SClause: 21527 -> 21426 (-0.47%); split: -0.53%, +0.06%
Copies: 59381 -> 59299 (-0.14%); split: -0.67%, +0.53%
PreSGPRs: 29358 -> 29349 (-0.03%)
PreVGPRs: 36595 -> 36368 (-0.62%); split: -0.70%, +0.08%
VALU: 482669 -> 481927 (-0.15%); split: -0.21%, +0.06%
SALU: 70019 -> 70009 (-0.01%); split: -0.06%, +0.05%
VOPD: 142 -> 139 (-2.11%)

fossil-db (navi21):
Totals from 671 (0.85% of 79377) affected shaders:
MaxWaves: 11536 -> 11516 (-0.17%); split: +0.03%, -0.21%
Instrs: 773615 -> 772476 (-0.15%); split: -0.18%, +0.03%
CodeSize: 4092564 -> 4086688 (-0.14%); split: -0.17%, +0.03%
VGPRs: 43424 -> 43448 (+0.06%); split: -0.04%, +0.09%
SpillSGPRs: 565 -> 560 (-0.88%)
Latency: 8650893 -> 8633993 (-0.20%); split: -0.31%, +0.11%
InvThroughput: 1920741 -> 1920368 (-0.02%); split: -0.10%, +0.08%
VClause: 15830 -> 15774 (-0.35%); split: -0.76%, +0.40%
SClause: 21025 -> 21009 (-0.08%); split: -0.11%, +0.03%
Copies: 65425 -> 65460 (+0.05%); split: -0.37%, +0.43%
Branches: 21845 -> 21848 (+0.01%)
PreSGPRs: 29457 -> 29448 (-0.03%)
PreVGPRs: 37296 -> 37066 (-0.62%); split: -0.69%, +0.08%
VALU: 516908 -> 516056 (-0.16%); split: -0.20%, +0.04%
SALU: 91545 -> 91531 (-0.02%); split: -0.05%, +0.03%

fossil-db (vega10):
Totals from 497 (0.79% of 62962) affected shaders:
MaxWaves: 2325 -> 2328 (+0.13%); split: +0.17%, -0.04%
Instrs: 298230 -> 297284 (-0.32%); split: -0.35%, +0.03%
CodeSize: 1535212 -> 1530636 (-0.30%); split: -0.34%, +0.04%
SGPRs: 36464 -> 36480 (+0.04%)
VGPRs: 29412 -> 29396 (-0.05%); split: -0.07%, +0.01%
SpillSGPRs: 164 -> 159 (-3.05%)
Latency: 3957230 -> 3948919 (-0.21%); split: -0.51%, +0.30%
InvThroughput: 1680680 -> 1679105 (-0.09%); split: -0.17%, +0.08%
VClause: 6175 -> 6102 (-1.18%); split: -1.55%, +0.37%
SClause: 9503 -> 9510 (+0.07%); split: -0.15%, +0.22%
Copies: 20992 -> 20892 (-0.48%); split: -0.97%, +0.50%
PreSGPRs: 17803 -> 17795 (-0.04%)
PreVGPRs: 23072 -> 22823 (-1.08%); split: -1.11%, +0.03%
VALU: 225322 -> 224587 (-0.33%); split: -0.36%, +0.04%
SALU: 21029 -> 21011 (-0.09%); split: -0.22%, +0.13%

fossil-db (polaris10):
Totals from 489 (0.79% of 61794) affected shaders:
Instrs: 299330 -> 298308 (-0.34%); split: -0.40%, +0.06%
CodeSize: 1529316 -> 1525440 (-0.25%); split: -0.32%, +0.07%
SpillSGPRs: 159 -> 149 (-6.29%)
Latency: 3924819 -> 3898471 (-0.67%); split: -0.93%, +0.25%
InvThroughput: 1687167 -> 1684956 (-0.13%); split: -0.22%, +0.09%
VClause: 6248 -> 6067 (-2.90%); split: -3.28%, +0.38%
SClause: 9519 -> 9492 (-0.28%); split: -0.72%, +0.44%
Copies: 21673 -> 21637 (-0.17%); split: -0.90%, +0.73%
PreSGPRs: 17611 -> 17603 (-0.05%)
PreVGPRs: 22873 -> 22625 (-1.08%)
VALU: 226805 -> 225928 (-0.39%); split: -0.45%, +0.06%
SALU: 21419 -> 21413 (-0.03%); split: -0.28%, +0.25%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Georg Lehmann
ee5017a0fa nir/opt_algebaric: convert fadd(a, a) to a * 2.0
On AMD, this is a clear win. 2.0 is a free constant,
the multiplication can be fused into fma, or it can
be done as a free output modifier. Otherwise, fmul
and fadd have the same throughput/latency, with the only
possible downside being potentially power consumption.

For other hardware this might not be as clear,
but we should at least choose one option for NIR because
it allows more CSE.

Foz-DB Navi21:
Totals from 12231 (15.41% of 79395) affected shaders:
MaxWaves: 309068 -> 309032 (-0.01%)
Instrs: 11826395 -> 11790132 (-0.31%); split: -0.31%, +0.00%
CodeSize: 63531496 -> 63512868 (-0.03%); split: -0.03%, +0.00%
VGPRs: 551256 -> 551328 (+0.01%); split: -0.00%, +0.02%
SpillSGPRs: 984 -> 979 (-0.51%)
Latency: 88486492 -> 88394296 (-0.10%); split: -0.11%, +0.01%
InvThroughput: 22360595 -> 22300790 (-0.27%); split: -0.27%, +0.00%
VClause: 226267 -> 226273 (+0.00%); split: -0.01%, +0.01%
SClause: 293820 -> 293783 (-0.01%); split: -0.02%, +0.00%
Copies: 727187 -> 727106 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 539623 -> 539625 (+0.00%)
PreVGPRs: 440843 -> 440946 (+0.02%); split: -0.00%, +0.03%
VALU: 8324962 -> 8288809 (-0.43%); split: -0.43%, +0.00%
SALU: 1278550 -> 1278538 (-0.00%); split: -0.00%, +0.00%
VMEM: 440600 -> 440599 (-0.00%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32989>
2025-01-21 20:28:04 +00:00
Marek Olšák
3800f0af41 nir/algebraic: optimize pack_split(unpack(a).x, unpack(a).y) -> a
This is required to optimize FP64 and Int64 shaders generated by
virglrenderer. It generates pack/unpack around every 64-bit op,
which NIR currently can't eliminate. This fixes that.

There is a new constraint ".y", which means that the use of an instruction
should have swizzle.y. This allows us to add patterns that have Y swizzle
on results of instructions.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
2025-01-07 05:47:52 +00:00
Marek Olšák
b1bc691b0f nir/algebraic: add and improve pack/unpack patterns
Some duplicated patterns are removed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
2025-01-07 05:47:52 +00:00
Marek Olšák
ebec182b04 nir/algebraic: use is_used_once for comparison patterns
otherwise we are just creating new instructions while not removing any

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172>
2025-01-07 05:47:52 +00:00
Georg Lehmann
c695043e81 nir/opt_algebraic: optimize min(max(a, b), a)
Foz-DB Navi21:
Totals from 105 (0.13% of 79395) affected shaders:
MaxWaves: 2638 -> 2646 (+0.30%)
Instrs: 76531 -> 75077 (-1.90%)
CodeSize: 413668 -> 406484 (-1.74%)
VGPRs: 4856 -> 4848 (-0.16%)
Latency: 333684 -> 328438 (-1.57%); split: -1.57%, +0.00%
InvThroughput: 80417 -> 78579 (-2.29%)
VClause: 1818 -> 1768 (-2.75%)
SClause: 3028 -> 2964 (-2.11%)
Copies: 4708 -> 4513 (-4.14%); split: -4.50%, +0.36%
PreVGPRs: 3792 -> 3715 (-2.03%); split: -2.08%, +0.05%
VALU: 54734 -> 53528 (-2.20%)
SALU: 6195 -> 6137 (-0.94%)
VMEM: 2363 -> 2313 (-2.12%)
SMEM: 5219 -> 5119 (-1.92%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32634>
2024-12-16 22:29:21 +00:00
Qiang Yu
129e37bab6 nir: do not generate b2i64 when driver want to lower it
This is found on GFX12 by:
  KHR-GL43.shader_ballot_tests.ShaderBallotBitmasks

ACO does not support it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:07 +00:00
Alyssa Rosenzweig
bd89279dd4 nir: add lower_scratch_to_var pass
to ease opencl pain.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:13 +00:00
Karmjit Mahil
b79994e92d nir,ir3: Add icsel_eqz
In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the
cmp and sel that we'd otherwise need.

total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%)
Instruction Count in affected programs: 162701 -> 160360 (-1.44%)
helped: 81
HURT: 29
Instruction count are helped.

total MOV Count in shared programs: 86777 -> 88671 (2.18%)
MOV Count in affected programs: 28119 -> 30013 (6.74%)
helped: 1
HURT: 292
Mov count are HURT.

total COV Count in shared programs: 15070 -> 14962 (-0.72%)
COV Count in affected programs: 5770 -> 5662 (-1.87%)
helped: 76
HURT: 2
Cov count are helped.

total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%)
Last helper instruction in affected programs: 91331 -> 89240 (-2.29%)
helped: 30
HURT: 1
Last helper instruction are helped.

total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%)
Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%)
helped: 8
HURT: 43
Instructions with ss sync bit are HURT.

total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%)
Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%)
helped: 21
HURT: 61
Estimated cycles stalled on ss are HURT.

total cat1 instructions in shared programs: 101933 -> 103695 (1.73%)
cat1 instructions in affected programs: 35804 -> 37566 (4.92%)
helped: 18
HURT: 290
Cat1 instructions are HURT.

total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%)
cat2 instructions in affected programs: 128609 -> 125809 (-2.18%)
helped: 322
HURT: 0
Cat2 instructions are helped.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Karmjit Mahil
aad0aa0a9c nir/algebraic: turn u{ge,lt} a, 1 to i{ne,eq} a, 0
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Ian Romanick
e1bb53bb3c nir/algebraic: Optimize some trivial bfi
In fossil-db, one big compute shader on Hogwarts Legacy is helped for
spills and fills. It has a lot of instances of bfi(0x3f, a, a).

On Tiger Lake and Skylake, a compute shader in Unicom that has a
single instance of this pattern is hurt for spills and fills. I think
this is just due to non-determinism in the register allocation
algorithm.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16992643 -> 16992548 (<.01%)
instructions in affected programs: 17533 -> 17438 (-0.54%)
helped: 33 / HURT: 0

total cycles in shared programs: 914313986 -> 914316238 (<.01%)
cycles in affected programs: 3734544 -> 3736796 (0.06%)
helped: 26 / HURT: 6

fossil-db:

Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 208952780 -> 208952537 (-0.00%)
Send messages: 10934879 -> 10934875 (-0.00%)
Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00%
Spill count: 534864 -> 534843 (-0.00%)
Fill count: 667081 -> 667068 (-0.00%)
Max live registers: 65686656 -> 65686624 (-0.00%)
Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%)

Totals from 3 (0.00% of 704834) affected shaders:
Instrs: 4708 -> 4465 (-5.16%)
Send messages: 234 -> 230 (-1.71%)
Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03%
Spill count: 91 -> 70 (-23.08%)
Fill count: 73 -> 60 (-17.81%)
Max live registers: 647 -> 615 (-4.95%)
Non SSA regs after NIR: 3957 -> 3934 (-0.58%)

Tiger Lake
Totals:
Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00%
Send messages: 12657684 -> 12657680 (-0.00%)
Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00%
Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00%
Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 19538944 -> 19539968 (+0.01%)
Max live registers: 41713622 -> 41713593 (-0.00%)
Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00%

Totals from 174 (0.02% of 794323) affected shaders:
Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04%
Send messages: 14330 -> 14326 (-0.03%)
Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16%
Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55%
Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87%
Scratch Memory Size: 9216 -> 10240 (+11.11%)
Max live registers: 12587 -> 12558 (-0.23%)
Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02%

Skylake
Totals:
Instrs: 158904982 -> 158903764 (-0.00%)
Send messages: 8490500 -> 8490496 (-0.00%)
Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00%
Spill count: 519127 -> 519115 (-0.00%)
Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00%
Max live registers: 33708764 -> 33708739 (-0.00%)
Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00%

Totals from 174 (0.03% of 648725) affected shaders:
Instrs: 160391 -> 159173 (-0.76%)
Send messages: 14354 -> 14350 (-0.03%)
Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32%
Spill count: 332 -> 320 (-3.61%)
Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36%
Max live registers: 12709 -> 12684 (-0.20%)
Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493>
2024-12-05 21:39:07 +00:00
Alyssa Rosenzweig
77d4ed0a01 nir/opt_algebraic: optimize sign bit manipulation
libclc loves to generate the iand(0x7fffffff) pattern. ior/ixor patterns are
added for completeness.

Shaves 4 instructions off libclc vec4 normalize.

v2: Loop over the bit sizes (Georg).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Georg Lehmann
34a47e4b14 nir/opt_algebraic: mark a - ffract(a) as nan incorrect.
Inf + fract(Inf) -> Inf + NaN -> NaN
floor(Inf) -> Inf

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00