Rhys Perry
3c5bcc5f7f
nir/algebraic: optimize ishl(iadd(iadd(a, #b), c), #d)
...
fossil-db (navi31):
Totals from 671 (0.85% of 79377) affected shaders:
MaxWaves: 17048 -> 17052 (+0.02%); split: +0.04%, -0.01%
Instrs: 786643 -> 785459 (-0.15%); split: -0.20%, +0.05%
CodeSize: 4074988 -> 4069304 (-0.14%); split: -0.18%, +0.04%
VGPRs: 43896 -> 43860 (-0.08%); split: -0.11%, +0.03%
SpillSGPRs: 753 -> 748 (-0.66%)
Latency: 8187731 -> 8186707 (-0.01%); split: -0.11%, +0.10%
InvThroughput: 1274564 -> 1274582 (+0.00%); split: -0.11%, +0.11%
VClause: 14292 -> 14183 (-0.76%); split: -0.98%, +0.22%
SClause: 21527 -> 21426 (-0.47%); split: -0.53%, +0.06%
Copies: 59381 -> 59299 (-0.14%); split: -0.67%, +0.53%
PreSGPRs: 29358 -> 29349 (-0.03%)
PreVGPRs: 36595 -> 36368 (-0.62%); split: -0.70%, +0.08%
VALU: 482669 -> 481927 (-0.15%); split: -0.21%, +0.06%
SALU: 70019 -> 70009 (-0.01%); split: -0.06%, +0.05%
VOPD: 142 -> 139 (-2.11%)
fossil-db (navi21):
Totals from 671 (0.85% of 79377) affected shaders:
MaxWaves: 11536 -> 11516 (-0.17%); split: +0.03%, -0.21%
Instrs: 773615 -> 772476 (-0.15%); split: -0.18%, +0.03%
CodeSize: 4092564 -> 4086688 (-0.14%); split: -0.17%, +0.03%
VGPRs: 43424 -> 43448 (+0.06%); split: -0.04%, +0.09%
SpillSGPRs: 565 -> 560 (-0.88%)
Latency: 8650893 -> 8633993 (-0.20%); split: -0.31%, +0.11%
InvThroughput: 1920741 -> 1920368 (-0.02%); split: -0.10%, +0.08%
VClause: 15830 -> 15774 (-0.35%); split: -0.76%, +0.40%
SClause: 21025 -> 21009 (-0.08%); split: -0.11%, +0.03%
Copies: 65425 -> 65460 (+0.05%); split: -0.37%, +0.43%
Branches: 21845 -> 21848 (+0.01%)
PreSGPRs: 29457 -> 29448 (-0.03%)
PreVGPRs: 37296 -> 37066 (-0.62%); split: -0.69%, +0.08%
VALU: 516908 -> 516056 (-0.16%); split: -0.20%, +0.04%
SALU: 91545 -> 91531 (-0.02%); split: -0.05%, +0.03%
fossil-db (vega10):
Totals from 497 (0.79% of 62962) affected shaders:
MaxWaves: 2325 -> 2328 (+0.13%); split: +0.17%, -0.04%
Instrs: 298230 -> 297284 (-0.32%); split: -0.35%, +0.03%
CodeSize: 1535212 -> 1530636 (-0.30%); split: -0.34%, +0.04%
SGPRs: 36464 -> 36480 (+0.04%)
VGPRs: 29412 -> 29396 (-0.05%); split: -0.07%, +0.01%
SpillSGPRs: 164 -> 159 (-3.05%)
Latency: 3957230 -> 3948919 (-0.21%); split: -0.51%, +0.30%
InvThroughput: 1680680 -> 1679105 (-0.09%); split: -0.17%, +0.08%
VClause: 6175 -> 6102 (-1.18%); split: -1.55%, +0.37%
SClause: 9503 -> 9510 (+0.07%); split: -0.15%, +0.22%
Copies: 20992 -> 20892 (-0.48%); split: -0.97%, +0.50%
PreSGPRs: 17803 -> 17795 (-0.04%)
PreVGPRs: 23072 -> 22823 (-1.08%); split: -1.11%, +0.03%
VALU: 225322 -> 224587 (-0.33%); split: -0.36%, +0.04%
SALU: 21029 -> 21011 (-0.09%); split: -0.22%, +0.13%
fossil-db (polaris10):
Totals from 489 (0.79% of 61794) affected shaders:
Instrs: 299330 -> 298308 (-0.34%); split: -0.40%, +0.06%
CodeSize: 1529316 -> 1525440 (-0.25%); split: -0.32%, +0.07%
SpillSGPRs: 159 -> 149 (-6.29%)
Latency: 3924819 -> 3898471 (-0.67%); split: -0.93%, +0.25%
InvThroughput: 1687167 -> 1684956 (-0.13%); split: -0.22%, +0.09%
VClause: 6248 -> 6067 (-2.90%); split: -3.28%, +0.38%
SClause: 9519 -> 9492 (-0.28%); split: -0.72%, +0.44%
Copies: 21673 -> 21637 (-0.17%); split: -0.90%, +0.73%
PreSGPRs: 17611 -> 17603 (-0.05%)
PreVGPRs: 22873 -> 22625 (-1.08%)
VALU: 226805 -> 225928 (-0.39%); split: -0.45%, +0.06%
SALU: 21419 -> 21413 (-0.03%); split: -0.28%, +0.25%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242 >
2025-02-07 13:52:57 +00:00
Georg Lehmann
ee5017a0fa
nir/opt_algebaric: convert fadd(a, a) to a * 2.0
...
On AMD, this is a clear win. 2.0 is a free constant,
the multiplication can be fused into fma, or it can
be done as a free output modifier. Otherwise, fmul
and fadd have the same throughput/latency, with the only
possible downside being potentially power consumption.
For other hardware this might not be as clear,
but we should at least choose one option for NIR because
it allows more CSE.
Foz-DB Navi21:
Totals from 12231 (15.41% of 79395) affected shaders:
MaxWaves: 309068 -> 309032 (-0.01%)
Instrs: 11826395 -> 11790132 (-0.31%); split: -0.31%, +0.00%
CodeSize: 63531496 -> 63512868 (-0.03%); split: -0.03%, +0.00%
VGPRs: 551256 -> 551328 (+0.01%); split: -0.00%, +0.02%
SpillSGPRs: 984 -> 979 (-0.51%)
Latency: 88486492 -> 88394296 (-0.10%); split: -0.11%, +0.01%
InvThroughput: 22360595 -> 22300790 (-0.27%); split: -0.27%, +0.00%
VClause: 226267 -> 226273 (+0.00%); split: -0.01%, +0.01%
SClause: 293820 -> 293783 (-0.01%); split: -0.02%, +0.00%
Copies: 727187 -> 727106 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 539623 -> 539625 (+0.00%)
PreVGPRs: 440843 -> 440946 (+0.02%); split: -0.00%, +0.03%
VALU: 8324962 -> 8288809 (-0.43%); split: -0.43%, +0.00%
SALU: 1278550 -> 1278538 (-0.00%); split: -0.00%, +0.00%
VMEM: 440600 -> 440599 (-0.00%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32989 >
2025-01-21 20:28:04 +00:00
Marek Olšák
3800f0af41
nir/algebraic: optimize pack_split(unpack(a).x, unpack(a).y) -> a
...
This is required to optimize FP64 and Int64 shaders generated by
virglrenderer. It generates pack/unpack around every 64-bit op,
which NIR currently can't eliminate. This fixes that.
There is a new constraint ".y", which means that the use of an instruction
should have swizzle.y. This allows us to add patterns that have Y swizzle
on results of instructions.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172 >
2025-01-07 05:47:52 +00:00
Marek Olšák
b1bc691b0f
nir/algebraic: add and improve pack/unpack patterns
...
Some duplicated patterns are removed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172 >
2025-01-07 05:47:52 +00:00
Marek Olšák
ebec182b04
nir/algebraic: use is_used_once for comparison patterns
...
otherwise we are just creating new instructions while not removing any
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32172 >
2025-01-07 05:47:52 +00:00
Georg Lehmann
c695043e81
nir/opt_algebraic: optimize min(max(a, b), a)
...
Foz-DB Navi21:
Totals from 105 (0.13% of 79395) affected shaders:
MaxWaves: 2638 -> 2646 (+0.30%)
Instrs: 76531 -> 75077 (-1.90%)
CodeSize: 413668 -> 406484 (-1.74%)
VGPRs: 4856 -> 4848 (-0.16%)
Latency: 333684 -> 328438 (-1.57%); split: -1.57%, +0.00%
InvThroughput: 80417 -> 78579 (-2.29%)
VClause: 1818 -> 1768 (-2.75%)
SClause: 3028 -> 2964 (-2.11%)
Copies: 4708 -> 4513 (-4.14%); split: -4.50%, +0.36%
PreVGPRs: 3792 -> 3715 (-2.03%); split: -2.08%, +0.05%
VALU: 54734 -> 53528 (-2.20%)
SALU: 6195 -> 6137 (-0.94%)
VMEM: 2363 -> 2313 (-2.12%)
SMEM: 5219 -> 5119 (-1.92%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32634 >
2024-12-16 22:29:21 +00:00
Qiang Yu
129e37bab6
nir: do not generate b2i64 when driver want to lower it
...
This is found on GFX12 by:
KHR-GL43.shader_ballot_tests.ShaderBallotBitmasks
ACO does not support it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570 >
2024-12-16 07:35:07 +00:00
Alyssa Rosenzweig
bd89279dd4
nir: add lower_scratch_to_var pass
...
to ease opencl pain.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529 >
2024-12-12 21:16:13 +00:00
Karmjit Mahil
b79994e92d
nir,ir3: Add icsel_eqz
...
In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the
cmp and sel that we'd otherwise need.
total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%)
Instruction Count in affected programs: 162701 -> 160360 (-1.44%)
helped: 81
HURT: 29
Instruction count are helped.
total MOV Count in shared programs: 86777 -> 88671 (2.18%)
MOV Count in affected programs: 28119 -> 30013 (6.74%)
helped: 1
HURT: 292
Mov count are HURT.
total COV Count in shared programs: 15070 -> 14962 (-0.72%)
COV Count in affected programs: 5770 -> 5662 (-1.87%)
helped: 76
HURT: 2
Cov count are helped.
total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%)
Last helper instruction in affected programs: 91331 -> 89240 (-2.29%)
helped: 30
HURT: 1
Last helper instruction are helped.
total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%)
Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%)
helped: 8
HURT: 43
Instructions with ss sync bit are HURT.
total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%)
Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%)
helped: 21
HURT: 61
Estimated cycles stalled on ss are HURT.
total cat1 instructions in shared programs: 101933 -> 103695 (1.73%)
cat1 instructions in affected programs: 35804 -> 37566 (4.92%)
helped: 18
HURT: 290
Cat1 instructions are HURT.
total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%)
cat2 instructions in affected programs: 128609 -> 125809 (-2.18%)
helped: 322
HURT: 0
Cat2 instructions are helped.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189 >
2024-12-06 08:42:36 +00:00
Karmjit Mahil
aad0aa0a9c
nir/algebraic: turn u{ge,lt} a, 1 to i{ne,eq} a, 0
...
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189 >
2024-12-06 08:42:36 +00:00
Ian Romanick
e1bb53bb3c
nir/algebraic: Optimize some trivial bfi
...
In fossil-db, one big compute shader on Hogwarts Legacy is helped for
spills and fills. It has a lot of instances of bfi(0x3f, a, a).
On Tiger Lake and Skylake, a compute shader in Unicom that has a
single instance of this pattern is hurt for spills and fills. I think
this is just due to non-determinism in the register allocation
algorithm.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16992643 -> 16992548 (<.01%)
instructions in affected programs: 17533 -> 17438 (-0.54%)
helped: 33 / HURT: 0
total cycles in shared programs: 914313986 -> 914316238 (<.01%)
cycles in affected programs: 3734544 -> 3736796 (0.06%)
helped: 26 / HURT: 6
fossil-db:
Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 208952780 -> 208952537 (-0.00%)
Send messages: 10934879 -> 10934875 (-0.00%)
Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00%
Spill count: 534864 -> 534843 (-0.00%)
Fill count: 667081 -> 667068 (-0.00%)
Max live registers: 65686656 -> 65686624 (-0.00%)
Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%)
Totals from 3 (0.00% of 704834) affected shaders:
Instrs: 4708 -> 4465 (-5.16%)
Send messages: 234 -> 230 (-1.71%)
Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03%
Spill count: 91 -> 70 (-23.08%)
Fill count: 73 -> 60 (-17.81%)
Max live registers: 647 -> 615 (-4.95%)
Non SSA regs after NIR: 3957 -> 3934 (-0.58%)
Tiger Lake
Totals:
Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00%
Send messages: 12657684 -> 12657680 (-0.00%)
Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00%
Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00%
Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 19538944 -> 19539968 (+0.01%)
Max live registers: 41713622 -> 41713593 (-0.00%)
Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00%
Totals from 174 (0.02% of 794323) affected shaders:
Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04%
Send messages: 14330 -> 14326 (-0.03%)
Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16%
Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55%
Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87%
Scratch Memory Size: 9216 -> 10240 (+11.11%)
Max live registers: 12587 -> 12558 (-0.23%)
Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02%
Skylake
Totals:
Instrs: 158904982 -> 158903764 (-0.00%)
Send messages: 8490500 -> 8490496 (-0.00%)
Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00%
Spill count: 519127 -> 519115 (-0.00%)
Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00%
Max live registers: 33708764 -> 33708739 (-0.00%)
Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00%
Totals from 174 (0.03% of 648725) affected shaders:
Instrs: 160391 -> 159173 (-0.76%)
Send messages: 14354 -> 14350 (-0.03%)
Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32%
Spill count: 332 -> 320 (-3.61%)
Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36%
Max live registers: 12709 -> 12684 (-0.20%)
Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493 >
2024-12-05 21:39:07 +00:00
Alyssa Rosenzweig
77d4ed0a01
nir/opt_algebraic: optimize sign bit manipulation
...
libclc loves to generate the iand(0x7fffffff) pattern. ior/ixor patterns are
added for completeness.
Shaves 4 instructions off libclc vec4 normalize.
v2: Loop over the bit sizes (Georg).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Georg Lehmann
34a47e4b14
nir/opt_algebraic: mark a - ffract(a) as nan incorrect.
...
Inf + fract(Inf) -> Inf + NaN -> NaN
floor(Inf) -> Inf
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Georg Lehmann
2ee96cf514
nir/opt_algebraic: optimize d3d9 ceil
...
No Foz-DB changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Georg Lehmann
34caed8adb
nir/opt_algebraic: optimize d3d9 ftrunc
...
Foz-DB Navi21:
Totals from 85 (0.11% of 79395) affected shaders:
MaxWaves: 1972 -> 1968 (-0.20%)
Instrs: 48682 -> 47067 (-3.32%)
CodeSize: 255664 -> 247172 (-3.32%)
VGPRs: 3752 -> 3768 (+0.43%)
Latency: 154414 -> 150360 (-2.63%)
InvThroughput: 37186 -> 35081 (-5.66%)
VClause: 847 -> 865 (+2.13%); split: -0.24%, +2.36%
SClause: 768 -> 796 (+3.65%)
Copies: 2763 -> 2869 (+3.84%); split: -0.14%, +3.98%
VALU: 28133 -> 26781 (-4.81%)
SALU: 7182 -> 6939 (-3.38%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Georg Lehmann
ea4aa8e5a6
nir/opt_algebraic: optimize ffma(b2f, b2f, c)
...
Foz-DB Navi21:
Totals from 134 (0.17% of 79395) affected shaders:
Instrs: 153297 -> 153326 (+0.02%); split: -0.03%, +0.05%
CodeSize: 829520 -> 828444 (-0.13%); split: -0.13%, +0.00%
Latency: 900489 -> 899964 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 267838 -> 267478 (-0.13%); split: -0.14%, +0.00%
VClause: 2452 -> 2454 (+0.08%)
Copies: 8331 -> 8353 (+0.26%); split: -0.25%, +0.52%
PreSGPRs: 4974 -> 4964 (-0.20%)
PreVGPRs: 6209 -> 6218 (+0.14%)
VALU: 112317 -> 112092 (-0.20%); split: -0.21%, +0.01%
SALU: 12451 -> 12694 (+1.95%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Marek Olšák
8752401e03
nir/algebraic: optimize (a & b) | (a | c) => a | c, (a & b) & (a | c) => a & b
...
No change in shader-db with ACO, but it doesn't seem to be optimized by
any other patterns.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Marek Olšák
3670d42c74
nir/algebraic: optimize (a | b) | (a | c) ==> (a | b) | c
...
shader-db with ACO:
3 shaders have -0.11% average decrease in the code size
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Marek Olšák
978ad93375
nir/algebraic: optimize (a & b) & (a & c) ==> (a & b) & c
...
shader-db with ACO:
3 shaders have -0.57% average decrease in the code size
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Marek Olšák
83b093f95e
nir/algebraic: use is_used_once in a few iand/ior patterns
...
shader-db with ACO:
1 shader has -4 decrease in the code size
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Kenneth Graunke
92797c6878
nir/algebraic: Reassociate fadd into fmul in DP4-like pattern
...
This extends the optimization from commit 09705747d7 ("nir/algebraic:
Reassociate fadd into fmul in DPH-like pattern") to a chain of 4 ffmas
for a DP4-style pattern.
Moving the add to the other end of the sequence allows it to be fused
into an FMA.
fossil-db results from Alchemist:
Totals:
Instrs: 158544142 -> 158490516 (-0.03%); split: -0.04%, +0.00%
Subgroup size: 7808912 -> 7808920 (+0.00%); split: +0.00%, -0.00%
Cycle count: 17859550672 -> 17859491966 (-0.00%); split: -0.01%, +0.01%
Spill count: 84652 -> 84494 (-0.19%); split: -0.37%, +0.18%
Fill count: 160728 -> 160623 (-0.07%); split: -0.29%, +0.23%
Scratch Memory Size: 4278272 -> 4272128 (-0.14%); split: -0.29%, +0.14%
Max live registers: 32411695 -> 32409789 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 5627856 -> 5627920 (+0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 185359099 -> 185307703 (-0.03%); split: -0.03%, +0.00%
Totals from 16378 (2.56% of 640872) affected shaders:
Instrs: 9818723 -> 9765097 (-0.55%); split: -0.58%, +0.04%
Subgroup size: 194056 -> 194064 (+0.00%); split: +0.01%, -0.01%
Cycle count: 294967108 -> 294908402 (-0.02%); split: -0.58%, +0.56%
Spill count: 10088 -> 9930 (-1.57%); split: -3.09%, +1.53%
Fill count: 24738 -> 24633 (-0.42%); split: -1.90%, +1.48%
Scratch Memory Size: 439296 -> 433152 (-1.40%); split: -2.80%, +1.40%
Max live registers: 1297204 -> 1295298 (-0.15%); split: -0.22%, +0.07%
Max dispatch width: 133232 -> 133296 (+0.05%); split: +0.14%, -0.10%
Non SSA regs after NIR: 11999084 -> 11947688 (-0.43%); split: -0.43%, +0.00%
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32197 >
2024-12-02 13:15:16 +00:00
Rhys Perry
4c7d6e9437
nir/algebraic: optimize more bcsel(, bcsel())
...
This inot should be pretty optimizable.
fossil-db (navi21);
Totals from 2361 (2.97% of 79395) affected shaders:
MaxWaves: 50808 -> 50890 (+0.16%)
Instrs: 4168195 -> 4167332 (-0.02%); split: -0.05%, +0.03%
CodeSize: 22727496 -> 22708088 (-0.09%); split: -0.12%, +0.03%
VGPRs: 135160 -> 134824 (-0.25%)
SpillSGPRs: 723 -> 725 (+0.28%)
Latency: 37498671 -> 37479794 (-0.05%); split: -0.07%, +0.02%
InvThroughput: 10468406 -> 10453028 (-0.15%); split: -0.16%, +0.01%
VClause: 98258 -> 98283 (+0.03%); split: -0.04%, +0.07%
SClause: 111281 -> 111323 (+0.04%); split: -0.06%, +0.09%
Copies: 299281 -> 300155 (+0.29%); split: -0.17%, +0.46%
Branches: 115951 -> 116111 (+0.14%); split: -0.00%, +0.14%
PreSGPRs: 109404 -> 109462 (+0.05%); split: -0.14%, +0.19%
PreVGPRs: 114558 -> 114421 (-0.12%)
VALU: 2876823 -> 2869990 (-0.24%); split: -0.24%, +0.00%
SALU: 500286 -> 506124 (+1.17%); split: -0.03%, +1.20%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145 >
2024-11-21 14:50:45 +00:00
Rhys Perry
7ef1585fd6
nir/algebraic: add is_used_once to bcsel(, bcsel()) opts
...
fossil-db (navi21):
Totals from 888 (1.12% of 79395) affected shaders:
MaxWaves: 18034 -> 18046 (+0.07%)
Instrs: 3422053 -> 3418446 (-0.11%); split: -0.11%, +0.01%
CodeSize: 18520912 -> 18500604 (-0.11%); split: -0.12%, +0.01%
VGPRs: 53200 -> 53176 (-0.05%)
Latency: 27739575 -> 27735200 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 6784257 -> 6782188 (-0.03%); split: -0.06%, +0.03%
VClause: 83188 -> 83199 (+0.01%); split: -0.00%, +0.02%
SClause: 91350 -> 91362 (+0.01%); split: -0.00%, +0.02%
Copies: 263277 -> 262638 (-0.24%); split: -0.29%, +0.05%
PreSGPRs: 52478 -> 51940 (-1.03%); split: -1.03%, +0.01%
PreVGPRs: 47418 -> 47397 (-0.04%); split: -0.06%, +0.02%
VALU: 2235368 -> 2234513 (-0.04%); split: -0.05%, +0.01%
SALU: 547587 -> 544839 (-0.50%); split: -0.51%, +0.00%
VMEM: 142861 -> 142871 (+0.01%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145 >
2024-11-21 14:50:45 +00:00
Alyssa Rosenzweig
61862b209e
nir/opt_algebraic: optimize convert_uint_sat(ulong)
...
I wrote this in my query copy shader, it didn't get the codegen I expected, so I
investigated.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32208 >
2024-11-20 16:53:50 +00:00
Rhys Perry
327e5465fc
nir/algebraic: check bit sizes in lowered unpack(pack()) optimization
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 894f7f4387 ("nir_opt_algebraic: Add a couple optimizations for lowered unpack(pack())")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32157 >
2024-11-19 18:17:18 +00:00
Rhys Perry
ecd6ae12fb
nir/algebraic: fix iabs(ishr(iabs(a), b)) optimization
...
iabs(a) is not positive if "a" is the minimum signed value, so this is
incorrect in that case for some values of "b".
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 2b76de9b5d ("nir/algebraic: Add a couple optimizations for iabs and ishr")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32157 >
2024-11-19 18:17:17 +00:00
Rhys Perry
0c7830eb85
nir/algebraic: optimize ushr(a, ishl(iand(b, 3), 3))
...
nir_lower_mem_access_bit_sizes creates this.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904 >
2024-11-13 12:59:26 +00:00
Rhys Perry
e95a3364b8
nir/algebraic: optimize bcsel(ieq(b, 0), a, shift(a, b))
...
nir_lower_mem_access_bit_sizes can create this.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904 >
2024-11-13 12:59:26 +00:00
Alyssa Rosenzweig
fc460e7f20
nir/opt_algebraic: don't lower amul if requested
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964 >
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
227026b7ad
nir/opt_algebraic: add another 64-bit pattern
...
clpeak
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964 >
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
2a3f133fd0
nir/opt_algebraic: add more 64-bit patterns
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964 >
2024-11-08 21:15:41 -04:00
Alyssa Rosenzweig
a4a3487aae
nir/opt_algebraic: optimize patterns from Skia
...
shaders/skia/1567.shader_test relies on algebraic + constant folding, subtle
changes in the input compiling flow can cause it to baloon. these patterns fix
that. annoying!
shader-db results aren't amazing, but they avert a major stats regression for
that one Skia shader.
total instructions in shared programs: 2751399 -> 2751295 (<.01%)
instructions in affected programs: 6509 -> 6405 (-1.60%)
helped: 21
HURT: 1
helped stats (abs) min: 1 max: 14 x̄: 5.62 x̃: 6
helped stats (rel) min: 0.53% max: 13.73% x̄: 3.57% x̃: 1.62%
HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14
HURT stats (rel) min: 2.45% max: 2.45% x̄: 2.45% x̃: 2.45%
95% mean confidence interval for instructions value: -7.09 -2.36
95% mean confidence interval for instructions %-change: -5.14% -1.45%
Instructions are helped.
total alu in shared programs: 2274577 -> 2274468 (<.01%)
alu in affected programs: 6178 -> 6069 (-1.76%)
helped: 21
HURT: 1
helped stats (abs) min: 1 max: 14 x̄: 5.86 x̃: 7
helped stats (rel) min: 0.55% max: 16.47% x̄: 3.93% x̃: 1.72%
HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14
HURT stats (rel) min: 2.83% max: 2.83% x̄: 2.83% x̃: 2.83%
95% mean confidence interval for alu value: -7.35 -2.56
95% mean confidence interval for alu %-change: -5.67% -1.57%
Alu are helped.
total fscib in shared programs: 2272894 -> 2272785 (<.01%)
fscib in affected programs: 6178 -> 6069 (-1.76%)
helped: 21
HURT: 1
helped stats (abs) min: 1 max: 14 x̄: 5.86 x̃: 7
helped stats (rel) min: 0.55% max: 16.47% x̄: 3.93% x̃: 1.72%
HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14
HURT stats (rel) min: 2.83% max: 2.83% x̄: 2.83% x̃: 2.83%
95% mean confidence interval for fscib value: -7.35 -2.56
95% mean confidence interval for fscib %-change: -5.67% -1.57%
Fscib are helped.
total bytes in shared programs: 21489352 -> 21488668 (<.01%)
bytes in affected programs: 53362 -> 52678 (-1.28%)
helped: 21
HURT: 2
helped stats (abs) min: 6 max: 98 x̄: 35.52 x̃: 40
helped stats (rel) min: 0.39% max: 10.63% x̄: 2.27% x̃: 1.27%
HURT stats (abs) min: 2 max: 60 x̄: 31.00 x̃: 31
HURT stats (rel) min: 0.08% max: 1.40% x̄: 0.74% x̃: 0.74%
95% mean confidence interval for bytes value: -42.73 -16.74
95% mean confidence interval for bytes %-change: -3.13% -0.89%
Bytes are helped.
total regs in shared programs: 865162 -> 865148 (<.01%)
regs in affected programs: 509 -> 495 (-2.75%)
helped: 4
HURT: 5
helped stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 4
helped stats (rel) min: 3.17% max: 35.90% x̄: 14.01% x̃: 8.48%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 3.17% max: 3.17% x̄: 3.17% x̃: 3.17%
95% mean confidence interval for regs value: -5.75 2.64
95% mean confidence interval for regs %-change: -14.31% 5.39%
Inconclusive result (value mean confidence interval includes 0).
total uniforms in shared programs: 2120731 -> 2120735 (<.01%)
uniforms in affected programs: 358 -> 362 (1.12%)
helped: 1
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 2.94% max: 2.94% x̄: 2.94% x̃: 2.94%
HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel) min: 1.05% max: 4.00% x̄: 2.53% x̃: 2.53%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964 >
2024-11-08 21:15:41 -04:00
Rhys Perry
da5c5a3edd
nir/algebraic: add bit-size check to extract_u8 pattern
...
This only worked when "a" was 16-bit because a pattern above replaced the
shift.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762 >
2024-11-06 19:31:20 +00:00
Alyssa Rosenzweig
33299354e0
nir/opt_algebraic: optimize patterns hit with OpenCL
...
This patterns were all found in the AGX quads tessellator, a medium-sized OpenCL
kernel. LLVM generates a lot of garbage around booleans which we need to chew
through. Though there's nothing AGX or really OpenCL specific here, so some of
this could help graphics shaders too.
Together, their effect is significant for that kernel instr count & occupancy:
before: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 384 threads
after: 2848 inst, 2246 alu, 2246 fscib, 1000 ic, 22260 bytes, 231 regs, 448 threads
No significant changes on GL shaderdb (a single godot shader regressed 1
instruction, 1344->1345).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892 >
2024-10-30 12:59:10 +00:00
Georg Lehmann
d6535f2602
nir/opt_algebraic: create ubfe with non constant mask
...
Foz-DB Navi21:
Totals from 278 (0.35% of 79395) affected shaders:
MaxWaves: 7444 -> 7448 (+0.05%)
Instrs: 316069 -> 314584 (-0.47%); split: -0.47%, +0.00%
CodeSize: 1608064 -> 1593204 (-0.92%)
VGPRs: 11128 -> 11120 (-0.07%)
Latency: 796599 -> 797786 (+0.15%); split: -0.19%, +0.34%
InvThroughput: 141195 -> 139472 (-1.22%); split: -1.22%, +0.00%
Copies: 28565 -> 29796 (+4.31%); split: -0.15%, +4.46%
PreSGPRs: 14335 -> 14336 (+0.01%)
VALU: 161342 -> 159426 (-1.19%)
SALU: 87794 -> 88305 (+0.58%); split: -0.03%, +0.61%
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852 >
2024-10-29 10:51:10 +00:00
Timur Kristóf
be68aeafdc
nir/opt_algebraic: Add various bitfield extract patterns.
...
v2 (Georg Lehmann):
- fixed incorrect imin in ubfe_ubfe
- simplied outer_bits of ushr((ubfe, ...), ...) opt
- added is_used_once to iand(ushr(), ...) opt to improve stats
For-DB Navi21:
Totals from 3309 (4.18% of 79206) affected shaders:
Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03%
CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06%
Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01%
InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01%
VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02%
SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01%
Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09%
Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01%
PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01%
PreVGPRs: 151477 -> 151474 (-0.00%)
VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04%
SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03%
VMEM: 188035 -> 188060 (+0.01%)
SMEM: 259632 -> 259630 (-0.00%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852 >
2024-10-29 10:51:09 +00:00
Rhys Perry
8efc765a3d
nir/algebraic: fix shfr optimization with zero src2
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 08903bbe89 ("nir: add mqsad_4x8, shfr and nir_opt_mqsad")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808 >
2024-10-25 09:59:40 +00:00
Georg Lehmann
1f9b82bb2a
nir/opt_algebraic: optimize -0.0 + a
...
Foz-DB Navi21:
Totals from 428 (0.54% of 79395) affected shaders:
MaxWaves: 8510 -> 8512 (+0.02%)
Instrs: 731062 -> 729665 (-0.19%); split: -0.19%, +0.00%
CodeSize: 3735788 -> 3728324 (-0.20%); split: -0.20%, +0.00%
VGPRs: 27328 -> 27336 (+0.03%); split: -0.03%, +0.06%
SpillSGPRs: 315 -> 314 (-0.32%)
Latency: 3872986 -> 3873236 (+0.01%); split: -0.08%, +0.09%
InvThroughput: 971001 -> 970056 (-0.10%); split: -0.17%, +0.08%
VClause: 11954 -> 11956 (+0.02%); split: -0.02%, +0.03%
SClause: 17361 -> 17358 (-0.02%)
Copies: 59038 -> 59045 (+0.01%); split: -0.22%, +0.24%
Branches: 17685 -> 17656 (-0.16%)
PreSGPRs: 26103 -> 26102 (-0.00%)
PreVGPRs: 23220 -> 23206 (-0.06%)
VALU: 515293 -> 513963 (-0.26%); split: -0.26%, +0.00%
SALU: 91591 -> 91544 (-0.05%)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31770 >
2024-10-23 08:58:34 +00:00
Georg Lehmann
f9d2aad7a3
nir: remove alu ddx/ddy
...
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014 >
2024-10-17 09:50:19 +00:00
Gert Wollny
f19f1ec17b
nir/opt_algebraic: Allow two-step lowering of ftrunc@64 to use ffract@64
...
If ftrunc@64 is lowered by nir_lower_doubles it is turned into a
comparable long series of 32 bit operations. If the hardware
supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64
to use some combinations with ffloor@64. They can then be turned
into a combination of fsub@64 and ffract@64 resulting in less
all-over instructions.
Fixes: 5218cff34b
nir/algebraic: avoid double lowering of some fp64 operations
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281 >
2024-09-30 23:51:02 +00:00
Ian Romanick
057c7c9f53
nir/algebraic: Recognize open-coded bitfield_reverse in XCOM 2
...
The XCOM 2 shaders in my shader-db use iadd instead of ior.
No fossil-db changes on any Intel platform.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19787210 -> 19787034 (<.01%)
instructions in affected programs: 1187 -> 1011 (-14.83%)
helped: 6 / HURT: 0
total cycles in shared programs: 906024436 -> 906012612 (<.01%)
cycles in affected programs: 72978 -> 61154 (-16.20%)
helped: 6 / HURT: 0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006 >
2024-09-13 00:21:00 +00:00
Ian Romanick
a780305818
nir/algebraic: Optimize more comparisons with b2f
...
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19781108 -> 19772614 (-0.04%)
instructions in affected programs: 372638 -> 364144 (-2.28%)
helped: 2915 / HURT: 0
total cycles in shared programs: 905907644 -> 905822682 (<.01%)
cycles in affected programs: 5573453 -> 5488491 (-1.52%)
helped: 2363 / HURT: 234
LOST: 42
GAINED: 16
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152519634 -> 152519610 (-0.00%)
Cycle count: 17122707642 -> 17122710974 (+0.00%); split: -0.00%, +0.00%
Totals from 5 (0.00% of 633222) affected shaders:
Instrs: 2827 -> 2803 (-0.85%)
Cycle count: 83089 -> 86421 (+4.01%); split: -0.12%, +4.13%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31068 >
2024-09-10 04:15:58 +00:00
Georg Lehmann
6378bbaa82
nir/opt_algebraic: reassociate constants in ior(iand) chains
...
Mostly affects one F1_23 shader that packs bitfields bit by bit.
Totals from 3 (0.00% of 79395) affected shaders:
Instrs: 5004 -> 4202 (-16.03%)
CodeSize: 30992 -> 23952 (-22.72%)
Latency: 28894 -> 28464 (-1.49%)
InvThroughput: 4095 -> 3934 (-3.93%)
Copies: 363 -> 376 (+3.58%)
PreVGPRs: 110 -> 109 (-0.91%)
VALU: 3035 -> 2504 (-17.50%)
SALU: 463 -> 459 (-0.86%)
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31009 >
2024-09-05 22:04:05 +00:00
Caio Oliveira
74be809237
compiler: Allow derivative_group to be used for all stages in shader_info
...
These will now also be used by stages that have workgroups.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950 >
2024-09-03 20:03:18 +00:00
Ian Romanick
f11a414645
nir/algebraic: Remove incorrect bfi of iand pattern
...
The comment says, "This expands to (b & 3) & ~0xc which is (b & 3) &
3." This is not correct. ~0xc is actually 0xfffffff3.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Closes : #11695
Fixes: 1c7e35d4e0 ("nir/algebraic: Optimize some bit operation nonsense observed in some shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30913 >
2024-08-29 22:21:55 +00:00
Georg Lehmann
ef970c5a9d
nir: optimize pack_uint_2x16 of pack_half(a, 0)
...
Foz-DB Navi31:
Totals from 31 (0.04% of 79395) affected shaders:
Instrs: 6157 -> 6065 (-1.49%)
CodeSize: 35676 -> 34936 (-2.07%)
Latency: 23979 -> 23805 (-0.73%); split: -0.79%, +0.07%
InvThroughput: 5248 -> 5124 (-2.36%)
VALU: 3224 -> 3162 (-1.92%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30855 >
2024-08-28 07:16:55 +00:00
Ian Romanick
198d8d9c03
nir/algebraic: Improve some find_lsb and ifind_msb patterns
...
These patterns were observed in shaders from parallel-rdp.
No shader-db changes on any Intel platform.
fossil-db:
Meteor Lake, DG2, Ice Lake had Skylake similar results. (Meteor Lake shown)
Totals:
Instrs: 152535883 -> 152535673 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17112406110 -> 17122827810 (+0.06%); split: -0.01%, +0.07%
Spill count: 78525 -> 78523 (-0.00%)
Fill count: 148132 -> 148127 (-0.00%); split: -0.01%, +0.00%
Max live registers: 31855320 -> 31855314 (-0.00%)
Totals from 206 (0.03% of 633223) affected shaders:
Instrs: 797124 -> 796914 (-0.03%); split: -0.03%, +0.00%
Cycle count: 4716743323 -> 4727165023 (+0.22%); split: -0.05%, +0.27%
Spill count: 18781 -> 18779 (-0.01%)
Fill count: 31381 -> 31376 (-0.02%); split: -0.03%, +0.01%
Max live registers: 31872 -> 31866 (-0.02%)
Tiger Lake
Totals:
Instrs: 150560465 -> 150560343 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15482372893 -> 15479328542 (-0.02%); split: -0.02%, +0.00%
Fill count: 103509 -> 103512 (+0.00%)
Max live registers: 31760378 -> 31760374 (-0.00%)
Totals from 199 (0.03% of 632445) affected shaders:
Instrs: 679513 -> 679391 (-0.02%); split: -0.02%, +0.00%
Cycle count: 4258406125 -> 4255361774 (-0.07%); split: -0.09%, +0.02%
Fill count: 30609 -> 30612 (+0.01%)
Max live registers: 30502 -> 30498 (-0.01%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650 >
2024-08-16 14:52:04 +00:00
Marek Olšák
ecfefe823e
nir/opt_algebraic: use fmulz for fpow lowering to fix incorrect rendering
...
The original implementation in all radeon drivers had this behavior.
Fixes: 9bc1fb4c07 - ac/llvm,radeonsi: lower nir_fpow for aco and llvm
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11464
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30069 >
2024-07-23 15:23:27 +00:00
Ian Romanick
faee9426ab
nir/algebraic: Optimize some masking of extract_u8 operations
...
I observed this pattern in several Red Dead Redemption 2 shaders.
No shader-db changes on any Intel platform.
v2: Remove duplicated patterns. Noticed by Georg.
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151519393 -> 151507192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17208246858 -> 17177437340 (-0.18%); split: -0.25%, +0.07%
Spill count: 80830 -> 80759 (-0.09%); split: -0.09%, +0.00%
Fill count: 152754 -> 152179 (-0.38%); split: -0.40%, +0.02%
Totals from 7531 (1.20% of 630198) affected shaders:
Instrs: 12606141 -> 12593940 (-0.10%); split: -0.10%, +0.00%
Cycle count: 5466605514 -> 5435795996 (-0.56%); split: -0.79%, +0.22%
Spill count: 25251 -> 25180 (-0.28%); split: -0.29%, +0.01%
Fill count: 45143 -> 44568 (-1.27%); split: -1.36%, +0.08%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158 >
2024-07-20 00:19:05 +00:00
Ian Romanick
1c7e35d4e0
nir/algebraic: Optimize some bit operation nonsense observed in some shaders
...
In updates (not post at the time of this writing) to !29884 , a change
caused many spill and fill regressions shader for OpenGL Tomb
Raider. While looking at that shader, I noticed some odd patterns. I
initially added these patterns to counteract the regressions caused by
the other change, but I had no luck. On Ice Lake... this cuts 99
instructions from the shader.
shader-db:
All Intel platforms had simliar results. (Meteor Lake shown)
total instructions in shared programs: 19732341 -> 19732295 (<.01%)
instructions in affected programs: 1744 -> 1698 (-2.64%)
helped: 1 / HURT: 0
total cycles in shared programs: 916273716 -> 916273068 (<.01%)
cycles in affected programs: 14266 -> 13618 (-4.54%)
helped: 1 / HURT: 0
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151519575 -> 151519393 (-0.00%)
Cycle count: 17208402120 -> 17208246858 (-0.00%); split: -0.00%, +0.00%
Totals from 159 (0.03% of 630198) affected shaders:
Instrs: 51970 -> 51788 (-0.35%)
Cycle count: 11474176 -> 11318914 (-1.35%); split: -1.36%, +0.01%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158 >
2024-07-20 00:19:05 +00:00