Commit graph

689 commits

Author SHA1 Message Date
Emma Anholt
e5a9eae2b5 nir/opt_algebraic_tests: Fix fuzzing levels for multi-component inputs.
We were enumerating enough for a single component, but not all the
combinations.  This helps show that our fdots fail pretty consistently.
And triggers more skipping from the fany_equal16s thanks to varied inputs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:41 +00:00
Emma Anholt
7fd0287a89 nir/opt_algebraic_tests: Test !nir_fp_preserve_signed_zero behavior.
Iterate over a set of sign-flips for 0.0s to see if we can find a set that
makes the search and replace sides match.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:41 +00:00
Emma Anholt
68f5bc4f12 nir/opt_algebraic_tests: Rename and use the enum result type more.
As I introduced another layer of iteration for signed zero testing, the
former logic got unwieldy.  In fact, it was already unwieldy enough that I
forgot to clear all_skipped when the assert failed, allowing a failing
test to be marked UNSUPPORTED instead of XFAIL.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:40 +00:00
Emma Anholt
a90163a15a nir/opt_algebraic_tests: Add support for expression swizzles.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:40 +00:00
Emma Anholt
c30c383d4d nir/opt_algebraic_tests: Allow testing of fdot*_replicated opcodes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:40 +00:00
Emma Anholt
173295adf4 nir/opt_algebraic_tests: Allow testing udiv_aligned_4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:40 +00:00
Emma Anholt
94237c3ea3 nir/opt_algebraic_tests: Allow testing mul/mad_relaxed opcodes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:40 +00:00
Emma Anholt
fd7754fba1 nir/opt_algebraic_tests: Allow testing imad24_ir3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39369>
2026-01-26 05:39:39 +00:00
Georg Lehmann
442daeb54a nir/opt_algebraic: use fcanonicalize
Mostly optimizations, some minor fixes but I don't
think they are worth backporting.

Foz-DB Navi21:
Totals from 7570 (9.21% of 82151) affected shaders:
MaxWaves: 204288 -> 204476 (+0.09%); split: +0.09%, -0.00%
Instrs: 4511439 -> 4500261 (-0.25%); split: -0.25%, +0.00%
CodeSize: 23727088 -> 23644388 (-0.35%); split: -0.35%, +0.00%
VGPRs: 290944 -> 290616 (-0.11%); split: -0.12%, +0.01%
SpillSGPRs: 1256 -> 1251 (-0.40%)
Latency: 16738072 -> 16726717 (-0.07%); split: -0.10%, +0.04%
InvThroughput: 3736856 -> 3716631 (-0.54%); split: -0.55%, +0.01%
VClause: 66150 -> 66156 (+0.01%); split: -0.05%, +0.06%
SClause: 93644 -> 93631 (-0.01%); split: -0.02%, +0.01%
Copies: 448816 -> 458584 (+2.18%); split: -0.05%, +2.22%
Branches: 139817 -> 139775 (-0.03%); split: -0.03%, +0.00%
PreSGPRs: 321922 -> 321900 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 239709 -> 238856 (-0.36%); split: -0.39%, +0.03%
VALU: 2595164 -> 2584250 (-0.42%); split: -0.43%, +0.01%
SALU: 839038 -> 838965 (-0.01%); split: -0.02%, +0.01%
VMEM: 137584 -> 137583 (-0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>
2026-01-19 16:11:29 +00:00
Rhys Perry
625afb0d29 nir: add fcanonicalize
v2(Georg Lehmann):
Always remove fcanonicalize if denorms must be neither flushed nor preserved.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>
2026-01-19 16:11:29 +00:00
Eric Engestrom
30c2e6dbf2 nir/meson: drop redundant --build-tests in favour of just checking if --out-tests is set
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39350>
2026-01-16 16:55:21 +00:00
Eric Engestrom
246095da49 nir/meson: only try to generate the nir_opt_algebraic tests when requested
Anything listed in a meson target's `output` is expected to exist once
the command has run. If it's missing, meson/ninja will run the command
again to try to generate it, resulting in a ton of files getting
re-generated/re-compiled for no reason.

Fixes: 4c30c44b75 ("nir: Generate unit tests for nir_opt_algebraic")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14667
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39350>
2026-01-16 16:55:21 +00:00
Konstantin Seurer
4c30c44b75 nir: Generate unit tests for nir_opt_algebraic
This catches a number of bugs in the current NIR algebraic optimizations
or opcodes implementations (as fixed in this series, or documented in the
XFAIL tests), and should prevent many future bugs from landing.

This required bumping the test timeout, because s390x is very slow to
emulate in CI.

Closes: #3338
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>
2026-01-15 19:09:43 +00:00
Emma Anholt
df215cc3cd nir/opt_algebraic_tests: Mark patterns as unsupported or xfails.
This way as a pattern author/editor you can immediately see whether it's
getting test coverage and if there are known issues with the pattern.
This will also give us clear outcomes from testing as we fix failing
patterns.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>
2026-01-15 19:09:43 +00:00
Georg Lehmann
93d05cdfd8 nir/opt_algebraic: move fsat last for fsqrt(fsat(a))
This should be exact, even for all special values:

fsqrt(NaN) -> NaN
fsqrt(-0.0) -> 0.0
fsqrt(-Inf) -> NaN
fsqrt(negative finite) -> NaN

So all of these get saturated to +0.0

All numbers >= 1.0 will have a square root >= 1.0,
which will be saturate to 1.0

Moving the fsat guarantees that it can use an output modifier
for hardware that has those, and shouldn't harm other hardware either.

Foz-DB Navi21:
Totals from 255 (0.31% of 82151) affected shaders:
Instrs: 664906 -> 664194 (-0.11%)
CodeSize: 3623500 -> 3619188 (-0.12%)
Latency: 11336397 -> 11335688 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 2716430 -> 2715726 (-0.03%); split: -0.03%, +0.00%
VALU: 442603 -> 441891 (-0.16%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39202>
2026-01-09 07:34:46 +00:00
Ian Romanick
d4a87e85b3 nir/algebraic: Add missing f on F-strings
Without this, nir_algebraic.py was treating "f2i{int_sz}_sat" as the
literal opcode name when it should have been "f2i8_sat" or similar.

Fixes: c49d6e0480 ("nir/algebraic: Elide range clamping of f2u sources")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39031>
2026-01-08 13:19:35 -08:00
Konstantin Seurer
a8224e3e00 nir/opt_algebraic: Do not emit patterns for 64bit booleans
Avoids assertion failures trying to constant-evaluate the pattern with the
new nir_opt_algebraic_pattern_tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>
2026-01-06 21:27:48 +00:00
Konstantin Seurer
211c7db8e3 nir/opt_algebraic: Remove a pattern for 8bit floats
Avoids assertion failures trying to constant-evaluate the pattern with the
new nir_opt_algebraic_pattern_tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>
2026-01-06 21:27:48 +00:00
Emma Anholt
afece95101 nir/opt_algebraic: Fix return type of fdot(vec(a, 0.0, ...), b).
The replace pattern was generating a vector when it should have been
scalar.  Fixes validation failures with the new algebraic unit tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39184>
2026-01-06 21:27:47 +00:00
Georg Lehmann
c8ce0df2d2 nir/opt_algebraic: replace is_negative_zero with constant -0.0
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Now that nir_search respects the sign of zero, we don't need
a manual helper for this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>
2026-01-03 12:42:23 +00:00
Georg Lehmann
7d2a946730 nir/opt_algebraic: canonicalize scmp with -0.0
We already do this for non fused comparisons.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>
2026-01-03 12:42:23 +00:00
Georg Lehmann
2824c12252 nir/opt_algebraic: explicitly add some -0.0 variants of patterns
Foz-DB Navi21:
Totals from 5 (0.00% of 125360) affected shaders:
CodeSize: 28812 -> 28744 (-0.24%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>
2026-01-03 12:42:23 +00:00
Pavel Ondračka
0b39b5ea63 nir/opt_algebraic: improve dot product narrowing
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The issue is that the current narrowing patterns are not working in a lot
of cases, for example
(('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
is missing patterns like this:

32x3   %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000)
32x4   %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0)
32    %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz

or after some later transforms:

32x2   %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000)
32x2   %6 = vec2 %5, %1 (0x0)
32    %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy

This patch is heavily based on old branch from Ian Romanick from 2019.

r300 RV530 shader-db:
total instructions in shared programs: 128900 -> 128882 (-0.01%)
instructions in affected programs: 621 -> 603 (-2.90%)
helped: 10
HURT: 1
total cycles in shared programs: 191837 -> 191828 (<.01%)
cycles in affected programs: 799 -> 790 (-1.13%)
helped: 7
HURT: 1

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>
2026-01-02 16:07:10 +01:00
Ian Romanick
66fd4d72fd nir/algebraic: Mask with shifted constant instead of shift-then-mask
shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17088766 -> 17088765 (<.01%)
instructions in affected programs: 1375 -> 1374 (-0.07%)
helped: 1 / HURT: 1

total cycles in shared programs: 887873068 -> 887871748 (<.01%)
cycles in affected programs: 136402 -> 135082 (-0.97%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 924954240 -> 924939317 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 40937696 -> 40937728 (+0.00%)
Cycle count: 106116946509 -> 106116637903 (-0.00%); split: -0.00%, +0.00%
Spill count: 3423930 -> 3423250 (-0.02%); split: -0.02%, +0.00%
Fill count: 4876960 -> 4876045 (-0.02%); split: -0.03%, +0.01%
Max live registers: 193882457 -> 193881816 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 49078640 -> 49078656 (+0.00%)
Non SSA regs after NIR: 231314214 -> 231314219 (+0.00%); split: -0.00%, +0.00%

Totals from 13809 (0.68% of 2019450) affected shaders:
Instrs: 25433084 -> 25418161 (-0.06%); split: -0.08%, +0.02%
Subgroup size: 32 -> 64 (+100.00%)
Cycle count: 1483550606 -> 1483242000 (-0.02%); split: -0.27%, +0.25%
Spill count: 41466 -> 40786 (-1.64%); split: -1.88%, +0.24%
Fill count: 74195 -> 73280 (-1.23%); split: -2.12%, +0.88%
Max live registers: 2326365 -> 2325724 (-0.03%); split: -0.05%, +0.02%
Max dispatch width: 234848 -> 234864 (+0.01%)
Non SSA regs after NIR: 3394104 -> 3394109 (+0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997527742 -> 997524495 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 27452928 -> 27452944 (+0.00%)
Cycle count: 93646717070 -> 93649738060 (+0.00%); split: -0.00%, +0.01%
Spill count: 3710125 -> 3709784 (-0.01%); split: -0.03%, +0.02%
Fill count: 5032819 -> 5033191 (+0.01%); split: -0.04%, +0.05%
Max live registers: 121648838 -> 121648528 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37811544 -> 37811584 (+0.00%)
Non SSA regs after NIR: 255562054 -> 255565914 (+0.00%); split: -0.00%, +0.00%

Totals from 14438 (0.63% of 2281134) affected shaders:
Instrs: 25974222 -> 25970975 (-0.01%); split: -0.08%, +0.06%
Subgroup size: 16 -> 32 (+100.00%)
Cycle count: 1149710820 -> 1152731810 (+0.26%); split: -0.29%, +0.55%
Spill count: 44445 -> 44104 (-0.77%); split: -2.23%, +1.46%
Fill count: 76172 -> 76544 (+0.49%); split: -2.89%, +3.37%
Max live registers: 1237997 -> 1237687 (-0.03%); split: -0.04%, +0.02%
Max dispatch width: 123528 -> 123568 (+0.03%)
Non SSA regs after NIR: 3490757 -> 3494617 (+0.11%); split: -0.03%, +0.14%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013364485 -> 1013342384 (-0.00%); split: -0.00%, +0.00%
Cycle count: 85509342602 -> 85500105656 (-0.01%); split: -0.02%, +0.01%
Spill count: 3903944 -> 3903350 (-0.02%); split: -0.02%, +0.01%
Fill count: 6801948 -> 6799368 (-0.04%); split: -0.05%, +0.01%
Max live registers: 122212165 -> 122211859 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37805336 -> 37805472 (+0.00%)
Non SSA regs after NIR: 244624956 -> 244628603 (+0.00%); split: -0.00%, +0.00%

Totals from 14835 (0.65% of 2278397) affected shaders:
Instrs: 27522570 -> 27500469 (-0.08%); split: -0.10%, +0.02%
Cycle count: 1128820972 -> 1119584026 (-0.82%); split: -1.53%, +0.71%
Spill count: 46408 -> 45814 (-1.28%); split: -2.04%, +0.76%
Fill count: 99071 -> 96491 (-2.60%); split: -3.14%, +0.54%
Max live registers: 1287967 -> 1287661 (-0.02%); split: -0.04%, +0.02%
Max dispatch width: 126600 -> 126736 (+0.11%)
Non SSA regs after NIR: 3438628 -> 3442275 (+0.11%); split: -0.03%, +0.14%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38979>
2025-12-17 18:38:55 +00:00
Georg Lehmann
653716b745 nir/opt_algebraic: create more bit test
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Helps hackends with has_bit_test more (i.e. ACO), but it
shouldn't hurt others either.

Foz-DB Navi21:
Totals from 1138 (1.17% of 97591) affected shaders:
Instrs: 5478747 -> 5476055 (-0.05%); split: -0.05%, +0.00%
CodeSize: 29850188 -> 29853140 (+0.01%); split: -0.04%, +0.05%
SpillSGPRs: 1406 -> 1401 (-0.36%)
Latency: 42324245 -> 42325921 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 11396940 -> 11394048 (-0.03%); split: -0.04%, +0.01%
VClause: 142294 -> 142309 (+0.01%); split: -0.00%, +0.01%
SClause: 124412 -> 124411 (-0.00%); split: -0.00%, +0.00%
Copies: 572696 -> 572749 (+0.01%); split: -0.02%, +0.03%
Branches: 199932 -> 199929 (-0.00%)
PreSGPRs: 73372 -> 74970 (+2.18%)
PreVGPRs: 79514 -> 79511 (-0.00%)
VALU: 3628764 -> 3625744 (-0.08%); split: -0.08%, +0.00%
SALU: 818258 -> 818475 (+0.03%); split: -0.03%, +0.06%

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38700>
2025-11-28 13:25:24 +00:00
Georg Lehmann
9ef0c96f26 nir/opt_algebraic: optimize open coded pack_32_2x16
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi48:
Totals from 4 (0.00% of 80287) affected shaders:
Instrs: 6231 -> 6101 (-2.09%)
CodeSize: 35916 -> 35156 (-2.12%)
Latency: 72190 -> 71317 (-1.21%)
InvThroughput: 20817 -> 19962 (-4.11%)
VALU: 3145 -> 3029 (-3.69%)
VOPD: 310 -> 312 (+0.65%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37937>
2025-11-10 19:04:32 +00:00
Ian Romanick
f1bbc3d4e4 nir/algebraic: Don't generate integer min or max that will need to be lowered
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In !35844, there was some discussion about allowing 64-bit bcsel that
would be lowered in the driver. One challenge there would be if a 64-bit
bcsel was transformed into integer min or max by an algebraic
optimization. I believe these were the only algebraic patterns that
could create new integer min or max that would not be immediately
constant folded.

There were no shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>
2025-10-23 22:35:27 +00:00
Job Noorman
ad421cdf2e nir: mark fneg distribution through fadd/ffma as nsz
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.

However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>
2025-10-17 08:58:59 +00:00
Job Noorman
0b82b803d9 nir,ir3: rename umul_low to umul_16x16
This is more in line with similar opcodes like umul_32x16.

Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
2025-10-14 12:54:54 +00:00
Ian Romanick
1e691e68e2 nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select
shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17181254 -> 17181046 (<.01%)
instructions in affected programs: 35834 -> 35626 (-0.58%)
helped: 130 / HURT: 2

total cycles in shared programs: 888543370 -> 888554248 (<.01%)
cycles in affected programs: 7443984 -> 7454862 (0.15%)
helped: 95 / HURT: 87

fossil-db:

Lunar Lake
Totals:
Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00%
Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00%

Totals from 6842 (0.87% of 790721) affected shaders:
Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00%
Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19%
Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01%
Spill count: 530380 -> 530084 (-0.06%)
Fill count: 613416 -> 612900 (-0.08%)
Scratch Memory Size: 20089856 -> 20075520 (-0.07%)
Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 8034616 -> 8034584 (-0.00%)

Totals from 6653 (0.73% of 905545) affected shaders:
Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00%
Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40%
Spill count: 1953 -> 1657 (-15.16%)
Fill count: 3556 -> 3040 (-14.51%)
Scratch Memory Size: 92160 -> 77824 (-15.56%)
Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 55768 -> 55736 (-0.06%)

No shader-db or fossil-db changes on any previous Intel platforms.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:11 +00:00
Ian Romanick
aa53735b66 nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
08ec408061 nir/algebraic: Optimize f2u of negative value to zero
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:

    con 32    %495 = fmul! %203.i, %1 (0.000000)
    con 32    %496 = ffma! %203.j, %1 (0.000000), %495
    con 32    %497 = ffma! %203.k, %1 (0.000000), %496
    con 32    %498 = ffma! %203.l, %1 (0.000000), %497
    con 32    %499 = @load_reloc_const_intel (param_idx=1, base=0)
    con 32    %500 = @load_reloc_const_intel (param_idx=0, base=0)
    con 32    %501 = f2u32 %498
    con 32    %502 = umin %501, %172 (0x4)
    con 32    %503 = ishl %502, %172 (0x4)
    con 32    %504 = load_const (0x00000040 = 64)
    con 32    %505 = umin %503, %504 (0x40)
    con 32    %506 = iadd %500, %505

The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%

Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:08 +00:00
Ian Romanick
5667459ff1 nir/algebraic: Don't introduce undefined behavior in f2u conversion
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)

Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)

Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)

No fossil-db changes on Skylake.

Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
4338f7d033 nir/algebraic: Remove useless ftrunc inside f2i/f2u
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Ian Romanick
c49d6e0480 nir/algebraic: Elide range clamping of f2u sources
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.

v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.

v3: Add a missing is_a_number.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)

Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)

Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)

Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
2025-10-10 17:25:07 +00:00
Gert Wollny
3b3c3ccf56 nir+r600: add option to avoid contracting fabs into ffma
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On r600 ternary operations can't use the fabs source modifier, so
converting "fadd(fabs(fmul(a, b), c)" to  "ffma(fabs(a), fabs(b), c)"
adds one more instruction in the backend, hence avoid this.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37440>
2025-09-17 21:03:58 +00:00
Christian Gmeiner
a7d2570296 nir/opt_algebraic: optimize f2i32(fround_even(x)) to f2i32_rtne(x)
Add late optimization to fuse f2i32 and fround_even operations into a
single f2i32_rtne instruction when the intermediate fround_even result
is only used once. This eliminates redundant rounding since f2i32_rtne
performs round-to-nearest-even conversion directly.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Tested-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37426>
2025-09-17 20:31:59 +00:00
Karmjit Mahil
9c6183604f nir, ir3: Add lower_fmulz_with_abs_min backend option
This commits adds the `lower_fmulz_with_abs_min` which lowers
`fmulz` -> `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b`
`ffmaz` -> `min(abs(a), abs(b)) == 0.0 ? c : ffma(a, b, c)

This is useful for ISAs which have `abs` for free on `min` such as
ir3.

Adreno A750 Benchmark of 10 runs of 5 DX9 single frame trimmed
captures looped 2048 times using u_trace measuring
`start_render_pass` to `end_render_pass` results:

sysmem:
-1.91156%, -2.21791%, -2.02533%, -2.21666%, -2.33272%,
-2.67349%, -1.75278%, -2.05923%, -2.26892%, -2.10506%
Avg:  ~ -2.16%
ST.S: ~  0.25%

gmem:
-3.61496%, -3.66682%, -3.80901%, -3.51198%, -3.72950%,
-3.71413%, -3.64467%, -3.67092%, -3.90640%, -3.83888%
Avg:  ~ -3.71%
ST.S: ~  0.12%

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>
2025-09-17 15:02:50 +00:00
Karmjit Mahil
8d19ffef0a nir: Add more matches for fmulz
In some cases after other passes, `(a == 0.0 ? 0 : b)` can be
turned into `(a != 0.0 ? b : 0)`, so let's match those cases too.

Also matching `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b`.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>
2025-09-17 15:02:50 +00:00
Daniel Schürmann
c78f1d516c nir/algebraic: add pattern for (a << #b) * #c => a * (#c << #b)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Totals from 2545 (3.19% of 79839) affected shaders: (Navi48)

Instrs: 6371003 -> 6364130 (-0.11%); split: -0.12%, +0.01%
CodeSize: 33827548 -> 33812244 (-0.05%); split: -0.06%, +0.01%
Latency: 47451755 -> 47430108 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 10442450 -> 10437159 (-0.05%); split: -0.05%, +0.00%
SClause: 159829 -> 159874 (+0.03%); split: -0.01%, +0.04%
Copies: 500725 -> 500721 (-0.00%); split: -0.01%, +0.01%
PreSGPRs: 110482 -> 110478 (-0.00%); split: -0.00%, +0.00%
PreVGPRs: 147289 -> 147287 (-0.00%); split: -0.00%, +0.00%
VALU: 3456135 -> 3454241 (-0.05%); split: -0.06%, +0.01%
SALU: 925982 -> 923616 (-0.26%)
VOPD: 1243 -> 1212 (-2.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37173>
2025-09-06 10:18:42 +00:00
Christoph Pillmayer
f81f3c85e2 nir/opt_algebraic: Convert a + b + a to b + 2a
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This allows fusing into one FMA later.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37113>
2025-09-05 11:39:51 +00:00
Georg Lehmann
3b06824e4c nir/opt_algebraic: optimize some post peephole select patterns
Foz-DB GFX1201:
Totals from 208 (0.26% of 80287) affected shaders:
Instrs: 427684 -> 426834 (-0.20%); split: -0.22%, +0.02%
CodeSize: 2232616 -> 2228816 (-0.17%); split: -0.20%, +0.03%
Latency: 3993934 -> 3992726 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 569055 -> 568622 (-0.08%); split: -0.09%, +0.01%
SClause: 12932 -> 12927 (-0.04%)
Copies: 22567 -> 22604 (+0.16%); split: -0.47%, +0.63%
Branches: 7671 -> 7658 (-0.17%)
VALU: 222047 -> 221625 (-0.19%)
SALU: 83954 -> 83815 (-0.17%); split: -0.29%, +0.13%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36938>
2025-08-27 09:45:19 +00:00
Rhys Perry
46da666205 nir/algebraic: allow non-const for iand(iadd()) -> iadd(iand())
fossil-db (gfx1201):
Totals from 596 (0.75% of 79839) affected shaders:
Instrs: 691926 -> 691819 (-0.02%); split: -0.11%, +0.09%
CodeSize: 3675216 -> 3675180 (-0.00%); split: -0.08%, +0.08%
VGPRs: 37464 -> 37452 (-0.03%)
Latency: 8566849 -> 8563162 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 1068038 -> 1063279 (-0.45%); split: -0.46%, +0.01%
VClause: 17859 -> 17897 (+0.21%); split: -0.01%, +0.22%
SClause: 16704 -> 16735 (+0.19%); split: -0.07%, +0.26%
Copies: 45422 -> 45395 (-0.06%); split: -0.15%, +0.09%
PreSGPRs: 24345 -> 24351 (+0.02%)
PreVGPRs: 29121 -> 29128 (+0.02%)
VALU: 349959 -> 348117 (-0.53%); split: -0.54%, +0.01%
SALU: 105926 -> 107576 (+1.56%); split: -0.02%, +1.58%
VOPD: 252 -> 234 (-7.14%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>
2025-08-22 15:45:55 +00:00
Rhys Perry
4f83059ac5 nir/algebraic: improve is_unsigned_multiple_of_4 and use it more
fossil-db (gfx1201):
Totals from 160 (0.20% of 79839) affected shaders:
MaxWaves: 4008 -> 3952 (-1.40%)
Instrs: 390073 -> 379834 (-2.62%); split: -2.63%, +0.00%
CodeSize: 2126020 -> 2053740 (-3.40%); split: -3.40%, +0.00%
VGPRs: 9492 -> 9612 (+1.26%)
Latency: 6746019 -> 6723893 (-0.33%); split: -0.33%, +0.00%
InvThroughput: 849571 -> 848942 (-0.07%); split: -0.42%, +0.35%
VClause: 11977 -> 11983 (+0.05%); split: -0.20%, +0.25%
SClause: 11828 -> 11824 (-0.03%); split: -0.14%, +0.11%
Copies: 30003 -> 30938 (+3.12%); split: -0.09%, +3.20%
PreSGPRs: 8914 -> 8938 (+0.27%)
PreVGPRs: 7352 -> 7514 (+2.20%); split: -0.04%, +2.24%
VALU: 171829 -> 168829 (-1.75%); split: -1.76%, +0.01%
SALU: 66503 -> 66543 (+0.06%); split: -0.01%, +0.07%
VMEM: 29365 -> 25327 (-13.75%)
VOPD: 864 -> 1013 (+17.25%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>
2025-08-22 15:45:55 +00:00
Georg Lehmann
1d885fab9c nir/opt_algebraic: optimize pack_half_rtz of b2f
Foz-DB Navi21:
Totals from 13 (0.02% of 80255) affected shaders:
Instrs: 2313 -> 2306 (-0.30%); split: -0.35%, +0.04%
CodeSize: 13452 -> 13480 (+0.21%)
Latency: 12066 -> 12013 (-0.44%); split: -0.45%, +0.01%
InvThroughput: 2172 -> 2163 (-0.41%)
Copies: 112 -> 114 (+1.79%)
VALU: 1480 -> 1472 (-0.54%)
SALU: 154 -> 155 (+0.65%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
bc3b09c5dd nir/opt_algebraic: optimize pack_half_rtz of bcsel with constant
Foz-DB Navi21:
Totals from 448 (0.56% of 80255) affected shaders:
Instrs: 345474 -> 344791 (-0.20%); split: -0.20%, +0.00%
CodeSize: 1917784 -> 1913324 (-0.23%); split: -0.25%, +0.02%
VGPRs: 22344 -> 22416 (+0.32%)
Latency: 2320847 -> 2318161 (-0.12%); split: -0.13%, +0.01%
InvThroughput: 543008 -> 541722 (-0.24%)
SClause: 11450 -> 11459 (+0.08%)
Copies: 19991 -> 19949 (-0.21%); split: -0.23%, +0.02%
PreSGPRs: 19129 -> 19114 (-0.08%)
PreVGPRs: 19695 -> 19696 (+0.01%); split: -0.01%, +0.01%
VALU: 257627 -> 256948 (-0.26%)
SALU: 30432 -> 30422 (-0.03%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
8512479097 nir/opt_algebraic: create 16bit fmin/fmax if only used by pack_half_2x16_rtz_split
Foz-DB Navi21:
Totals from 1842 (2.30% of 80066) affected shaders:
Instrs: 869152 -> 866751 (-0.28%)
CodeSize: 4687316 -> 4682496 (-0.10%); split: -0.14%, +0.03%
VGPRs: 75216 -> 75312 (+0.13%)
Latency: 7297749 -> 7297929 (+0.00%); split: -0.01%, +0.02%
InvThroughput: 1864933 -> 1860706 (-0.23%); split: -0.23%, +0.00%
Copies: 52679 -> 52463 (-0.41%)
VALU: 665076 -> 662890 (-0.33%)
SALU: 56226 -> 56010 (-0.38%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
22afe83473 nir/opt_algebraic: remove fneg around fmin/fmax
Foz-DB Navi21:
Totals from 282 (0.35% of 80255) affected shaders:
Instrs: 310515 -> 309755 (-0.24%)
CodeSize: 1721236 -> 1714540 (-0.39%)
Latency: 1366446 -> 1365141 (-0.10%); split: -0.10%, +0.00%
InvThroughput: 352528 -> 351097 (-0.41%); split: -0.41%, +0.00%
Copies: 24623 -> 24630 (+0.03%)
VALU: 231716 -> 230951 (-0.33%)
SALU: 28774 -> 28779 (+0.02%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
cfd5fbfde1 nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16
Foz-DB Navi31:
Totals from 11 out of 14 FSR4 shaders:
Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21%
CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15%
Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00%
InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10%
VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91%
Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47%
VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21%
VOPD: 1916 -> 1400 (-26.93%)

These stats looks bad, but it's actually just unlucky RA.
Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16)
should still be a win from a register bandwidth perspective.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:30 +00:00
Georg Lehmann
261239a492 nir/opt_algebraic: use range analysis to detect no-op fmin/fmax
Foz-DB Navi31:
Totals from 418 (0.52% of 80273) affected shaders:
Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01%
CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01%
Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00%
Copies: 40126 -> 40125 (-0.00%)
VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01%
SALU: 50290 -> 50283 (-0.01%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:28 +00:00