Commit graph

654 commits

Author SHA1 Message Date
Gert Wollny
3b3c3ccf56 nir+r600: add option to avoid contracting fabs into ffma
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On r600 ternary operations can't use the fabs source modifier, so
converting "fadd(fabs(fmul(a, b), c)" to  "ffma(fabs(a), fabs(b), c)"
adds one more instruction in the backend, hence avoid this.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37440>
2025-09-17 21:03:58 +00:00
Christian Gmeiner
a7d2570296 nir/opt_algebraic: optimize f2i32(fround_even(x)) to f2i32_rtne(x)
Add late optimization to fuse f2i32 and fround_even operations into a
single f2i32_rtne instruction when the intermediate fround_even result
is only used once. This eliminates redundant rounding since f2i32_rtne
performs round-to-nearest-even conversion directly.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Tested-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37426>
2025-09-17 20:31:59 +00:00
Karmjit Mahil
9c6183604f nir, ir3: Add lower_fmulz_with_abs_min backend option
This commits adds the `lower_fmulz_with_abs_min` which lowers
`fmulz` -> `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b`
`ffmaz` -> `min(abs(a), abs(b)) == 0.0 ? c : ffma(a, b, c)

This is useful for ISAs which have `abs` for free on `min` such as
ir3.

Adreno A750 Benchmark of 10 runs of 5 DX9 single frame trimmed
captures looped 2048 times using u_trace measuring
`start_render_pass` to `end_render_pass` results:

sysmem:
-1.91156%, -2.21791%, -2.02533%, -2.21666%, -2.33272%,
-2.67349%, -1.75278%, -2.05923%, -2.26892%, -2.10506%
Avg:  ~ -2.16%
ST.S: ~  0.25%

gmem:
-3.61496%, -3.66682%, -3.80901%, -3.51198%, -3.72950%,
-3.71413%, -3.64467%, -3.67092%, -3.90640%, -3.83888%
Avg:  ~ -3.71%
ST.S: ~  0.12%

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>
2025-09-17 15:02:50 +00:00
Karmjit Mahil
8d19ffef0a nir: Add more matches for fmulz
In some cases after other passes, `(a == 0.0 ? 0 : b)` can be
turned into `(a != 0.0 ? b : 0)`, so let's match those cases too.

Also matching `min(abs(a), abs(b)) == 0.0 ? 0.0 : a * b`.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479>
2025-09-17 15:02:50 +00:00
Daniel Schürmann
c78f1d516c nir/algebraic: add pattern for (a << #b) * #c => a * (#c << #b)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Totals from 2545 (3.19% of 79839) affected shaders: (Navi48)

Instrs: 6371003 -> 6364130 (-0.11%); split: -0.12%, +0.01%
CodeSize: 33827548 -> 33812244 (-0.05%); split: -0.06%, +0.01%
Latency: 47451755 -> 47430108 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 10442450 -> 10437159 (-0.05%); split: -0.05%, +0.00%
SClause: 159829 -> 159874 (+0.03%); split: -0.01%, +0.04%
Copies: 500725 -> 500721 (-0.00%); split: -0.01%, +0.01%
PreSGPRs: 110482 -> 110478 (-0.00%); split: -0.00%, +0.00%
PreVGPRs: 147289 -> 147287 (-0.00%); split: -0.00%, +0.00%
VALU: 3456135 -> 3454241 (-0.05%); split: -0.06%, +0.01%
SALU: 925982 -> 923616 (-0.26%)
VOPD: 1243 -> 1212 (-2.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37173>
2025-09-06 10:18:42 +00:00
Christoph Pillmayer
f81f3c85e2 nir/opt_algebraic: Convert a + b + a to b + 2a
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This allows fusing into one FMA later.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37113>
2025-09-05 11:39:51 +00:00
Georg Lehmann
3b06824e4c nir/opt_algebraic: optimize some post peephole select patterns
Foz-DB GFX1201:
Totals from 208 (0.26% of 80287) affected shaders:
Instrs: 427684 -> 426834 (-0.20%); split: -0.22%, +0.02%
CodeSize: 2232616 -> 2228816 (-0.17%); split: -0.20%, +0.03%
Latency: 3993934 -> 3992726 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 569055 -> 568622 (-0.08%); split: -0.09%, +0.01%
SClause: 12932 -> 12927 (-0.04%)
Copies: 22567 -> 22604 (+0.16%); split: -0.47%, +0.63%
Branches: 7671 -> 7658 (-0.17%)
VALU: 222047 -> 221625 (-0.19%)
SALU: 83954 -> 83815 (-0.17%); split: -0.29%, +0.13%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36938>
2025-08-27 09:45:19 +00:00
Rhys Perry
46da666205 nir/algebraic: allow non-const for iand(iadd()) -> iadd(iand())
fossil-db (gfx1201):
Totals from 596 (0.75% of 79839) affected shaders:
Instrs: 691926 -> 691819 (-0.02%); split: -0.11%, +0.09%
CodeSize: 3675216 -> 3675180 (-0.00%); split: -0.08%, +0.08%
VGPRs: 37464 -> 37452 (-0.03%)
Latency: 8566849 -> 8563162 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 1068038 -> 1063279 (-0.45%); split: -0.46%, +0.01%
VClause: 17859 -> 17897 (+0.21%); split: -0.01%, +0.22%
SClause: 16704 -> 16735 (+0.19%); split: -0.07%, +0.26%
Copies: 45422 -> 45395 (-0.06%); split: -0.15%, +0.09%
PreSGPRs: 24345 -> 24351 (+0.02%)
PreVGPRs: 29121 -> 29128 (+0.02%)
VALU: 349959 -> 348117 (-0.53%); split: -0.54%, +0.01%
SALU: 105926 -> 107576 (+1.56%); split: -0.02%, +1.58%
VOPD: 252 -> 234 (-7.14%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>
2025-08-22 15:45:55 +00:00
Rhys Perry
4f83059ac5 nir/algebraic: improve is_unsigned_multiple_of_4 and use it more
fossil-db (gfx1201):
Totals from 160 (0.20% of 79839) affected shaders:
MaxWaves: 4008 -> 3952 (-1.40%)
Instrs: 390073 -> 379834 (-2.62%); split: -2.63%, +0.00%
CodeSize: 2126020 -> 2053740 (-3.40%); split: -3.40%, +0.00%
VGPRs: 9492 -> 9612 (+1.26%)
Latency: 6746019 -> 6723893 (-0.33%); split: -0.33%, +0.00%
InvThroughput: 849571 -> 848942 (-0.07%); split: -0.42%, +0.35%
VClause: 11977 -> 11983 (+0.05%); split: -0.20%, +0.25%
SClause: 11828 -> 11824 (-0.03%); split: -0.14%, +0.11%
Copies: 30003 -> 30938 (+3.12%); split: -0.09%, +3.20%
PreSGPRs: 8914 -> 8938 (+0.27%)
PreVGPRs: 7352 -> 7514 (+2.20%); split: -0.04%, +2.24%
VALU: 171829 -> 168829 (-1.75%); split: -1.76%, +0.01%
SALU: 66503 -> 66543 (+0.06%); split: -0.01%, +0.07%
VMEM: 29365 -> 25327 (-13.75%)
VOPD: 864 -> 1013 (+17.25%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>
2025-08-22 15:45:55 +00:00
Georg Lehmann
1d885fab9c nir/opt_algebraic: optimize pack_half_rtz of b2f
Foz-DB Navi21:
Totals from 13 (0.02% of 80255) affected shaders:
Instrs: 2313 -> 2306 (-0.30%); split: -0.35%, +0.04%
CodeSize: 13452 -> 13480 (+0.21%)
Latency: 12066 -> 12013 (-0.44%); split: -0.45%, +0.01%
InvThroughput: 2172 -> 2163 (-0.41%)
Copies: 112 -> 114 (+1.79%)
VALU: 1480 -> 1472 (-0.54%)
SALU: 154 -> 155 (+0.65%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
bc3b09c5dd nir/opt_algebraic: optimize pack_half_rtz of bcsel with constant
Foz-DB Navi21:
Totals from 448 (0.56% of 80255) affected shaders:
Instrs: 345474 -> 344791 (-0.20%); split: -0.20%, +0.00%
CodeSize: 1917784 -> 1913324 (-0.23%); split: -0.25%, +0.02%
VGPRs: 22344 -> 22416 (+0.32%)
Latency: 2320847 -> 2318161 (-0.12%); split: -0.13%, +0.01%
InvThroughput: 543008 -> 541722 (-0.24%)
SClause: 11450 -> 11459 (+0.08%)
Copies: 19991 -> 19949 (-0.21%); split: -0.23%, +0.02%
PreSGPRs: 19129 -> 19114 (-0.08%)
PreVGPRs: 19695 -> 19696 (+0.01%); split: -0.01%, +0.01%
VALU: 257627 -> 256948 (-0.26%)
SALU: 30432 -> 30422 (-0.03%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
8512479097 nir/opt_algebraic: create 16bit fmin/fmax if only used by pack_half_2x16_rtz_split
Foz-DB Navi21:
Totals from 1842 (2.30% of 80066) affected shaders:
Instrs: 869152 -> 866751 (-0.28%)
CodeSize: 4687316 -> 4682496 (-0.10%); split: -0.14%, +0.03%
VGPRs: 75216 -> 75312 (+0.13%)
Latency: 7297749 -> 7297929 (+0.00%); split: -0.01%, +0.02%
InvThroughput: 1864933 -> 1860706 (-0.23%); split: -0.23%, +0.00%
Copies: 52679 -> 52463 (-0.41%)
VALU: 665076 -> 662890 (-0.33%)
SALU: 56226 -> 56010 (-0.38%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
22afe83473 nir/opt_algebraic: remove fneg around fmin/fmax
Foz-DB Navi21:
Totals from 282 (0.35% of 80255) affected shaders:
Instrs: 310515 -> 309755 (-0.24%)
CodeSize: 1721236 -> 1714540 (-0.39%)
Latency: 1366446 -> 1365141 (-0.10%); split: -0.10%, +0.00%
InvThroughput: 352528 -> 351097 (-0.41%); split: -0.41%, +0.00%
Copies: 24623 -> 24630 (+0.03%)
VALU: 231716 -> 230951 (-0.33%)
SALU: 28774 -> 28779 (+0.02%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
cfd5fbfde1 nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16
Foz-DB Navi31:
Totals from 11 out of 14 FSR4 shaders:
Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21%
CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15%
Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00%
InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10%
VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91%
Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47%
VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21%
VOPD: 1916 -> 1400 (-26.93%)

These stats looks bad, but it's actually just unlucky RA.
Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16)
should still be a win from a register bandwidth perspective.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:30 +00:00
Georg Lehmann
261239a492 nir/opt_algebraic: use range analysis to detect no-op fmin/fmax
Foz-DB Navi31:
Totals from 418 (0.52% of 80273) affected shaders:
Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01%
CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01%
Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00%
Copies: 40126 -> 40125 (-0.00%)
VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01%
SALU: 50290 -> 50283 (-0.01%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:28 +00:00
Georg Lehmann
a0665e79e9 nir/opt_algebraic: push fsat into bcsel with constant
bcsel doesn't have a free clamp modifier on AMD hardware,
but what's inside might have free clamp.

Foz-DB Navi31:
Totals from 873 (1.09% of 80273) affected shaders:
MaxWaves: 22008 -> 21968 (-0.18%)
Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02%
CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01%
VGPRs: 57900 -> 57960 (+0.10%)
Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00%
VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14%
SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02%
Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10%
PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08%
VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01%
SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Georg Lehmann
e9e5146848 nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive
Foz-DB Navi31:
Totals from 946 (1.18% of 80273) affected shaders:
Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00%
CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00%
Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00%
VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00%
SClause: 109694 -> 109688 (-0.01%)
Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01%
Branches: 132636 -> 132633 (-0.00%)
PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00%
VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00%
SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Georg Lehmann
e43ef6533b nir/opt_algebraic: remove 8bit roundtrip when vectorizing i2i16(unpack_4x8(a).zw)
Explicit 16bit instructions are nicer to vectorize.

Helps FSR4 on GFX11 marginally.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 59781 -> 58518 (-2.11%)
CodeSize: 413428 -> 404156 (-2.24%)
Latency: 193770 -> 190768 (-1.55%)
InvThroughput: 226274 -> 221628 (-2.05%)
VClause: 796 -> 793 (-0.38%); split: -1.01%, +0.63%
Copies: 3342 -> 3008 (-9.99%); split: -11.01%, +1.02%
PreSGPRs: 312 -> 305 (-2.24%)
VALU: 51448 -> 50213 (-2.40%)
SALU: 1074 -> 1048 (-2.42%)
VOPD: 1783 -> 1718 (-3.65%); split: +0.95%, -4.60%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
Georg Lehmann
037c2532ab nir/opt_algebraic: create non 32bit bitfield_select
Foz-DB Navi21:
Totals from 68 (0.08% of 80255) affected shaders:
Instrs: 197878 -> 197709 (-0.09%); split: -0.09%, +0.00%
CodeSize: 1060700 -> 1060472 (-0.02%); split: -0.02%, +0.00%
Latency: 659865 -> 659673 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 117010 -> 116985 (-0.02%); split: -0.03%, +0.00%
VClause: 3781 -> 3779 (-0.05%)
Copies: 15317 -> 15265 (-0.34%); split: -0.35%, +0.01%
PreVGPRs: 3251 -> 3250 (-0.03%)
VALU: 96800 -> 96799 (-0.00%); split: -0.00%, +0.00%
SALU: 57006 -> 56836 (-0.30%); split: -0.30%, +0.00%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>
2025-07-21 20:42:32 +00:00
Alyssa Rosenzweig
421d0e0953 nir: mark exact fmul in ldexp lowering
this chain of fmul is deliberately chosen for floating point precision
reasons, it needs to be exact, or else we might try to reassociate it
and break subnormal handling.

avoids regressing dEQP-VK.glsl.builtin.precision.ldexp_subnormals.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
2025-07-21 11:42:18 +00:00
Alyssa Rosenzweig
042adf3cc5 nir/opt_algebraic: optimize signed pow in Control
used in a post-processing shader which goes 896 instrs -> 749 instrs.

In my Control fossil:

   Totals from 2 (0.63% of 319) affected shaders:
   Instrs: 2078 -> 1841 (-11.41%)
   CodeSize: 14540 -> 12800 (-11.97%)
   ALU: 1779 -> 1626 (-8.60%)
   FSCIB: 1779 -> 1626 (-8.60%)
   Uniforms: 370 -> 372 (+0.54%)

In radv_fossils, there are affected shaders in Dredge.

   Totals from 4 (0.01% of 54019) affected shaders:
   Instrs: 2306 -> 2294 (-0.52%)
   CodeSize: 16594 -> 16534 (-0.36%)
   ALU: 2010 -> 2004 (-0.30%)
   FSCIB: 2010 -> 2004 (-0.30%)
   Uniforms: 1138 -> 1146 (+0.70%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
2025-07-08 17:09:16 +00:00
Alyssa Rosenzweig
2765017553 nir: fuse ffma even with float controls
The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise,
because they are a contraction. However, they respect signed zero/Inf/NaN rules.
As such, it is legal to do this fusion with shader float controls as long as the
exact bit is not set (mapping to SPIR-V NoContract).

Unfortunately, NIR's imprecise rules do not distinguish between contraction
issues versus float special case issues, forcing nir_search to skip all
imprecise rules when any shader float control modes are used. This notably
affects DXVK, which sets shader float controls to get D3D11 float behaviour and
hence loses FMA fusing.

Therefore, we plumb in the exact bit to express NoContract independent of the
float controls, and weaken the requirement for fma fusion to allowable
contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in
SPIR-V is just a multiply add that we're allowed to contract rather than the
real deal.

Drivers that use their own FMA fusing passes (notably, Intel and AMD) are
unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results
on hk shown:

Totals from 2194 (4.06% of 54019) affected shaders:
MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01%
Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01%
CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01%
Spills: 1094 -> 747 (-31.72%)
Fills: 988 -> 681 (-31.07%)
Scratch: 4444 -> 3820 (-14.04%)
ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
IC: 215398 -> 215274 (-0.06%)
GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96%
Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04%
Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07%

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
2025-07-08 17:09:16 +00:00
Georg Lehmann
045ddb992a nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat
Helps vectorized emulated fp16 -> fp8 conversions

No Foz-DB changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>
2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig
4f7cae5e61 nir/opt_algebraic: add trichotomy identity
In https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802 we will
significantly rework geometry shaders & transform feedback. In the new approach,
transform feedback is executed as part of the hardware vertex shader, meaning
the vertex shader needs to write out all the "copies" of the same value into
different parts of the XFB buffer. In the general case of a GS writing triangle
strips, we get 0-3 copies. This is good and lets us parallelize XFB better with
GS.

In the case of a VS alone with XFB, we insert a passthrough GS. In that case
special case, we can only get at most 1 copy, so if we can prove the length of
the output strip is 3 we can delete 2/3 of the shader.

Anyway, the only thing preventing NIR from doing that optimization is failing to
see through some conditionals, fixed by optimizing with the law of trichotomy.

We could add other variants of this pattern (signed vs unsigned, iand vs
ior/ixor) if we expect anything else to hit this other than my boutique use
case.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
2025-06-30 16:24:04 +00:00
Emma Anholt
bc8994cb48 nir: Add a pass to reassociate multiplication of mat*mat*vec.
The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but
mat4*(mat4*vec4) is only 32.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35622>
2025-06-23 17:49:51 +00:00
Georg Lehmann
f047a67fba nir,aco: optimize FP16_OFVL pattern created by vkd3d-proton
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:27 +00:00
Georg Lehmann
ad80b554f4 spirv: use feq for OpIsInf
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This effectively reverts fcca6a83cd because feq was clarified to be ordered
when used with exact and without fast math flags.

It's common for HW to only have free abs for floating point instructions.

Foz-DB Navi21:
Totals from 63 (0.08% of 80065) affected shaders:
Instrs: 337027 -> 336667 (-0.11%); split: -0.12%, +0.02%
CodeSize: 1846752 -> 1845000 (-0.09%); split: -0.13%, +0.03%
Latency: 3401087 -> 3400633 (-0.01%); split: -0.04%, +0.03%
InvThroughput: 847299 -> 845939 (-0.16%); split: -0.19%, +0.03%
VClause: 7693 -> 7694 (+0.01%)
Copies: 45175 -> 45240 (+0.14%); split: -0.12%, +0.27%
PreSGPRs: 3555 -> 3553 (-0.06%)
PreVGPRs: 4565 -> 4564 (-0.02%)
VALU: 225473 -> 225245 (-0.10%); split: -0.13%, +0.03%
SALU: 44735 -> 44625 (-0.25%)

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35437>
2025-06-11 18:34:21 +00:00
Lionel Landwerlin
978933c015 nir/opt_algebraic: extend lowering for (i|u)bitfield_extract
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35334>
2025-06-04 16:28:39 +00:00
Samuel Pitoiset
226b0e28db nir: generalize bitfield insert/extract sizes
Original patch from Alyssa Rosenzweig

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35209>
2025-06-04 09:37:53 +00:00
Alyssa Rosenzweig
759dc70bde nir: generalize bitfield_reverse bit size
No reason we can't reverse other bit sizes, we just need to generalize the
constant folding & bit size lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35198>
2025-05-28 16:29:30 +00:00
Ian Romanick
37ee91679a nir/algebraic: Generalize an existing bfi(a, 0, ...) pattern
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210561118 -> 210560921 (-0.00%)
Send messages: 10979615 -> 10979613 (-0.00%)
Cycle count: 31576352808 -> 31576347218 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66068161 -> 66068157 (-0.00%)
Non SSA regs after NIR: 60230775 -> 60230949 (+0.00%)

Totals from 180 (0.03% of 707082) affected shaders:
Instrs: 68035 -> 67838 (-0.29%)
Send messages: 3190 -> 3188 (-0.06%)
Cycle count: 3979496 -> 3973906 (-0.14%); split: -0.14%, +0.00%
Max live registers: 11812 -> 11808 (-0.03%)
Non SSA regs after NIR: 18878 -> 19052 (+0.92%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
2025-05-16 14:49:25 -07:00
Ian Romanick
464955bbdd nir/algebraic: Optimize some open-coded extract_i8
These were initially observed in Hogwarts Legacy while working on
something else entirely. Two compute shaders in that app are helped
for spills and fills. On Skylake, one of the shaders benefits from
this change, and the other is hurt pretty significantly.

About 40 vertex shaders in Shadow of the Tomb Raider were helped for
instructions.

v2: Use ~0xff instead of 0xffffff00 to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.

v3: No, really, fix the various errors to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.

No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake)
Totals:
Instrs: 210566294 -> 210561118 (-0.00%)
Cycle count: 31582309052 -> 31576352808 (-0.02%); split: -0.02%, +0.00%
Spill count: 519300 -> 519280 (-0.00%)
Fill count: 625181 -> 625161 (-0.00%)
Scratch Memory Size: 36289536 -> 36281344 (-0.02%)
Max live registers: 66068413 -> 66068161 (-0.00%)
Non SSA regs after NIR: 60230773 -> 60230775 (+0.00%)

Totals from 1662 (0.24% of 707082) affected shaders:
Instrs: 635064 -> 629888 (-0.82%)
Cycle count: 36549632 -> 30593388 (-16.30%); split: -16.43%, +0.14%
Spill count: 246 -> 226 (-8.13%)
Fill count: 280 -> 260 (-7.14%)
Scratch Memory Size: 16384 -> 8192 (-50.00%)
Max live registers: 178491 -> 178239 (-0.14%)
Non SSA regs after NIR: 169552 -> 169554 (+0.00%)

Tiger Lake
Totals:
Instrs: 238544730 -> 238539407 (-0.00%)
Cycle count: 23679446097 -> 23673238578 (-0.03%); split: -0.03%, +0.00%
Max live registers: 42494925 -> 42494799 (-0.00%)
Non SSA regs after NIR: 63639071 -> 63639074 (+0.00%)

Totals from 1662 (0.21% of 802704) affected shaders:
Instrs: 626604 -> 621281 (-0.85%)
Cycle count: 26444363 -> 20236844 (-23.47%); split: -23.50%, +0.02%
Max live registers: 95405 -> 95279 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)

Ice Lake
Totals:
Instrs: 238855310 -> 238826534 (-0.01%)
Cycle count: 24952257277 -> 24944589398 (-0.03%); split: -0.03%, +0.00%
Spill count: 575510 -> 575117 (-0.07%)
Fill count: 713007 -> 708632 (-0.61%)
Max live registers: 42499556 -> 42499432 (-0.00%)
Non SSA regs after NIR: 64388747 -> 64388750 (+0.00%)

Totals from 1662 (0.21% of 805149) affected shaders:
Instrs: 926887 -> 898111 (-3.10%)
Cycle count: 67025583 -> 59357704 (-11.44%); split: -11.45%, +0.01%
Spill count: 5168 -> 4775 (-7.60%)
Fill count: 32883 -> 28508 (-13.30%)
Max live registers: 95614 -> 95490 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)

Skylake
Totals:
Instrs: 161904416 -> 161895239 (-0.01%); split: -0.01%, +0.00%
Cycle count: 20098067714 -> 20090767583 (-0.04%); split: -0.04%, +0.00%
Spill count: 525546 -> 525789 (+0.05%); split: -0.04%, +0.09%
Fill count: 603369 -> 602276 (-0.18%); split: -0.28%, +0.10%
Max live registers: 33895714 -> 33895590 (-0.00%)
Non SSA regs after NIR: 57348729 -> 57348730 (+0.00%)

Totals from 1655 (0.25% of 653734) affected shaders:
Instrs: 769979 -> 760802 (-1.19%); split: -1.83%, +0.64%
Cycle count: 51365416 -> 44065285 (-14.21%); split: -14.22%, +0.01%
Spill count: 4186 -> 4429 (+5.81%); split: -4.90%, +10.70%
Fill count: 16356 -> 15263 (-6.68%); split: -10.50%, +3.82%
Max live registers: 95115 -> 94991 (-0.13%)
Non SSA regs after NIR: 180797 -> 180798 (+0.00%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
2025-05-16 14:49:05 -07:00
Georg Lehmann
0a30611c10 nir/opt_algebraic: some bitfield_select optimizations
Foz-DB Navi21:
Totals from 47 (0.06% of 79789) affected shaders:
Instrs: 69536 -> 69363 (-0.25%)
CodeSize: 370624 -> 369388 (-0.33%)
Latency: 383505 -> 383298 (-0.05%)
InvThroughput: 72924 -> 72727 (-0.27%)
PreSGPRs: 2618 -> 2610 (-0.31%)
VALU: 43261 -> 43091 (-0.39%)
SALU: 13065 -> 13063 (-0.02%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34739>
2025-05-13 10:59:09 +00:00
Karol Herbst
f0fa2209a8 nir: add nir_opt_algebraic_integer_promotion
This handles basic operations where clang promotes integers to 32 bits
according to the C99 spec in OpenCL C source code.

This is its own opt_algerbraic pass, because we don't wanna fight with
nir_lower_bit_size.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>
2025-05-12 09:29:20 +00:00
Georg Lehmann
02e743c99e nir: add an option to lower bf2f and f2bf
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Rhys Perry
f538cae743 nir/algebraic: optimize ior(unpack_4x8, unpack_4x8<<8) to unpack_32_2x16
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Christian Gmeiner
f17d350001 lima: Move fdot lowering from NIR to lima
This change relocates the fdot lowering from the generic NIR to the lima,
since lima is the only consumer of this particular lowering. This avoids
potential conflicts with the similar fdot lowering already present in
nir_lower_alu_width.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34757>
2025-04-30 17:33:38 +00:00
Georg Lehmann
3e26fc4498 nir/opt_algebraic: disable fsat(a + 1.0) opt if a can be NaN
Foz-DB Navi21:
Totals from 9 (0.01% of 79789) affected shaders:
Instrs: 6782 -> 6796 (+0.21%); split: -0.03%, +0.24%
CodeSize: 40020 -> 40108 (+0.22%); split: -0.04%, +0.26%
Latency: 23764 -> 23758 (-0.03%)
InvThroughput: 6424 -> 6431 (+0.11%); split: -0.08%, +0.19%
SClause: 273 -> 275 (+0.73%)
Copies: 338 -> 339 (+0.30%)
VALU: 5138 -> 5147 (+0.18%); split: -0.06%, +0.23%
SALU: 349 -> 350 (+0.29%)
SMEM: 498 -> 500 (+0.40%)

Fixes: a4a3487aae ("nir/opt_algebraic: optimize patterns from Skia")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
8ad695195e nir/opt_algebraic: turn exact fmin(1.0, a) into fsat if a is not NaN and not negative
Foz-DB Navi21:
Totals from 2456 (3.08% of 79789) affected shaders:
Instrs: 3415398 -> 3413352 (-0.06%); split: -0.06%, +0.00%
CodeSize: 18781096 -> 18776092 (-0.03%); split: -0.03%, +0.00%
VGPRs: 158512 -> 158528 (+0.01%)
Latency: 39528900 -> 39526687 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 10612237 -> 10609296 (-0.03%); split: -0.03%, +0.00%
VClause: 71028 -> 71034 (+0.01%)
SClause: 93971 -> 93975 (+0.00%); split: -0.00%, +0.01%
Copies: 257525 -> 257521 (-0.00%); split: -0.01%, +0.01%
VALU: 2483374 -> 2481325 (-0.08%); split: -0.09%, +0.00%
SALU: 348207 -> 348211 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
18a0de1834 nir/opt_algebraic: optimize fmax(ffma(a, b, c), 0.0) to fsat
Foz-DB Navi21:
Totals from 2621 (3.28% of 79789) affected shaders:
MaxWaves: 55744 -> 55736 (-0.01%)
Instrs: 2840180 -> 2832647 (-0.27%); split: -0.27%, +0.00%
CodeSize: 15497364 -> 15464692 (-0.21%); split: -0.21%, +0.00%
VGPRs: 138448 -> 138456 (+0.01%)
Latency: 22319512 -> 22307018 (-0.06%); split: -0.06%, +0.01%
InvThroughput: 5745108 -> 5729197 (-0.28%); split: -0.28%, +0.00%
Copies: 110279 -> 110268 (-0.01%); split: -0.04%, +0.03%
VALU: 2210578 -> 2203211 (-0.33%); split: -0.33%, +0.00%
SALU: 169014 -> 168841 (-0.10%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
f71fc26393 nir/opt_algebraic: generalize fmax(fadd(a, b), 0.0) to fsat by not requiring fneg
Not a large effect, but it's positive and makes the pattern simpler.

Foz-DB Navi21:
Totals from 1 (0.00% of 79789) affected shaders:
Instrs: 145 -> 138 (-4.83%)
CodeSize: 784 -> 756 (-3.57%)
Latency: 1495 -> 1487 (-0.54%)
InvThroughput: 210 -> 196 (-6.67%)
VALU: 103 -> 96 (-6.80%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Ian Romanick
1d2ebeca17 nir/algebraic: Allow fmin(a,a) optimization when flush denorm to zero is not set
I was surprised this had any affect on Intel GPUs because we have been
unconditionally performing this optimization in the backend since June
2014.

Once that error is fixed (later in this MR), this change prevents a
couple dozen regressions in shader-db and around 90 regressions in
fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and
that can be a big deal.

v2: Add 64-bit too. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16970141 -> 16970139 (<.01%)
instructions in affected programs: 40 -> 38 (-5.00%)
helped: 2 / HURT: 0

total cycles in shared programs: 914617580 -> 914617548 (<.01%)
cycles in affected programs: 3428 -> 3396 (-0.93%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%)

Totals from 83 (0.01% of 706657) affected shaders:
Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02%
Non SSA regs after NIR: 78997 -> 78901 (-0.12%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
2025-04-15 23:59:31 +00:00
Georg Lehmann
d046ecf95a nir/opt_algebraic: optimize open coded ffract
Foz-DB Navi21:
Totals from 274 (0.34% of 79789) affected shaders:
Instrs: 522630 -> 522181 (-0.09%); split: -0.09%, +0.01%
CodeSize: 2880668 -> 2878940 (-0.06%); split: -0.07%, +0.01%
VGPRs: 14488 -> 14464 (-0.17%)
Latency: 4092358 -> 4091243 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 1014148 -> 1013471 (-0.07%); split: -0.07%, +0.00%
VClause: 11646 -> 11639 (-0.06%)
SClause: 18614 -> 18611 (-0.02%)
Copies: 56248 -> 56309 (+0.11%); split: -0.05%, +0.16%
PreVGPRs: 13649 -> 13647 (-0.01%)
VALU: 359733 -> 359285 (-0.12%); split: -0.13%, +0.01%
SALU: 59719 -> 59720 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33369>
2025-04-11 12:36:02 +00:00
Marek Olšák
1d5c42528b nir/opt_algebraic: lower 16-bit imul_high & umul_high
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34016>
2025-04-07 19:44:22 +00:00
Georg Lehmann
2b1fc1a7fe nir: add option to keep mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:15 +00:00
Georg Lehmann
b386659588 nir/opt_algebraic: create ubfe from (a & mask) >> c
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 917 (1.16% of 79188) affected shaders:
Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00%
CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00%
Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01%
VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00%
SClause: 62294 -> 62293 (-0.00%)
Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01%
VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00%
SALU: 359390 -> 358910 (-0.13%)
VMEM: 101966 -> 101924 (-0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>
2025-03-14 11:15:04 +00:00
Georg Lehmann
d272a6e261 nir/opt_algebraic: optimize d3d a ? b : 0
Foz-DB Navi21:
Totals from 3466 (4.34% of 79789) affected shaders:
MaxWaves: 73163 -> 73161 (-0.00%); split: +0.02%, -0.02%
Instrs: 3993862 -> 3987633 (-0.16%); split: -0.19%, +0.04%
CodeSize: 21747420 -> 21725620 (-0.10%); split: -0.15%, +0.05%
VGPRs: 190736 -> 190728 (-0.00%); split: -0.04%, +0.03%
SpillSGPRs: 489 -> 478 (-2.25%); split: -2.86%, +0.61%
Latency: 48169718 -> 48159068 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 12132999 -> 12128721 (-0.04%); split: -0.05%, +0.01%
VClause: 78063 -> 78052 (-0.01%); split: -0.09%, +0.08%
SClause: 109095 -> 108996 (-0.09%); split: -0.13%, +0.04%
Copies: 265784 -> 264530 (-0.47%); split: -0.72%, +0.25%
Branches: 84533 -> 84553 (+0.02%)
PreSGPRs: 172577 -> 172531 (-0.03%); split: -0.19%, +0.16%
PreVGPRs: 165776 -> 165825 (+0.03%); split: -0.06%, +0.09%
VALU: 2851544 -> 2850426 (-0.04%); split: -0.08%, +0.04%
SALU: 413543 -> 408408 (-1.24%); split: -1.45%, +0.21%
VMEM: 139890 -> 139887 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
2e7f34af6b nir/opt_algebraic: optimize more ine/ieq(umin(b2i, ), 0)
Foz-DB Navi21:
Totals from 76 (0.10% of 79789) affected shaders:
MaxWaves: 1050 -> 1062 (+1.14%)
Instrs: 113754 -> 113691 (-0.06%); split: -0.11%, +0.06%
CodeSize: 605096 -> 605216 (+0.02%); split: -0.03%, +0.05%
VGPRs: 6024 -> 5976 (-0.80%)
Latency: 1776501 -> 1777519 (+0.06%); split: -0.06%, +0.12%
InvThroughput: 379644 -> 376751 (-0.76%)
SClause: 2132 -> 2134 (+0.09%)
Copies: 4131 -> 4128 (-0.07%); split: -1.77%, +1.69%
PreSGPRs: 4275 -> 4270 (-0.12%)
PreVGPRs: 5568 -> 5526 (-0.75%)
VALU: 86732 -> 86581 (-0.17%); split: -0.24%, +0.07%
SALU: 7112 -> 7198 (+1.21%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Georg Lehmann
7bc3062a3b nir/opt_algebraic: push comparisons with constants into bcsel with constant
Foz-DB Navi21:
Totals from 1657 (2.08% of 79789) affected shaders:
MaxWaves: 30275 -> 30261 (-0.05%); split: +0.01%, -0.05%
Instrs: 3316251 -> 3315701 (-0.02%); split: -0.04%, +0.02%
CodeSize: 17831924 -> 17832020 (+0.00%); split: -0.06%, +0.06%
SpillSGPRs: 815 -> 859 (+5.40%)
SpillVGPRs: 3335 -> 3293 (-1.26%)
Scratch: 231424 -> 230400 (-0.44%)
Latency: 33413310 -> 33402751 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 9116062 -> 9112904 (-0.03%); split: -0.04%, +0.00%
VClause: 65587 -> 65560 (-0.04%); split: -0.05%, +0.01%
SClause: 86208 -> 86261 (+0.06%); split: -0.02%, +0.08%
Copies: 356158 -> 356439 (+0.08%); split: -0.07%, +0.15%
PreSGPRs: 101710 -> 101806 (+0.09%); split: -0.01%, +0.11%
PreVGPRs: 89293 -> 89286 (-0.01%); split: -0.04%, +0.04%
VALU: 2220900 -> 2218839 (-0.09%); split: -0.11%, +0.01%
SALU: 472988 -> 474567 (+0.33%); split: -0.08%, +0.42%
VMEM: 118401 -> 118347 (-0.05%)
SMEM: 123597 -> 123592 (-0.00%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00
Georg Lehmann
3837bc6d16 nir/opt_algebraic: optimize ~a == ~b and ~a == #b
Foz-DB Navi21:
Totals from 2 (0.00% of 79789) affected shaders:
Instrs: 8343 -> 8323 (-0.24%)
CodeSize: 43884 -> 43764 (-0.27%)
Latency: 19390 -> 19363 (-0.14%)
InvThroughput: 3380 -> 3356 (-0.71%)
VALU: 5413 -> 5393 (-0.37%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:27 +00:00