Commit graph

95 commits

Author SHA1 Message Date
Rhys Perry
ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`

There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used

Clarify that the parent here is where the src's def is used, not where
it's defined.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
2026-05-06 17:09:22 +00:00
Lorenzo Rossi
2a7d817591 nir/opt_algebraic: optimize fadd/fmul with 16-bit source and constant
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>
2026-04-30 17:33:09 +00:00
Karol Herbst
2e09b4ac68 nir: handle fmul_rtz in a couple of places
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>
2026-04-30 15:42:40 +00:00
Brandon Jones
d1dd65d425 nir/opt_algebraic: fix fabs optimization
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes a regression found in blender's unit testing, which called
fabs(-0.0) and invoked an NIR optimization that is was not valid for
the parameter -0.0. IEEE 754 requires that abs clear the sign bit
for the value -0.0.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060>
2026-04-21 04:10:29 +00:00
Georg Lehmann
643dd510d4 nir/opt_algebraic: optimize b2f(a) * b
When the multiplication is only used by fadd, it's not a clear win
because of potential fma fusion.

Totals from 8015 (6.99% of 114655) affected shaders:
MaxWaves: 199394 -> 199466 (+0.04%); split: +0.04%, -0.01%
Instrs: 17461518 -> 17451076 (-0.06%); split: -0.10%, +0.04%
CodeSize: 94779552 -> 94769828 (-0.01%); split: -0.07%, +0.06%
VGPRs: 526012 -> 525532 (-0.09%); split: -0.10%, +0.01%
SpillSGPRs: 12466 -> 12517 (+0.41%); split: -0.09%, +0.50%
Latency: 191274766 -> 191297394 (+0.01%); split: -0.03%, +0.04%
InvThroughput: 31465968 -> 31456785 (-0.03%); split: -0.07%, +0.04%
VClause: 312081 -> 312073 (-0.00%); split: -0.10%, +0.09%
SClause: 366914 -> 366906 (-0.00%); split: -0.02%, +0.01%
Copies: 1222482 -> 1221933 (-0.04%); split: -0.20%, +0.15%
Branches: 376651 -> 376577 (-0.02%); split: -0.03%, +0.01%
PreSGPRs: 442974 -> 443240 (+0.06%); split: -0.01%, +0.07%
PreVGPRs: 415964 -> 415668 (-0.07%); split: -0.09%, +0.02%
VALU: 9403517 -> 9393916 (-0.10%); split: -0.12%, +0.02%
SALU: 2799420 -> 2800430 (+0.04%); split: -0.13%, +0.16%
VOPD: 472826 -> 472347 (-0.10%); split: +0.09%, -0.19%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
2026-03-20 08:50:41 +00:00
Georg Lehmann
cadc74b5e2 nir/search_helpers: assume float sources without preserve flag can't be inf/nan
For example, this should let us avoid needing one pattern with is_a_number
and one with nnan.

Foz-DB Navi48:
Totals from 3564 (3.11% of 114655) affected shaders:
Instrs: 8256755 -> 8255042 (-0.02%); split: -0.02%, +0.00%
CodeSize: 43143184 -> 43123192 (-0.05%); split: -0.05%, +0.00%
VGPRs: 268252 -> 268240 (-0.00%)
Latency: 218890225 -> 218881157 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 31044516 -> 31042297 (-0.01%); split: -0.01%, +0.00%
VClause: 96074 -> 96067 (-0.01%); split: -0.01%, +0.00%
SClause: 218042 -> 218037 (-0.00%); split: -0.00%, +0.00%
Copies: 508677 -> 508661 (-0.00%); split: -0.01%, +0.01%
Branches: 148570 -> 148569 (-0.00%)
PreSGPRs: 228110 -> 228082 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 231996 -> 231982 (-0.01%)
VALU: 4516327 -> 4515321 (-0.02%); split: -0.02%, +0.00%
SALU: 1353696 -> 1353590 (-0.01%); split: -0.01%, +0.00%
VMEM: 182189 -> 182179 (-0.01%)
SMEM: 344771 -> 344756 (-0.00%)
VOPD: 29463 -> 29438 (-0.08%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
4885e5cf3a nir: remove more fsat using range analysis
Foz-DB Navi48:
Totals from 3018 (3.65% of 82636) affected shaders:
MaxWaves: 69274 -> 69280 (+0.01%)
Instrs: 7165414 -> 7157581 (-0.11%); split: -0.12%, +0.01%
CodeSize: 38890212 -> 38823132 (-0.17%); split: -0.18%, +0.00%
VGPRs: 228672 -> 228624 (-0.02%)
Latency: 64789026 -> 64784877 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 11805156 -> 11802642 (-0.02%); split: -0.02%, +0.00%
VClause: 136900 -> 136886 (-0.01%); split: -0.03%, +0.02%
SClause: 150135 -> 150130 (-0.00%); split: -0.01%, +0.01%
Copies: 574690 -> 574894 (+0.04%); split: -0.03%, +0.06%
Branches: 187169 -> 187086 (-0.04%); split: -0.04%, +0.00%
PreSGPRs: 190074 -> 190067 (-0.00%); split: -0.00%, +0.00%
PreVGPRs: 189564 -> 189538 (-0.01%); split: -0.02%, +0.00%
VALU: 3955188 -> 3949411 (-0.15%); split: -0.15%, +0.00%
SALU: 1114659 -> 1114729 (+0.01%); split: -0.02%, +0.03%
SMEM: 231080 -> 231077 (-0.00%); split: -0.00%, +0.00%
VOPD: 116150 -> 116180 (+0.03%); split: +0.04%, -0.02%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:45 +00:00
Georg Lehmann
506bb5a609 nir/search_helpers: use fp class analysis more
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:45 +00:00
Georg Lehmann
eb431efc19 nir/search_helpers: switch to fp class analysis
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Georg Lehmann
32b5719a9f nir/opt_algebraic: add is_not_uint_zero for b2i16(uge) pattern
More fallout from f2a59fdea6.
is_not_zero now always returns whether the result is a floating point zero.

When combined with the fp denorm handling that will be added to
floating point range analysis, this is false for many sensible integer values.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Georg Lehmann
bca5aab2be nir: let nir_analyze_fp_range take a nir_def
This is midly worse for vector constants, but so much simpler.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
2026-02-16 18:08:53 +00:00
Georg Lehmann
474af815ff nir: rename nir_analyze_range because it's float only
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39756>
2026-02-16 18:08:53 +00:00
Georg Lehmann
486ea54184 nir/opt_algebraic: make bcsel(fcmp(b, a), b, a) -> fmin/fmax patterns exact
These patterns need is_only_used_as_float because fmin/fmax might change NaN
patterns, while bcsel is bit exact. For the same reason, the replacement
must not add undefined results, so make the replacement NaN/inf preserving.

It's impossible to make them signed zero correct (-0.0 == +0.0),
so it's also important that the user alu doesn't care.

Otherwise, the only thing that matters is is whether a is NaN.

Foz-DB Navi48:
Totals from 453 (0.55% of 82405) affected shaders:
MaxWaves: 8242 -> 8270 (+0.34%)
Instrs: 2382059 -> 2380094 (-0.08%); split: -0.09%, +0.00%
CodeSize: 13197208 -> 13179488 (-0.13%); split: -0.14%, +0.00%
VGPRs: 44688 -> 44604 (-0.19%)
Latency: 22839894 -> 22838985 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 4873352 -> 4872924 (-0.01%)
VClause: 50862 -> 50883 (+0.04%); split: -0.02%, +0.06%
SClause: 54000 -> 53993 (-0.01%)
Copies: 250215 -> 250233 (+0.01%); split: -0.00%, +0.01%
PreVGPRs: 39694 -> 39620 (-0.19%)
VALU: 1116881 -> 1116073 (-0.07%); split: -0.07%, +0.00%
SALU: 492799 -> 492139 (-0.13%); split: -0.14%, +0.00%
VOPD: 85457 -> 85461 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>
2026-02-10 18:42:03 +00:00
Georg Lehmann
b2d9615000 nir/opt_algebraic: optimize bcsel to hi 16bits with undef lo
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39412>
2026-01-26 10:54:20 +00:00
Rhys Perry
625afb0d29 nir: add fcanonicalize
v2(Georg Lehmann):
Always remove fcanonicalize if denorms must be neither flushed nor preserved.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>
2026-01-19 16:11:29 +00:00
Emma Anholt
8ebe630a13 nir/search_helpers: Avoid UB in is_2x_16_bits()/is_neg2x_16_bits().
Same trick we do for nir_imul evaluation -- do the multiply in unsigned to
get defined behavior from C.  Fixes UBSan failures with
nir_opt_algebraic_pattern_tests.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>
2026-01-15 19:09:37 +00:00
Georg Lehmann
93d05cdfd8 nir/opt_algebraic: move fsat last for fsqrt(fsat(a))
This should be exact, even for all special values:

fsqrt(NaN) -> NaN
fsqrt(-0.0) -> 0.0
fsqrt(-Inf) -> NaN
fsqrt(negative finite) -> NaN

So all of these get saturated to +0.0

All numbers >= 1.0 will have a square root >= 1.0,
which will be saturate to 1.0

Moving the fsat guarantees that it can use an output modifier
for hardware that has those, and shouldn't harm other hardware either.

Foz-DB Navi21:
Totals from 255 (0.31% of 82151) affected shaders:
Instrs: 664906 -> 664194 (-0.11%)
CodeSize: 3623500 -> 3619188 (-0.12%)
Latency: 11336397 -> 11335688 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 2716430 -> 2715726 (-0.03%); split: -0.03%, +0.00%
VALU: 442603 -> 441891 (-0.16%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39202>
2026-01-09 07:34:46 +00:00
Georg Lehmann
c8ce0df2d2 nir/opt_algebraic: replace is_negative_zero with constant -0.0
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Now that nir_search respects the sign of zero, we don't need
a manual helper for this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39123>
2026-01-03 12:42:23 +00:00
Pavel Ondračka
0b39b5ea63 nir/opt_algebraic: improve dot product narrowing
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The issue is that the current narrowing patterns are not working in a lot
of cases, for example
(('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
is missing patterns like this:

32x3   %1 = load_const (0x3f800000, 0x00000000, 0x00000000) = (1.000000, 0.000000, 0.000000)
32x4   %7 = vec4 %6, %2 (0x0), %2 (0x0), %2 (0x0)
32    %19 = fdot3 %1 (1.000000, 0.000000, 0.000000), %7.xyz

or after some later transforms:

32x2   %0 = load_const (0x3f800000, 0x00000000) = (1.000000, 0.000000)
32x2   %6 = vec2 %5, %1 (0x0)
32    %18 = fdot3 %0 (1.000000, 0.000000).xyy, %6.xyy

This patch is heavily based on old branch from Ian Romanick from 2019.

r300 RV530 shader-db:
total instructions in shared programs: 128900 -> 128882 (-0.01%)
instructions in affected programs: 621 -> 603 (-2.90%)
helped: 10
HURT: 1
total cycles in shared programs: 191837 -> 191828 (<.01%)
cycles in affected programs: 799 -> 790 (-1.13%)
helped: 7
HURT: 1

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39068>
2026-01-02 16:07:10 +01:00
Konstantin Seurer
de32f9275f treewide: add & use parent instr helpers
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:

* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*

Plus nir_def_instr() where there's no more suitable helper.

Also an existing helper is renamed to unify all the names, while we're
churning the tree:

* nir_src_as_alu_instr -> nir_src_as_alu

..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.

Acked-by: Marek Olšák <maraeo@gmail.com>

---

To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.

Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
2025-11-12 21:22:13 +00:00
Rhys Perry
4f83059ac5 nir/algebraic: improve is_unsigned_multiple_of_4 and use it more
fossil-db (gfx1201):
Totals from 160 (0.20% of 79839) affected shaders:
MaxWaves: 4008 -> 3952 (-1.40%)
Instrs: 390073 -> 379834 (-2.62%); split: -2.63%, +0.00%
CodeSize: 2126020 -> 2053740 (-3.40%); split: -3.40%, +0.00%
VGPRs: 9492 -> 9612 (+1.26%)
Latency: 6746019 -> 6723893 (-0.33%); split: -0.33%, +0.00%
InvThroughput: 849571 -> 848942 (-0.07%); split: -0.42%, +0.35%
VClause: 11977 -> 11983 (+0.05%); split: -0.20%, +0.25%
SClause: 11828 -> 11824 (-0.03%); split: -0.14%, +0.11%
Copies: 30003 -> 30938 (+3.12%); split: -0.09%, +3.20%
PreSGPRs: 8914 -> 8938 (+0.27%)
PreVGPRs: 7352 -> 7514 (+2.20%); split: -0.04%, +2.24%
VALU: 171829 -> 168829 (-1.75%); split: -1.76%, +0.01%
SALU: 66503 -> 66543 (+0.06%); split: -0.01%, +0.07%
VMEM: 29365 -> 25327 (-13.75%)
VOPD: 864 -> 1013 (+17.25%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>
2025-08-22 15:45:55 +00:00
Rhys Perry
2a12624532 nir/search: add nir_search_state
A future commit will add another hash table.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36760>
2025-08-22 15:45:55 +00:00
Georg Lehmann
e43ef6533b nir/opt_algebraic: remove 8bit roundtrip when vectorizing i2i16(unpack_4x8(a).zw)
Explicit 16bit instructions are nicer to vectorize.

Helps FSR4 on GFX11 marginally.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 59781 -> 58518 (-2.11%)
CodeSize: 413428 -> 404156 (-2.24%)
Latency: 193770 -> 190768 (-1.55%)
InvThroughput: 226274 -> 221628 (-2.05%)
VClause: 796 -> 793 (-0.38%); split: -1.01%, +0.63%
Copies: 3342 -> 3008 (-9.99%); split: -11.01%, +1.02%
PreSGPRs: 312 -> 305 (-2.24%)
VALU: 51448 -> 50213 (-2.40%)
SALU: 1074 -> 1048 (-2.42%)
VOPD: 1783 -> 1718 (-3.65%); split: +0.95%, -4.60%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
Georg Lehmann
045ddb992a nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat
Helps vectorized emulated fp16 -> fp8 conversions

No Foz-DB changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>
2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig
6efe557718 nir/search_helpers: add has_multiple_uses helper
heuristic for the next patch.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>
2025-06-26 16:41:55 +00:00
Marek Olšák
c3034fa82c amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>
2025-06-04 17:46:38 +00:00
Georg Lehmann
b386659588 nir/opt_algebraic: create ubfe from (a & mask) >> c
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 917 (1.16% of 79188) affected shaders:
Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00%
CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00%
Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01%
VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00%
SClause: 62294 -> 62293 (-0.00%)
Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01%
VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00%
SALU: 359390 -> 358910 (-0.13%)
VMEM: 101966 -> 101924 (-0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>
2025-03-14 11:15:04 +00:00
Georg Lehmann
5da76df4cd nir/search_helpers: check tex source type in is_only_used_as_float
Foz-DB Navi21:
Totals from 164 (0.21% of 79377) affected shaders:
Instrs: 197477 -> 197035 (-0.22%); split: -0.23%, +0.01%
CodeSize: 1052944 -> 1051140 (-0.17%); split: -0.18%, +0.01%
VGPRs: 8104 -> 8080 (-0.30%)
Latency: 1115663 -> 1115567 (-0.01%); split: -0.06%, +0.05%
InvThroughput: 265822 -> 265158 (-0.25%); split: -0.26%, +0.01%
VClause: 3792 -> 3789 (-0.08%); split: -0.11%, +0.03%
SClause: 5738 -> 5744 (+0.10%); split: -0.02%, +0.12%
Copies: 12223 -> 12200 (-0.19%); split: -0.53%, +0.34%
PreVGPRs: 6807 -> 6801 (-0.09%); split: -0.15%, +0.06%
VALU: 139206 -> 138785 (-0.30%); split: -0.31%, +0.01%
SALU: 27852 -> 27853 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>
2025-02-24 16:34:53 +00:00
Georg Lehmann
3d8585e4fc nir/search_helpers: look through vecs in is_only_used_as_float
Will be useful with the next commit, or for backends that don't lower
alu to scalar.

No changes on Navi21.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>
2025-02-24 16:34:53 +00:00
Mel Henning
0470643047 nak,nir: Add 32-bit nir_op_lea_nv and use it
Changes code size by -0.80% on shaderdb.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32517>
2025-02-13 17:36:41 +00:00
Alyssa Rosenzweig
be049e1c14 nir/search_helpers: handle bcsel in is_only_used_as_float
this lets algebraic see through chains of instructions.

v2: Limit recursion depth (Georg).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Job Noorman
1333af5d77 nir/search: add is_only_used_by_{iand,ior} helpers
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Job Noorman
a8c947df9a nir/search: make is_only_used_by_iadd reusable
The algorithm is exactly the same for other opcodes so we don't have to
have to copy paste it.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Rhys Perry
b8c8482dbb nir/algebraic: add ddxy to is_only_used_as_float
The sources for these intrinsics are floating point.

fossil-db (navi21):
Totals from 67 (0.08% of 79395) affected shaders:
MaxWaves: 1128 -> 1116 (-1.06%)
Instrs: 113552 -> 113319 (-0.21%); split: -0.21%, +0.01%
CodeSize: 595248 -> 593360 (-0.32%)
VGPRs: 4344 -> 4392 (+1.10%)
Latency: 578158 -> 577526 (-0.11%); split: -0.18%, +0.07%
InvThroughput: 170150 -> 169908 (-0.14%); split: -0.23%, +0.09%
SClause: 3787 -> 3780 (-0.18%)
Copies: 4305 -> 4294 (-0.26%); split: -0.51%, +0.26%
PreVGPRs: 3883 -> 3925 (+1.08%)
VALU: 90007 -> 89774 (-0.26%); split: -0.27%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145>
2024-11-21 14:50:45 +00:00
Timur Kristóf
be68aeafdc nir/opt_algebraic: Add various bitfield extract patterns.
v2 (Georg Lehmann):
- fixed incorrect imin in ubfe_ubfe
- simplied outer_bits of ushr((ubfe, ...), ...) opt
- added is_used_once to iand(ushr(), ...) opt to improve stats

For-DB Navi21:
Totals from 3309 (4.18% of 79206) affected shaders:
Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03%
CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06%
Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01%
InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01%
VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02%
SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01%
Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09%
Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01%
PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01%
PreVGPRs: 151477 -> 151474 (-0.00%)
VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04%
SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03%
VMEM: 188035 -> 188060 (+0.01%)
SMEM: 259632 -> 259630 (-0.00%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>
2024-10-29 10:51:09 +00:00
Christian Gmeiner
87786a7a7e nak: Move imad late optimization to nir
It is more or less just a code move, but I touched
is_only_used_by_iadd(..) to match the style of the other functions in
that file.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>
2024-07-12 05:54:46 +00:00
Georg Lehmann
98cc57bccb nir/optimize cmp(a, -0.0)
+0.0 can use an inline constant for AMD hardware,
-0.0 needs a literal.

Foz-DB Navi21:
Totals from 1014 (1.28% of 79395) affected shaders:
Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00%
CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00%
Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00%
VClause: 79475 -> 79478 (+0.00%)
SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00%
Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00%
PreSGPRs: 59155 -> 59144 (-0.02%)
VALU: 2032016 -> 2032034 (+0.00%)
SALU: 386424 -> 385729 (-0.18%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
2024-06-27 08:12:30 +00:00
Ian Romanick
4834df82e2 nir/algebraic: More patterns to generate iadd3
I noticed some shaders with patterns similar to these while working on
cooperative matrix lowering.

Meteor Lake and DG2 are the only platforms that support iadd3, so there
were no shader-db or fossil-db changes on any other platforms.

shader-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19869445 -> 19868343 (<.01%)
instructions in affected programs: 419426 -> 418324 (-0.26%)
helped: 913 / HURT: 2

total cycles in shared programs: 936010029 -> 935909811 (-0.01%)
cycles in affected programs: 31746523 -> 31646305 (-0.32%)
helped: 495 / HURT: 356

LOST:   10
GAINED: 12

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04%
Spill count: 146887 -> 146886 (-0.00%)
Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00%
Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5550128 -> 5550368 (+0.00%)

Totals from 4401 (0.70% of 632560) affected shaders:
Instrs: 3095239 -> 3086109 (-0.29%); split: -0.30%, +0.00%
Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10%
Spill count: 28105 -> 28104 (-0.00%)
Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02%
Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22%
Max dispatch width: 43768 -> 44008 (+0.55%)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Ian Romanick
f1b941aaec nir/search: Refactor is_16_bits
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Suggested-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Ian Romanick
6e53be2a0a nir/search: Fix is_16_bits for vectors
Require that all elements of a vector be representable as either
int16_t or uint16_t.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixes: 7ef45e661f ("intel/fs: Add constant propagation for ADD3")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Job Noorman
96c2fe3e1a nir: add search helper is_only_used_by_if
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
2024-03-01 13:45:11 +00:00
Alyssa Rosenzweig
c39896b17b nir: Use getters for nir_src::parent_*
First, we need to give the parent_instr field a unique name to be able to
replace with a helper.  We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.

This was done with a combination of sed and manual fix-ups.

Then we use semantic patches plus manual fixups:

    @@
    expression s;
    @@

    -s->renamed_parent_instr
    +nir_src_parent_instr(s)

    @@
    expression s;
    @@

    -s.renamed_parent_instr
    +nir_src_parent_instr(&s)

    @@
    expression s;
    @@

    -s->parent_if
    +nir_src_parent_if(s)

    @@
    expression s;
    @@

    -s.renamed_parent_if
    +nir_src_parent_if(&s)

    @@
    expression s;
    @@

    -s->is_if
    +nir_src_is_if(s)

    @@
    expression s;
    @@

    -s.is_if
    +nir_src_is_if(&s)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
2023-10-10 04:58:05 -04:00
Marek Olšák
1ac379c4a0 nir/algebraic: collapse ALU opcodes sourcing NaN
Undef will be replaced by NaN whenever it leads to elimination of FP
instructions. This implements the elimination part.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>
2023-08-19 14:18:52 -04:00
Faith Ekstrand
6c1d32581a nir: Drop nir_alu_dest
Instead, we replace it directly with nir_def.  We could replace it with
nir_dest but the next commit gets rid of that so this avoids unnecessary
churn.  Most of this commit was generated by sed:

   sed -i -e 's/dest.dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp

There were a few manual fixups required in the nir_legacy.c and
nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a
similar pattern.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Faith Ekstrand
777d336b1f nir: clang-format src/compiler/nir/*.[ch]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24382>
2023-08-12 19:27:28 +00:00
Ian Romanick
de60b463d7 nir/algebraic: Simplify various trivial bfi
These are mostly just obvious patterns that somebody will eventually
want to add.

DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0

total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)

Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
7ef45e661f intel/fs: Add constant propagation for ADD3
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.

v3: Remove redundant patterns. Noticed by Matt.

shader-db:

DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15

total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32

Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.

All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.

fossil-db:

DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210

Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922

Lost: 90

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Alyssa Rosenzweig
7f6491b76d nir: Combine if_uses with instruction uses
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists.  That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.

To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.

However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.

There's a bit of churn, but this is largely a mechanical set of changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
2023-04-07 23:48:03 +00:00
Rhys Perry
368be87255 nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half
fossil-db (navi21):
Totals from 457 (0.34% of 135636) affected shaders:
Instrs: 259349 -> 250383 (-3.46%)
CodeSize: 1411976 -> 1369136 (-3.03%)
Latency: 2175961 -> 2148158 (-1.28%)
InvThroughput: 502206 -> 490244 (-2.38%)
Copies: 15238 -> 15232 (-0.04%); split: -0.07%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19748>
2022-11-21 17:34:46 +00:00