Commit graph

72 commits

Author SHA1 Message Date
Georg Lehmann
045ddb992a nir/opt_algebraic: optimize 16bit vec2 comparison followed by b2i16 using usub_sat
Helps vectorized emulated fp16 -> fp8 conversions

No Foz-DB changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35876>
2025-07-03 20:08:39 +00:00
Alyssa Rosenzweig
6efe557718 nir/search_helpers: add has_multiple_uses helper
heuristic for the next patch.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35720>
2025-06-26 16:41:55 +00:00
Marek Olšák
c3034fa82c amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>
2025-06-04 17:46:38 +00:00
Georg Lehmann
b386659588 nir/opt_algebraic: create ubfe from (a & mask) >> c
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 917 (1.16% of 79188) affected shaders:
Instrs: 2549482 -> 2544997 (-0.18%); split: -0.18%, +0.00%
CodeSize: 13781648 -> 13763616 (-0.13%); split: -0.13%, +0.00%
Latency: 24832087 -> 24825199 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 5921339 -> 5914799 (-0.11%); split: -0.12%, +0.01%
VClause: 59910 -> 59898 (-0.02%); split: -0.02%, +0.00%
SClause: 62294 -> 62293 (-0.00%)
Copies: 221015 -> 220988 (-0.01%); split: -0.02%, +0.01%
VALU: 1717280 -> 1713332 (-0.23%); split: -0.23%, +0.00%
SALU: 359390 -> 358910 (-0.13%)
VMEM: 101966 -> 101924 (-0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33455>
2025-03-14 11:15:04 +00:00
Georg Lehmann
5da76df4cd nir/search_helpers: check tex source type in is_only_used_as_float
Foz-DB Navi21:
Totals from 164 (0.21% of 79377) affected shaders:
Instrs: 197477 -> 197035 (-0.22%); split: -0.23%, +0.01%
CodeSize: 1052944 -> 1051140 (-0.17%); split: -0.18%, +0.01%
VGPRs: 8104 -> 8080 (-0.30%)
Latency: 1115663 -> 1115567 (-0.01%); split: -0.06%, +0.05%
InvThroughput: 265822 -> 265158 (-0.25%); split: -0.26%, +0.01%
VClause: 3792 -> 3789 (-0.08%); split: -0.11%, +0.03%
SClause: 5738 -> 5744 (+0.10%); split: -0.02%, +0.12%
Copies: 12223 -> 12200 (-0.19%); split: -0.53%, +0.34%
PreVGPRs: 6807 -> 6801 (-0.09%); split: -0.15%, +0.06%
VALU: 139206 -> 138785 (-0.30%); split: -0.31%, +0.01%
SALU: 27852 -> 27853 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>
2025-02-24 16:34:53 +00:00
Georg Lehmann
3d8585e4fc nir/search_helpers: look through vecs in is_only_used_as_float
Will be useful with the next commit, or for backends that don't lower
alu to scalar.

No changes on Navi21.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33674>
2025-02-24 16:34:53 +00:00
Mel Henning
0470643047 nak,nir: Add 32-bit nir_op_lea_nv and use it
Changes code size by -0.80% on shaderdb.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32517>
2025-02-13 17:36:41 +00:00
Alyssa Rosenzweig
be049e1c14 nir/search_helpers: handle bcsel in is_only_used_as_float
this lets algebraic see through chains of instructions.

v2: Limit recursion depth (Georg).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Job Noorman
1333af5d77 nir/search: add is_only_used_by_{iand,ior} helpers
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Job Noorman
a8c947df9a nir/search: make is_only_used_by_iadd reusable
The algorithm is exactly the same for other opcodes so we don't have to
have to copy paste it.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Rhys Perry
b8c8482dbb nir/algebraic: add ddxy to is_only_used_as_float
The sources for these intrinsics are floating point.

fossil-db (navi21):
Totals from 67 (0.08% of 79395) affected shaders:
MaxWaves: 1128 -> 1116 (-1.06%)
Instrs: 113552 -> 113319 (-0.21%); split: -0.21%, +0.01%
CodeSize: 595248 -> 593360 (-0.32%)
VGPRs: 4344 -> 4392 (+1.10%)
Latency: 578158 -> 577526 (-0.11%); split: -0.18%, +0.07%
InvThroughput: 170150 -> 169908 (-0.14%); split: -0.23%, +0.09%
SClause: 3787 -> 3780 (-0.18%)
Copies: 4305 -> 4294 (-0.26%); split: -0.51%, +0.26%
PreVGPRs: 3883 -> 3925 (+1.08%)
VALU: 90007 -> 89774 (-0.26%); split: -0.27%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32145>
2024-11-21 14:50:45 +00:00
Timur Kristóf
be68aeafdc nir/opt_algebraic: Add various bitfield extract patterns.
v2 (Georg Lehmann):
- fixed incorrect imin in ubfe_ubfe
- simplied outer_bits of ushr((ubfe, ...), ...) opt
- added is_used_once to iand(ushr(), ...) opt to improve stats

For-DB Navi21:
Totals from 3309 (4.18% of 79206) affected shaders:
Instrs: 5295291 -> 5282128 (-0.25%); split: -0.28%, +0.03%
CodeSize: 28299320 -> 28298456 (-0.00%); split: -0.07%, +0.06%
Latency: 51566173 -> 51521923 (-0.09%); split: -0.09%, +0.01%
InvThroughput: 13222050 -> 13204557 (-0.13%); split: -0.14%, +0.01%
VClause: 116451 -> 116458 (+0.01%); split: -0.02%, +0.02%
SClause: 160356 -> 160324 (-0.02%); split: -0.03%, +0.01%
Copies: 424152 -> 423670 (-0.11%); split: -0.20%, +0.09%
Branches: 156701 -> 156192 (-0.32%); split: -0.33%, +0.01%
PreSGPRs: 168507 -> 168500 (-0.00%); split: -0.02%, +0.01%
PreVGPRs: 151477 -> 151474 (-0.00%)
VALU: 3486077 -> 3476675 (-0.27%); split: -0.31%, +0.04%
SALU: 786467 -> 783109 (-0.43%); split: -0.45%, +0.03%
VMEM: 188035 -> 188060 (+0.01%)
SMEM: 259632 -> 259630 (-0.00%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31852>
2024-10-29 10:51:09 +00:00
Christian Gmeiner
87786a7a7e nak: Move imad late optimization to nir
It is more or less just a code move, but I touched
is_only_used_by_iadd(..) to match the style of the other functions in
that file.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>
2024-07-12 05:54:46 +00:00
Georg Lehmann
98cc57bccb nir/optimize cmp(a, -0.0)
+0.0 can use an inline constant for AMD hardware,
-0.0 needs a literal.

Foz-DB Navi21:
Totals from 1014 (1.28% of 79395) affected shaders:
Instrs: 3037490 -> 3036849 (-0.02%); split: -0.02%, +0.00%
CodeSize: 17060228 -> 17051276 (-0.05%); split: -0.05%, +0.00%
Latency: 45916788 -> 45916600 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 12982201 -> 12982187 (-0.00%); split: -0.00%, +0.00%
VClause: 79475 -> 79478 (+0.00%)
SClause: 119935 -> 119934 (-0.00%); split: -0.00%, +0.00%
Copies: 301641 -> 300964 (-0.22%); split: -0.23%, +0.00%
PreSGPRs: 59155 -> 59144 (-0.02%)
VALU: 2032016 -> 2032034 (+0.00%)
SALU: 386424 -> 385729 (-0.18%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29467>
2024-06-27 08:12:30 +00:00
Ian Romanick
4834df82e2 nir/algebraic: More patterns to generate iadd3
I noticed some shaders with patterns similar to these while working on
cooperative matrix lowering.

Meteor Lake and DG2 are the only platforms that support iadd3, so there
were no shader-db or fossil-db changes on any other platforms.

shader-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19869445 -> 19868343 (<.01%)
instructions in affected programs: 419426 -> 418324 (-0.26%)
helped: 913 / HURT: 2

total cycles in shared programs: 936010029 -> 935909811 (-0.01%)
cycles in affected programs: 31746523 -> 31646305 (-0.32%)
helped: 495 / HURT: 356

LOST:   10
GAINED: 12

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154514596 -> 154505466 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17540226067 -> 17436266198 (-0.59%); split: -0.63%, +0.04%
Spill count: 146887 -> 146886 (-0.00%)
Fill count: 272499 -> 272489 (-0.00%); split: -0.01%, +0.00%
Max live registers: 32634290 -> 32634739 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5550128 -> 5550368 (+0.00%)

Totals from 4401 (0.70% of 632560) affected shaders:
Instrs: 3095239 -> 3086109 (-0.29%); split: -0.30%, +0.00%
Cycle count: 7327352564 -> 7223392695 (-1.42%); split: -1.51%, +0.10%
Spill count: 28105 -> 28104 (-0.00%)
Fill count: 45830 -> 45820 (-0.02%); split: -0.04%, +0.02%
Max live registers: 264376 -> 264825 (+0.17%); split: -0.05%, +0.22%
Max dispatch width: 43768 -> 44008 (+0.55%)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Ian Romanick
f1b941aaec nir/search: Refactor is_16_bits
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Suggested-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Ian Romanick
6e53be2a0a nir/search: Fix is_16_bits for vectors
Require that all elements of a vector be representable as either
int16_t or uint16_t.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixes: 7ef45e661f ("intel/fs: Add constant propagation for ADD3")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Job Noorman
96c2fe3e1a nir: add search helper is_only_used_by_if
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
2024-03-01 13:45:11 +00:00
Alyssa Rosenzweig
c39896b17b nir: Use getters for nir_src::parent_*
First, we need to give the parent_instr field a unique name to be able to
replace with a helper.  We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.

This was done with a combination of sed and manual fix-ups.

Then we use semantic patches plus manual fixups:

    @@
    expression s;
    @@

    -s->renamed_parent_instr
    +nir_src_parent_instr(s)

    @@
    expression s;
    @@

    -s.renamed_parent_instr
    +nir_src_parent_instr(&s)

    @@
    expression s;
    @@

    -s->parent_if
    +nir_src_parent_if(s)

    @@
    expression s;
    @@

    -s.renamed_parent_if
    +nir_src_parent_if(&s)

    @@
    expression s;
    @@

    -s->is_if
    +nir_src_is_if(s)

    @@
    expression s;
    @@

    -s.is_if
    +nir_src_is_if(&s)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
2023-10-10 04:58:05 -04:00
Marek Olšák
1ac379c4a0 nir/algebraic: collapse ALU opcodes sourcing NaN
Undef will be replaced by NaN whenever it leads to elimination of FP
instructions. This implements the elimination part.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24792>
2023-08-19 14:18:52 -04:00
Faith Ekstrand
6c1d32581a nir: Drop nir_alu_dest
Instead, we replace it directly with nir_def.  We could replace it with
nir_dest but the next commit gets rid of that so this avoids unnecessary
churn.  Most of this commit was generated by sed:

   sed -i -e 's/dest.dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp

There were a few manual fixups required in the nir_legacy.c and
nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a
similar pattern.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Faith Ekstrand
777d336b1f nir: clang-format src/compiler/nir/*.[ch]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24382>
2023-08-12 19:27:28 +00:00
Ian Romanick
de60b463d7 nir/algebraic: Simplify various trivial bfi
These are mostly just obvious patterns that somebody will eventually
want to add.

DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0

total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)

Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
7ef45e661f intel/fs: Add constant propagation for ADD3
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.

v3: Remove redundant patterns. Noticed by Matt.

shader-db:

DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15

total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32

Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.

All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.

fossil-db:

DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210

Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922

Lost: 90

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Alyssa Rosenzweig
7f6491b76d nir: Combine if_uses with instruction uses
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists.  That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.

To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.

However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.

There's a bit of churn, but this is largely a mechanical set of changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
2023-04-07 23:48:03 +00:00
Rhys Perry
368be87255 nir/algebraic: shrink 64-bit bitwise operations with 0/-1 constant half
fossil-db (navi21):
Totals from 457 (0.34% of 135636) affected shaders:
Instrs: 259349 -> 250383 (-3.46%)
CodeSize: 1411976 -> 1369136 (-3.03%)
Latency: 2175961 -> 2148158 (-1.28%)
InvThroughput: 502206 -> 490244 (-2.38%)
Copies: 15238 -> 15232 (-0.04%); split: -0.07%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19748>
2022-11-21 17:34:46 +00:00
Alyssa Rosenzweig
45a111c21c nir/opt_algebraic: Fuse c - a * b to FMA
Algebraically it is clear that

   -(a * b) + c = (-a) * b + c = fma(-a, b, c)

But this is not clear from the NIR

   ('fadd', ('fneg', ('fmul', a, b)), c)

Add rules to handle this case specially. Note we don't necessarily want
to  solve this by pushing fneg into fmul, because the rule opt_algebraic
(not the late part where FMA fusing happens) specifically pulls fneg out
of fmul to push fneg up multiplication chains.

Noticed in the big glmark2 "terrain" shader, which has a cycle count
reduced by 22% on Mali-G57 thanks to having this pattern a ton and being
FMA bound.

BEFORE: 1249 inst, 16.015625 cycles, 16.015625 fma, ... 632 quadwords
AFTER: 997 inst, 12.437500 cycles, .... 504 quadwords

Results on the same shader on AGX are also quite dramatic:

BEFORE: 1294 inst, 8600 bytes, 50 halfregs, ...
AFTER: 1154 inst, 8040 bytes, 50 halfregs, ...

Similar rules apply for fabs.

v2: Use a loop over the bit sizes (suggested by Emma).

shader-db on Valhall (open + small subset of closed), results on Bifrost
are similar:

total instructions in shared programs: 167975 -> 164970 (-1.79%)
instructions in affected programs: 92642 -> 89637 (-3.24%)
helped: 492
HURT: 25
helped stats (abs) min: 1.0 max: 252.0 x̄: 6.25 x̃: 3
helped stats (rel) min: 0.30% max: 20.18% x̄: 3.21% x̃: 2.91%
HURT stats (abs)   min: 1.0 max: 5.0 x̄: 2.80 x̃: 3
HURT stats (rel)   min: 0.46% max: 9.09% x̄: 3.89% x̃: 3.37%
95% mean confidence interval for instructions value: -6.95 -4.68
95% mean confidence interval for instructions %-change: -3.08% -2.65%
Instructions are helped.

total cycles in shared programs: 10556.89 -> 10538.98 (-0.17%)
cycles in affected programs: 265.56 -> 247.66 (-6.74%)
helped: 88
HURT: 2
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.20 x̃: 0
helped stats (rel) min: 0.65% max: 22.34% x̄: 5.65% x̃: 4.25%
HURT stats (abs)   min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
HURT stats (rel)   min: 8.33% max: 12.50% x̄: 10.42% x̃: 10.42%
95% mean confidence interval for cycles value: -0.28 -0.12
95% mean confidence interval for cycles %-change: -6.30% -4.30%
Cycles are helped.

total fma in shared programs: 1582.42 -> 1535.06 (-2.99%)
fma in affected programs: 871.58 -> 824.22 (-5.43%)
helped: 502
HURT: 9
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.09 x̃: 0
helped stats (rel) min: 0.60% max: 25.00% x̄: 5.46% x̃: 4.82%
HURT stats (abs)   min: 0.015625 max: 0.0625 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 4.35% max: 12.50% x̄: 6.22% x̃: 4.35%
95% mean confidence interval for fma value: -0.11 -0.08
95% mean confidence interval for fma %-change: -5.58% -4.93%
Fma are helped.

total cvt in shared programs: 665.55 -> 665.95 (0.06%)
cvt in affected programs: 61.72 -> 62.12 (0.66%)
helped: 33
HURT: 43
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.04 x̃: 0
helped stats (rel) min: 1.01% max: 25.00% x̄: 6.68% x̃: 4.35%
HURT stats (abs)   min: 0.015625 max: 0.109375 x̄: 0.04 x̃: 0
HURT stats (rel)   min: 0.78% max: 38.46% x̄: 10.85% x̃: 6.90%
95% mean confidence interval for cvt value: -0.01 0.02
95% mean confidence interval for cvt %-change: 0.23% 6.24%
Inconclusive result (value mean confidence interval includes 0).

total quadwords in shared programs: 93376 -> 91736 (-1.76%)
quadwords in affected programs: 25376 -> 23736 (-6.46%)
helped: 169
HURT: 1
helped stats (abs) min: 8.0 max: 128.0 x̄: 9.75 x̃: 8
helped stats (rel) min: 1.52% max: 33.33% x̄: 8.35% x̃: 8.00%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
95% mean confidence interval for quadwords value: -11.18 -8.11
95% mean confidence interval for quadwords %-change: -8.95% -7.36%
Quadwords are helped.

total threads in shared programs: 4697 -> 4701 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Ol<C5><A1><C3><A1>k <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19312>
2022-11-01 22:39:45 -04:00
Alyssa Rosenzweig
ac2964dfbd nir: Be smarter fusing ffma
If there is a single use of fmul, and that single use is fadd, it makes
sense to fuse ffma, as we already do. However, if there are multiple
uses, fusing may impede code gen. Consider the source fragment:

   a = fmul(x, y)
   b = fadd(a, z)
   c = fmin(a, t)
   d = fmax(b, c)

The fmul has two uses. The current ffma fusing is greedy and will
produce the following "optimized" code.

   a = fmul(x, y)
   b = ffma(x, y, z)
   c = fmin(a, t)
   d = fmax(b, c)

Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1
fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of
one multiply and an add. Depending on the ISA, that could impede
scheduling or increase code size. It can also increase register
pressure, extending the live range.

It's tempting to gate on is_used_once, but that would hurt in cases
where we really do fuse everything, e.g.:

   a = fmul(x, y)
   b = fadd(a, z)
   c = fadd(a, t)

For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2
fadd. So what we really want is to fuse ffma iff the fmul will get
deleted. That occurs iff all uses of the fmul are fadd and will
themselves get fused to ffma, leaving fmul to get dead code eliminated.
That's easy to implement with a new NIR search helper, checking that all
uses are fadd.

shader-db results on Mali-G57 [open shader-db + subset of closed]:

total instructions in shared programs: 179491 -> 178991 (-0.28%)
instructions in affected programs: 36862 -> 36362 (-1.36%)
helped: 190
HURT: 27

total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%)
cycles in affected programs: 72.02 -> 70.56 (-2.02%)
helped: 28
HURT: 1

total fma in shared programs: 1590.47 -> 1582.61 (-0.49%)
fma in affected programs: 319.95 -> 312.09 (-2.46%)
helped: 194
HURT: 1

total cvt in shared programs: 812.98 -> 813.03 (<.01%)
cvt in affected programs: 118.53 -> 118.58 (0.04%)
helped: 65
HURT: 81

total quadwords in shared programs: 98968 -> 98840 (-0.13%)
quadwords in affected programs: 2960 -> 2832 (-4.32%)
helped: 20
HURT: 4

total threads in shared programs: 4693 -> 4697 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0

v2: Update trace checksums for virgl due to numerical differences.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>
2022-10-15 17:47:31 +00:00
Rhys Perry
c23411a970 nir/algebraic: optimize bits=umin(bits, 32-(offset&0x1f))
Optimizes patterns which are created by recent versions of vkd3d-proton,
when constant folding doesn't eliminate it entirely:
- ubitfield_extract(value, offset, umin(bits, 32-(offset&0x1f)))
- ibitfield_extract(value, offset, umin(bits, 32-(offset&0x1f)))
- bitfield_insert(base, insert, offset, umin(bits, 32-(offset&0x1f)))

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13225>
2022-09-13 20:36:06 +00:00
Ian Romanick
97ce3a56bd nir/search: Constify instr parameter to nir_search_expression::cond
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
2022-02-10 18:15:39 +00:00
Rhys Perry
495debebad nir/algebraic: optimize expressions using fmulz/ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
14b8227083 nir: add some missing nir_alu_type_get_base_type
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry
403ae3b48e nir/algebraic: optimize more 64-bit imul with constant source
Two 64-bit shifts and an addition are usually faster than the several
multiplications nir_lower_int64 creates.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00
Rhys Perry
a2d8c5b26d nir/algebraic: optimize a*#b & -4
fossil-db (Sienna Cichlid):
Totals from 611 (0.47% of 128647) affected shaders:
CodeSize: 3096680 -> 3090976 (-0.18%)
Instrs: 570494 -> 569249 (-0.22%)
Latency: 5765865 -> 5759619 (-0.11%)
InvThroughput: 969840 -> 967608 (-0.23%)
VClause: 9690 -> 9688 (-0.02%)
Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03%
PreVGPRs: 28290 -> 28288 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>
2021-12-03 13:41:07 +00:00
Ian Romanick
839495efc6 nir/algebraic: Add lowering for dot_4x8 instructions
v2: Fix copy-and-paste bugs in lowering patterns.

v3: Add has_sudot_4x8 flag.  Requested by Rhys.

v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers.  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Rhys Perry
2bb49e4587 nir/search: don't consider INT_MIN a negative power-of-two
ineg(INT_MIN)/iabs(INT_MIN) won't work as expected.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Ian Romanick
49177b9e2f nir/algebraic: Tautology replacements require sources be numbers
It seems worth the small amount of damage to give an extra cushion of
not having to debug problems later.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043197 -> 21043359 (<.01%)
instructions in affected programs: 4409 -> 4571 (3.67%)
helped: 0
HURT: 25
HURT stats (abs)   min: 1 max: 16 x̄: 6.48 x̃: 5
HURT stats (rel)   min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40%
95% mean confidence interval for instructions value: 4.37 8.59
95% mean confidence interval for instructions %-change: 2.93% 6.26%
Instructions are HURT.

total cycles in shared programs: 856175986 -> 856176921 (<.01%)
cycles in affected programs: 58908 -> 59843 (1.59%)
helped: 0
HURT: 25
HURT stats (abs)   min: 7 max: 70 x̄: 37.40 x̃: 38
HURT stats (rel)   min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39%
95% mean confidence interval for cycles value: 31.11 43.69
95% mean confidence interval for cycles %-change: 1.35% 2.39%
Cycles are HURT.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick
7019cd84c0 nir/search: Use range analysis for is_finite
There are only a couple patterns that use is_finite, so the changes
aren't huge.  Mostly shaders from Batman Arkham City and a few shaders
from Shadow of the Tomb Raider were affected.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
Instructions in all programs: 160902591 -> 160902489 (-0.0%)
SENDs in all programs: 6812270 -> 6812270 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7429003266 -> 7428992369 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)

Ice Lake
Instructions in all programs: 145301634 -> 145301460 (-0.0%)
SENDs in all programs: 6863890 -> 6863890 (+0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798589772 -> 8798575869 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334250 (+0.0%)

Skylake
Instructions in all programs: 135892010 -> 135891836 (-0.0%)
SENDs in all programs: 6802916 -> 6802916 (+0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442597324 -> 8442583202 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301116 (+0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
2021-03-11 22:00:30 +00:00
Ian Romanick
aa5d38decd nir/range_analysis: Add "is a number" range analysis tracking
This commit is necessary to support "nir/range_analysis: Fix analysis of
fmin and fmax with NaN".

No shader-db or fossil-db changes on any Intel platform.

v2: Pack and unpack is_a_number.

v3: Don't set is_a_number of integer constants.  The bit pattern might
be NaN.

v4: Update handling of b2i32.  intBitsToFloat(int(true)) is
1.401298464324817e-45.  Return a value consistent with that.

Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
2021-03-11 22:00:30 +00:00
Ian Romanick
c393ae9d84 nir/search: Constify instruction parameter to search helpers
The search helps must *never* modify the instruction passed in, so let
the compiler enforce this.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9378>
2021-03-03 18:32:14 +00:00
Jason Ekstrand
f9b3be09e1 nir/algebraic: Clean up up-cast of down-cast when we can
There are a bunch of cases where we can pretty quickly determine that
the high bits don't matter.  In these cases, delete the casts.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8872>
2021-02-16 16:36:31 +00:00
Rhys Perry
2849f0b5aa nir/algebraic: optimize out exact a*1.0 if it's used only as a float
fossil-db (GFX10):
Totals from 10180 (7.30% of 139391) affected shaders:
SGPRs: 549392 -> 549448 (+0.01%); split: -0.00%, +0.01%
VGPRs: 243228 -> 243008 (-0.09%); split: -0.11%, +0.02%
CodeSize: 12939080 -> 12603996 (-2.59%); split: -2.59%, +0.00%
MaxWaves: 186948 -> 186976 (+0.01%)
Instrs: 2497266 -> 2414648 (-3.31%)

fossil-db (GFX10.3):
Totals from 10180 (7.30% of 139391) affected shaders:
SGPRs: 549672 -> 549280 (-0.07%); split: -0.23%, +0.16%
VGPRs: 289296 -> 283672 (-1.94%); split: -2.83%, +0.88%
CodeSize: 13920180 -> 13255560 (-4.77%); split: -4.77%, +0.00%
MaxWaves: 151789 -> 153165 (+0.91%)
Instrs: 2756978 -> 2671517 (-3.10%); split: -3.10%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Ian Romanick
55621c6d1c nir/algebraic: Add some compare-with-zero optimizations that are exact
This prevents some fossil-db regressions in "spir-v: Mark floating point
comparisons exact".

v2: Note that the patterns and replacements produce the same value when
isnan(b).  Suggested by Caio.

v3: Use C99 isfinite() instead of (obsolete) BSD finite().  Fixes
various Windows builds.

No fossil-db changes on any Inetl platform, Vega, or Polaris10.

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908670 -> 20908672 (<.01%)
instructions in affected programs: 69 -> 71 (2.90%)
helped: 0
HURT: 1

total cycles in shared programs: 473515288 -> 473513940 (<.01%)
cycles in affected programs: 4942 -> 3594 (-27.28%)
helped: 2
HURT: 0

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Rhys Perry
89c4bba8bc nir/algebraic: better propagate constants up fadd chains
Make the optimization create more mad-friendly code if the order of the
fadd's operands is unlucky.

fossil-db (Navi):
Totals from 9259 (8.07% of 114665) affected shaders:
SGPRs: 615991 -> 616191 (+0.03%); split: -0.05%, +0.08%
VGPRs: 442184 -> 443568 (+0.31%); split: -0.10%, +0.41%
CodeSize: 32674876 -> 32625572 (-0.15%); split: -0.17%, +0.02%
MaxWaves: 108560 -> 108152 (-0.38%); split: +0.07%, -0.44%
Instrs: 6126473 -> 6120463 (-0.10%); split: -0.13%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
2020-11-03 14:56:00 +00:00
Rhys Perry
5476d18183 nir/algebraic: add patterns for a >> #b << #b
Fixes compilation of a Battlefront 2 shader with ACO by removing VGPR
spilling. The reassociation makes it worse on LLVM though.

pipeline-db (ACO):
Totals from affected shaders:
SGPRS: 10704 -> 10688 (-0.15 %)
VGPRS: 18736 -> 18528 (-1.11 %)
Spilled SGPRs: 70 -> 70 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 909696 -> 885796 (-2.63 %) bytes
LDS: 225 -> 225 (0.00 %) blocks
Max Waves: 1115 -> 1129 (1.26 %)

pipeline-db (LLVM):
Totals from affected shaders:
SGPRS: 8472 -> 8424 (-0.57 %)
VGPRS: 14284 -> 14368 (0.59 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 442 -> 503 (13.80 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 268 -> 396 (47.76 %) dwords per thread
Code Size: 862568 -> 853028 (-1.11 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 971 -> 964 (-0.72 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2271>
2020-01-29 14:30:33 +00:00
Timothy Arceri
7f106a2b5d util: rename list_empty() to list_is_empty()
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-10-28 11:24:38 +00:00
Rob Clark
ad8167c1e0 nir/search: fix the PoT helpers
Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Ian Romanick
050e4e28bf nir/search: Fix possible NULL dereference in is_fsign
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
2019-10-17 15:07:01 -07:00
Eric Anholt
c23db0df18 nir: Keep the range analysis HT around intra-pass until we make a change.
This lets us memoize range analysis work across instructions.  Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).

Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-10-04 19:15:01 +00:00