Commit graph

5849 commits

Author SHA1 Message Date
Georg Lehmann
0e6d32777f nir/opt_remove_phis: rematerialize equal alu
Foz-DB Navi31:
Totals from 943 (1.19% of 79395) affected shaders:
MaxWaves: 24672 -> 24722 (+0.20%)
Instrs: 1541665 -> 1544956 (+0.21%); split: -0.23%, +0.44%
CodeSize: 8085180 -> 8109212 (+0.30%); split: -0.16%, +0.46%
VGPRs: 57768 -> 57624 (-0.25%)
Latency: 18043743 -> 17948245 (-0.53%); split: -1.28%, +0.75%
InvThroughput: 2692605 -> 2677049 (-0.58%); split: -2.07%, +1.49%
VClause: 25321 -> 25343 (+0.09%); split: -0.48%, +0.57%
SClause: 38473 -> 38614 (+0.37%); split: -0.00%, +0.37%
Copies: 86089 -> 86236 (+0.17%); split: -0.46%, +0.63%
Branches: 36719 -> 36777 (+0.16%); split: -0.60%, +0.76%
PreSGPRs: 44138 -> 44303 (+0.37%); split: -0.05%, +0.42%
PreVGPRs: 43319 -> 43009 (-0.72%)
VALU: 893684 -> 894272 (+0.07%); split: -0.42%, +0.48%
SALU: 189561 -> 191358 (+0.95%); split: -0.05%, +1.00%
VMEM: 42294 -> 42313 (+0.04%); split: -0.44%, +0.49%
SMEM: 72916 -> 73144 (+0.31%)

Instruction count regressions are largly caused by additional
loop unrolling.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31028>
2024-12-16 20:38:38 +00:00
Qiang Yu
129e37bab6 nir: do not generate b2i64 when driver want to lower it
This is found on GFX12 by:
  KHR-GL43.shader_ballot_tests.ShaderBallotBitmasks

ACO does not support it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:07 +00:00
Alyssa Rosenzweig
bd89279dd4 nir: add lower_scratch_to_var pass
to ease opencl pain.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32529>
2024-12-12 21:16:13 +00:00
Rhys Perry
26790e90d3 nir: make ballot ALU and mbcnt_amd operations reorderable
These can be lowered to ALU and load_subgroup_invocation, all of which are
reorderable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Rhys Perry
650468fbdf nir/move_discards_to_top: don't move across more intrinsics
This missed dpp16_shift_amd, lane_permute_16_amd, last_invocation and
ballot_relaxed.

Instead, list the non-reorderable intrinsics which are allowed to be moved
after discards.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Rhys Perry
5368569d06 nir: make load_helper_invocation non-reorderable
This can't be moved to after demote, so it's not reorderable.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32512>
2024-12-11 14:47:12 +00:00
Georg Lehmann
e8b29abb25 nir: add unsigned upper bound support for fsat
Foz-DB Navi21:
Totals from 89 (0.11% of 79395) affected shaders:
Instrs: 97018 -> 96995 (-0.02%)
CodeSize: 492996 -> 492488 (-0.10%)
Latency: 504649 -> 504555 (-0.02%)
InvThroughput: 121968 -> 121875 (-0.08%)
VALU: 67427 -> 67404 (-0.03%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Georg Lehmann
e78e63e3fe nir: add unsigned upper bound support for f2i32
Foz-DB Navi21:
Totals from 649 (0.82% of 79395) affected shaders:
CodeSize: 2330592 -> 2314112 (-0.71%)
Latency: 2068161 -> 2053370 (-0.72%)
InvThroughput: 346583 -> 329425 (-4.95%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Georg Lehmann
0b366a7ab2 nir/uub: properly limit float support to 32bit
Cc: mesa-stable

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32565>
2024-12-10 20:53:53 +00:00
Alyssa Rosenzweig
83dd4889a7 nir/lower_point_size: skip non-var derefs
these can happen depending on pass order, otherwise we crash on the null
pointer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
69a0962c70 nir/lower_printf: use 64-bit math
this lets load_store_vectorize vectorize the stores we produce. it also matches
actual OpenCL kernel code looks, so drivers need to have an optimized path for
these 64+32 patterns regardless.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
da967416db nir/lower_printf: use unsigned math
negative offsets/sizes don't make sense, and zero-extension is often easier
to optimize/lower than sign-extension.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
8db0751eb8 nir/lower_printf: lower aborts
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
0b9072e2e5 nir/lower_printf: allow fixed address
fixed address printf buffers can avoid a lot of complexity, especially with the
general case of (e.g.) DGC-enqueued precompiled kernels. so add a knob for that
and save the driver the need to write a lowering pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Alyssa Rosenzweig
816c14d33d nir: add printf_abort intrinsic
abort() for the gpu, implemented with the printf infrastructure since they go
together.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32564>
2024-12-10 19:13:07 +00:00
Georg Lehmann
c5c22fc3a3 nir: add constant clip/cull distance optimization
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32518>
2024-12-10 16:35:01 +00:00
Benjamin Lee
b01afd06cd nir: update docs for nir_get_io_arrayed_index_src
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
74ccf6cbdc nir: add option to use compact view indices
In panvk we pass absolute view indices to the hardware, so we need to do
the conversion from compacted to absolute at some point. Emitting
absolute indices from nir_lower_multiview initially looks like the
simplest option, but nir_lower_io_to_temporaries will emit a write for
every element of array varyings. This results in unnecessary writes to
disabled views.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
becb014d27 nir: treat per-view outputs as arrayed IO
This is needed for implementing multiview in panvk, where the address
calculation for multiview outputs is not well-represented by lowering to
nir_intrinsic_store_output with a single offset.

The case where a variable is both per-view and per-{vertex,primitive} is
now unsupported. This would come up with drivers implementing
NV_mesh_shader or using nir_lower_multiview on geometry, tessellation,
or mesh shaders. No drivers currently do either of these. There was some
code that attempted to handle the nested per-view case by unwrapping
per-view/arrayed types twice, but it's unclear to what extent this
actually worked.

ANV and Turnip both rely on per-view outputs being assigned a unique
driver location for each view, so I've added on option to configure that
behavior rather than removing it.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
6d843cde45 nir: document index semantics in nir_lower_multiview
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Benjamin Lee
975c3ecd1e nir: handle arbitrary per-view outputs in nir_lower_multiview
This is needed for panvk, where multiview is "all or nothing". When
multiview is enabled, all outputs may be written with separate values
for each view.

The edge case mentioned in the previous `nir_can_lower_multiview` is now
handled because we now handle an arbitrary number of per-view output
vars instead of expecting to find exactly one.

Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31704>
2024-12-09 20:31:49 +00:00
Karmjit Mahil
047049dcb5 nir: Fix the spelling of compare
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Karmjit Mahil
b79994e92d nir,ir3: Add icsel_eqz
In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the
cmp and sel that we'd otherwise need.

total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%)
Instruction Count in affected programs: 162701 -> 160360 (-1.44%)
helped: 81
HURT: 29
Instruction count are helped.

total MOV Count in shared programs: 86777 -> 88671 (2.18%)
MOV Count in affected programs: 28119 -> 30013 (6.74%)
helped: 1
HURT: 292
Mov count are HURT.

total COV Count in shared programs: 15070 -> 14962 (-0.72%)
COV Count in affected programs: 5770 -> 5662 (-1.87%)
helped: 76
HURT: 2
Cov count are helped.

total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%)
Last helper instruction in affected programs: 91331 -> 89240 (-2.29%)
helped: 30
HURT: 1
Last helper instruction are helped.

total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%)
Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%)
helped: 8
HURT: 43
Instructions with ss sync bit are HURT.

total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%)
Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%)
helped: 21
HURT: 61
Estimated cycles stalled on ss are HURT.

total cat1 instructions in shared programs: 101933 -> 103695 (1.73%)
cat1 instructions in affected programs: 35804 -> 37566 (4.92%)
helped: 18
HURT: 290
Cat1 instructions are HURT.

total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%)
cat2 instructions in affected programs: 128609 -> 125809 (-2.18%)
helped: 322
HURT: 0
Cat2 instructions are helped.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Karmjit Mahil
aad0aa0a9c nir/algebraic: turn u{ge,lt} a, 1 to i{ne,eq} a, 0
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
2024-12-06 08:42:36 +00:00
Ian Romanick
e1bb53bb3c nir/algebraic: Optimize some trivial bfi
In fossil-db, one big compute shader on Hogwarts Legacy is helped for
spills and fills. It has a lot of instances of bfi(0x3f, a, a).

On Tiger Lake and Skylake, a compute shader in Unicom that has a
single instance of this pattern is hurt for spills and fills. I think
this is just due to non-determinism in the register allocation
algorithm.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16992643 -> 16992548 (<.01%)
instructions in affected programs: 17533 -> 17438 (-0.54%)
helped: 33 / HURT: 0

total cycles in shared programs: 914313986 -> 914316238 (<.01%)
cycles in affected programs: 3734544 -> 3736796 (0.06%)
helped: 26 / HURT: 6

fossil-db:

Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 208952780 -> 208952537 (-0.00%)
Send messages: 10934879 -> 10934875 (-0.00%)
Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00%
Spill count: 534864 -> 534843 (-0.00%)
Fill count: 667081 -> 667068 (-0.00%)
Max live registers: 65686656 -> 65686624 (-0.00%)
Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%)

Totals from 3 (0.00% of 704834) affected shaders:
Instrs: 4708 -> 4465 (-5.16%)
Send messages: 234 -> 230 (-1.71%)
Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03%
Spill count: 91 -> 70 (-23.08%)
Fill count: 73 -> 60 (-17.81%)
Max live registers: 647 -> 615 (-4.95%)
Non SSA regs after NIR: 3957 -> 3934 (-0.58%)

Tiger Lake
Totals:
Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00%
Send messages: 12657684 -> 12657680 (-0.00%)
Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00%
Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00%
Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 19538944 -> 19539968 (+0.01%)
Max live registers: 41713622 -> 41713593 (-0.00%)
Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00%

Totals from 174 (0.02% of 794323) affected shaders:
Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04%
Send messages: 14330 -> 14326 (-0.03%)
Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16%
Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55%
Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87%
Scratch Memory Size: 9216 -> 10240 (+11.11%)
Max live registers: 12587 -> 12558 (-0.23%)
Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02%

Skylake
Totals:
Instrs: 158904982 -> 158903764 (-0.00%)
Send messages: 8490500 -> 8490496 (-0.00%)
Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00%
Spill count: 519127 -> 519115 (-0.00%)
Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00%
Max live registers: 33708764 -> 33708739 (-0.00%)
Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00%

Totals from 174 (0.03% of 648725) affected shaders:
Instrs: 160391 -> 159173 (-0.76%)
Send messages: 14354 -> 14350 (-0.03%)
Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32%
Spill count: 332 -> 320 (-3.61%)
Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36%
Max live registers: 12709 -> 12684 (-0.20%)
Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493>
2024-12-05 21:39:07 +00:00
Alyssa Rosenzweig
ca9bf43d0b nir,asahi: make argument alignment configurable
this is more flexible. Mali needs 32-bit alignment, for example.

I added an option struct in case we need to make this a callback or something
later.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
0d77e91ca3 nir/opt_load_store_vectorize: match amul like imul
for AGX, we preserve amul all the way until fusing address modes in order to be
able to fuse effectively. so the load/store vectorizer wouldn't vectorize before
fusing.

however, after fusing we get fused intrinsics which are tricky to teach the
vectorizer about as their semantics are pretty subtle. so we can't vectorize
after, either.

the easiest solution is to teach the vectorize about amul, which can always be
replaced by imul for our pattern matches.

this fixes certain cases of vectorization in OpenCL kernels on asahi.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
77d4ed0a01 nir/opt_algebraic: optimize sign bit manipulation
libclc loves to generate the iand(0x7fffffff) pattern. ior/ixor patterns are
added for completeness.

Shaves 4 instructions off libclc vec4 normalize.

v2: Loop over the bit sizes (Georg).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
be049e1c14 nir/search_helpers: handle bcsel in is_only_used_as_float
this lets algebraic see through chains of instructions.

v2: Limit recursion depth (Georg).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>
2024-12-05 10:58:51 +00:00
Boris Brezillon
98e3c1e6fb nir: Let nir_lower_texcoord_replace_late() report progress
Useful if we want to wrap this pass with a NIR_PASS() to enforce
validation.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480>
2024-12-05 08:49:45 +00:00
Marek Olšák
3effa3d53b nir/lower_io_passes: lower indirect IO for TCS
nir_lower_io_to_temporaries doesn't do anything and gives up when it gets TCS.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
f5a0cde125 nir/opt_varyings: fix compile failures in the disabled PRINT code
linkage is a pointer, but it was used as a structure.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
dd788d0a7f nir/opt_varyings: remove rare dead output stores after inter-shader code motion
Backward inter-shader code motion left dead output stores in the producer
in rare cases. Those dead stores would then make their way into drivers
and hw.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
f0c4e71d58 nir/opt_varyings: fix getting deref variables for sysvals
This might fix array system values. Noticed by luck.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
dcc679ab3a nir/opt_varyings: add inter-shader code motion for uniform/UBO indexing
If input_value, index, index1 or index2 is an input, here are examples of
code that this commit moves from consumers to producers:
* input_value * uniform_array[index]
* uniform_array[index]
* ubo[0].array[index]
* ubo[index].var
* ubo[index1].array[index2]

If the array index is computed from an input, it must be flat or convergent
within a primitive to be moved. If the array index is not an input, it must
be a uniform expression.

dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
has UBO indexing that is moved to the producer by this.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
f52ae35d73 nir/opt_varyings: propagate indirect uniform/UBO loads into the next shader
Uniform and UBO loads with non-constant indices are now propagated.
The majority of this code implements cloning deref chains.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
c0de78f120 nir/opt_varyings: change try_move_postdominator param to nir_instr type
We want more instructions to be movable, like
load_deref(var, index = load_input).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
8e39e8ed4d nir/opt_varyings: make top-level compaction code for TES, TCS, GS separate
Add a separate "if" block for each and use a helper for repeated code.
There will be more code added here that keeping TES, TCS, and GS compaction
code unified would be a mess.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
d20e07dbad nir/opt_varyings: fix max_slot for color varying compaction
It should be in units of slots. This was unlikely to break anything.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
69b1853ecf nir/opt_varyings: count the number of unused components for compaction correctly
Holes due to indirectly-indexed inputs were ignored, making the compaction
worse when such inputs were present alongside convergent inputs.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
1aa9fec542 nir/opt_varyings: fix compaction with sparse indirect FS inputs
Without this, compaction can put inputs into vec4 slots already occupied
by indirectly-accessed inputs while ignoring their interpolation qualifier,
which is incorrect.

All input components sharing the same vec4 slot must use interpolation
qualifiers that are compatible with each other.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
b01f3cea7a nir/opt_varyings: remove redundant conditions from a while loop
Most of these conditions are repeated below with a continue statement.
This just puts break at the end where all of them are false.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
a618a2aa8b nir/linking_helpers: don't promote interpolated varyings to flat
Even the most flexible interpolation that we have in NIR options
(nir_io_has_flexible_input_interpolation_except_flat) doesn't allow
mixing flat and non-flat in the same vec4. This (legacy) optimization
can't promote interpolated inputs to flat if it doesn't consider
the interpolation mode of the whole vec4 slot.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Georg Lehmann
34a47e4b14 nir/opt_algebraic: mark a - ffract(a) as nan incorrect.
Inf + fract(Inf) -> Inf + NaN -> NaN
floor(Inf) -> Inf

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Georg Lehmann
2ee96cf514 nir/opt_algebraic: optimize d3d9 ceil
No Foz-DB changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Georg Lehmann
34caed8adb nir/opt_algebraic: optimize d3d9 ftrunc
Foz-DB Navi21:
Totals from 85 (0.11% of 79395) affected shaders:
MaxWaves: 1972 -> 1968 (-0.20%)
Instrs: 48682 -> 47067 (-3.32%)
CodeSize: 255664 -> 247172 (-3.32%)
VGPRs: 3752 -> 3768 (+0.43%)
Latency: 154414 -> 150360 (-2.63%)
InvThroughput: 37186 -> 35081 (-5.66%)
VClause: 847 -> 865 (+2.13%); split: -0.24%, +2.36%
SClause: 768 -> 796 (+3.65%)
Copies: 2763 -> 2869 (+3.84%); split: -0.14%, +3.98%
VALU: 28133 -> 26781 (-4.81%)
SALU: 7182 -> 6939 (-3.38%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Georg Lehmann
ea4aa8e5a6 nir/opt_algebraic: optimize ffma(b2f, b2f, c)
Foz-DB Navi21:
Totals from 134 (0.17% of 79395) affected shaders:
Instrs: 153297 -> 153326 (+0.02%); split: -0.03%, +0.05%
CodeSize: 829520 -> 828444 (-0.13%); split: -0.13%, +0.00%
Latency: 900489 -> 899964 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 267838 -> 267478 (-0.13%); split: -0.14%, +0.00%
VClause: 2452 -> 2454 (+0.08%)
Copies: 8331 -> 8353 (+0.26%); split: -0.25%, +0.52%
PreSGPRs: 4974 -> 4964 (-0.20%)
PreVGPRs: 6209 -> 6218 (+0.14%)
VALU: 112317 -> 112092 (-0.20%); split: -0.21%, +0.01%
SALU: 12451 -> 12694 (+1.95%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Kenneth Graunke
5712fc48a9 nir: Allow large overfetching holes in the load store vectorizer
The load_*_uniform_block_intel intrinsics always load either 8x or 16x
32-bit components worth of data (so 32 byte increments).  This leads to
cases where we load a few components from one vec8, followed by a few
components of an adjacent vec8.  We want to combine those into a vec16
load, as that loads a whole cacheline at a time, and requires less hoops
to calculate addresses and request memory loads.

So, we allow 7 * 4 = 28 bytes of holes, which handles vec8+vec8 where
only the .x component is read.

Most drivers and intrinsics will not want such large holes.  I thought
about adding a per-intrinsic max_hole to the core code, but decided that
since we already have driver callbacks, we can just rely on them to
reject what makes sense to them.

No driver callbacks currently allow holes, so this should not currently
affect any drivers.  But any work in progress branches may need to be
updated to reject larger holes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Marek Olšák
8752401e03 nir/algebraic: optimize (a & b) | (a | c) => a | c, (a & b) & (a | c) => a & b
No change in shader-db with ACO, but it doesn't seem to be optimized by
any other patterns.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>
2024-12-03 01:24:27 +00:00
Marek Olšák
3670d42c74 nir/algebraic: optimize (a | b) | (a | c) ==> (a | b) | c
shader-db with ACO:
    3 shaders have -0.11% average decrease in the code size

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>
2024-12-03 01:24:27 +00:00