Commit graph

5814 commits

Author SHA1 Message Date
Marek Olšák
f52ae35d73 nir/opt_varyings: propagate indirect uniform/UBO loads into the next shader
Uniform and UBO loads with non-constant indices are now propagated.
The majority of this code implements cloning deref chains.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
c0de78f120 nir/opt_varyings: change try_move_postdominator param to nir_instr type
We want more instructions to be movable, like
load_deref(var, index = load_input).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
8e39e8ed4d nir/opt_varyings: make top-level compaction code for TES, TCS, GS separate
Add a separate "if" block for each and use a helper for repeated code.
There will be more code added here that keeping TES, TCS, and GS compaction
code unified would be a mess.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
d20e07dbad nir/opt_varyings: fix max_slot for color varying compaction
It should be in units of slots. This was unlikely to break anything.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
69b1853ecf nir/opt_varyings: count the number of unused components for compaction correctly
Holes due to indirectly-indexed inputs were ignored, making the compaction
worse when such inputs were present alongside convergent inputs.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
1aa9fec542 nir/opt_varyings: fix compaction with sparse indirect FS inputs
Without this, compaction can put inputs into vec4 slots already occupied
by indirectly-accessed inputs while ignoring their interpolation qualifier,
which is incorrect.

All input components sharing the same vec4 slot must use interpolation
qualifiers that are compatible with each other.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
b01f3cea7a nir/opt_varyings: remove redundant conditions from a while loop
Most of these conditions are repeated below with a continue statement.
This just puts break at the end where all of them are false.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Marek Olšák
a618a2aa8b nir/linking_helpers: don't promote interpolated varyings to flat
Even the most flexible interpolation that we have in NIR options
(nir_io_has_flexible_input_interpolation_except_flat) doesn't allow
mixing flat and non-flat in the same vec4. This (legacy) optimization
can't promote interpolated inputs to flat if it doesn't consider
the interpolation mode of the whole vec4 slot.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Georg Lehmann
34a47e4b14 nir/opt_algebraic: mark a - ffract(a) as nan incorrect.
Inf + fract(Inf) -> Inf + NaN -> NaN
floor(Inf) -> Inf

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Georg Lehmann
2ee96cf514 nir/opt_algebraic: optimize d3d9 ceil
No Foz-DB changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Georg Lehmann
34caed8adb nir/opt_algebraic: optimize d3d9 ftrunc
Foz-DB Navi21:
Totals from 85 (0.11% of 79395) affected shaders:
MaxWaves: 1972 -> 1968 (-0.20%)
Instrs: 48682 -> 47067 (-3.32%)
CodeSize: 255664 -> 247172 (-3.32%)
VGPRs: 3752 -> 3768 (+0.43%)
Latency: 154414 -> 150360 (-2.63%)
InvThroughput: 37186 -> 35081 (-5.66%)
VClause: 847 -> 865 (+2.13%); split: -0.24%, +2.36%
SClause: 768 -> 796 (+3.65%)
Copies: 2763 -> 2869 (+3.84%); split: -0.14%, +3.98%
VALU: 28133 -> 26781 (-4.81%)
SALU: 7182 -> 6939 (-3.38%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Georg Lehmann
ea4aa8e5a6 nir/opt_algebraic: optimize ffma(b2f, b2f, c)
Foz-DB Navi21:
Totals from 134 (0.17% of 79395) affected shaders:
Instrs: 153297 -> 153326 (+0.02%); split: -0.03%, +0.05%
CodeSize: 829520 -> 828444 (-0.13%); split: -0.13%, +0.00%
Latency: 900489 -> 899964 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 267838 -> 267478 (-0.13%); split: -0.14%, +0.00%
VClause: 2452 -> 2454 (+0.08%)
Copies: 8331 -> 8353 (+0.26%); split: -0.25%, +0.52%
PreSGPRs: 4974 -> 4964 (-0.20%)
PreVGPRs: 6209 -> 6218 (+0.14%)
VALU: 112317 -> 112092 (-0.20%); split: -0.21%, +0.01%
SALU: 12451 -> 12694 (+1.95%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>
2024-12-03 14:42:18 +00:00
Kenneth Graunke
5712fc48a9 nir: Allow large overfetching holes in the load store vectorizer
The load_*_uniform_block_intel intrinsics always load either 8x or 16x
32-bit components worth of data (so 32 byte increments).  This leads to
cases where we load a few components from one vec8, followed by a few
components of an adjacent vec8.  We want to combine those into a vec16
load, as that loads a whole cacheline at a time, and requires less hoops
to calculate addresses and request memory loads.

So, we allow 7 * 4 = 28 bytes of holes, which handles vec8+vec8 where
only the .x component is read.

Most drivers and intrinsics will not want such large holes.  I thought
about adding a per-intrinsic max_hole to the core code, but decided that
since we already have driver callbacks, we can just rely on them to
reject what makes sense to them.

No driver callbacks currently allow holes, so this should not currently
affect any drivers.  But any work in progress branches may need to be
updated to reject larger holes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
2024-12-03 02:02:33 +00:00
Marek Olšák
8752401e03 nir/algebraic: optimize (a & b) | (a | c) => a | c, (a & b) & (a | c) => a & b
No change in shader-db with ACO, but it doesn't seem to be optimized by
any other patterns.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>
2024-12-03 01:24:27 +00:00
Marek Olšák
3670d42c74 nir/algebraic: optimize (a | b) | (a | c) ==> (a | b) | c
shader-db with ACO:
    3 shaders have -0.11% average decrease in the code size

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>
2024-12-03 01:24:27 +00:00
Marek Olšák
978ad93375 nir/algebraic: optimize (a & b) & (a & c) ==> (a & b) & c
shader-db with ACO:
    3 shaders have -0.57% average decrease in the code size

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>
2024-12-03 01:24:27 +00:00
Marek Olšák
83b093f95e nir/algebraic: use is_used_once in a few iand/ior patterns
shader-db with ACO:
    1 shader has -4 decrease in the code size

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>
2024-12-03 01:24:27 +00:00
Antonino Maniscalco
2b9738ce6d nir,zink,asahi: support passing through gl_PrimitiveID
When this pass is used with Zink, gl_PrimitiveID needs to be passed
through, however this is unnecessary for other divers.

Analogous to previous commit

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: d0342e28b3 ("nir: Add helper to create passthrough GS shader")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32397>
2024-12-03 00:24:04 +00:00
Kenneth Graunke
92797c6878 nir/algebraic: Reassociate fadd into fmul in DP4-like pattern
This extends the optimization from commit 09705747d7 ("nir/algebraic:
Reassociate fadd into fmul in DPH-like pattern") to a chain of 4 ffmas
for a DP4-style pattern.

Moving the add to the other end of the sequence allows it to be fused
into an FMA.

fossil-db results from Alchemist:

   Totals:
   Instrs: 158544142 -> 158490516 (-0.03%); split: -0.04%, +0.00%
   Subgroup size: 7808912 -> 7808920 (+0.00%); split: +0.00%, -0.00%
   Cycle count: 17859550672 -> 17859491966 (-0.00%); split: -0.01%, +0.01%
   Spill count: 84652 -> 84494 (-0.19%); split: -0.37%, +0.18%
   Fill count: 160728 -> 160623 (-0.07%); split: -0.29%, +0.23%
   Scratch Memory Size: 4278272 -> 4272128 (-0.14%); split: -0.29%, +0.14%
   Max live registers: 32411695 -> 32409789 (-0.01%); split: -0.01%, +0.00%
   Max dispatch width: 5627856 -> 5627920 (+0.00%); split: +0.00%, -0.00%
   Non SSA regs after NIR: 185359099 -> 185307703 (-0.03%); split: -0.03%, +0.00%

   Totals from 16378 (2.56% of 640872) affected shaders:
   Instrs: 9818723 -> 9765097 (-0.55%); split: -0.58%, +0.04%
   Subgroup size: 194056 -> 194064 (+0.00%); split: +0.01%, -0.01%
   Cycle count: 294967108 -> 294908402 (-0.02%); split: -0.58%, +0.56%
   Spill count: 10088 -> 9930 (-1.57%); split: -3.09%, +1.53%
   Fill count: 24738 -> 24633 (-0.42%); split: -1.90%, +1.48%
   Scratch Memory Size: 439296 -> 433152 (-1.40%); split: -2.80%, +1.40%
   Max live registers: 1297204 -> 1295298 (-0.15%); split: -0.22%, +0.07%
   Max dispatch width: 133232 -> 133296 (+0.05%); split: +0.14%, -0.10%
   Non SSA regs after NIR: 11999084 -> 11947688 (-0.43%); split: -0.43%, +0.00%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32197>
2024-12-02 13:15:16 +00:00
Rhys Perry
9f3607de76 nir/tests: fix SSA dominance in opt_if_merge tests
It isn't necessary for these ALU instructions to be used in the next IF.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c437f2e79c ("nir/tests: Add tests for opt_if_merge")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12211
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32391>
2024-12-02 09:38:22 +00:00
Timothy Arceri
6ca81adffc nir: allow loops with unknown induction var initialiser to unroll
If the condition of the loop terminator is based on an unsigned value we
can in some cases find the max number of possible loop trips. With the
max loop trips know a complex unroll can unroll the loop.

For example:

   uniform uint x;
   uint i = x;
   while (true) {
      if (i >= 4)
         break;

      i += 6;
   }

The above loop can be unrolled even though we don't know the initial
value of the induction variable because it can have at most 1 iteration.

There were no changes with my shader-db collection. Change was inspired
by MR #31312 where builtin shader code failed to unroll.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31701>
2024-12-02 11:44:33 +11:00
Job Noorman
d5d0628728 nir/lower_subgroups: add option to only lower clustered rotates
On ir3, we have native support for full rotates but not for clustered
ones.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
2024-11-29 16:22:48 +00:00
Job Noorman
5dbd2b08f4 nir/lower_subgroups: disable boolean reduce when not supported
lower_boolean_reduce only supports ballot_components == 1. Fall back to
lower_scan_reduce when this is not the case.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
2024-11-29 16:22:48 +00:00
Job Noorman
493f7b8084 nir/lower_subgroups: add extra filter data to options
It might be convenient for filter implementations to have access to
extra information. This will be used, for example, by ir3 to access
compiler features.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
2024-11-29 16:22:48 +00:00
Job Noorman
e6c63a88fb nir: add read_getlast_ir3 intrinsic
Like read_first_invocation but using getlast. Note that I intentionally
used the name of the ir3 instruction in the name as its semantics are
tricky to exactly describe otherwise.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
2024-11-29 16:22:47 +00:00
Job Noorman
60e1615ced nir/lower_subgroups: support unknown subgroup size
Some targets (e.g., ir3) don't always know the exact subgroup size.
Calculate the maximum subgroup size in that case by multiplying
ballot_components and ballot_bit_size.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
2024-11-29 16:22:47 +00:00
Alyssa Rosenzweig
e3001352ad nir: add helpers for precompiled shaders
v2: generalize function signatures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> [v1]
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> [v1]
Acked-by: Mary Guillemard <mary.guillemard@collabora.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339>
2024-11-28 17:34:12 +00:00
Marek Olšák
c26da94b4c nir/opt_varyings: replace options::lower_varying_from_uniform with a cost number
This is a simple way for drivers to enable uniform expression propagation
without having to set any callbacks for it. It replaces the old option.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32390>
2024-11-28 15:39:46 +00:00
Marek Olšák
428613b690 nir/opt_varyings: add a default callback for varying_estimate_instr_cost
used when the driver doesn't set it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32390>
2024-11-28 15:39:46 +00:00
Marek Olšák
1f238f0a2e nir/opt_varyings: always call remove_dead_varyings in init_linkage
so that we don't have to do it after every init_linkage call.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32390>
2024-11-28 15:39:46 +00:00
Marek Olšák
c50c9e9bf9 nir/lower_clip: implement ClipVertex lowering for GS + lowered IO correctly
This is currently needed to fix d3d12 for st_unlower_io_to_vars.

The idea is to track the current value of ClipVertex in a temporary
variable, and for every emit_vertex, we load the ClipVertex value from
the temporary (which matches the stored value) and insert new CLIP_DIST
stores before emit_vertex.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
2024-11-28 14:14:47 +00:00
Marek Olšák
a648acc287 nir/lower_clip: convert nir_lower_clip_gs to nir_shader_intrinsics_pass
and add struct lower_clip_state to hold the state for both
nir_lower_clip_gs and nir_lower_clip_vs.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
2024-11-28 14:14:47 +00:00
Marek Olšák
3b8e4a71fe nir/lower_clip: set clip_distance_array_size outside of create_clipdist_vars
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
2024-11-28 14:14:47 +00:00
Marek Olšák
b4ef50bca8 nir/lower_clip: separate code for IO variables and intrinsics
The code for IO variables was interleaved with code for IO intrinsics,
which was difficult to follow.

lower_clip_outputs is split and replaced by more accurate names:
lower_clip_vertex_var and lower_clip_vertex_intrin

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
2024-11-28 14:14:47 +00:00
Marek Olšák
3e40c2010e nir/lower_clip: don't set cursor to fix crashes due to removed instructions
The original builder already points at the end of the function impl.
Just use that.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
2024-11-28 14:14:47 +00:00
Caterina Shablia
7ca8c19246 Revert "nir: introduce instance_index system value"
This reverts commit b9be1f1f20.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32332>
2024-11-28 07:53:01 +00:00
Caterina Shablia
9d5ba87ca1 Revert "nir: lower INSTANCE_{ID,INDEX} to an offset load_instance_{index,id} respectively"
This reverts commit a5bcf566a9.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32332>
2024-11-28 07:53:01 +00:00
Job Noorman
1333af5d77 nir/search: add is_only_used_by_{iand,ior} helpers
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Job Noorman
a8c947df9a nir/search: make is_only_used_by_iadd reusable
The algorithm is exactly the same for other opcodes so we don't have to
have to copy paste it.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Job Noorman
22fc90a116 nir: add ir3-specific bitwise triop opcodes
ir3 has a number of bitwise triops (e.g., shrm == (src0 >> src1) & src2)
that don't have NIR-equivalents. Doing instruction selection for them is
a lot more convenient using algebraic patterns than to have to manually
match for them. This patch add NIR opcodes for these instructions.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
2024-11-28 06:19:59 +00:00
Alyssa Rosenzweig
c2973765e2 nir: add nir_lower_constant_to_temp helper
this comes up with clc.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32382>
2024-11-27 20:02:05 +00:00
Alyssa Rosenzweig
12cc22af4c nir: add nir_remove_entrypoints helper
opposite of nir_remove_non_entrypoint. this operation comes up with
precompiling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32382>
2024-11-27 20:02:05 +00:00
Alyssa Rosenzweig
c076900360 nir: add nir_function::pass_flags
convenience, asahi will stash stuff here.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32382>
2024-11-27 20:02:05 +00:00
Alyssa Rosenzweig
5555769102 nir: add workgroup size to functions
for cl kernel libraries with many entrypoints. spirv can represent, nir should
be able to as well.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32382>
2024-11-27 20:02:05 +00:00
Alyssa Rosenzweig
ba30eb9f40 nir: add nir_foreach_entrypoint macros
for compiling libraries full of kernels.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32382>
2024-11-27 20:02:05 +00:00
Alyssa Rosenzweig
d8ece9bf3a nir: add nir_lower_calls_to_builtins pass
nir_builder for the GPU

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32382>
2024-11-27 20:02:04 +00:00
Georg Lehmann
3f26e9ca19 nir/opt_intrinsic: fix sample mask opt with demote
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Fixes: d3ce8a7f6b ("nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32327>
2024-11-26 18:44:39 +00:00
Georg Lehmann
22557497ec nir/opt_intrinsic: rework sample mask opt with vector alu
Purely theoretical issue, for example gl_SampleMaskIn.xx == 0.xx.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Fixes: d3ce8a7f6b ("nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32327>
2024-11-26 18:44:38 +00:00
Marek Olšák
8fc640b256 nir/lower_io_to_temporaries: fix interp_deref_at_* lowering
The pass converts:
    ...
    %.. = load_deref(input)
to:
    temp = copy_deref(input) // beginning of the shader
    ...
    %.. = load_deref(temp)

If interp_deref_at_* occurs between copy_deref and load_deref,
the interp_deref_at_* lowering overwrites temp, so all future
load_deref(temp) return the result of interp_deref_at_* instead of
copy_deref, which is incorrect.

The issue manifests when the same input is used by both load_deref
and interp_deref_at_* in the same shader and when interp_deref_at_*
happens to be before load_deref.

This fixes it by using a completely new temporary for each instance
of interp_deref_at_*.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32344>
2024-11-26 06:50:40 +00:00
Marek Olšák
c23abb12e8 nir: allow cloning indirect array derefs in nir_clone_deref_instr
but only if cloning within the same shader. This will be used to fix
nir_lower_io_to_temporaries.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32344>
2024-11-26 06:50:40 +00:00