This allows unit tests to compare against a reference nir shader instead
of implementing checks for interesting instructions/CF nodes.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32644>
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.
Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
The load_*_uniform_block_intel intrinsics always load either 8x or 16x
32-bit components worth of data (so 32 byte increments). This leads to
cases where we load a few components from one vec8, followed by a few
components of an adjacent vec8. We want to combine those into a vec16
load, as that loads a whole cacheline at a time, and requires less hoops
to calculate addresses and request memory loads.
So, we allow 7 * 4 = 28 bytes of holes, which handles vec8+vec8 where
only the .x component is read.
Most drivers and intrinsics will not want such large holes. I thought
about adding a per-intrinsic max_hole to the core code, but decided that
since we already have driver callbacks, we can just rely on them to
reject what makes sense to them.
No driver callbacks currently allow holes, so this should not currently
affect any drivers. But any work in progress branches may need to be
updated to reject larger holes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
If the condition of the loop terminator is based on an unsigned value we
can in some cases find the max number of possible loop trips. With the
max loop trips know a complex unroll can unroll the loop.
For example:
uniform uint x;
uint i = x;
while (true) {
if (i >= 4)
break;
i += 6;
}
The above loop can be unrolled even though we don't know the initial
value of the induction variable because it can have at most 1 iteration.
There were no changes with my shader-db collection. Change was inspired
by MR #31312 where builtin shader code failed to unroll.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31701>
If we have NIR such as:
32x4 %48 = @load_vulkan_descriptor (%47) (desc_type=SSBO)
32x4 %76 = deref_cast (tint_symbol_11 *)%48 (ssbo tint_symbol_11) (ptr_stride=0, align_mul=4, align_offset=0)
32x4 %77 = deref_struct &%76->tint_symbol_10 (ssbo int) // &((tint_symbol_11 *)%48)->tint_symbol_10
A single nir_rematerialize_deref_in_use_blocks() will rematerialize the
deref_struct and then it's deref_cast. However,
nir_foreach_instr_reverse_safe is not safe if the next iteration's
instruction is removed. This can result in the instruction loop exiting
and the load_vulkan_descriptor never having an LCSSA phi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 439e8c42cc ("nir/lcssa: Fix rematerializing derefs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11770
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32225>
If there is a 4-byte hole between 2 loads, drivers can now optionally
vectorize the loads by including the hole between them, e.g.:
4B load + 4B hole + 8B load --> 16B load
All vectorize callbacks already reject all holes, but AMD will want to
allow it.
radeonsi+ACO with the new vectorization callback:
TOTALS FROM AFFECTED SHADERS (25248/58918)
VGPRs: 871116 -> 871872 (0.09 %)
Spilled SGPRs: 397 -> 407 (2.52 %)
Code Size: 43074536 -> 42496352 (-1.34 %) bytes
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
New load merging transformations (first, second), examples:
(vec4, vec3) ==> vec8(read=0x7f) (because NIR doesn't have vec7)
(vec1, vec8(read=0x7f)) ==> vec8(read=0xff)
- the unused component at the end of vec8 is dropped
Not merged:
vec8(read=0xfe) + vec1
- unused components at the beginning are kept
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
If you allow an unsupported component count in the callback for loads,
nir_opt_load_store_vectorize will align num_components to the next supported
vector size, essentially overfetching.
This changes all callbacks to reject it. AMD will enable it in a later commit.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
It will be used to allow merging loads with a hole between them.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
Backward inter-shader code motion can move any code into the previous
shader if it only uses convergent inputs. The problem is the final input
type can end up being integer or FP64, which is incompatible with
the assumption that convergent inputs can always be interpolated.
If such a case occurs and the type is integer or FP64, either don't
do any code motion, or if the driver exposes the new flag, rewrite
convergent loads to use load_input.
If the new flag is supported, all convergent loads are rewritten to use
load_input, and flat varyings are allowed to be classified as convergent,
which means they are packed into interpolated vec4 slots if there are
unused components.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>
Here we test the rematerialization of the deref produces valid nir
when both the deref and array index value are moved to the else branch of
the first terminator during the merge.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29686>
The layer output is added in ac_nir_lower_ngg which is called
later than this pass; prevent deleting layer input from FS here.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28685>
Backward inter-shader code motion turns ALU results into outputs,
which led to getting IO with unsupported bit sizes. This prevents
that.
There is a new NIR option flag that indicates 16-bit support.
Fixes: c66967b5cb - nir: add nir_opt_varyings, new pass optimizing and compacting varyings
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28431>
When we use source from ALU instruction directly, the default swizzle array
should be populated with the same amount of components as the src has.
Otherwise, if we use nir_ssa_alu_instr_src_components, it can return
the destination components count that is lower than component index
actually used in that source. This can lead to false equality
between 0 (uninitialized) and 0 (.x) in swizzle comparison below.
Fixes: c6ee46a7 ("nir: Add nir_alu_srcs_negative_equal")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8704
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22655>
Testing negative iterations count makes no sense, and can cause issues
when the unsigned type is used.
Testing 0 iterations is already covered with
will_break_on_first_iteration, so it can be skipped too.
Fixes: 6772a17a ("nir: Add a loop analysis pass")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9913
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26173>
In the tagged commit, we stopped actually inverting the condition, and
instead relied on the "invert_cond" flag. But we missed a few places
where this flag should've been handled too.
Also, add a few more tests to make sure this won't regress in the future.
Fixes: 99a7a664 ("nir/loop_analyze: Change invert_cond instead of changing the condition")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10012
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26024>
In the linear allocation only the parent (context) can be used
to allocate new children, so let's use an opaque type to identify
the linear context. This is similar to what's done in GC allocator.
Update the documentation and a couple of function names to
refer to linear context instead of linear parent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
None of the callsites took advantage of this, so remove
the feature. This will help to a next change that will
add an opaque type to represent a linear parent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25280>
The nir_op_is_vec() helper I added in 842338e2f0 ("nir: Add a
nir_op_is_vec helper") treats nir_op_mov as a vec even though the
semanitcs of the two are different. In retrospect, this was a mistake
as the previous three fixup commits show. This commit splits the helper
into two: nir_op_is_vec() and nir_op_is_vec_or_mov() and uses the
appropriate helper at each call site. Hopefully, this rename will
encurage any future users of these helpers to think about nir_op_mov as
separate from nir_op_vecN and we can avoid these bugs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24704>
Instead, we replace every use of it with nir_def. Most of this commit
was generated by sed:
sed -i -e 's/dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
A few manual fixups were required in lima and the nir_legacy code.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
Instead, we replace it directly with nir_def. We could replace it with
nir_dest but the next commit gets rid of that so this avoids unnecessary
churn. Most of this commit was generated by sed:
sed -i -e 's/dest.dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
There were a few manual fixups required in the nir_legacy.c and
nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a
similar pattern.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>