Don't use glsl_get_explicit_stride as it may return 0 for vector types,
use nir_deref_instr_array_stride instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31460>
Otherwise, we could end up with phis of derefs.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324>
If ftrunc@64 is lowered by nir_lower_doubles it is turned into a
comparable long series of 32 bit operations. If the hardware
supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64
to use some combinations with ffloor@64. They can then be turned
into a combination of fsub@64 and ffract@64 resulting in less
all-over instructions.
Fixes: 5218cff34b
nir/algebraic: avoid double lowering of some fp64 operations
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281>
If the stride we're adding to our loop counter is larger than the total
amount of shared local memory we're trying to initialize, we know the
loop will run at most one time. So we can skip emitting a loop.
Loop unrolling appears to be unable to detect this currently.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31312>
This is what callers actually want, and it simplifies nir_opt_remove_phis
because we can assume dominance meta data is valid.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
The problem with radeonsi+ACO is that UBO loads from vec4 uniforms using
only 1 component always load all 4 components. This fixes that.
We are only interested in shrinking UBO and SSBO loads, but I added more
intrinsics because why not.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29384>
This will be use by the glsl nir linker when we are combining
different shaders from the same shader stage that might have multiple
declarations of global variables across the different shaders.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>
NIR doesn't really support global instructions such as global val
initilisation. So here we add functionality to glsl_to_nir() to
put these instructions into a temporary function that will be
later inlined into main.
We give the function a name starting with gl_mesa_tmp_ as functions
starting with gl_ are reserved and will not have any clashes with
user functions, we finish the name with the blake3 of the shader
source to avoid conflicts with multiple shaders attached to a single
stage.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>
On Mali(Valhall), the bounds checking can be done when in hardware, but
for this to work properly, we need to pass the offset to the
nir_load_ssbo_address() intrinsic.
Add an offset source to the intrinsic, and adjust the lowering pass
to conditionally lower the offset addition.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
On Mali(Valhall), we have a way to load SSBO data without going through
an SSBO index -> global address translation, so let's provide a way
to tell nir_lower_ssbo() when it shouldn't lower loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164>
Inspired by a commit message in !30934, I set about optimizing the code
generated for nir_copysign. It would be possible to just implement an
opt_algebraic pattern for the specific values used by nir_copysign, but
this casts a slightly larger net.
As noted in a comment in the code, there may be variations of the
pattern that this pass misses. The opt_algebraic pattern would miss them
too.
v2: Use nir_def_replace. Suggested by Alyssa. Allow more "root"
instruction types. Suggested by Georg.
v3: Treat extract_u16(x, 0) as (x & 0x0000ffff), and treat extract_u8(x,
0) as (x & 0x000000ff).
v4: Use nir_scalar. Suggested by Georg.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>
The XCOM 2 shaders in my shader-db use iadd instead of ior.
No fossil-db changes on any Intel platform.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19787210 -> 19787034 (<.01%)
instructions in affected programs: 1187 -> 1011 (-14.83%)
helped: 6 / HURT: 0
total cycles in shared programs: 906024436 -> 906012612 (<.01%)
cycles in affected programs: 72978 -> 61154 (-16.20%)
helped: 6 / HURT: 0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>
Doing that optimization wouldn't do anything useful in this case.
nir_block_has_non_copy() is used by opt_loop_peel_initial_break().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002>
Any kind of jump prevents us from moving it to the top of the loop, not
just breaks.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002>
If this nir_if contains continues or other breaks, we can't move it
outside the loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002>
Inverse_ballot result is undefined if the input is not dynamically uniform.
And sinking out of loops might make the input divergent.
Fixes: 18a0ff137f ("nir: sink/move inverse_ballot like moves")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
Same reason as for load_ubo.
Fixes: d199d65c3a ("nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906>
Prevent the accidental passing of 0 components or bits, as it makes no sense.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Suggested-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31103>
this existed due to the min/max, per the comment. now that we don't do min/max,
the whole routine is NaN correct so the fixup is pointless.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
everybody has abs on fmul, not everyone has abs on bcsel. should help agx and
bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
this does two things:
* ignores sign of negative numbers which let us play fast and loose later in th
series
* avoids an expensive fsign instruction in favour of a cheap bitwise op
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
worse in terms of NIR instruction count but lets the fabs fold easier. (on agx,
which has fabs on comparisons and fmul but not on bcsel. should be no worse if
ISA has fabs on all 3.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
just implement what the comment says, don't be clever. the clever thing is worse
on all architectures i'm familiar with, because the fdiv will turn into
fmul+frcp.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>