Without fetch_inactive, these instructions need to return 0 for inactive lanes
and peephole_select changes which instructions are inactive.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30540>
ir3's lowering of variables to scratch memory has to treat 8-bit values as
16-bit ones when comparing such value's size against the given threshold
since those values are handled through 16-bit half-registers. But those
values can still use natural 8-bit size and alignment for storing inside
scratch memory.
nir_lower_vars_to_scratch now accepts two size-and-alignment functions,
one used for calculating the variable size and the other for calculating
the size and alignment needed for storing inside scratch memory. Non-ir3
uses of this pass can just duplicate the currently-used function. ir3
provides a separate variable-size function that special-cases 8-bit types.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>
bunch of vendor intrinsics, plus some standard intrinsics used in weird shader
stages.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>
on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in
this pass is redundant. and would cause a problem since we'd then have to lower
the "is helper inv?" flag late. so just skip the extra lowering code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>
There is no need to compute it in the shader as the result is known at
runtime already.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30467>
nir.h includes:
- "compiler/glsl_types.h" -> inc_src is needed
- "util/u_atomic.h" -> "no_extern_c.h" -> inc_include needed
This makes it possible to use rust's bindgen with only nir.h
as specified include.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30359>
Backward inter-shader code motion can move any code into the previous
shader if it only uses convergent inputs. The problem is the final input
type can end up being integer or FP64, which is incompatible with
the assumption that convergent inputs can always be interpolated.
If such a case occurs and the type is integer or FP64, either don't
do any code motion, or if the driver exposes the new flag, rewrite
convergent loads to use load_input.
If the new flag is supported, all convergent loads are rewritten to use
load_input, and flat varyings are allowed to be classified as convergent,
which means they are packed into interpolated vec4 slots if there are
unused components.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>
I'm not sure if two otherwise equal texture instructions ever have sources
in different orders, but they should be considered equal.
ministat of nir_opt_cse:
N Min Max Median Avg Stddev
x 9 6.586801 6.718673 6.682875 6.6621411 0.047817119
+ 9 6.519098 6.609235 6.552997 6.5605604 0.028879587
Difference at 95.0% confidence
-0.101581 +/- 0.0394755
-1.52475% +/- 0.585928%
(Student's t, pooled s = 0.0395)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30145>
This is faster.
ministat of nir_opt_cse:
N Min Max Median Avg Stddev
x 9 6.724212 6.84511 6.788336 6.7873378 0.034363882
+ 9 6.586801 6.718673 6.682875 6.6621411 0.047817119
Difference at 95.0% confidence
-0.125197 +/- 0.0416115
-1.84456% +/- 0.609248%
(Student's t, pooled s = 0.0416374)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30145>
ministat of nir_opt_cse:
N Min Max Median Avg Stddev
x 9 7.393408 7.490593 7.434056 7.4338972 0.028150325
+ 9 6.724212 6.84511 6.788336 6.7873378 0.034363882
Difference at 95.0% confidence
-0.646559 +/- 0.0313916
-8.69745% +/- 0.407925%
(Student's t, pooled s = 0.0314111)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30145>
In updates (not post at the time of this writing) to !29884, a change
caused many spill and fill regressions shader for OpenGL Tomb
Raider. While looking at that shader, I noticed some odd patterns. I
initially added these patterns to counteract the regressions caused by
the other change, but I had no luck. On Ice Lake... this cuts 99
instructions from the shader.
shader-db:
All Intel platforms had simliar results. (Meteor Lake shown)
total instructions in shared programs: 19732341 -> 19732295 (<.01%)
instructions in affected programs: 1744 -> 1698 (-2.64%)
helped: 1 / HURT: 0
total cycles in shared programs: 916273716 -> 916273068 (<.01%)
cycles in affected programs: 14266 -> 13618 (-4.54%)
helped: 1 / HURT: 0
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151519575 -> 151519393 (-0.00%)
Cycle count: 17208402120 -> 17208246858 (-0.00%); split: -0.00%, +0.00%
Totals from 159 (0.03% of 630198) affected shaders:
Instrs: 51970 -> 51788 (-0.35%)
Cycle count: 11474176 -> 11318914 (-1.35%); split: -1.36%, +0.01%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>
fmin(x, 0.0) must at least be le_zero, and fmax(x, 0.0) be at least be
ge_zero.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733226 -> 19731919 (<.01%)
instructions in affected programs: 196415 -> 195108 (-0.67%)
helped: 615 / HURT: 0
total cycles in shared programs: 916277979 -> 916265288 (<.01%)
cycles in affected programs: 2482535 -> 2469844 (-0.51%)
helped: 346 / HURT: 178
LOST: 2
GAINED: 1
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531355 -> 151519575 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17209372399 -> 17208402120 (-0.01%); split: -0.01%, +0.01%
Max live registers: 32016490 -> 32016514 (+0.00%)
Totals from 4307 (0.68% of 630198) affected shaders:
Instrs: 4179418 -> 4167638 (-0.28%); split: -0.28%, +0.00%
Cycle count: 1063492212 -> 1062521933 (-0.09%); split: -0.24%, +0.15%
Max live registers: 359250 -> 359274 (+0.01%)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>
Instead of having a hardcoded list of endian-independent format aliases
in the header, generate them from the format definitions.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>
In practice these are equal but the old code was semantically wrong: that
dimension is "sources" not "components". Use the correct #define.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30214>
This simple LICM pass hoists all loop-invariant instructions
from the loops' top-level control flow, skipping any nested CF.
The hoisted instructions are placed right before the loop.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28783>
This implements IEEE-754-2019 signed zero semantics for fmin/fmax, as now
required by NIR, for hardware that has busted signed zero behaviour for
fmin/fmax. Ian expressed interest in this for Intel.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
Ensure the following identities hold to match IEEE-754-2019 and upcoming NIR:
min(-0, +0) = -0
min(+0, -0) = -0
max(-0, +0) = +0
max(+0, -0) = +0
NVK uses this lowering. In a simple compute shader using fmin64 on an SSBO with
signed zero preserve required, testing the effect of this patch, the instruction
count goes from 47->52. Obviously I'm not thrilled by that but I also couldn't
find any obvious way of mitigating the issue. (Maybe NVIDIA has special hardware
support here. By instruction count, lowering all the way to int64 is a loss,
though I don't know how to count cycles on NVIDIA.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
SPIR-V strengthened the semantics around signed zero, requiring fmin(-0, +0) =
-0. Since nir_op_fmin is commutative, we must also require fmin(+0, -0) = -0 to
match, although it's unclear if SPIR-V requires that. We must strengthen NIR's
definitions accordingly.
This strengthening is additionally motivated by the existing nir_opt_algebraic
rule like:
(('fmin', a, ('fneg', a)), ('fneg', ('fabs', a))),
With the strengthened new definition, this transform is clearly exact. With the
weaker definition, the transform could change the sign of zero based on
implementation-defined behaviours which ... while, not exactly unsound, is
undesireable semantically.
...
This is probably technically a bug fix, but I'm not convinced it's worth it's
weight in backporting.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
This is more idiomatic and already #include'd.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
These are helpful now that float_controls2 exists, these are common
patterns worth factoring out into helpers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
It is more or less just a code move, but I touched
is_only_used_by_iadd(..) to match the style of the other functions in
that file.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30099>
Just add a && to the condition. This is more readable to me.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25590>
Only VK_NV_mesh_shader allows this kind of access, and no driver
advertises that extension anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25590>
We currently assume that the instruction is already inserted and we are
optimizing it away, but in the use case I have where we are hoisting
instructions into a preamble and deduplicating as we go along, that
isn't the case. Move this responsibility onto the caller, which also
makes it a bit clearer what's going on and turns this into something
more similar to an actual set.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>
This allows use cases where we copy over expression trees and
deduplicate as we go along. We can use the matching instruction to build
up the rest of the expression tree.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>