Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Zink will now handle flat interpolation correctly when line loops
are generated from primitives.
The flat shading information is passed to the emulation gs using constant
uniforms which get inlined.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` now allows the user to force the generated GS
to always output a line strip from the primitive
regardless of whether edgeflags are present.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` will now take a boolean argument to decide
whether it needs to handle edgeflags.
When true is passed it will output a line strip where edges that
shouldn't be visible are not emitted.
This is usefull because geometry shaders will generally throw away
edgeflags so for a passthrough GS to act transparently it needs to emulate them.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` has been changed to take the type of primitive
as opposed to the number of vertices as an argument.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
For GLSL, we want to optimize code like
memoryBarrierBuffer();
controlBarrier();
into a single scoped_barrier intrinsic for the backend to consume. Now that
backends can get scoped_barriers everywhere, what's left is enabling backends to
combine these barriers together. We already have an Intel-specific pass for
combining memory barriers; it just needs a teensy bit of generalization to allow
combining all sorts of barriers together.
This avoids code quality regression on Asahi when switching to purely scoped
barriers. It's probably useful for other backends too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>
Some backends can handle a constant texture index or a dynamic texture index but
not a constant texture index plus a dynamic texture offset. Add a nir_lower_tex
option to lower to one of these options.
v2: Use more straightforward code proposed by Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>
It can sometimes be useful to also print the shaders that are marked as
internal, so let's add a flag that lets us do that.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21681>
This NIR pass lowers stores in fragment shaders to:
if (!gl_HelperInvocaton) {
store();
}
This implements the API requirement that helper invocations do not have visible
side effects, and the lowering is required on any hardware that cannot directly
mask helper invocation's side effects. The pass was originally written for
Midgard (which has this issue) but is also needed for Asahi. Let's share the
code, and fix it while we're at it.
Changes from the Midgard pass:
1. Add an option to only lower atomics.
AGX hardware can mask helper invocations for "plain" stores but not for
atomics. Accordingly, the AGX compiler wants this lowering for atomics but
not store_global. By contrast, Midgard cannot mask any stores and needs the
lowering for all store intrinsics. Add an option to the common pass to
accommodate both cases.
This is an optimization for AGX. It is not required for correctness, this
lowering is always legal.
2. Fix dominance issues.
It's invalid to have NIR like
if ... {
ssa_1 = ...
}
foo ssa_1
Instead we need to rewrite as
if ... {
ssa_1 = ...
} else {
ssa_2 = undef
}
ssa_3 = phi ssa_1, ssa_2
foo ssa_3
By default, neither nir_validate nor the backends check this, so this doesn't
currently fix a (known) real bug. But it's still invalid and fails validation
with NIR_DEBUG=validate_ssa_dominance.
Fix this in lower_helper_writes for intrinsics that return data (atomics).
3. Assert that the pass is run only for fragment shaders. This encourages
backends to be judicious about which passes they call instead of just
throwing everything in a giant lower everything spaghetti.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21413>
Add a new texture opcode that returns the LOD bias of the sampler. This will be
used on AGX to lower sampler LOD bias to txb and friends. This needs to be a
texture op (and not a new intrinsic) to handle both bindless and bindful
samplers across GL and Vulkan in a uniform way.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
This pass lowers Loop Continue Constructs to the previous solution
by inserting it at the beginning of the loop:
loop {
if (i != 0) {
continue construct
}
loop body
}
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
The added continue_list corresponds to the SPIR-V
Continue Construct and serves as a converged control-flow
construct and is executed after each continue statement
and before the next iteration of the loop body.
Also adds validation rules for loops with Continue Construct
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
This is based on brw_nir_lower_mem_access_bit_sizes() but ended up being
substantially different. While the core concepts are all the same, the
brw_* version made a lot of Intel-specific assumptions. The new version
takes a callback which takes a number of bytes of data and an alignment
pair and returns a bit size and number of components to load/store.
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21232>
While making the function public, rename it to
nir_collect_src_uniforms. The old name makes it sound like it's just a
query that doesn't have side effects. That is, however, not the case.
This is step 4 in an attempt to unify a bunch of nir_inline_uniforms.c
and lvp_inline_uniforms.c code.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21179>
Add a second NIR pass for lowering point/texture coordinate replacement (i.e.
point sprites). Why a second one? The current pass works on derefs/variables,
which is good for drivers that don't lower I/O at all (like Zink, where the pass
originates). However, it is problematic for hardware drivers: the inputs to this
pass depend on the shader key, so we want to run the pass as late as possible to
minimize the cost of building/compiling the associated shader variants. In
particular, we need to be able to lower point sprites after lowering I/O if we
would like to lower I/O when preprocessing NIR.
The logic for early lowering and late lowering is considerably different (the
late lowering is a lot simpler), so I've split this out into a second pass
rather than trying to weld them together into one.
This pass will be used on Asahi, which currently uses the early pass. It may be
useful for other drivers as well. (Actually, it's been shipping on Asahi for a
little while now, just hasn't been sent upstream yet.)
Tested with Neverball.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Asahi Lina <lina@asahilina.net>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21065>
This can be used by multiple drivers that do not support ms images
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewer-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Amber Amber <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20813>
Same as pack_half_2x16_rtz_split, but always uses RTZ mode.
Note that pack_half_2x16 rounding mode is unspecified.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
Some drivers may encode constant offsets in the instruction, so
make it possible for the drivers to request lowering the atomic
uniform offset into the range_base variable of the intrinsic.
v2: drop patch to use build-in array offset evaluation, it makes
problems with zink, and update the code accordingly
v3: always initialize range base
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
Add the intrinsic range_base value to the image intrinsics and add
the option to store the image array offset into range_base instead
of adding it to the image array index if the driver requests it.
v2: Always initialize range_base
v3: fix for bindless intrinsics
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19980>
This is heavily copy-pasted from a patch of Ian Romanick, including the
commit message.
Previously, this pass always generated fcsel for bcsel. This was the
only place that generate fcsel, so various drivers assumed (and needed!)
that src0 was a Boolean with 0.0 or 1.0 as the only values.
Specifically, many DX9 / GL_ARB_vertex_program platforms lack a CMP
instruction in vertex shaders. In those cases, they would use LRP to
implement fcsel. The bummer is that many plaforms have a real fcsel
instruction, and those platforms would benefit from other places
generating that opcode.
Instead of leaving assumptions in drivers about the sources of an opcode
that they can't really support, allow them to control the way the
lowering pass translates bcsel. Two flags are used to control this:
- If the driver sets has_fused_comp_and_csel in nir_options, fcsel_gt
will be used. Since the Boolean value is 0.0 or 1.0, this is
equivalent to fcsel.
- If the parameter has_fcsel_ne is set, fcsel will be used. This is the
old path.
- Otherwise, the lowering pass assumes we're on a crufty, old DX9 vertex
program, and it emits flrp.
With this, the assumptions about src0 of fcsel in NTT can be removed.
If a platform can't handle fcsel, it should ensure that the lowering
pass won't generate it.
No change in shader-db.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
Some HW may be able to fold only some of dst types, e.g.
for Adreno folding i32 -> i16 could cause a different result since
folded variant clamps the result instead of masking it.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20396>
nir_lower_is_helper_invocation lowers intrinsic_is_helper_invocation
and uses load_helper_invocation (which is lowered by nir_lower_system_values).
While nir_lower_system_values may lower SYSTEM_VALUE_HELPER_INVOCATION
into intrinsic_is_helper_invocation.
So they depend on each other. Break the dependency by making
nir_lower_is_helper_invocation aware of lower_helper_invocation option
and emitting lowered load_helper_invocation when required.
Happens with SPIR-V 1.6 for which gl_HelperInvocation is translated into
"BuiltIn HelperInvocation" + "Volatile", which nir_lower_system_values
translates into is_helper_invocation.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19677>
Like lower_to_fragment_fetch_amd option in lower tex,
this is for radeonsi to lower MS image ops.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
Like nir_texop_fragment_mask_fetch_amd, this is used to load multi
sample image fmask data for AMD GPU.
We will lower multi sample image load and samples_identical intrinsics
to use it latter for radeonsi. RADV does not need this because it
always expand fmask images before dispatch compute shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18666>
We need this, because on Intel task payload starts with private header,
followed by user-accessible data.
Fixes: 37e78803d7 ("intel/compiler: use nir_lower_task_shader pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
Based on nir_create_passthrough_tcs and d3d12_make_passthrough_gs, this
creates a passthrough geometry shader that can be used by drivers that
needs to emulate some graphics features in the geometry shader.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19987>
We cannot fully use the vectorizer outside of this pass because once
stack load/store operations have been lower to global load/store, the
robustness rule applies to those as they would to application
load/store.
But this is all internal and we know it doesn't require out of bound
checking. So doing the vectorizing here is the best solution. We just
have to teach the vectorizer about our intrinsics.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20058>
We can determice scopes/ranges of the use of ray queries and use this information to combine ray queries.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16593>