The DXIL backend would like to distinguish between casts to 16-bit
that must cast, vs those that may. If a shader only ever produces
16-bit types from mediump casts and ALU ops on those values, then
the resulting shader can be annotated with DXIL's min-precision
qualifier, basically telling the driver to use 16-bit precision if
it's faster for them. If it uses concrete 16-bit casts, or loads/
stores to externally-visible memory, then it must use the "native"
16-bit flag, which is not supported on all hardware.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23344>
Intel HW has multiple ways to access resources like UBO/SSBO/images :
- binding tables : a small ~240 heap of surfaces
- bindless surfaces : a 64Mb heap of surfaces up to Gfx12+, 4Gb on Gfx12.5+
- surfaces : a 4Gb heap on Gfx12.5+ (mostly unused at the moment,
only available through the LSC)
For samplers, we have 2 options since Gfx11+ :
- samplers indexed from the Dynamic State Heap (4Gb)
- samplers indexed from the Bindless Sampler Heap (4Gb)
Additionally our whole push constant promotion mechanism is based
around binding table indices. This is problematic if you want to also
promote to push constants things that would be accessed through the
bindless heap.
To solve this issue, we introduce a new intrinsic that will cary a
block index that is not based off the binding table index nor the
bindless table offset.
We will also use this intrinsic to identify whether the buffer/surface
index in load_ubo/load_ssbo/store_ssbo/etc... is relative to the
binding table or the bindless heap.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Some instruction we would like to keep around because they carry
additional information in their indices.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
Hardware that lacks dedicated image atomics can still implement image atomics
with regular atomics on global memory, as long as there is a way to get the
address of a texel in memory. I've open-coded this lowering in my first 2
compilers, so before I add another crappy vendored version in my 3rd, let's add
a common NIR pass to do the lowering.
Thanks to unified atomics, the pass itself is fairly concise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23120>
We're about to need this in another place, so let's move it to common
nir code, and clean up the name a bit.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22755>
If nobody has added def-use lists for registers in all this time, it's probably
because we don't want them after all ;)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23107>
We'd like to postpone most int64 lowering until pretty late in the
process, because e.g. turning iadd@64 into (unpack + add-low + add-high
+ compare + b2i32 + repack) sequences makes it difficult for many
optimization passes to detect basic arithmetic patterns. In particular,
nir_opt_load_store_vectorizer becomes unable to handle basic offset math
on 64-bit addresses.
We'd like to do double precision lowering earlier in the process,
however. One snag is that nir_lower_int64's lower_2f and lower_f2 can
produce operations that may need lowering by nir_lower_doubles(), so
it's crucial to run those sets of lowering together.
To handle this, we make a new entrypoint that does nir_lower_int64
but skips everything except float conversions. Note that the newly
produced instructions will still be lowered according to the full set
of int64 lowering options; this shouldn't be a huge deal.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23064>
Similar to nir_ssa_dest_init, but with fewer call sites to churn through.
This was done with the help of Coccinelle:
@@
expression A, B, C, D;
@@
-nir_ssa_dest_init_for_type(A, B, C, D);
+nir_ssa_dest_init_for_type(A, B, C);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:
@@
expression A, B, C, D, E;
@@
-nir_ssa_dest_init(A, B, C, D, E);
+nir_ssa_dest_init(A, B, C, D);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
This should make writing some lowering/meta code easier. It also keeps
the num_inputs/outputs updated, when sometimes passes forgot to do so (for
example, nir_lower_input_attachments updated for one of the two vars it
creates). The names of the variables change in many cases, but it's
probably nicer to see "VERT_ATTRIB_POS" than "in_0" or whatever.
I've only converted mesa core (compiler and GL), not all the driver meta
code.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22809>
In the future, we'd like to have all drivers only ingest unified atomics, and
all frontends only produce unified atomics, and garbage collect the existing
non-unified atomics. To get to that future, it's a lot nicer to convert drivers
one-by-one. Add a pass to translate old-style atomics to new-style atomics so
drivers can opt-in to the new form one-by-one. Once all drivers are converted,
we can convert producers one-by-one. Finally, we can just drop the calls to the
pass and garbage collect this pass and the old atomics. That's probably a while
out, though, so this will be out bridge to get there.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
Currently, we have an atomic intrinsic for each combination of memory type
(global, shared, image, etc) and atomic operation (add, sub, etc). So for m
types of memory supported by the driver and n atomic opcodes, the driver has to
handle O(mn) intrinsics. This makes a total mess in every single backend I've
looked at, without fail.
It would be a lot nicer to unify the intrinsics. There are two obvious ways:
1. Make the memory type a constant index, keep different intrinsics for
different operations. The problem with this is that different memory types
imply different intrinsic signatures (number of sources, etc). As an
example, it doesn't make sense to unify global_atomic_amd with
global_atomic_2x32, as an example. The first takes 3 scalar sources, the
second takes 1 vector and 1 scalar. Also, in any single backend, there are a
lot more operations than there are memory types.
2. Make the opcode a constant index, keep different intrinsics for different
operations. This works well, with one exception: compswap and fcompswap
take an extra argument that other atomics don't, so there's an extra axis of
variation for the intrinsic signatures.
So, the solution is to have 2 intrinsics for each memory type -- for atomics
taking 1 argument and atomics taking 2 respectively. Both of these intrinsics
take an nir_atomic_op enum to describe its operation. We don't use a nir_op for
this purpose, as there are some atomics (cmpxchg, inc_wrap, etc) that don't
cleanly map to any ALU op and it would be weird to force it.
The plan is to transition to these new opcodes gradually. This series adds a
lowering pass producing these opcodes from the existing opcodes, so that
backends can opt-in to the new forms one-by-one. Then we can convert backends
separately without any cross-tree flag day. Once everything is converted, we can
convert the producers and core NIR as a flag day, but we have far fewer
producers than backends so this should be fine. Finally we can drop the old
stuff.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
The pattern shows up all the time open-coded. Use the macro instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22967>
Serious preprocessor voodoo here. There are two tricks here.
1. Iterating only phis. We know that phis come only at the beginning of a block,
so all over the tree, we open-code iteration like:
nir_foreach_instr(instr, block) {
if (instr->type != phi)
break;
/* do stuff */
}
We can express this equivalently as
nir_foreach_instr(instr, block)
if (instr->type != phi)
break;
else {
/* do stuff */
}
So, we can define a macro
#define nir_foreach_phi(instr, block)
if (instr->type != phi)
break;
else
and then
nir_foreach_phi(..)
statement;
and
nir_foreach_phi(..) {
...
}
will expand to the right thing.
2. Automatically getting the phi as a phi. We want the instruction to go to some
hidden variable, and then automatically insert nir_phi_instr *phi =
nir_instr_as_phi(instr_internal); We can't do that directly, since we need to
express the assignment implicitly in the control flow for the above trick to
work. But we can do it indirectly with a loop initializer.
for (nir_phi_instr *phi = nir_instr_as_phi(instr_internal); ...)
That loop needs to break after exactly one iteration. We know that phi
will always be non-null on its first iteration, since the original
instruction is non-null, so we can use phi==NULL as a sentinel and express a
one-iteration loop as for (phi = nonnull; phi != NULL; phi = NULL).
Putting these together gives the macros implemented used.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22967>
We have a few ALU instructions that take a constant source. Technically, they
have a swizzle so you can't just nir_src_as_uint them, even though a bunch of
backends do. To help backends do the right thing, add a helper that's just as
easy to use that will chase the swizzle properly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
When rendering a scaled tile, we need to use the original, hardware
FragCoord when accessing input attachments that are on-tile (i.e. were
rendered to in a previous subpass) because they are also scaled in the
same way that FragCoord is scaled. For input attachments that aren't
already on-tile, however, we need to use the fixed gl_FragCoord. Add a
new intrinsic and a bitfield of input attachments which should use it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
It is undefined behavior when an ARB assembly or shadow2d GLSL func
uses SHADOW2D target with a texture in not depth format.
In this case AMD and NVIDIA automatically replaces SHADOW sampler
with a normal sampler and some games like Penumbra Overture which abuses
this UB works fine but breaks with mesa.
Replace the shadow sampler with a normal one here by recompiling
the ARB program variant
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8425
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22147>
Previously the passthrough gs shader loaded some values with uniform
loads using sevaral hardcoded values.
This was not flexible for other drivers and started becoming too
unflexible for zink itself.
Use system values instead and use a lowering pass in zink.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22667>
If we know the next stage, we can tell whether an output is a sysval,
such as POS.
For example, POS is not a sysval output if the next stage is not FS.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21861>
Like nir_instr_rewrite_ssa but without the asserted extra argument. Works on ifs
too, now that we have a unified use list.
We do need to assert that the source has actually been inserted and has valid
use/def chains. Previously, asserting on the parent instruction accomplished
that indirectly. For the more general helper, we instead directly assert that
there exists a non-null parent, whatever it is.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
We can now determine whether a nir_src is for an if without a sideband, so
simplify the function signature.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Faith Ekstrand <faith@gfxstrand.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Zink will now handle flat interpolation correctly when line loops
are generated from primitives.
The flat shading information is passed to the emulation gs using constant
uniforms which get inlined.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` now allows the user to force the generated GS
to always output a line strip from the primitive
regardless of whether edgeflags are present.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` will now take a boolean argument to decide
whether it needs to handle edgeflags.
When true is passed it will output a line strip where edges that
shouldn't be visible are not emitted.
This is usefull because geometry shaders will generally throw away
edgeflags so for a passthrough GS to act transparently it needs to emulate them.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
`nir_create_passthrough_gs` has been changed to take the type of primitive
as opposed to the number of vertices as an argument.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
For GLSL, we want to optimize code like
memoryBarrierBuffer();
controlBarrier();
into a single scoped_barrier intrinsic for the backend to consume. Now that
backends can get scoped_barriers everywhere, what's left is enabling backends to
combine these barriers together. We already have an Intel-specific pass for
combining memory barriers; it just needs a teensy bit of generalization to allow
combining all sorts of barriers together.
This avoids code quality regression on Asahi when switching to purely scoped
barriers. It's probably useful for other backends too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>
Some backends can handle a constant texture index or a dynamic texture index but
not a constant texture index plus a dynamic texture offset. Add a nir_lower_tex
option to lower to one of these options.
v2: Use more straightforward code proposed by Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>
It can sometimes be useful to also print the shaders that are marked as
internal, so let's add a flag that lets us do that.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21681>
This NIR pass lowers stores in fragment shaders to:
if (!gl_HelperInvocaton) {
store();
}
This implements the API requirement that helper invocations do not have visible
side effects, and the lowering is required on any hardware that cannot directly
mask helper invocation's side effects. The pass was originally written for
Midgard (which has this issue) but is also needed for Asahi. Let's share the
code, and fix it while we're at it.
Changes from the Midgard pass:
1. Add an option to only lower atomics.
AGX hardware can mask helper invocations for "plain" stores but not for
atomics. Accordingly, the AGX compiler wants this lowering for atomics but
not store_global. By contrast, Midgard cannot mask any stores and needs the
lowering for all store intrinsics. Add an option to the common pass to
accommodate both cases.
This is an optimization for AGX. It is not required for correctness, this
lowering is always legal.
2. Fix dominance issues.
It's invalid to have NIR like
if ... {
ssa_1 = ...
}
foo ssa_1
Instead we need to rewrite as
if ... {
ssa_1 = ...
} else {
ssa_2 = undef
}
ssa_3 = phi ssa_1, ssa_2
foo ssa_3
By default, neither nir_validate nor the backends check this, so this doesn't
currently fix a (known) real bug. But it's still invalid and fails validation
with NIR_DEBUG=validate_ssa_dominance.
Fix this in lower_helper_writes for intrinsics that return data (atomics).
3. Assert that the pass is run only for fragment shaders. This encourages
backends to be judicious about which passes they call instead of just
throwing everything in a giant lower everything spaghetti.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21413>