These are different (though related) instructions. I've split them in applegpu,
let's mirror that here. This simplifies the IR a bit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
The driver needs to lower MSAA (because only it knows the sample count). MSAA
lowering depends on discards getting lowered (in order to get sample masks on the
discards for sample shading to work properly). Discard lowering depends on all
discards emitted. But the driver needs to lower clip planes which generates
discards. To break the circular dependency, we have the driver call the
discard lowering pass itself (in between lowering clip planes and lowering
MSAA). Technically, this is probably a layering violation but it's the least
gross solution I see.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
We already lower discard in NIR when depth/stencil writes are used in the
shader. In this patch, we extend that lowering for when depth/stencil writes are
not used, in which case the discard is lowered to a sample_mask instruction.
This is a step towards multisampling, since the old lowering assumed
single-sample and there's no way to express a sample mask with a standard NIR
discard instructions so we need to lower in NIR anyway for sample shading (i.e.
if a discard_if diverges between samples in a pixel).
This changes the lowering for discard_if to be free of control flow (instead
executing a sample mask instruction unconditionally). This seems to be slightly
faster in SuperTuxKart and slightly slower in Dolphin, but I'm not too worried
right now.
To make this work, we do need some extra lowering to ensure we always execute a
sample_mask instruction, in case a discard_if is buried in other control flow
(as occurs with Dolphin's ubershaders). So that's added too. We need that for
MSAA anyway, so pardon the line count.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
Including indirectly via discard/demote.
Fixes graphical artefacts in Chromium when API sample masks are hooked up, which
will result in fragment programs that do not write colour/depth but do a lone
sample mask write. These need tag writes enabled (according to a trace from
Metal for a case constructed to test this scenario).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
We need to control both sources to implement multisampling properly. The
semantic is something like:
foreach sample in the first mask {
if correspond bit in second bit set {
make sample live
} else {
make sample dead
}
}
But I'm reticent to document more formally until the details are really
understood and properly tested.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
If we read gl_SampleID we need the lowering, even though we don't call into
gather_info to set the bit for us. So set the bit manually.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
Via Coccinelle patch:
@@
expression a, b, c;
@@
-a.src = nir_src_for_ssa(b);
-a.src_type = c;
+a = nir_tex_src_for_ssa(c, b);
@@
expression a, b, c;
@@
-a.src_type = c;
-a.src = nir_src_for_ssa(b);
+a = nir_tex_src_for_ssa(c, b);
Plus manual fixups, including...
* a few identity swizzles changed to nir_trim_vector in TTN and prog-to-nir to
fix the Coccinelle-botched formatting, and similarly a pointless nir_channels
* collapsing a now-pointless temp in vtn
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
Now, that the foreach macro list is complete (I hope), let's reformat
drivers that enforce correct formatting in CI.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23275>
This contains a bugfix: execution scopes are now respected when combining
barriers. Otherwise control barriers can disappear during combining, which is
wrong.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23181>
nir_registers are only supposed to be used temporarily. They may be created by a
producer, but then must be immediately lowered prior to optimizing the produced
shader. They may be created internally by an optimization pass that doesn't want
to deal with phis, but that pass needs to lower them back to phis immediately.
Finally they may be created when going out-of-SSA if a backend chooses, but that
has to happen late.
Regardless, there should be no case where a backend sees a shader that comes in
with nir_registers needing to be lowered. The two frontend producers of
registers (tgsi_to_nir and mesa/st) both call nir_lower_regs_to_ssa to clean up
as they should. Some backend (like intel) already depend on this behaviour.
There's no need for other backends to call nir_lower_regs_to_ssa too.
Drop the pointless calls as a baby step towards replacing nir_register.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23181>
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:
@@
expression A, B, C, D, E;
@@
-nir_ssa_dest_init(A, B, C, D, E);
+nir_ssa_dest_init(A, B, C, D);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
There are no more producers of legacy atomics so these calls are inert.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23036>
This tests the following matrix:
- Format: RGBA8Unorm, RGBA16Unorm, RGBA32Float
- Samples: 2 or 4
- Layers: 1 or 2
- Width: Interesting values 1..4097
- Height: Interesting values 1..4097
Compression is based on the dimensions (that is, everything that can be
compressed is). This test compares both the total texture size and the
compression metadata offset.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
We need to support up to 16 bytes/sample * 4 samples/pixel = 64 bytes/pixel for
multisampling to work with formats like RGBA32F.
Fixes dEQP-GLES3.functional.fbo.msaa.4_samples.rgba32f
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
For multisampled textures, the decision about whether to compress or not
is based on the effective width and height in samples, not pixels.
Introduce ail_can_compress() to encode this logic in ail, so the driver
can use it to decide whether to compress or not before the full layout
is determined.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
BOs can be written from several contexts, so writing to this member is
racy. We only care about this for the purposes of exporting BOs after a
submission (and if the app is racing writers/submissions at that point
all bets are off), so just keeping track of the last written value is
sufficient.
Switch to atomic operations to eliminate the race, and drop the assert
in the batch cleanup path that no longer holds when the BO might have
been written to from another context.
Fixes: asahi/mesa#20
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
Core and opfifo stuff from the compute helper blob, vm_slot because it
was the only one changing when I poked around yesterday and it hit me
what it was ^^
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
We have an imad instruction and our iadd has a small immediate shift on the
second source. Together, these allow expressing lots of integer multiplies more
efficiently. Add some rules to optimize these now that the backend compiler can
ingest the optimized forms.
Half-register changes are from load_const scheduling changing in some vertex
shaders.
total instructions in shared programs: 1539092 -> 1537949 (-0.07%)
instructions in affected programs: 167896 -> 166753 (-0.68%)
total bytes in shared programs: 10543012 -> 10533866 (-0.09%)
bytes in affected programs: 1218068 -> 1208922 (-0.75%)
total halfregs in shared programs: 483180 -> 483448 (0.06%)
halfregs in affected programs: 1942 -> 2210 (13.80%)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
This is just a sanity check, I haven't actually hit this case but if we
ever do something is very broken (e.g. BO refcounting bug).
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
RA asserts this, but by then if you've messed it up, the failure is inscrutable.
Let's check it in the validator for more pleasant debugging.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
We were being sloppy with the sizes before. It mostly worked out, but there were
some corner cases where we would end up with mixed sized collects and that won't
end well for us. Let's rework the logic to make all the sizes explicit in NIR --
32-bit for depth and 16-bit stencil -- and then do the needed promotions to make
it happen in the AGX IR side.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
In case .x isn't read, it'll be null which has the wrong size and will fail
the validation added later in this series. We fix this by padding with sized
undefs (something that exists of defined size but undefined value) rather than
nothingness (of undefined size).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Sometimes a shader might index with a non-power-of-two stride. For example, if
it's indexing into an array of structures where the structure size is not a
power of two, we'll get a multiply with a constant as opposed to a shift. We
want to handle these cases, too. To do so, we generalize our pattern matching to
look for any kind of multiply (with our new helper), rather than hardcoding
logic for ishl. This eliminates right-shifts in a pile of compute shaders, which
makes me happy from a "I read lots of shader assembly when debugging"
perspective.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Currently, we hardcode logic in the addressing chasing code to look for ishl
instructions that shift by constants. We can generalize this to looking for
integer multiplies by constants to optimize more addressing patterns. Add a
helper to do so.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This is totally pointless. This saves some waits at the ends of compute kernels
(waiting for stores to complete before terminating the thread). I don't know
how much this would matter for performance, since the hardware may have to do
these waits internally, but it makes the generated code less silly which is
always nice.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Comparison with PowerVR's XML shows that this is the actual name... And it needs
to be set a bit more carefully than "no colour output" in order to get correct
behaviour for depth-only passes that use sample mask / discard. Fix the name
first, the extra conditions will come when they're needed for multisampling.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
When possible. Only occassionally possible because the loads are pretty limited
in the addressing arithmetic. This probably doesn't matter for performance but
it saves some noise in dEQP tests which makes for nicer debugging, plenty of
optimizations end up worth it for that alone.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>