Most integer immediates are only 8-bit, but load/store instructions allow their
immediate offsets to be 16-bit instead. Take advantage of this in the optimizer.
This eliminates 36% of the instructions in
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, a fitting
percentage.
Insignificant effect on dEQP-GLES31.functional.ssbo.* performance... Only a
small % of our compile-time pie is actually spent in the backend anyway (as
opposed to NIR passes or GLSL IR).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
This avoids creating silly preambles that don't actually do anything except push
a constant that we could've inlined for cheaper anyway, since nir_opt_preamble's
cost model is sensitive to e.g. constant folding.
This avoids a pointless preamble in split-hell.
As a nice bonus, this also improves compile-time on address-heavy shaders. With
a release build, CPU time in dEQP-GLES31.functional.ssbo.* reduces from 12.87s
to 10.77... a 16% improvement is nothing to sneeze at.
shader-db results are mostly noise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
Useful both for ruling out issues with shader preambles as well as (in some
cases) making for a nicer reading experience of the compiled assembly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>
It doesn't make sense, they're basically little compute kernel environments.
Noticed while debugging dEQP-GLES31.functional.fbo.no_attachments.multisample.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21710>
Sample index and layer index are both 16-bits, even though they are zero
extended for compiler simplicity in some cases. In particular this means that 2D
MSAA arrays consume 6 half-regs for their coordinates, not 8. This is what the
IR translation (actually agx_nir_lower_texture) produces, we just need to fix
the calculation in agx_read_registers to agree.
Fixes validation failure in tests like
dEQP-GLES31.functional.texture.multisample.samples_4.use_texture_color_2d_array
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21708>
This gives our shifts SM5 behaviour at the cost of a little extra ALU. That way,
we match NIR's shifts.
This fixes unsoundness of GLSL expressions like "a << (b & 31)", where the &
would mistakenly get optimized away.
Closes: #8181
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reported-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21673>
These just don't seem to work. macOS falls back to eMRT here...
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.13
from Fail -> Crash. Proper solution will come when we implement eMRT later on.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21705>
Some things were missed (like winsys) and there's still some bad include
orders lying around and some other randomness.
We should set up CI checks for this soon... ^^;;
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21687>
I'm not sure if this was always broken downstream or just got dropped at
some point, but it's definitely UAPI-agnostic and missing now that we
have all the non-UAPI bits upstream.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21677>
We expect to forward GPU fault information to userspace. Since Mesa can
get that information, we can look up the fault address to log what was
the containing or nearest BO. Add a helper for that, so it can be called
from the driver.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
With macOS support out of the way, we can start implementing a lot of
the Linux driver interface and bookkeeping without actually adding the
UAPI proper. Let's do that to reduce the size of the UAPI patchset.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
Nothing implemented, but this lets us get the batch tracking bits in,
including explicit sync/DMA-BUF integration which uses generic ioctls.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
This might be useful in the future, but it is best reimplemented in
terms of the upcoming Linux UAPI instead of having parallel codepaths.
Let's drop it.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
G13 does not support sampler descriptor LOD biasing, so this needs to be lowered
to shader code for APIs that require this functionality. Add an option to do
this lowering while doing our other backend texture lowerings. This generates
lod_bias_agx texture instructions which the driver is expected to lower
according to its binding model.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>
Hoping that I didn't miss any, this *should* add assertions
to all functions and passes which explicitly handle 'nir_loop'.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
Blend states can require masking colour. Currently, this is handled by
nir_lower_blend, which lowers masks to a read-modify-write operation as required
on Mali hardware. However, our "tilebuffer store" instruction supports a write
mask, allowing us to write only a subset of channels to the tilebuffer. It's
more efficient to use that than to emit pointless tilebuffer loads.
Note that even without tilebuffer loads, non-opaque masks don't work with opaque
pass types. Here, we handle this with a translucent pass type, which gets HSR
to do the right thing and is consistent with the pass type used previously.
However, it's a bit heavy handed -- Apple manages to use an opaque pass type
with masking but with some unknown HSR fields twiddled. IMO reverse-engineering
those details shouldn't block this because this gets us closer to optimal (just
not all the way there) and is strictly better than what we had before.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21431>
Masked stores may result in undefs after optimization. Rather than call
lower_undef_to_zero late (but get no benefit), we may as well handle ourselves
to prepare for proper undef support down the line.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21431>
A combination of control_barrier + memory_barrier but it's always seen with
those. This would be safer with scoped barriers...
Fixes dEQP-GLES31.functional.synchronization.inter_invocation.ssbo
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21326>
Per the hardware requirement. This simplifies instruction selection (it avoids
the need to constant fold u2u16 in the backend).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21326>
Rather than the backend. This way we can handle non-constant offsets as well as
constants with a single code path (with the constant offset code subsumed as a
special case via NIR's constant folding). This nets us dynamic offset support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21264>
agx_preprocess_nir runs once per shader, whereas agx_optimize_nir runs once per
variant. That means we want to do as much work as possible in agx_preprocess_nir
to make shader variants as cheap as possible to compiler. So, move our standard
suite of lowering and optimizing to the preprocess loop, leaving just a single
(easy) trip through the optimizer for simple variant processing.
Plus, we can remove variables when preprocessing, since we no longer use
variables anywhere. We remove them to reduce the RAM and disk cache footprint of
shader variants.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21104>
We've been using the clip lowering, but it's been broken upstream because of
this artefact from the (non-lowered implementation) sneaking in from downstream.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21104>
ralloc is not thread-safe, so we can't use dev->memctx for allocating
context-specific things without locking. On top of that, we always
need to explicitly clean up pools anyway since we need to unref the BOs,
so there is no point to using a memctx.
And since pools need to be explicitly cleaned up, the meta cache code
needs explicit cleanup, so add that and drop memctx from there too.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21348>