The main buffer is twiddled as before, but there's now also an auxiliary
compression buffer that we need to reserve space for.
With compression, the main buffer is aligned less. The macOS logic seems to be
to align to the page size only if the texture is both 3D and mipmapped, *and*
the layer stride is greater than the page size.
That's gated on compression being enabled. Page alignment seems to be needed for
uncompressed twiddled cube maps.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19999>
Better occupancy, which is especially important when the background shader
does memory access (for reloads). On my 4K monitor, glmark2 -bdesktop fullscreen
from 95fps to 133fps.
At default settings, glmark2 -bterrain from 63fps to 71fps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19997>
UBSan complains otherwise:
../src/asahi/compiler/agx_pack.c:701:21: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
../src/asahi/compiler/agx_pack.c:534:18: runtime error: left shift of 8 by 28 places cannot be represented in type 'int'
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
r5 and r6 are always getting lowered. Will prevent a regression with VBO
lowering on a shader which has stride=0 and hence gets the vertex ID read
optimized out with NIR:
dEQP-GLES2.functional.draw.random.50
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
Massive performance gains, some fps before/after numbers from glmark2:
[shading] 1486 -> 2391
[refract] 87 -> 127
[terrain] 32 -> 56
...and it's basically for free with enough copy/paste, so thank you to Boris
Brezillon for an excellent Asahi patch, the LRU cache seems to work great on M1
:-p
There are a few minor changes I made from panfrost, notably adjusting the
constants to account for 16KiB pages and switching from pthread_mutex to
simple_mtx to be less weird in Mesa.
For context on the design, the following commits evolved it in Panfrost and
their commit messages may be useful... The logic in this module is the product
of years of mistakes and correcting course :-)
f06809cdca ("panfrost: Evict the BO cache when allocation fails")
77d0498913 ("panfrost: Fix major flaw in BO cache")
ee82f9f07e ("panfrost: Try to evict unused BOs from the cache")
2225383af8 ("panfrost: Make sure the BO is 'ready' when picked from the cache")
9af4aeaaf7 ("panfrost: Don't return imported/exported BOs to the cache")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
This defeats the point of specifying alignments and of packing allocations
together with the BO cache. We're a real driver now, let's allocate memory like
one.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19971>
We have these native. Passes the relevant piglits. Large reduction in memory
usage on Xonotic on higher settings (8x less memory per texture), which allows
Xonotic to run at high settings without OOMing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19903>
Flag day change to replace the previous hardcoded background/end-of-tile shaders
and the API-style load/store_output in fragment shaders with the generated
shaders and lowered *_agx intrinsics. This gets us working non-UNORM8 render
targets and working MRT. It's also a step in the direction of working MSAA but
that needs a lot more work, since the multisampling programming model on AGX is
quite different from any of the APIs (including Metal).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
With multiple render targets, it's not practical to generate all
variants of the background and end-of-tile programs at start up. Rather
than trying, add a hash table of meta program keys to background
programs, and compile variants as they're needed.
With the new infrastructure, it's sensible to handle clears with the
same code path as reloads. In addition to getting us closer to multiple
render target support, this gets us support for non-RGBA8 render
targets, as the u8norm tilebuffer format was baked into the hardcoded
clear shader and store shaders used before.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
The compiler can't handle load/store_output directly for nontrivial tilebuffer
layouts. Add a NIR pass to lower these intrinsics, applying a given layout.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
This hw instruction writes out an entire block from the tilebuffer to an
attached render target (PBE descriptor). It is used (only?) in end-of-tile
shaders to implement write out. We need to handle it in the compiler as a
prerequisite to compiling end-of-tile shaders ourselves, instead of hardcoding.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
Mutlisampled texture fetch (txf_ms) is encoded like regular txf. However, we now
need to pack the multisample index in the right place, which we do by extending
our existing NIR texture source lowering pass. 2D MS arrays use a new value of
dim which requires tweaking the encoding slightly. Otherwise, everything is
bog standard.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
It appears that multisampled textures on AGX have all samples of the same pixel
contiguous in memory, effectively using the layout of a single-sampled texture
with a larger block size. Handle in ail.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
The track_alloc and track_free symbols are used, we need to link them in.
Depending on build flags / environment / etc, fixes the potential build error
hit by a CI job:
mold: error: undefined symbol: agxdecode_track_alloc
>>> referenced by agx_device.c
>>> src/asahi/lib/libasahi_lib.a(src/asahi/lib/libasahi_lib.a.p/agx_device.c.o):(agx_shmem_alloc)>>> referenced by agx_device.c
>>> src/asahi/lib/libasahi_lib.a(src/asahi/lib/libasahi_lib.a.p/agx_device.c.o):(agx_bo_create)
mold: error: undefined symbol: agxdecode_track_free
>>> referenced by agx_device.c
>>> src/asahi/lib/libasahi_lib.a(src/asahi/lib/libasahi_lib.a.p/agx_device.c.o):(agx_bo_unreference)
...when trying to link with libasahi_lib without libasahi_decode for unit tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
This avoids exposing the ISA-internal agx_format to the driver, instead hiding
it behind a real PIPE_FORMAT. This lets us use real pipe formats in formatted
intrinsics in NIR, which is convenient; it will allow us to simplify the
compiler/driver ABI; and it lets us use common format helpers (e.g.
util_format_get_blocksize) for the internal formats in driver lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
Obviously there can't *actually* be memoryless render targets, because
how would partial renders work? The control stream with memoryless looks
like everything would if it went to memory (e.g. full 2D MSAA
attachments for the partial loads/stores even if only a resolved
2D image for the final store). Except the memoryless attachments all
load from the same address 0xeeee0000. Clearly that's not actually what
happens, so what gives? Unclear... but I see the magic bits mentioned
here set, and I assume there are some firmware (or kernel) shenanigans
used to JIT allocate the backing storage for partial renders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19811>
Stencil texturing is easy: S8_UINT is textured like R8_UINT (with a
little swizzle fixup), and stencil is always S8_UINT thanks to
u_transfer_helper. So we just need to do some fixups to make
u_transfer_helper's seperate_stencil work and everything will work out.
Passes dEQP-GLES31.functional.stencil_texturing.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19811>