The equations for calculating miptree offsets are complicated,
nonobvious, and full of subtle footguns. Worse, the driver doesn't
control the offsets -- it must simply agree with the offsets
implicitly calculated in the hardware. The CTS doesn't adequately
exercise all the corner cases. Make sure we have unit tests that do.
The tests themselves are generated by instrumenting agxdecode to scan
GPU memory after uploading test patterns in a variety of layout with a
Metal application.
Thank you to Asahi Lina and Dougall Johnson for the reverse-engineering
that led to this. The tests selected here are a subset of those used for
the reverse-engineering. The full set may be found in Lina's tilecalc
repo:
https://github.com/asahilina/tilecalc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
Now that we have layout and tiling code that can handle block-compressed
formats, including the non-square blocks found with some ASTC formats,
we can advertise ASTC formats. Passes dEQP-GLES3.*astc* which
exercises everything here. (These tests passed before by decompressing
the textures to RGBA8 UNORM in the frontend, but it's much more
efficient to use real ASTC textures as done here.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
I'm not sure why we need to set this magic bit, but this fixes the
non-depth_component portion of
dEQP-GLES3.functional.texture.format.sized.*, e.g
dEQP-GLES3.functional.texture.format.sized.cube.srgb_rg8_pot
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
Move tiling.c into ail, using ail data structures and helpers to manage
the tiling. This fixes a staggering number of issues with the tiling
routines:
* NPOT block sizes defeatured. The hardware only supports POT block
sizes. There's no need to handle anything else.
* Use ail to determine tile sizes, instead of the broken
agx_select_tile_shift routine that didn't work for non-square tile
sizes (for instance).
* Handle up to 128x128 tiles, as required by 8bpp textures.
* Handle non-square tiles. If the block size is not a multiple of 4, the
tile size will be of the form 2n x n. This is easy with the ail_tile
data structure, but not possible architecturally with
agx_select_tile_shift. This is required for 16bpp and 64bpp textures.
* Express in terms of elements instead of pixels, using unit
suffixes to make the dimensional analysis obvious. In particular this
handles tiling of block-compressed textures by tiling the blocks
themselves. This is required for block-compressed textures (internally handled
like smaller 64bpp textures).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
Introduce ail, a small library for working with the image (and buffer)
layouts encountered with AGX hardware. Its design is inspired by isl. In
particular, ail strives to use isl unit suffixes and to represent
quantities in a canonical, API-agnostic fashion [1].
ail replaces the old miptree code (based on some ad hoc heuristics that
passed a few dEQP tests). It is based on a thorough reverse-engineering
of AGX's twiddled format, courtesy of Asahi Lina, Dougall Johnson, and
me. This corrects our handling of many common cases that were totally
wrong in the old code, leading to GPU faults.
Unlike the code, ail differentiates between pixels and elements
consistently, allowing block-compressed formats like ETC2 to be
supported correctly. These formats will be enabled later in the series.
This commit fixes Inochi2D, glmark2 -brefract and -bterrain, and who
knows what else.
ail stands for { Asahi, AGX } Image { Layout, Library } at your
convenience. ail is best served warm.
Liberal use of ail is recommended. Yum!
[1] https://docs.mesa3d.org/isl/units.html
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
The ASTC enum only encodes the block width/height. By contrast the
LDR/HDR/sRGB distinction is encoded as UNORM/Float and via the sRGB bit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
We won't be implementing AGX compression for a while, but this gets some
unknowns out of the way when looking at dumps from Metal that use it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
AGX and zink both want all of these lowered, but nir_to_tgsi will want
only demote (and terminate if it was possible from GLSL but it's not)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15932>
Unnecessary rename that breaks forward compatibility... but Apple says
this is just NULL. Do the simpler thing. Note that the argument is a
mach_port_t, which is a natural_t == uint32_t in userspace... even
though it's a pointer in the kernel. Although Apple's docs claim that
kIOMasterPortDefault is NULL, it's really just 0.
../src/asahi/lib/agx_device.c:290:35: warning: 'kIOMasterPortDefault' is deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations]
IOServiceGetMatchingService(kIOMasterPortDefault, matching);
^~~~~~~~~~~~~~~~~~~~
kIOMainPortDefault
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18121>
Setting this bit (at the batch level, not the draw level!) switches to
[-1, 1] clipping instead of Metal's preferred [0, 1] clipping. Using
this bit allows us to drop the clip_halfz lowering we had before, saving
2 instructions in every vertex shader.
Fixes dEQP-GLES2.functional.depth_range.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17948>
In case a shader only use gl_FragCoord.xy, this avoids wasting
coefficient registers for gl_FragCoord.zw which should be a small
optimization. It's also less work for DCE but I'm less worried about
that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
If we want to break down a 64-bit value into its 32-bit halves, we want
to be able to use a split for this:
lo, hi = split long
Extend the RA to handle this case.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
It is, as the name suggests, broken. Instruction count goes from 50->53
on the shader in
dEQP-GLES2.functional.shaders.operator.binary_operator.div.highp_int_fragment.
I'm happy to eat that cost in exchange for correct results!
There are lots more low-hanging opportunities for optimizations to that
shader:
- fuse double icmpsel for the b2i32(cmp) sequences
- promoting big immediates to uniforms
- fusing integer multiply+add
But for now this is acceptable and anyway I'm doing this on "fix broken
NIR lowering" time and not Asahi time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
We can implement umul_high (for both 16-bit and 32-bit types)
efficiently by multiplying in the next larger type size and extracting
the upper word. We already have such an implementation (for instancing).
Extract it so we can use it for emit_alu too.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
This seems to be an architectural constraint. Ensure that RA satisfies
it, because otherwise we're left with mysterious fails.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
We need to get a matching coefficient register and change the encoding
of the iter instruction slightly, but otherwise this is normal.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Unlike Mali (where I borrowed the old names from), these are not loads
in the memory sense. They are simply register loads and arithmetic.
Rename accordingly, using PowerVR names and public Apple names as a
guide.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
For perspective-correct interpolation, the W coefficient register is
needed. Instead of hardcoding this to cf0 and special casing, model this
in the IR and let the general handling kick in.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Instead of using driver_location magic and hoping things work, make the
linkage between vertex and fragment shaders explicit. Thanks to the
coefficient register mechanism reverse-engineered and documented earlier
in this series, this does not require any shader keys to support
separable shaders. It just requires that we regenerate the coefficient
register binding tables at draw time, based on the varying layouts
decided by the compiler independently for the VS and FS. This is more
robust in the face of separate shaders.
This also gets us glProvokingVertex() support without shader keys.
After that, we don't need any of the remapping prepasses. For fragment
shaders, any old mapping will do, so we can assign coefficient registers
as we go (based on what the program actually uses, not nir_variable
information that might be stale by this point). We do want to cache
coefficient registers, particularly for fragcoord.w which is used for
perspective interpolation everywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Lots of changes from reverse-engineering harder the interactions with
fp16 and noperspective and such, and comparing against the PowerVR
driver code in Mesa that's been released since this XML was
originally written.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
A number of structures encode their size, but we were ignoring it just
for this fragment pipeline bind. Fix that.
This fix might also apply to bind vertex pipeline. Unsure.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Although these are similar data structures, they are not identical and
trying to cover both in the same struct is causing problems with
aliasing. Split them out to get a more accurate representation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
The counts for textures/samplers are specified in the bind
texture/sampler packets. What's in the bind pipeline appear to be...
hints? of some kind? It's a direct function of the numbers of textures
and samplers, but much more coarse. Unknown purpose.
This should be correct for up to 48 textures and at least 8 samplers.
For more than 48 textures, Metal switches to a "bindless" mode, where
the textures are instead bound with a bind uniform packet, ts* is no
longer read in the shader, and instead registers and immediates are used
to index the texture with a substantial preshader. Details TBD. We don't
need to worry about that for a long while, though.
Fixes a number of dEQPs.
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_vertex,Crash
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
This confirms the actual size of the texture descriptor -- 24 bytes.
The last 8 bytes have so far only been zeroed. It also confirms we got
the sampler descriptor size right.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Typo in the handwritten packing code, oof!
Fixes incorrectly repeated shadows in Neverball (among many other bugs,
I assume). Huge thanks to Lina for the idea that this was the
bug -- fixing it was a breeze from there :-)
Fixes: 9f55538834 ("agx: Pack texture ops")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
These tests predate using GTest in the compiler. Now that we do, we'd like to
have the tests together so they run regularly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17824>
We need to push loop nesting to handle this correctly -- at the end of
the innermost loop, the correct nesting is 1 (from the if), not 0.
Fixes assertion failure in
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17128>
Tell the state tracker our point coordinates have a lower left origin
instead of an upper left origin, and remove our point coordinate
flipping code. Saves an instruction in any shader that reads
gl_PointCoord.y
Note: the OpenGL blob also emits an "fadd $y', ^y.neg, 1.0" to flip
point coordinates, so this isn't just a Metal weirdness.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16829>
Add a library that wraps the key IOKit entrypoints used in the macOS
UABI for AGX. Our wrapped routines print information about the kernel
calls made and dump work submitted to the GPU using agxdecode. This code
has two major use cases:
1. Debugging Mesa, particularly around the undocumented macOS
user-kernel interface. Logs from Mesa may compared to Metal to check
that the UABI is being used correcrly.
2. Reverse-engineering the hardware, using this as glue to get at the
"interesting" GPU memory.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>