The big discovery is the "number of uniform registers" field. I learned
about this one accidentally when my preamble shaders weren't working
right, because we had inadvertently hardcoded "at most 32 registers" :-)
In the course of identifying that field, I found that the pipeline
address is used as a tagged pointer, with some unknown field in the
bottom bits and alignment demanded. The XML is updated to account for
this.
I later found that there's also a "number of general purpose registers
used by the preamble shader" field. I missed this one first, because the
encoding is slightly different from the usual "number of general purpose
registers in the main shader" field. The specification is slightly
coarser. I don't know why the hardware needs that
information anyway -- occupancy of the preamble shader should be
irrelevant -- but it's not a big deal.
Finally I found that the "more than 4 textures?" bit is... not that. I
do not yet know what it is, but it is... not that.
These all use the new groups() modifier for GenXML
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
The start field in the Uniform USC word is only 8-bits, whereas 9-bits
are required to address the entire uniform register file. This other
word gets used for the high half, with start indexed from u128l in
the natural way.
Apparently spending the evening stuffing too many uniforms into Metal is
paying off.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
Break up the monolithic SET_SHADER_EXTENDED packet into the separate
underlying commands (some only 2-byte sized and aligned), and add a
builder for USC control streams like we did for PPP updates to make that
change manageable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
For compute kernels, this encodes how much workgroup-local memory is
used ("shared memory" or "threadgroup memory" or "local memory"). This
memory is partitioned by the hardware.
For fragment shaders, this... encodes exactly the same thing. There is
no traditional tilebuffer in AGX, instead local memory is interpreted as
an imageblock, where each workgroup is a tile. This is a nifty design.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Histogram of sizes of the spill buffer, with logarithmic bucket sizes
(relative to the amount spilled from the perspective of a single thread).
Pretty funny.
Also mark a few unknowns that are nonzero when spilling is used.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
We need the header to be common between gfx and compute, but everything
else seems to be different. Shuffle so we can decode compute without any
terrible hacks.
I don't know the exact layout and don't care: the layout of the fields
here is all software defined in macOS, even though the *values* are
defined by hardware (or firmware in a few cases).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Same enum as PowerVR CDM, annoyingly different from the VDM block types.
Split out the stream link / terminate structs (both observed with Metal
for copious amounts of compute), in preparation for decoding "properly".
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Jumps in the command streams, allowing us to chain ("link") command
buffers. Naming is from PowerVR, which contains an identical command.
PowerVR's has conditional jumps and function call support, it's likely
that AGX inherited this too but I haven't tested that. (Those might be
useful for conditional rendering and secondary command buffers
respectively?)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Piles of unknown bits go away, as we find they're either "field present"
bits or block types. And yep, the block type enum lines up between AGX
and RGX.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Looking at PowerVR's PPP definitions in tree in Mesa
(src/imagination/csbgen/), we find that AGX's "tagged" data structures
are actually sequences of state items prefixed by a header specifying
which state follows. Rather than hardcoding the sequences in which Apple's
driver chooses to bundle state, we need the XML to be flexible enough to
encode or decode any valid combination of state. That means reworking
the XML. While doing so, we find a number of fields that are identical
between RGX and AGX, and fix the names while at it (for example, the W
Clamp floating point).
Names are from the PowerVR code in Mesa where sensible.
Once we've reworked the XML, we need to rework the decoder. Instead of
reading tags and printing the combined state packets, the decoder now
must unpack the header and print the individual state items specified by
the header, with slightly more complicated bounds checking.
Finally, state emission in the driver becomes much more flexible. To
prove the flexibility actually works, we now emit all PPP state (except for
viewport and scissor state) as a single PPP update. This works. After
this we can move onto more interesting arrangements of state for lower
driver overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
src/imagination/csbgen/rogue_ppp.xml STATE_ISPA bits 28. Looks like that
got split into two structs in AGX (with info duplicated?) but yeah I
have a lot to work with here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
There are a pair of flags controlling the stencil test. One enables
stencil testing in general, the other enables two-sided stencil. Compare
the identical "twosided" flag in src/imagination/csbgen/rogue_ppp.xml's
STATE_ISPCTL structure, at the samebit offset even. Evidently this word of
the "Rasterizer" is, in fact, a derivative of STATE_ISPCTL.
Fixes
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.*
dEQP-GLES2.functional.fragment_ops.random.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
There are a bunch of bits we need to set right to get depth/stencil
loads/stores working, including with independent settings for each. The
kernel "helps" us here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
This is still a guess, but a considerably firmer one as it now corrects
handles the clear pipelines emitted by Metal as well as the regular
vertex/fragment shader, and gets rid of the preshader special cases
seen there. Fixes decode of clear pipeline's preshaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
As we recently discovered, the layout of level L of a mipmapped 2D image
of size WxH is /not/ the same as the layout of a non-mipmapped 2D image
of size minify(W, L) x minify(H, L). The difference occurs due to
subtleties of the "power of two" miptrees which can force a level to use
a larger tile size than it would have required at root level. To handle
this quirk correctly, the driver must not implement texture views with
address arithmetic -- it must supply instead the base width/height of a
texture and use first/last level fields on the texture descriptor to map
it. Similar issues occur when writing a particular level of a mipmapped
texture, which was handled correctly in the colour case but not the Z/S
case.
Fixes
dEQP-GLES2.functional.texture.mipmap.cube.generate.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
It doesn't exist, but there's a count of mip levels for writeable image
descs. Setting that appropriately matters at high mip levels.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
For cube maps, depth=1 in the hardware (but 6 in Gallium). Likewise for
cube map arrays, depth=n in the hardware (but 6n in Gallium). We need to
divide to compensate. This will be relevant for cube map arrays in the
future -- add the dimension XML for cube map arrays too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
Add the missing stub in the decoder for it, so we can decode indexed
draws instead of reading back garbage, and fill in some known unknowns
in the structure.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
I'm not sure why we need to set this magic bit, but this fixes the
non-depth_component portion of
dEQP-GLES3.functional.texture.format.sized.*, e.g
dEQP-GLES3.functional.texture.format.sized.cube.srgb_rg8_pot
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
The ASTC enum only encodes the block width/height. By contrast the
LDR/HDR/sRGB distinction is encoded as UNORM/Float and via the sRGB bit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
We won't be implementing AGX compression for a while, but this gets some
unknowns out of the way when looking at dumps from Metal that use it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
Setting this bit (at the batch level, not the draw level!) switches to
[-1, 1] clipping instead of Metal's preferred [0, 1] clipping. Using
this bit allows us to drop the clip_halfz lowering we had before, saving
2 instructions in every vertex shader.
Fixes dEQP-GLES2.functional.depth_range.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17948>
Instead of using driver_location magic and hoping things work, make the
linkage between vertex and fragment shaders explicit. Thanks to the
coefficient register mechanism reverse-engineered and documented earlier
in this series, this does not require any shader keys to support
separable shaders. It just requires that we regenerate the coefficient
register binding tables at draw time, based on the varying layouts
decided by the compiler independently for the VS and FS. This is more
robust in the face of separate shaders.
This also gets us glProvokingVertex() support without shader keys.
After that, we don't need any of the remapping prepasses. For fragment
shaders, any old mapping will do, so we can assign coefficient registers
as we go (based on what the program actually uses, not nir_variable
information that might be stale by this point). We do want to cache
coefficient registers, particularly for fragcoord.w which is used for
perspective interpolation everywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Lots of changes from reverse-engineering harder the interactions with
fp16 and noperspective and such, and comparing against the PowerVR
driver code in Mesa that's been released since this XML was
originally written.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
A number of structures encode their size, but we were ignoring it just
for this fragment pipeline bind. Fix that.
This fix might also apply to bind vertex pipeline. Unsure.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Although these are similar data structures, they are not identical and
trying to cover both in the same struct is causing problems with
aliasing. Split them out to get a more accurate representation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
The counts for textures/samplers are specified in the bind
texture/sampler packets. What's in the bind pipeline appear to be...
hints? of some kind? It's a direct function of the numbers of textures
and samplers, but much more coarse. Unknown purpose.
This should be correct for up to 48 textures and at least 8 samplers.
For more than 48 textures, Metal switches to a "bindless" mode, where
the textures are instead bound with a bind uniform packet, ts* is no
longer read in the shader, and instead registers and immediates are used
to index the texture with a substantial preshader. Details TBD. We don't
need to worry about that for a long while, though.
Fixes a number of dEQPs.
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_vertex,Crash
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>