Instead of smashing unconditionally to 1. Not sure if this fixes anything but it
gets rid of an unknown at least. Possibly slightly faster.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20561>
Fixes
dEQP-GLES2.functional.texture.mipmap.cube.basic.linear_nearest
when run with a GLES2 version.
We wire up seamless cube maps for GLES3+ only, working around an obscure
mesa/st limitation. See 6148e3aae7 ("mesa: Fix
ctx->Texture.CubeMapSeamless") for the full context.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
It seems triangle merging is incompatible with calculating derivatives
along primitive edges correctly. Take the appropriate NIR shader info
flags in the compiler and pass them down as a flag to the driver, so it
can set the disable triangle merging flag (formerly called "lines or
points").
TODO: Is this what macOS does when you set a sample mask there (which
apparently fixes the same bug on the Darwinia Metal backend)? Do we
also need to set this when sample masks are used?
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes Darwinia and dEQP2 projected tests.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
Obviously there can't *actually* be memoryless render targets, because
how would partial renders work? The control stream with memoryless looks
like everything would if it went to memory (e.g. full 2D MSAA
attachments for the partial loads/stores even if only a resolved
2D image for the final store). Except the memoryless attachments all
load from the same address 0xeeee0000. Clearly that's not actually what
happens, so what gives? Unclear... but I see the magic bits mentioned
here set, and I assume there are some firmware (or kernel) shenanigans
used to JIT allocate the backing storage for partial renders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19811>
The big discovery is the "number of uniform registers" field. I learned
about this one accidentally when my preamble shaders weren't working
right, because we had inadvertently hardcoded "at most 32 registers" :-)
In the course of identifying that field, I found that the pipeline
address is used as a tagged pointer, with some unknown field in the
bottom bits and alignment demanded. The XML is updated to account for
this.
I later found that there's also a "number of general purpose registers
used by the preamble shader" field. I missed this one first, because the
encoding is slightly different from the usual "number of general purpose
registers in the main shader" field. The specification is slightly
coarser. I don't know why the hardware needs that
information anyway -- occupancy of the preamble shader should be
irrelevant -- but it's not a big deal.
Finally I found that the "more than 4 textures?" bit is... not that. I
do not yet know what it is, but it is... not that.
These all use the new groups() modifier for GenXML
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
The start field in the Uniform USC word is only 8-bits, whereas 9-bits
are required to address the entire uniform register file. This other
word gets used for the high half, with start indexed from u128l in
the natural way.
Apparently spending the evening stuffing too many uniforms into Metal is
paying off.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
Break up the monolithic SET_SHADER_EXTENDED packet into the separate
underlying commands (some only 2-byte sized and aligned), and add a
builder for USC control streams like we did for PPP updates to make that
change manageable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
For compute kernels, this encodes how much workgroup-local memory is
used ("shared memory" or "threadgroup memory" or "local memory"). This
memory is partitioned by the hardware.
For fragment shaders, this... encodes exactly the same thing. There is
no traditional tilebuffer in AGX, instead local memory is interpreted as
an imageblock, where each workgroup is a tile. This is a nifty design.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Histogram of sizes of the spill buffer, with logarithmic bucket sizes
(relative to the amount spilled from the perspective of a single thread).
Pretty funny.
Also mark a few unknowns that are nonzero when spilling is used.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
We need the header to be common between gfx and compute, but everything
else seems to be different. Shuffle so we can decode compute without any
terrible hacks.
I don't know the exact layout and don't care: the layout of the fields
here is all software defined in macOS, even though the *values* are
defined by hardware (or firmware in a few cases).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Same enum as PowerVR CDM, annoyingly different from the VDM block types.
Split out the stream link / terminate structs (both observed with Metal
for copious amounts of compute), in preparation for decoding "properly".
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Jumps in the command streams, allowing us to chain ("link") command
buffers. Naming is from PowerVR, which contains an identical command.
PowerVR's has conditional jumps and function call support, it's likely
that AGX inherited this too but I haven't tested that. (Those might be
useful for conditional rendering and secondary command buffers
respectively?)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Piles of unknown bits go away, as we find they're either "field present"
bits or block types. And yep, the block type enum lines up between AGX
and RGX.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Looking at PowerVR's PPP definitions in tree in Mesa
(src/imagination/csbgen/), we find that AGX's "tagged" data structures
are actually sequences of state items prefixed by a header specifying
which state follows. Rather than hardcoding the sequences in which Apple's
driver chooses to bundle state, we need the XML to be flexible enough to
encode or decode any valid combination of state. That means reworking
the XML. While doing so, we find a number of fields that are identical
between RGX and AGX, and fix the names while at it (for example, the W
Clamp floating point).
Names are from the PowerVR code in Mesa where sensible.
Once we've reworked the XML, we need to rework the decoder. Instead of
reading tags and printing the combined state packets, the decoder now
must unpack the header and print the individual state items specified by
the header, with slightly more complicated bounds checking.
Finally, state emission in the driver becomes much more flexible. To
prove the flexibility actually works, we now emit all PPP state (except for
viewport and scissor state) as a single PPP update. This works. After
this we can move onto more interesting arrangements of state for lower
driver overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
src/imagination/csbgen/rogue_ppp.xml STATE_ISPA bits 28. Looks like that
got split into two structs in AGX (with info duplicated?) but yeah I
have a lot to work with here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
There are a pair of flags controlling the stencil test. One enables
stencil testing in general, the other enables two-sided stencil. Compare
the identical "twosided" flag in src/imagination/csbgen/rogue_ppp.xml's
STATE_ISPCTL structure, at the samebit offset even. Evidently this word of
the "Rasterizer" is, in fact, a derivative of STATE_ISPCTL.
Fixes
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.*
dEQP-GLES2.functional.fragment_ops.random.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>