Expecting it to keep around unused definitions around is wishful. Add an
"anchoring" unit_test instruction to consume the results so they don't
have to be precoloured registers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18687>
This is slightly more accurate in the IR, and means we instruction
select the current 16-bit size floating point instructions when all
non-immediate operands are 16-bit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18687>
A bunch of the emitted combines were unnecessary, or unnecessarily
large. Fix the accounting now that combines are variable size.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18687>
..Rather than at backend IR translation time. This is considerably
simpler because we can use the txs lowering instead of special casing
array sizes. Unfortunately it generates worse code, but that gap should
close once nir_opt_preamble is wired in.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18652>
Break up the monolithic SET_SHADER_EXTENDED packet into the separate
underlying commands (some only 2-byte sized and aligned), and add a
builder for USC control streams like we did for PPP updates to make that
change manageable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
For compute kernels, this encodes how much workgroup-local memory is
used ("shared memory" or "threadgroup memory" or "local memory"). This
memory is partitioned by the hardware.
For fragment shaders, this... encodes exactly the same thing. There is
no traditional tilebuffer in AGX, instead local memory is interpreted as
an imageblock, where each workgroup is a tile. This is a nifty design.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Histogram of sizes of the spill buffer, with logarithmic bucket sizes
(relative to the amount spilled from the perspective of a single thread).
Pretty funny.
Also mark a few unknowns that are nonzero when spilling is used.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
We need the header to be common between gfx and compute, but everything
else seems to be different. Shuffle so we can decode compute without any
terrible hacks.
I don't know the exact layout and don't care: the layout of the fields
here is all software defined in macOS, even though the *values* are
defined by hardware (or firmware in a few cases).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Same enum as PowerVR CDM, annoyingly different from the VDM block types.
Split out the stream link / terminate structs (both observed with Metal
for copious amounts of compute), in preparation for decoding "properly".
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
There's no native txs instruction... but we can emulate one :-) This is
heavy on shader ALU, but in the production driver, it'll all be hoisted
up to the preamble shader and so it shouldn't matter much. This
keeps the driver itself simple and low overhead, with a completely
obvious generalization to bindless.
Passes dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18525>
Texture offsets and shadow comparison values get grouped into a vector
passed by register. Comparison values are provided as-is (fp32). Texture
offsets are packed into nibbles, but we can do this on the CPU, as
nonconstant offsets are forbidden in GLSL at least. They're also
forbidden in Vulkan/SPIR-V without ImageGatherExtended/
shaderImageGatherExtended. I'm happy kicking the NIR lowering can down
the line, this commit is complicated enough already.
Passes dEQP-GLES3.functional.shaders.texture_functions.texture.* and
dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18525>
Jumps in the command streams, allowing us to chain ("link") command
buffers. Naming is from PowerVR, which contains an identical command.
PowerVR's has conditional jumps and function call support, it's likely
that AGX inherited this too but I haven't tested that. (Those might be
useful for conditional rendering and secondary command buffers
respectively?)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Piles of unknown bits go away, as we find they're either "field present"
bits or block types. And yep, the block type enum lines up between AGX
and RGX.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Now that we have fine grained state emit code, let's use it to reduce
driver overhead. Dirty tracking is delicate: while this seems to work,
I've also added an ASAHI_MESA_DEBUG=dirty option in debug builds
to disable the optimizations here for future debug.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Looking at PowerVR's PPP definitions in tree in Mesa
(src/imagination/csbgen/), we find that AGX's "tagged" data structures
are actually sequences of state items prefixed by a header specifying
which state follows. Rather than hardcoding the sequences in which Apple's
driver chooses to bundle state, we need the XML to be flexible enough to
encode or decode any valid combination of state. That means reworking
the XML. While doing so, we find a number of fields that are identical
between RGX and AGX, and fix the names while at it (for example, the W
Clamp floating point).
Names are from the PowerVR code in Mesa where sensible.
Once we've reworked the XML, we need to rework the decoder. Instead of
reading tags and printing the combined state packets, the decoder now
must unpack the header and print the individual state items specified by
the header, with slightly more complicated bounds checking.
Finally, state emission in the driver becomes much more flexible. To
prove the flexibility actually works, we now emit all PPP state (except for
viewport and scissor state) as a single PPP update. This works. After
this we can move onto more interesting arrangements of state for lower
driver overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Instead we can flip point coords with the object type. That means fewer
instructions without shader variants. Thanks, PowerVR ^_^
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
src/imagination/csbgen/rogue_ppp.xml STATE_ISPA bits 28. Looks like that
got split into two structs in AGX (with info duplicated?) but yeah I
have a lot to work with here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
There are a pair of flags controlling the stencil test. One enables
stencil testing in general, the other enables two-sided stencil. Compare
the identical "twosided" flag in src/imagination/csbgen/rogue_ppp.xml's
STATE_ISPCTL structure, at the samebit offset even. Evidently this word of
the "Rasterizer" is, in fact, a derivative of STATE_ISPCTL.
Fixes
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.*
dEQP-GLES2.functional.fragment_ops.random.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
There are a bunch of bits we need to set right to get depth/stencil
loads/stores working, including with independent settings for each. The
kernel "helps" us here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
This is still a guess, but a considerably firmer one as it now corrects
handles the clear pipelines emitted by Metal as well as the regular
vertex/fragment shader, and gets rid of the preshader special cases
seen there. Fixes decode of clear pipeline's preshaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
As we recently discovered, the layout of level L of a mipmapped 2D image
of size WxH is /not/ the same as the layout of a non-mipmapped 2D image
of size minify(W, L) x minify(H, L). The difference occurs due to
subtleties of the "power of two" miptrees which can force a level to use
a larger tile size than it would have required at root level. To handle
this quirk correctly, the driver must not implement texture views with
address arithmetic -- it must supply instead the base width/height of a
texture and use first/last level fields on the texture descriptor to map
it. Similar issues occur when writing a particular level of a mipmapped
texture, which was handled correctly in the colour case but not the Z/S
case.
Fixes
dEQP-GLES2.functional.texture.mipmap.cube.generate.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
It doesn't exist, but there's a count of mip levels for writeable image
descs. Setting that appropriately matters at high mip levels.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
pot_level can be greater than the number of levels actually included --
don't overallocate. Fix the issue and add a representative unit test.
Fixes:
dEQP-GLES2.functional.texture.size.cube.512x512_rgb888
Fixes: 6ff75da8aa ("ail: Introduce image layout module")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
For cube maps, depth=1 in the hardware (but 6 in Gallium). Likewise for
cube map arrays, depth=n in the hardware (but 6n in Gallium). We need to
divide to compensate. This will be relevant for cube map arrays in the
future -- add the dimension XML for cube map arrays too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>