ralloc is not thread-safe, so we can't use dev->memctx for allocating
context-specific things without locking. On top of that, we always
need to explicitly clean up pools anyway since we need to unref the BOs,
so there is no point to using a memctx.
And since pools need to be explicitly cleaned up, the meta cache code
needs explicit cleanup, so add that and drop memctx from there too.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21348>
This splits up the CDM commands into their subparts, after which
indirect dispatch is straightforward.
Also fix the pipeline bits.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21272>
Implement custom border colours, as required by OpenGL's CLAMP_TO_BORDER and
Vulkan with customBorderColor. This uses an extended sampler descriptor, which
has space for the custom border values. The trouble is that the border must be
packed into an internal interchange format that depends on the original format
in a complex way. That said, we're not solving NP-complete problems here, and it
passes the tests (dEQP-GLES31.functional.texture.border_clamp.* and piglit
texwrap).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20570>
The hardware doesn't extend in this case, we need to extend for it. This
fixes 32-bit render target formats with lower_mediump_io.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
We only need 4 byte alignment, not 8 bytes. This isn't a big difference in
practice, but it probably reduces padding in some cases. More importantly, it
corrects our XML to match what the hardware actually does, which is great.
(There is exactly enough room for a 40-bit address with 4 byte alignment.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
Check the size explicitly, instead of just implicitly in the GenXML pack: it is
the responsibility of the caller to split up larger uploads. While this is
nominally more complicated, agx_usc_uniform is called in the draw hot path
whereas the actual splitting decision can usually be done at compile-time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
All the ifdef __APPLE__ is getting really silly. Let's split off the
macOS UAPI abstraction into its own file, so we can have parallel
implementations.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
In preparation for splitting off the macOS backend implementation into
its own file, pull out the shared BO code from agx_device.c into
agx_bo.c.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
So we're not tied to the macOS or Linux UAPIs and are not translating awkwardly
from one to the other when creating BOs. They're not quite equivalent -- macOS
doesn't include writeback information in this flag field, and Linux doesn't have
a executable flag. (Maybe we should add one, though? Then we can enforce W^X.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
Instead of smashing unconditionally to 1. Not sure if this fixes anything but it
gets rid of an unknown at least. Possibly slightly faster.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20561>
If a render target isn't written to, we don't use the sample mask. Avoid
generating the intermediate instructions, common with gl_FragColor. It will get
DCE'd, but this means less work for DCE, which should help for shader jank since
this pass gets called per-variant.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
See 0afd691f29 ("panfrost: clang-format the tree") for why I'm doing this.
Asahi already mostly follows Mesa style so this doesn't do much. But this means
we can all stop thinking about formatting and trust the robot poets to do that
for us.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>
Fixes
dEQP-GLES2.functional.texture.mipmap.cube.basic.linear_nearest
when run with a GLES2 version.
We wire up seamless cube maps for GLES3+ only, working around an obscure
mesa/st limitation. See 6148e3aae7 ("mesa: Fix
ctx->Texture.CubeMapSeamless") for the full context.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
The hash table needs a key pointer with at least the lifetime of the
hash entry, which the key pointer we get does not have (since it is
stack-allocated by agx_build_meta). Copy it into the shader struct
itself and use that for the hash table.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
It seems triangle merging is incompatible with calculating derivatives
along primitive edges correctly. Take the appropriate NIR shader info
flags in the compiler and pass them down as a flag to the driver, so it
can set the disable triangle merging flag (formerly called "lines or
points").
TODO: Is this what macOS does when you set a sample mask there (which
apparently fixes the same bug on the Darwinia Metal backend)? Do we
also need to set this when sample masks are used?
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes Darwinia and dEQP2 projected tests.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>
Now we support all the vertex formats! This means we don't hit u_vbuf for format
translation, which helps performance in lots of applications. By doing the
lowering in NIR, the vertex fetch code itself can be optimized by NIR (e.g.
nir_opt_algebraic) which can improve generated code quality.
In my first implementation of this, I had a big switch statement mapping format
enums to interchange formats and post-processing code. This ends up being really
unwieldly, the combinatorics of bit packing + conversion + swizzles is
enormous and for performance we want to support everything (no u_vbuf
fallbacks). To keep the combinatorics in check, we rely on parsing the
util_format_description to separate out the issues of bit packing, conversion,
and swizzling, allowing us to handle bizarro formats like B10G10R10A2_SNORM with
no special casing.
In an effort to support everything in one shot, this handles all the formats
needed for the extensions EXT_vertex_array_bgra, ARB_vertex_type_2_10_10_10_rev,
and ARB_vertex_type_10f_11f_11f_rev.
Passes dEQP-GLES3.functional.vertex_arrays.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19996>
Better occupancy, which is especially important when the background shader
does memory access (for reloads). On my 4K monitor, glmark2 -bdesktop fullscreen
from 95fps to 133fps.
At default settings, glmark2 -bterrain from 63fps to 71fps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19997>