ralloc is not thread-safe, so we can't use dev->memctx for allocating
context-specific things without locking. On top of that, we always
need to explicitly clean up pools anyway since we need to unref the BOs,
so there is no point to using a memctx.
And since pools need to be explicitly cleaned up, the meta cache code
needs explicit cleanup, so add that and drop memctx from there too.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21348>
This splits up the CDM commands into their subparts, after which
indirect dispatch is straightforward.
Also fix the pipeline bits.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21272>
Needed for discard to work properly, which has visible side effects with
occlusion queries. Fixes no_attachment framebuffers together with the next
commit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21267>
Scalarizing preambles in NIR isn't really necessary, we can do it more
efficiently in the backend. This makes the final NIR a lot less annoying to
read; the backend IR was already nice to read thanks to all the scalarized moves
being copypropped. Plus, this is a lot simpler.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>
Move the decision of "can I copyprop this uniform?" from copyprop to a
standalone lowering pass. This is more straightforward and will enable the next
patch. This has the side effect of sinking load_preamble instructions, for a
nice reduction in register pressure. Instruction count increase is from
rematerializing some moves, which should be more than balanced out by the
reduced register pressure.
total instructions in shared programs: 1523285 -> 1523317 (<.01%)
instructions in affected programs: 1148 -> 1180 (2.79%)
helped: 0
HURT: 13
HURT stats (abs) min: 1.0 max: 4.0 x̄: 2.46 x̃: 2
HURT stats (rel) min: 0.69% max: 7.69% x̄: 3.65% x̃: 2.61%
95% mean confidence interval for instructions value: 1.78 3.14
95% mean confidence interval for instructions %-change: 2.16% 5.15%
Instructions are HURT.
total bytes in shared programs: 10444532 -> 10444724 (<.01%)
bytes in affected programs: 7386 -> 7578 (2.60%)
helped: 0
HURT: 13
HURT stats (abs) min: 6.0 max: 24.0 x̄: 14.77 x̃: 12
HURT stats (rel) min: 0.63% max: 7.14% x̄: 3.40% x̃: 2.48%
95% mean confidence interval for bytes value: 10.68 18.85
95% mean confidence interval for bytes %-change: 2.02% 4.78%
Bytes are HURT.
total halfregs in shared programs: 419444 -> 416434 (-0.72%)
halfregs in affected programs: 27080 -> 24070 (-11.12%)
helped: 634
HURT: 0
helped stats (abs) min: 1.0 max: 30.0 x̄: 4.75 x̃: 2
helped stats (rel) min: 2.90% max: 54.55% x̄: 13.13% x̃: 8.51%
95% mean confidence interval for halfregs value: -5.08 -4.41
95% mean confidence interval for halfregs %-change: -14.03% -12.23%
Halfregs are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>
We can avoid reading both width and height when the texture is a cube map, and
we do so more simply by relying on CSE+DCE (Alyssa).
Closes: #7541
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20628>
Implement custom border colours, as required by OpenGL's CLAMP_TO_BORDER and
Vulkan with customBorderColor. This uses an extended sampler descriptor, which
has space for the custom border values. The trouble is that the border must be
packed into an internal interchange format that depends on the original format
in a complex way. That said, we're not solving NP-complete problems here, and it
passes the tests (dEQP-GLES31.functional.texture.border_clamp.* and piglit
texwrap).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20570>
Match iadd(x, #y). The format shift will get constant-folded away and, if y
is sufficiently small, the constant will be inlined by the AGX backend
optimizer. This gets rid of piles of 64-bit arithmetic from lowering UBOs. It
probably doesn't matter for perf since that's happening in preamble shaders but
it *is* noisy.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21108>
The offset is in vec4s, not words (unlike the component). This doesn't matter
right now since we get everything lowered (offset -> 0) but it will come up if
we implement clip distances natively (instead of lowering in FS).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21097>
The hardware doesn't extend in this case, we need to extend for it. This
fixes 32-bit render target formats with lower_mediump_io.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
This works around bugs in a LOT of applications, since fp16 texture coordinates
are almost never appropriate even though it's a valid implementation of the GLES
spec. It also doesn't seem to matter for perf.
Code from the Bifrost compiler which implements the same workaround for slightly
different reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
We only need 4 byte alignment, not 8 bytes. This isn't a big difference in
practice, but it probably reduces padding in some cases. More importantly, it
corrects our XML to match what the hardware actually does, which is great.
(There is exactly enough room for a 40-bit address with 4 byte alignment.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
Check the size explicitly, instead of just implicitly in the GenXML pack: it is
the responsibility of the caller to split up larger uploads. While this is
nominally more complicated, agx_usc_uniform is called in the draw hot path
whereas the actual splitting decision can usually be done at compile-time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21118>
This has a subtle interaction with page-aligned layers. Written while debugging
dEQP-GLES3.functional.texture.filtering.cube.combinations.nearest_nearest_repeat_clamp
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
The mipmapped_z = true case is checked against Metal, the false case is smoke
testing the old behaviour (which is still used for 2D arrays).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
For 3D images, the full miptree depends on the depth of the image, in contrast
to 2D arrays. We need to account for this to calculate the correct layer
strides.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21114>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
This pass needs to run early (because it depends on early I/O), but it doesn't
actually need the shader key. Why not? If we overestimate the number of render
targets, extra store_output intrinsics will be generated, but they will be
deleted by AGX tilebuffer lowering later.
Note we'll probably want something smarter than this for fragment epilogues in
the future to avoid piles of unnecessary moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21065>
All the ifdef __APPLE__ is getting really silly. Let's split off the
macOS UAPI abstraction into its own file, so we can have parallel
implementations.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
In preparation for splitting off the macOS backend implementation into
its own file, pull out the shared BO code from agx_device.c into
agx_bo.c.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
So we're not tied to the macOS or Linux UAPIs and are not translating awkwardly
from one to the other when creating BOs. They're not quite equivalent -- macOS
doesn't include writeback information in this flag field, and Linux doesn't have
a executable flag. (Maybe we should add one, though? Then we can enforce W^X.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
Lowering buffer textures will interact with multiple of our existing lowerings,
and it's convenient to have it all in one place. This also keeps the pass
ordering dependencies centralized.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21060>
nir_opt_preamble is now aware of the internal uniforms we insert, so it can use
the whole uniform file available to it. This lets us push more (all?) uniform
loads in Dolphin ubershaders to the preamble.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
To comply with The Ekstrand Rule.
AGX has a large number of "uniform registers" available. These may be loaded
with arbitrary ranges of GPU memory by the driver, or they can be written by the
preamble shader. Currently, the compiler runs nir_opt_preamble on the first half
of the uniform file, and then translates NIR sysvals to moves from the second
half of the uniform file, passing back a uniform->sysval map for the GL driver
to respect. This has (at least) two issues:
* Since nir_opt_preamble runs before gathering sysvals, it has to assume the
maximum number of sysvals are pushed, which can prevent it from moving some
computation to the preamble due to running out of partitioned uniform registers.
This is a problem for Dolphin's ubershaders, though it's unclear how much it
matters for Dolphin perf.
* This violates The Ekstrand Rule and apparently will be a problem for our
Vulkan driver. I'm just a compiler+GL girl, so I wouldn't know.
To fix this, we invert the order of operations. At the end of this series, we
instead lower NIR system values to NIR load_preamble instructions in the GL
driver. The compiler just translates directly to uniform registers reads. The
Vulkan driver will need its own version of this code, but maybe it can do
something clever and descriptor set aware.
This means that there will already be some load_preamble instructions when
nir_opt_preamble runs, so I've made minor changes to nir_opt_preamble to handle
that gracefully. This is a bit lazy... The alternative is to introduce a
`load_uniform_agx` intrinsic which `load_preamble` gets lowered to trivially.
But that's another pass over the IR (and due to AGX's shader variant hell I'm
sensitive to backend compile time) and it would be more complicated than what's
implemented here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Ella Stanforth <ella@iglunix.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
Instead of smashing unconditionally to 1. Not sure if this fixes anything but it
gets rid of an unknown at least. Possibly slightly faster.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20561>