Midgard has the ability to calculate addresses as part of the load/store
pipeline. We'd like to make use of this to avoid doing this work on the
ALU pipes. To do so, when emitting globals/SSBOs/shareds, we walk the
tree looking for address arithmetic to try to parse out something the
hardware can work with, letting the original instructions be DCE'd
ideally. This analysis is done at the NIR level to properly account for
some messy details of vectorization which we'd rather not poke at the
backend level. (Originally I wrote this as a MIR pass but I'm fairly
sure it was wrong.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
The swizzles are as-if they were 32-bit regardless of the bitness of the
operation, but the source sizes can and do change depending on the
flags. Account for this in the analysis.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
Many load/store ops (used for globals, SSBOs, shared memory, etc) have
the ability to compute addresses directly. Mark off which ones behave
like this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
When mixing 32/64-bit, we need to align the 32-bit registers to get the
required alignment. This isn't quite enough yet, though, since user
swizzles could bypass and will need to be lowered to 32-bit moves
(outstanding todo).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
It works just fine, we just forgot to expose the CAP.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
I'm not sure why I never saw smaller values, but here you go.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
I'm not sure what this is for, but the blob does it and I'd rather not
poke farther than needed into hardware-internal details.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
I'm not sure where I got the impression 1024 was the right number. From
kbase:
#define THREAD_MT_DEFAULT 256
(where MT = "max threads" and the threads to allocate for TLS is <= max
threads). Let's cut out memory footprint for spilling by 75% :)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
This field controls the size of per-thread temporaries (somehow this is
separate from the regular stack for register spilling..), though I'm not
certain on the details. Regardless this value of 0x1e despite being used
in places by the blob seems wrong and is interfering with correct sizing
of the stack.
We don't use non-spilling scratchpad yet, so this is just here to fix
some details of spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
All of this should apply equally with compute shaders, as far as I know.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
These two cases were flipped from the notes, leading to underestimates
of the padded vertex count, manifesting as visual corruption (random
geometry messed up). This issue was raised when noticing the corruption
went away when dramaticlaly oversizing max_index on an instanced indexed
draw, and then checking that padded_count >= vertex_count -- which
turned out *not* to be the case on certain inputs, a clear issue. Hence
looking into this routine...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
This will help us narrow the size required for thread local storage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
For index bufer resources (not user index buffers), we're able to cache
results. In practice, the cache works pretty dang well. It's still
important that the min/max computation is efficient (since when the
cache misses it'll run at draw-time and we don't want jank), but this
can eliminate a lot of computations entirely.
We use a custom data structure for caching. Search is O(N) to the size
but sizes are capped so it's effectively O(1). Insertion is O(1) with
automatic oldest eviction, on the assumption that the oldest results are
the least likely to still be useful. We might also experiment with other
heuristics based on actual usage later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3880>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3880>
These operations are intertwined since there are optimizations that will
want to "double dip". In particular for user index buffers we'd want to
upload simultaneous with index computation. For resources we'd like to
keep resource related code together.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3880>
gfx9.surf_pitch is supposed to be in blocks (or elements) but addrlib
returns a pitch in pixels.
This cause a mismatch between surface->bpe and surface.u.gfx9.surf_pitch.
For subsampled formats like uyvy (bpe is 2) this breaks in various places:
- sdma copy
- video rendering (see issue https://gitlab.freedesktop.org/mesa/mesa/issues/2363)
when the vl_compositor_gfx_render method is used
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3738>
chroma_format depends on buffer_format so use the format_to_chroma_format
helper instead of storing it next to buffer_format.
This avoids bugs where one value is changed without updating the other.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3738>
16-bit med3 is only supported on GFX9+.
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f16.*.
Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
Lower 64-bit fmed3 because LLVM doesn't expose an intrinsic.
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f64.*.
Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>