Both textures and images share a unified indexing scheme in AGX. When binding
tables are used, they can be mapped to texture state registers. Otherwise, there
is bindless access available.
It would be nice to map OpenGL's binding table based textures and images to AGX
texture state registers 1:1. The problem is that OpenGL allows more combined
textures and images than we necessarily have texture state registers. So, we use
as many texture state registers as we can, and then we fallback on an internal
bindless scheme mapping an extended binding table.
Add and use a lowering pass to map all of the API-level texture/image indices to
either texture state registers or bindless handles as required.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
We need to set a particular bit on atomics for them to be coherent across
clusters. Fixes atomics on G13X.
Setting this bit on the single-cluster G13G, on the other hand, wedges the GPU.
So best be careful ;-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Fixes KHR-GLES31.core.draw_indirect.advanced-twoPass-transformFeedback-arrays
and KHR-GLES31.core.draw_indirect.advanced-twoPass-transformFeedback-elements
on M1 Ultra (G13D). Let's assume that same bits are required on M1 Pro
and Max.
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Deserializing isn't expected to be much more expensive than cloning, and the
serialized NIR is *significantly* smaller. So store the serialized instead of
the deserialized, and deserialize on the fly.
This reduces a lot of noise in valgrind due to random crap alloc'd against the
NIR shader by lowering passes that now get properly freed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
This lets us force small tiles when they otherwise would not be
necessary, which is useful for decoupling tile size and the logic that
depends on it from things like MSAA and MRT which can trigger small
tiles.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
This requests synchronous TVB growth (instead of split renders). Mostly
for testing at this point.
Only works with newer kernels and the kernel will complain on dmesg for
now.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
In general, PBE descriptors map pipe_image_views for the hardware. That we use a
writeable shader image internally for render targets is an implementation-detail
of the end-of-tile program. So, refactor the PBE upload routine to take a
pipe_image_view (not a pipe_surface), and translate the pipe_surface into an
internal pipe_image_view for end-of-tile programs.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
Save lowest dword of shader source sha1 in pipeline object for use
later as hash for uniquely identifying shader in debug outputs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
We skip ntt regalloc for vertex shaders and we have 1024 instruction
limit for R500 vs, so in theory we could run some shaders with more that
1024 ssa registers (if we can optimize the number of instruction in the
backend). So add one more bit.
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24154>
We might need more push uniforms (FAU) than the currently bound program. Update
that too for correct results on v9.
Fixes: c282f80c98 ("panfrost: Fix transform feedback on v9")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24198>
Without SP_FS_CTRL_REG0.LODPIXMASK quad ops don't get values from
helper invocations, but from the current one.
Fixes:
dEQP-VK.glsl.derivate.dfdxsubgroup.*
dEQP-VK.glsl.derivate.dfdysubgroup.*
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24211>
That's the "real" name of the field.
It enables ALL helper invocations in a quad, which is necessary for
fine derivatives and quad subgroup ops.
While PIXLODENABLE by itself enables only 3 out 4 fragments in a quad.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24211>
As per spec color_range and chroma_sample_position parameters
are always set, not conditional on color_description_present_flag.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24179>
Same as the blob, always initialize this state when feature
BUG_FIXES18 is present.
Fixes spec@!opengl 2.0@occlusion-query-discard on GC3000.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24166>