useful when validation blows up in a file containing many intrinsic
passes, to figure out which one is borked.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
OpenCL C defines work-item functions to return a scalar for a particular
dimension. This is a really annoying papercut, and is not what you want for
either 1D or 3D dispatches. In both cases, it's nicer to get vectors. For
syntax, we opt to define uint3 "magic globals" for each work-item vector. This
matches the GLSL convention, although retaining OpenCL names. For example,
`gl_GlobalInvocationID.xy` is expressed here as `cl_global_id.xy`. That is much
nicer than standard OpenCL C's syntax `(uint2)(get_global_id(0),
get_global_id(1))`.
We define the obvious mappings for each relevant function in "Work-Item
Functions" in the OpenCL C specification.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
This is a rewrite of vtn_bindgen. For now the two tools live in parallel, to
give Intel time to migrate off v1.
For a refresher, the classic vtn_bindgen reads a SPIR-V and generates a .h
containing nir_builder stubs for each exported function. The stub inserts an
unimplemented nir_function with the proper signature into the shader, and adds a
"call" to that function. The driver is responsible for linking with the library
later, which is annoying.
vtn_bindgen2 instead generates a .c/.h pair. The header are just prototypes with
identical signatures to what we have now. The .c implementations, however, are
very different. Instead of generating unimplemented nir_function, the
implementations contain the actual code (as serialized NIR, deserialized
on-the-fly). There is no linking step, nor a library nir_shader that the driver
has to keep around.
The programming model here is that this is "just" nir_builder ... just a
massively more competent way of using nir_builder.
Additionally, the whole SPIR-V -> optimized lowered serialized NIR step is now
all common code. There's no longer anything target-specific, and it's
disentangled from the nir_precomp infrastructure.
That means drivers can use CL with zero integration, except a few meson.build
rules. This gives a very gentle on-ramp to CL for drivers. (Note: that applies
only for library-style CL. For precompiled kernel-style CL, that still requires
significant driver integration. I do have plans there, though. Also,
printf/abort support requires a minimal amount of driver code.)
Furthermore, this unblocks the use of CL library functions in common code. That
makes this an important step towards common code geom/tess or maybe saner
raytracing.
For drivers already using classic vtn_bindgen, porting to vtn_bindgen2 should
just be deleting all your linking/deserializing code. The .cl's are unchanged,
as are the function prototypes exposed.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
this will be used internally in vtn_bindgen2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
with vtn_bindgen2, we'll want return values without derefs. this needs some
special handholding.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
with vtn_bindgen2 we only care about a single function at a time, not a whole
nir_shader, and it would be quite wasteful to serialize all the shader info
every time. add a specialized serialize just for 1 function.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
this will be used for more concise prints in vtn_bindgen2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
It's not at all clear how this pass should work with real function calls (if at
all), but at least this is enough to handle collections of self-contained
functions which vtn_bindgen2 wants.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
adding a new bindgen-using driver should not require touching 4 different meson
files! factor out the expression, since it's a pain otherwise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
With newly added features, especially since KHR_shader_subgroup the base
runtime has crept above the 15 minute timeout. Introduce a fraction to
keep the runtime in check and add a nightly full run to cover the gaps.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12548
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Martin Roukala <None>
Reviewed-by: Karol Herbst <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33318>
When the core supports unified instruction memory, don't clamp individual
shader sizes to 256 instructions. Allow shaders to make full use of the
state instruction memory, as long as both VS and FS fit into the memory
region.
Allows to run the shaders from glmark2 terrain from state instruction
memory, so we don't need to use icache mode on GC3000 and makes the app
work on GC2000, which doesn't have icache but unified instruction memory.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
Currently we statically partition the unified intsruction memory range
into a VS and a FS range, with each of the shader types getting half
of the memory. Change this to place the FS instructions right behind
the VS instructions, which potentially allows larger shaders to be
executed from the instruction memory states when VS are FS are
unbalanced in size, which is quite common.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
Use the generated macros from the HW headers to do the shifting,
which makes it more clear what is being done to those states.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
According to _InitializeContextBuffer() in the downstream kernel driver
all GPUs claiming to support more than 256 shader instructions have unified
instruction memory and the shader range registers to partition usage of
this unified memory region.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
There is no reason to emit those states on framebuffer changes. While the
framebuffer format change might change the fragment shader due to R/B
swapping in the shader, this triggers compilation of a new shader variant
which will dirty the shader state as needed.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
A single LOAD_STATE command can only load a maximum of 1023 32bit states,
limited by the range of the count parameter in the header.
Split the state update into multiple LOAD_STATE commands if necessary.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
The VIV_FE_LOAD_STATE_HEADER_COUNT marco already includes the masking
operation, so there is no need to apply the same mask again.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
Some cores with the the instruction cache feature, such as the GC3000 found
on the i.MX6QP, have a wrong instruction limit encoded in hardware. The HWDB
entry for this core has the correct number (512). Fixup all cores with the
instruction cache feature to report at least 512 instructions, which was
already assumed when configuring the VS/FS instruction state memory split in
other parts of the driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33229>
There's been a high number of GPU resets on Raven that amdgpu couldn't
recover from, leading to jobs timing out.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33317>
The shader_modifies_coverage-flag is currently not set for PanVK. This
might lead to issues down the line, so ensure it's set correctly.
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33300>
Enabling evaluate_per_sample in non-MSAA cases might cause issues and
hangs for subsequent ZS cases.
Therefore, only enable the flag when MSAA is active.
Fixes: 26d339ef8a ("panfrost: Generate Valhall Malloc IDVS jobs")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33300>
The existing code relies on assert to identify when a cs overlow
occurs. On builds without asserts, a cs overflow won't be detected
and it will likely lead to a hang.
Reporting a preemptively a PIPE_UNKNOWN_CONTEXT_RESET error seems
ok as the context is lost anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33288>