Handle bfloat16 by converting sources to float, performing the
operation, and converting result back to bfloat16 if needed. This is
done because not all ALU ops have a `bf` version in NIR.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
Take the opportunity to add a comment about why the bit_size
comes from the NIR def and not the original type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
Allocate space for the aliased region first, then allocate the
non-Aliased blocks in sequence after that.
SPV_KHR_workgroup_memory_explicit_layout extension added support for
having Blocks of workgroup (shared) memory, which include layout
decoration. For that extension all such blocks must be decorated with
Aliased.
SPV_KHR_untyped_pointers extension lifts that requirement, allowing
blocks that don't alias in workgroup memory. They are still explicitly
laid out.
The motivation is that untyped pointers provide a different
mechanism to obtain the same effect as the Aliased blocks. Instead of
having two Aliased variables with different types, have a single
variable and use an untyped pointer with a different type to access it.
This patch is a preparation for supporting untyped pointers.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>
Move the calculation to nir_lower_vars_to_explicit_types(). This
consolidates the check of shader_info::shared_memory_explicit_layout
in a single place instead of in all drivers.
This is motivated by SPV_KHR_untyped_pointers. Before that extension
we had essentially two modes for shared memory variables
- No layout decorations in the SPIR-V, and both internal layout and
driver location was _given by the driver_.
- Explicitly laid out, i.e. they are blocks, and decorated with Aliased.
Because they all alias, we could assign them driver location directly
to the start of the shared memory.
With the untyped pointers extension, there's a third option, to be added
by a later commit
- Explicitly laid out, i.e. they are blocks, and NOT decorated
with Aliased. Driver location is _given by the driver_. Blocks
with and without Aliased can be mixed.
The driver location of multiple blocks that don't alias depend on
alignment that is driver-specific, which we can more easily do from
the nir_lower_vars_to_explicit_types() that already has access to
a function to obtain such value.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (hk)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (anv/hasvk)
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Rob Clark <robdclark@gmail.com> (tu)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>
A cooperative matrix conversion operation was represented in NIR by the
cmat_unary_op intrinsic with an nir_alu_op as extra parameter,
that was already lowered to a specific conversion operation
based on the matrix types.
Instead of that, add a new intrinsic `cmat_convert` that is specific
for that conversion. In addition to the src/dst matrix descriptions
already available, also include the signedness information in the
intrinsic (reuse nir_cmat_signed for that). This is needed because
different Convert operations define different interpretations for
integers, regardless their original type.
In this patch, both radv and intel were changed to use the same logic
that was previously used to pick the lowered ALU op.
This change will help represent cmat conversions involving BFloat16,
because it avoids having to create new NIR ALU ops for all the
combinations involving BFloat16.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>
The vtn_ssa_value for a cmat is not backed by a nir_def, but by a nir_variable, so
can't be used directly when calling a function. In most cases the cmat is used by
reference so code will take the value of deref for it (which is a `nir_def`).
When passing a cooperative matrix to a function by value, let the caller pass the deref
value, and the callee copy to a new local variable from that deref.
Fixes: b98f87612b ("spirv: Implement SPV_KHR_cooperative_matrix")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34364>
After 8fa70cfcfd ("spirv: Use the right bit-size for spec constant
ops") the bit-size will already be adjusted based on the sources, and
this will take care of Convert operations too.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34119>
This is needed for SPIR-V 1.6 support in OpenCL. This capability enables
the Uniform and UniformId decorations which prior were Shader only.
The CTS ends up using those decorations on function arguments, but we can
just ignore handling them there for now.
Fixes the spirv16_uniformdecoration_uniform and
spirv16_uniformdecoration_uniformid CL CTS test inside test_spirv_new.
Fixes: bb6d371c0e ("rusticl: support SPIR-V 1.5 and 1.6")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34004>
This is a rewrite of vtn_bindgen. For now the two tools live in parallel, to
give Intel time to migrate off v1.
For a refresher, the classic vtn_bindgen reads a SPIR-V and generates a .h
containing nir_builder stubs for each exported function. The stub inserts an
unimplemented nir_function with the proper signature into the shader, and adds a
"call" to that function. The driver is responsible for linking with the library
later, which is annoying.
vtn_bindgen2 instead generates a .c/.h pair. The header are just prototypes with
identical signatures to what we have now. The .c implementations, however, are
very different. Instead of generating unimplemented nir_function, the
implementations contain the actual code (as serialized NIR, deserialized
on-the-fly). There is no linking step, nor a library nir_shader that the driver
has to keep around.
The programming model here is that this is "just" nir_builder ... just a
massively more competent way of using nir_builder.
Additionally, the whole SPIR-V -> optimized lowered serialized NIR step is now
all common code. There's no longer anything target-specific, and it's
disentangled from the nir_precomp infrastructure.
That means drivers can use CL with zero integration, except a few meson.build
rules. This gives a very gentle on-ramp to CL for drivers. (Note: that applies
only for library-style CL. For precompiled kernel-style CL, that still requires
significant driver integration. I do have plans there, though. Also,
printf/abort support requires a minimal amount of driver code.)
Furthermore, this unblocks the use of CL library functions in common code. That
makes this an important step towards common code geom/tess or maybe saner
raytracing.
For drivers already using classic vtn_bindgen, porting to vtn_bindgen2 should
just be deleting all your linking/deserializing code. The .cl's are unchanged,
as are the function prototypes exposed.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
adding a new bindgen-using driver should not require touching 4 different meson
files! factor out the expression, since it's a pain otherwise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33099>
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.
The old approach also stops worrking if passes reorder instructions.
This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
In theory you can build a driver using OpenCL kernels with a
-Dmesa-clc=system. That shouldn't require any LLVM/Clang/etc...
But the checks to find the pre-compiled mesa_clc & vtn_bindgen
binaries are in meson files or conditions only triggered if you build
with LLVM (:
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33014>
This is similar to what link_intrastage_shaders is doing and it
fixes the following test:
KHR-Single-GL46.subgroups.builtin_var.compute.subgroupsize_compute
Which was failing with SPIRV but passing with GLSL, the diff being:
- SPIRV: "subgroup_size: 1"
- GLSL: "subgroup_size: 2"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32698>
Due to the cross build issues in current meson, we adds new options to
allow mesa_clc and vtn_bindgen to be installed or searched on the
system.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32719>
switch libagx to the precompilation pipeline. see the big comment in the
previous commit for why we're doing this.
while doing so, we move some dispatch stuff. there was so much churn from
precompile that this avoids doing the churn twice. that new header will be used
for DGC down the road.
there's also a small vtn/bindgen patch in here to skip bindgen'ing entrypoints,
as that conflicts with the new dispatch macros. this is the sane behaviour, we
just need to do the full precomp switch across the tree at once.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339>