Commit graph

27 commits

Author SHA1 Message Date
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Caio Oliveira
2dfd4dcbc5 brw: Fix cmat conversion between bfloat16 and non-float32
The HW only supports converting BRW_TYPE_BF values to/from BRW_TYPE_F,
so intermediate conversion is needed.  Move the intermediate conversion
to the implementation of `@convert_cmat_intel` and simplify the
brw_nir_lower_cooperative_matrix pass.  This has two positive effects

- Fixes conversion between BF and integer type cooperative matrices,
  that was still using the old emit_alu1 approach instead of the new
  code for `@convert_cmat_intel`.

- Guarantee the intermediate conversion will result in a valid layout
  for conversions involved USE_B matrices.  If we instead used the
  intrinsic twice in brw_nir_lower_cooperative_matrix.c, a matrix with
  invalid layout would be visible at NIR level and we wouldn't be able
  to keep the current assertion for USE_B case.

Due to the configurations we have exposed, we still don't need to
write a more complex USE_B conversion -- they are all between same
size types (and, consequently, packing factors), so no shuffling of
data is needed to respect the USE_B layout.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36185>
2025-07-18 21:55:43 +00:00
Matt Turner
6d786a0e4b brw: Use convert_cmat_intel intrinsic
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35616>
2025-06-27 01:26:22 +00:00
Caio Oliveira
07fa3b3785 intel: Add support for BFloat16 as cooperative matrix source
Re-organize the configuration lists to make easier to include BFloat16
only for the Gfx125+ that support it, while keeping MTL supporting the
"lowered" configurations from pre-Gfx125.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
d4381c0908 brw/cmat: Implement conversion from/to BFloat16
When converting BFloat16 from/to non-Float32 type, use
the Float32 conversion as an intermediate step.  Take the
opportunity to separate the unary_op/convert code-paths.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
de88184ab6 brw/cmat: Support different src/dst packing factors in emit_packed_alu1
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
7fa7be970d brw/cmat: Extract emit_packed_alu1() function
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
4b4500ad35 brw/cmat: Store more information about cmat slices
Store the cmat_description and packing_factor so that various
functions don't need to extract and recalculate them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
a38960e8f3 brw, nir: Use glsl_base_type instead of nir_alu_type for @dpas_intel
This will allow including types that don't have a nir_alu_type
equivalent, like bfloat16.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
d5ad798140 spirv, radv, intel: Add NIR intrinsic for cmat conversion
A cooperative matrix conversion operation was represented in NIR by the
cmat_unary_op intrinsic with an nir_alu_op as extra parameter,
that was already lowered to a specific conversion operation
based on the matrix types.

Instead of that, add a new intrinsic `cmat_convert` that is specific
for that conversion.  In addition to the src/dst matrix descriptions
already available, also include the signedness information in the
intrinsic (reuse nir_cmat_signed for that).  This is needed because
different Convert operations define different interpretations for
integers, regardless their original type.

In this patch, both radv and intel were changed to use the same logic
that was previously used to pick the lowered ALU op.

This change will help represent cmat conversions involving BFloat16,
because it avoids having to create new NIR ALU ops for all the
combinations involving BFloat16.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>
2025-04-16 23:13:36 +00:00
Ian Romanick
556e78f737 intel/brw/xe2+: Allow vec16 for cooperative matrix
Xe2 will allow a B matrix large enough that it will be stored in a
vec16.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>
2024-06-25 14:17:47 -07:00
Ian Romanick
7a773ac53e intel/brw: Major rework of lower_cmat_load_store
The original goal was to get rid of a bunch of the magic constants
sprinkled through the function. Once I did that, I realized that there
was a lot my symmertry between the row-major and column-major paths
possible.

It's +6 lines of code, but about 15 of those lines are comments
explaining things that were not obvious in the original code.

v2: Save duplicated condition in a variable with a meaningful
name. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>
2024-06-25 14:16:48 -07:00
Ian Romanick
ea6e10c0b2 intel/brw: Temporarily disable result=float16 matrix configs
Even though the hardware does not naively support these configurations,
there are many potential benefits to advertising them. These
configurations can theoretically use half the memory bandwidth for loads
and stores. For large matrices, that can be the limiting in performance.

The current implementation, however, has a number of significant
problems.

The conversion from float16 to float32 is performed in the driver during
conversion from NIR. As a result, many common usage patterns end up
doing back-to-back conversions to and from float16 between matrix
multiplications (when the result of one multiplication is used as the
accumulator for the next).

The float16 version of the matrix waste half the possible register
space. Each float16 value sits alone in a dword. This is done so that
the per-invocation slice of an 8x8 float16 result matrix and an 8x8
float32 result matrix will have the same number of elements. This makes
it possible to do straightforward implementations of all the unary_op
type conversions in NIR.

It would be possible to perform N:M element type conversions in the
backend using specialized NIR intrinsics. However, per #10961, this
would be very, very painful. My hope is that, once a suitable resolution
for that issue can be found, support for these configs can be restored.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>
2024-06-25 13:52:12 -07:00
Ian Romanick
a5adbae6f6 nir: intel/brw: Remove cmat_signed_mask from dpas_intel intrinsic
It is not used. The signedness is inferred from src_type and dest_type.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
2024-04-19 09:53:29 -07:00
Ian Romanick
2ce558d928 intel/brw: Fix handling of cmat_signed_mask
For integer types, the signedness is determined by flags on the muladd
instruction. The types of the sources play no role. Previously we were
using the signedness of the type and ignoring the mask.

Adjust the types passed to the dpas_intel intrinsic to match.

Fixes various
dEQP-VK.compute.*.cooperative_matrix.khr_*.matrixmuladd_cross.* tests on
different Intel platforms. Some platforms had failing tests, and some
platforms failed EU validation before the tests could fail.

Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
2024-04-19 09:53:27 -07:00
Ian Romanick
a8115221e5 nir: intel/brw: Change the order of sources for nir_dpas_intel
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).

This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.

This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.

Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Iván Briano
446f652cde intel/cmat: fix stride calculation in cmat load/store
The stride given in the shader is in number of elements of the of the
type pointed by the given pointer, which may not match the matrix own
element type.
Since we cast the pointer to match the element type, the stride needs to
be ajusted accordingly.

v2:
 - Fix mismatching bit-width in matrix element type and pointer type (Caio)
 - Do the stride calculation in one place

Fixes dEQP-VK.compute.pipeline.cooperative_matrix.khr_*.multicomponent.*

Fixes: 3a35f8b29b ("intel/cmat: Lower cmat_load and cmat_store")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10820

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27903>
2024-03-15 20:34:43 +00:00
Ian Romanick
2e75d71c1f intel/cmat: Generate better code for nir_intrinsic_cmat_insert
When the source destination index is a constant, we can avoid generating
a lot of the intermediate code. At the very least, this makes initial
NIR dumps much easier to read.

v2: Simplify tracking of dst_index. Suggested by Caio.

Suggested-by: Caio
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:54 -08:00
Ian Romanick
6b14da33ad intel/fs: nir: Add nir_intrinsic_dpas_intel
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion.

v3: Fix float16 destination DPAS on DG2.

v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.

v5: Rebase on !26323.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:28:43 -08:00
Ian Romanick
3a35f8b29b intel/cmat: Lower cmat_load and cmat_store
v2: Add support for non-constant stride.

v3: Explain B matrices (a little bit) in
get_slice_type_from_desc. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:16 -08:00
Ian Romanick
502be565da intel/cmat: Add lowering for cmat_bitcast
v2: Use nir_component_mask(...) instead of 0xffff. Assert that source
and destination are same size. Both suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
7303315a8b intel/cmat: Enable packed formats for scalar ops
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar
handling. This saved 13 lines of code.

v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
26c4acd8ee intel/cmat: Enable packed formats for binary ops
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary
handling. This saved 13 lines of code.

v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
0d314eb3cc intel/cmat: Enable packed formats for unary, length, and construct
With this, a minimum test case passes:

    void main()
    {
        coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA;
        coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR;

        matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0);
        matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA);

        coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor);
    }

v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in
one of the 4x8 cases.

v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify
coop_unary handling. This saved 67 lines of code.

v4: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

v5: Massive update to the comment in lower_cooperative_matrix_unary_op
with some suggestions from Caio. Add a comment and assertion around
`nir_def *v[4]`. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
75388a71c9 intel/cmat: Add lowering for cmat_insert and cmat_extract
v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Ian Romanick
a2ded5b26c intel/cmat: Update get_slice_type for packed slices
Also splits off another funciton get_slice_type_from_desc that will be
used in future commits.

v2: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.

v3: Use glsl_base_type_get_bit_size.

v4: Adjust packing so that a single row fills an entire GRF.

v5: Add comment for get_packing_factor and some other cleanups
there. s/cooperative_matrix/cmat/. Tighten the validation of len in
gt_slice_from_desc. All suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00
Caio Oliveira
dba6451ce8 intel/cmat: Add pass to lower cooperative matrix to subgroup operations
This is just the skeleton of the implementation. Future commits will
fill it all in.

v2: Move to src/intel/compiler

v3 (idr): Use vecN instead of array[N] for slice type.

v4 (idr): Refactor lower_cooperative_matrix_load and
lower_cooperative_matrix_store into a single function.

v5 (idr): Remove old, verbose debug logging. Assert that entry is not
NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of
0xffff. s/cooperative_matrix/cmat/. All suggested by Caio.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>

I put both R-b on this because, at this point, we've each done equal
parts authoring and reviewing.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:24:15 -08:00