fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 02:30:12 +01:00

Author	SHA1	Message	Date
Antonio Ospite	ddf2aa3a4d	build: avoid redefining unreachable() which is standard in C23 In the C23 standard unreachable() is now a predefined function-like macro in <stddef.h> See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in And this causes build errors when building for C23: ----------------------------------------------------------------------- In file included from ../src/util/log.h:30, from ../src/util/log.c:30: ../src/util/macros.h:123:9: warning: "unreachable" redefined 123 \| #define unreachable(str) \ \| ^~~~~~~~~~~ In file included from ../src/util/macros.h:31: /usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition 456 \| #define unreachable() (__builtin_unreachable ()) \| ^~~~~~~~~~~ ----------------------------------------------------------------------- So don't redefine it with the same name, but use the name UNREACHABLE() to also signify it's a macro. Using a different name also makes sense because the behavior of the macro was extending the one of __builtin_unreachable() anyway, and it also had a different signature, accepting one argument, compared to the standard unreachable() with no arguments. This change improves the chances of building mesa with the C23 standard, which for instance is the default in recent AOSP versions. All the instances of the macro, including the definition, were updated with the following command line: git grep -l '[^_]unreachable(' -- "src/**" \| sort \| uniq \| \ while read file; \ do \ sed -e 's/$[^_]$unreachable(/\1UNREACHABLE(/g' -i "$file"; \ done && \ sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>	2025-07-31 17:49:42 +00:00
Caio Oliveira	2dfd4dcbc5	brw: Fix cmat conversion between bfloat16 and non-float32 The HW only supports converting BRW_TYPE_BF values to/from BRW_TYPE_F, so intermediate conversion is needed. Move the intermediate conversion to the implementation of `@convert_cmat_intel` and simplify the brw_nir_lower_cooperative_matrix pass. This has two positive effects - Fixes conversion between BF and integer type cooperative matrices, that was still using the old emit_alu1 approach instead of the new code for `@convert_cmat_intel`. - Guarantee the intermediate conversion will result in a valid layout for conversions involved USE_B matrices. If we instead used the intrinsic twice in brw_nir_lower_cooperative_matrix.c, a matrix with invalid layout would be visible at NIR level and we wouldn't be able to keep the current assertion for USE_B case. Due to the configurations we have exposed, we still don't need to write a more complex USE_B conversion -- they are all between same size types (and, consequently, packing factors), so no shuffling of data is needed to respect the USE_B layout. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36185>	2025-07-18 21:55:43 +00:00
Matt Turner	6d786a0e4b	brw: Use convert_cmat_intel intrinsic Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35616>	2025-06-27 01:26:22 +00:00
Caio Oliveira	07fa3b3785	intel: Add support for BFloat16 as cooperative matrix source Re-organize the configuration lists to make easier to include BFloat16 only for the Gfx125+ that support it, while keeping MTL supporting the "lowered" configurations from pre-Gfx125. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>	2025-04-29 16:29:37 +00:00
Caio Oliveira	d4381c0908	brw/cmat: Implement conversion from/to BFloat16 When converting BFloat16 from/to non-Float32 type, use the Float32 conversion as an intermediate step. Take the opportunity to separate the unary_op/convert code-paths. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>	2025-04-29 16:29:37 +00:00
Caio Oliveira	de88184ab6	brw/cmat: Support different src/dst packing factors in emit_packed_alu1 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>	2025-04-29 16:29:37 +00:00
Caio Oliveira	7fa7be970d	brw/cmat: Extract emit_packed_alu1() function Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>	2025-04-29 16:29:37 +00:00
Caio Oliveira	4b4500ad35	brw/cmat: Store more information about cmat slices Store the cmat_description and packing_factor so that various functions don't need to extract and recalculate them. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>	2025-04-29 16:29:37 +00:00
Caio Oliveira	a38960e8f3	brw, nir: Use glsl_base_type instead of nir_alu_type for @dpas_intel This will allow including types that don't have a nir_alu_type equivalent, like bfloat16. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>	2025-04-29 16:29:37 +00:00
Caio Oliveira	d5ad798140	spirv, radv, intel: Add NIR intrinsic for cmat conversion A cooperative matrix conversion operation was represented in NIR by the cmat_unary_op intrinsic with an nir_alu_op as extra parameter, that was already lowered to a specific conversion operation based on the matrix types. Instead of that, add a new intrinsic `cmat_convert` that is specific for that conversion. In addition to the src/dst matrix descriptions already available, also include the signedness information in the intrinsic (reuse nir_cmat_signed for that). This is needed because different Convert operations define different interpretations for integers, regardless their original type. In this patch, both radv and intel were changed to use the same logic that was previously used to pick the lowered ALU op. This change will help represent cmat conversions involving BFloat16, because it avoids having to create new NIR ALU ops for all the combinations involving BFloat16. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>	2025-04-16 23:13:36 +00:00
Ian Romanick	556e78f737	intel/brw/xe2+: Allow vec16 for cooperative matrix Xe2 will allow a B matrix large enough that it will be stored in a vec16. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:17:47 -07:00
Ian Romanick	7a773ac53e	intel/brw: Major rework of lower_cmat_load_store The original goal was to get rid of a bunch of the magic constants sprinkled through the function. Once I did that, I realized that there was a lot my symmertry between the row-major and column-major paths possible. It's +6 lines of code, but about 15 of those lines are comments explaining things that were not obvious in the original code. v2: Save duplicated condition in a variable with a meaningful name. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 14:16:48 -07:00
Ian Romanick	ea6e10c0b2	intel/brw: Temporarily disable result=float16 matrix configs Even though the hardware does not naively support these configurations, there are many potential benefits to advertising them. These configurations can theoretically use half the memory bandwidth for loads and stores. For large matrices, that can be the limiting in performance. The current implementation, however, has a number of significant problems. The conversion from float16 to float32 is performed in the driver during conversion from NIR. As a result, many common usage patterns end up doing back-to-back conversions to and from float16 between matrix multiplications (when the result of one multiplication is used as the accumulator for the next). The float16 version of the matrix waste half the possible register space. Each float16 value sits alone in a dword. This is done so that the per-invocation slice of an 8x8 float16 result matrix and an 8x8 float32 result matrix will have the same number of elements. This makes it possible to do straightforward implementations of all the unary_op type conversions in NIR. It would be possible to perform N:M element type conversions in the backend using specialized NIR intrinsics. However, per #10961, this would be very, very painful. My hope is that, once a suitable resolution for that issue can be found, support for these configs can be restored. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>	2024-06-25 13:52:12 -07:00
Ian Romanick	a5adbae6f6	nir: intel/brw: Remove cmat_signed_mask from dpas_intel intrinsic It is not used. The signedness is inferred from src_type and dest_type. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>	2024-04-19 09:53:29 -07:00
Ian Romanick	2ce558d928	intel/brw: Fix handling of cmat_signed_mask For integer types, the signedness is determined by flags on the muladd instruction. The types of the sources play no role. Previously we were using the signedness of the type and ignoring the mask. Adjust the types passed to the dpas_intel intrinsic to match. Fixes various dEQP-VK.compute..cooperative_matrix.khr_.matrixmuladd_cross.* tests on different Intel platforms. Some platforms had failing tests, and some platforms failed EU validation before the tests could fail. Fixes: `6b14da33ad` ("intel/fs: nir: Add nir_intrinsic_dpas_intel") Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>	2024-04-19 09:53:27 -07:00
Ian Romanick	a8115221e5	nir: intel/brw: Change the order of sources for nir_dpas_intel It was by pure luck that all sources (and the result) of nir_dpas_intel had the same number of components. It is possible to support matrix sizes where the accumlator matrix and the result matrix are larger (e.g., 16x8 * 8x16 = 16x16). This breaks all of the assumptions of NIR's infrastructure for code generating intrinsics. Fix the by making the accumulator matrix be the first source. The accumulator and the result will always have the same dimensions (due to rules of matrix multiplication) and the same type (due to restructions of the cooperative matrix extension). This forces them to have the same number of components. This doesn't fix all the potential problems. NIR expects that all 0-sized sources will have the same number of components. This just ensures that the result has the correct number of components. Fixes: `6b14da33ad` ("intel/fs: nir: Add nir_intrinsic_dpas_intel") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>	2024-03-29 21:12:32 +00:00
Iván Briano	446f652cde	intel/cmat: fix stride calculation in cmat load/store The stride given in the shader is in number of elements of the of the type pointed by the given pointer, which may not match the matrix own element type. Since we cast the pointer to match the element type, the stride needs to be ajusted accordingly. v2: - Fix mismatching bit-width in matrix element type and pointer type (Caio) - Do the stride calculation in one place Fixes dEQP-VK.compute.pipeline.cooperative_matrix.khr_.multicomponent. Fixes: `3a35f8b29b` ("intel/cmat: Lower cmat_load and cmat_store") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10820 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27903>	2024-03-15 20:34:43 +00:00
Ian Romanick	2e75d71c1f	intel/cmat: Generate better code for nir_intrinsic_cmat_insert When the source destination index is a constant, we can avoid generating a lot of the intermediate code. At the very least, this makes initial NIR dumps much easier to read. v2: Simplify tracking of dst_index. Suggested by Caio. Suggested-by: Caio Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:54 -08:00
Ian Romanick	6b14da33ad	intel/fs: nir: Add nir_intrinsic_dpas_intel v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion. v3: Fix float16 destination DPAS on DG2. v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. v5: Rebase on !26323. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:28:43 -08:00
Ian Romanick	3a35f8b29b	intel/cmat: Lower cmat_load and cmat_store v2: Add support for non-constant stride. v3: Explain B matrices (a little bit) in get_slice_type_from_desc. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:16 -08:00
Ian Romanick	502be565da	intel/cmat: Add lowering for cmat_bitcast v2: Use nir_component_mask(...) instead of 0xffff. Assert that source and destination are same size. Both suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	7303315a8b	intel/cmat: Enable packed formats for scalar ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	26c4acd8ee	intel/cmat: Enable packed formats for binary ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	0d314eb3cc	intel/cmat: Enable packed formats for unary, length, and construct With this, a minimum test case passes: void main() { coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA; coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR; matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0); matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA); coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor); } v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in one of the 4x8 cases. v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify coop_unary handling. This saved 67 lines of code. v4: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v5: Massive update to the comment in lower_cooperative_matrix_unary_op with some suggestions from Caio. Add a comment and assertion around `nir_def *v[4]`. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	75388a71c9	intel/cmat: Add lowering for cmat_insert and cmat_extract v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Ian Romanick	a2ded5b26c	intel/cmat: Update get_slice_type for packed slices Also splits off another funciton get_slice_type_from_desc that will be used in future commits. v2: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v3: Use glsl_base_type_get_bit_size. v4: Adjust packing so that a single row fills an entire GRF. v5: Add comment for get_packing_factor and some other cleanups there. s/cooperative_matrix/cmat/. Tighten the validation of len in gt_slice_from_desc. All suggested by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00
Caio Oliveira	dba6451ce8	intel/cmat: Add pass to lower cooperative matrix to subgroup operations This is just the skeleton of the implementation. Future commits will fill it all in. v2: Move to src/intel/compiler v3 (idr): Use vecN instead of array[N] for slice type. v4 (idr): Refactor lower_cooperative_matrix_load and lower_cooperative_matrix_store into a single function. v5 (idr): Remove old, verbose debug logging. Assert that entry is not NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of 0xffff. s/cooperative_matrix/cmat/. All suggested by Caio. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> I put both R-b on this because, at this point, we've each done equal parts authoring and reviewing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>	2023-12-29 20:24:15 -08:00

27 commits