Commit graph

6168 commits

Author SHA1 Message Date
Daniel Schürmann
3dab7b0a45 nir/tests: add tests for nir_move_terminate_out_of_loops
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Daniel Schürmann
c59356e6a5 nir: add option to move terminate{_if} out of loops
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Sil Vilerino
150fa795fe nir: Only build nir headers for mediafoundation/d3d12-no-graphics paired build
Reviewed-by: <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34845>
2025-05-09 16:34:00 +00:00
Georg Lehmann
ba63263f32 nir: add bfdot2_bfadd and use it for lowering bfdot if supported
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:26 +00:00
Georg Lehmann
02e743c99e nir: add an option to lower bf2f and f2bf
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Georg Lehmann
e8f5c335ff radv,aco,nir: keep the A and B base type for cmat_muladd_amd
With bfloat16, and the two fp8 formats in the future, using just the bit size
to identify the types is no longer possible.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34768>
2025-05-09 11:20:25 +00:00
Rhys Perry
ddef4bddf8 ac/nir: round components when lowering 8/16-bit loads to 32-bit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
f538cae743 nir/algebraic: optimize ior(unpack_4x8, unpack_4x8<<8) to unpack_32_2x16
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Rhys Perry
10f4264936 nir/search: extend swizzle_y
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:50 +00:00
Job Noorman
6a57bfb004 nir/lower_io_to_vector: remove can_read_output assert
Since we're not creating new output reads, just vectorizing existing
ones, this isn't the place to assert whether we can actually read
outputs.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <anholt@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34784>
2025-05-08 08:18:24 +00:00
Lionel Landwerlin
9d342081e7 brw/nir: add intrinsics to read attribute payload register indirectly
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
c467444670 brw/nir: use a new intrinsic for fs_msaa_flag
Avoid NIR code doing offset computations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:34 +00:00
Marek Olšák
f58c0cbb6a nir: split *_accessed_indirectly* bitmasks into *_read/written_indirectly*
for AMD

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
2025-05-08 02:54:12 +00:00
Marek Olšák
afd8fefb79 nir: add shader_info::tess::tcs_cross_invocation_outputs_written
for AMD

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34863>
2025-05-08 02:54:12 +00:00
Alyssa Rosenzweig
5788770d91 nir: add nir_lower_default_point_size pass
this is useful across drivers for maint5 semantics on mobile hw.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34762>
2025-05-06 17:07:00 +00:00
Rhys Perry
75880655f8 nir/lower_gs_intrinsics: silence warning
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
../../../../../../../mesa/src/compiler/nir/nir_lower_gs_intrinsics.c: In function ‘nir_lower_gs_intrinsics’:
../../../../../../../mesa/src/compiler/nir/nir_lower_gs_intrinsics.c:523:93: warning: ‘state’ may be used uninitialized [-Wmaybe-uninitialized]
  523 |             state.decomposed_primitive_count_vars[i] = state.decomposed_primitive_count_vars[0];
      |                                                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
../../../../../../../mesa/src/compiler/nir/nir_lower_gs_intrinsics.c:464:17: note: ‘state’ declared here
  464 |    struct state state;
      |                 ^~~~~

It's always initialized by the first iteration of the loop, but GCC
doesn't seem to know that.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34785>
2025-05-05 11:45:42 +00:00
Rhys Perry
bc49045294 nir/opt_shrink_vectors: add assume to silence warning
../../../../../../../mesa/src/compiler/nir/nir_opt_shrink_vectors.c: In function ‘shrink_dest_to_read_mask’:
../../../../../../../mesa/src/compiler/nir/nir_opt_shrink_vectors.c:140:36: warning: writing 16 bytes into a region of size 15 [-Wstringop-overflow=]
  140 |             swizzle[first_bit + i] = i;
      |             ~~~~~~~~~~~~~~~~~~~~~~~^~~
../../../../../../../mesa/src/compiler/nir/nir_opt_shrink_vectors.c:138:18: note: at offset [1, 15] into destination object ‘swizzle’ of size 16
  138 |          uint8_t swizzle[NIR_MAX_VEC_COMPONENTS] = { 0 };
      |                  ^~~~~~~

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34785>
2025-05-05 11:45:42 +00:00
Ella Stanforth
32d9afdf73 nir/printf: add new helper to printf at a specific pixel.
Debugging with nir_printf_fmt can result in overwhelming information. This
allows us to filter for a pixel we care about.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34737>
2025-05-05 06:20:18 +00:00
Ella Stanforth
43f22110e7 nir/printf: break out va_list handling
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34737>
2025-05-05 06:20:17 +00:00
Christian Gmeiner
f17d350001 lima: Move fdot lowering from NIR to lima
This change relocates the fdot lowering from the generic NIR to the lima,
since lima is the only consumer of this particular lowering. This avoids
potential conflicts with the similar fdot lowering already present in
nir_lower_alu_width.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34757>
2025-04-30 17:33:38 +00:00
Caio Oliveira
a38960e8f3 brw, nir: Use glsl_base_type instead of nir_alu_type for @dpas_intel
This will allow including types that don't have a nir_alu_type
equivalent, like bfloat16.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:37 +00:00
Caio Oliveira
cf4021f93c nir: Add opcodes for BFloat16
SPV_KHR_bfloat16 requires a small set of operations,
since it doesn't support all the arithmetic ops.

This patch adds conversions to/from Float32 and also
the necessary ops (bfdot, bffma, bfmul) to implement
SpvOpDot using the same lowering approach than the
Float32 counterpart.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:36 +00:00
Rohan Garg
9e5d7eb88d compiler/types: add a bfloat16 type
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
2025-04-29 16:29:36 +00:00
Marek Olšák
55db7fc18c nir/opt_varyings: group TES inputs based on whether they are used by POS or VAR
If the optional flag is set, compaction groups TES inputs based on which
outputs they are used for:
- inputs generating only POS/CLIP outputs are first
- inputs generating both POS/CLIP and VAR outputs are next
- inputs generating only VAR outputs are last

shader-db with ACO:
    143 shaders have -1.44% average decrease in code size.
    There are fewer input loads and more of them are vec4 instead of vec1-3.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32262>
2025-04-23 17:47:37 +00:00
Marek Olšák
f15399af0f nir: add gathering passes that gather which inputs affect specific outputs
The first pass computes which shader instructions contribute to each
output. It can be used to query how data flows within shaders towards
outputs.

The second pass computes which shader input components and which types of
memory loads are used to compute shader outputs.

The third pass uses the second pass to gather which input components are
used to compute pos and clip dist outputs, which input components are used
to compute all other outputs, and which input components are used to
compute both. This will be used by compaction in nir_opt_varyings for
drivers that split TES into a separate position cull shader and varying
shader to make it less likely that the same vec4 inputs are needed in both.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32262>
2025-04-23 17:47:37 +00:00
Karol Herbst
33965bb21b nir_lower_mem_access_bit_sizes: fix negative chunk offsets
With a 64 bit pointer model, instead of doing -1 the pass ended up doing
+4294967295. The reason here was some implicit integer conversion going
horribly wrong, so just do the offset math in 64 bit to get a nice result.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13023
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34669>
2025-04-23 16:59:56 +00:00
Ella Stanforth
b38c4e8982 nir/alpha_to_coverage: Add an intrinsic for better dithering
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>
2025-04-23 09:03:41 +00:00
Ella Stanforth
d3aedbfe9d asahi/lib: Move alpha_to_one and alpha_to_coverage lowering to common code.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>
2025-04-23 09:03:41 +00:00
Georg Lehmann
6d7e67d986 nir,amd: add neg_lo/hi modifiers to cmat_matmul_amd
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:55 +00:00
Georg Lehmann
3e26fc4498 nir/opt_algebraic: disable fsat(a + 1.0) opt if a can be NaN
Foz-DB Navi21:
Totals from 9 (0.01% of 79789) affected shaders:
Instrs: 6782 -> 6796 (+0.21%); split: -0.03%, +0.24%
CodeSize: 40020 -> 40108 (+0.22%); split: -0.04%, +0.26%
Latency: 23764 -> 23758 (-0.03%)
InvThroughput: 6424 -> 6431 (+0.11%); split: -0.08%, +0.19%
SClause: 273 -> 275 (+0.73%)
Copies: 338 -> 339 (+0.30%)
VALU: 5138 -> 5147 (+0.18%); split: -0.06%, +0.23%
SALU: 349 -> 350 (+0.29%)
SMEM: 498 -> 500 (+0.40%)

Fixes: a4a3487aae ("nir/opt_algebraic: optimize patterns from Skia")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
a60d61cce8 nir: improve fadd is_a_number analysis by using the range
Foz-DB Navi21:
Totals from 145 (0.18% of 79789) affected shaders:
Instrs: 168553 -> 168391 (-0.10%); split: -0.10%, +0.00%
CodeSize: 926708 -> 926684 (-0.00%)
Latency: 2210456 -> 2210329 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 545992 -> 545768 (-0.04%)
SClause: 3084 -> 3085 (+0.03%)
VALU: 129521 -> 129360 (-0.12%)
SALU: 13085 -> 13084 (-0.01%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
a6fd9f488a nir: add is_a_number analysis for ffma
Foz-DB Navi21:
Totals from 508 (0.64% of 79789) affected shaders:
Instrs: 796183 -> 795838 (-0.04%)
CodeSize: 4303420 -> 4303384 (-0.00%); split: -0.00%, +0.00%
Latency: 7806095 -> 7805458 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 1377028 -> 1376824 (-0.01%); split: -0.01%, +0.00%
Copies: 63297 -> 63299 (+0.00%); split: -0.00%, +0.00%
PreVGPRs: 29818 -> 29819 (+0.00%)
VALU: 562067 -> 561885 (-0.03%); split: -0.03%, +0.00%
SALU: 89896 -> 89733 (-0.18%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
cb6d035925 nir: add range analysis for ffmaz
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:05 +00:00
Georg Lehmann
8ad695195e nir/opt_algebraic: turn exact fmin(1.0, a) into fsat if a is not NaN and not negative
Foz-DB Navi21:
Totals from 2456 (3.08% of 79789) affected shaders:
Instrs: 3415398 -> 3413352 (-0.06%); split: -0.06%, +0.00%
CodeSize: 18781096 -> 18776092 (-0.03%); split: -0.03%, +0.00%
VGPRs: 158512 -> 158528 (+0.01%)
Latency: 39528900 -> 39526687 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 10612237 -> 10609296 (-0.03%); split: -0.03%, +0.00%
VClause: 71028 -> 71034 (+0.01%)
SClause: 93971 -> 93975 (+0.00%); split: -0.00%, +0.01%
Copies: 257525 -> 257521 (-0.00%); split: -0.01%, +0.01%
VALU: 2483374 -> 2481325 (-0.08%); split: -0.09%, +0.00%
SALU: 348207 -> 348211 (+0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
18a0de1834 nir/opt_algebraic: optimize fmax(ffma(a, b, c), 0.0) to fsat
Foz-DB Navi21:
Totals from 2621 (3.28% of 79789) affected shaders:
MaxWaves: 55744 -> 55736 (-0.01%)
Instrs: 2840180 -> 2832647 (-0.27%); split: -0.27%, +0.00%
CodeSize: 15497364 -> 15464692 (-0.21%); split: -0.21%, +0.00%
VGPRs: 138448 -> 138456 (+0.01%)
Latency: 22319512 -> 22307018 (-0.06%); split: -0.06%, +0.01%
InvThroughput: 5745108 -> 5729197 (-0.28%); split: -0.28%, +0.00%
Copies: 110279 -> 110268 (-0.01%); split: -0.04%, +0.03%
VALU: 2210578 -> 2203211 (-0.33%); split: -0.33%, +0.00%
SALU: 169014 -> 168841 (-0.10%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Georg Lehmann
f71fc26393 nir/opt_algebraic: generalize fmax(fadd(a, b), 0.0) to fsat by not requiring fneg
Not a large effect, but it's positive and makes the pattern simpler.

Foz-DB Navi21:
Totals from 1 (0.00% of 79789) affected shaders:
Instrs: 145 -> 138 (-4.83%)
CodeSize: 784 -> 756 (-3.57%)
Latency: 1495 -> 1487 (-0.54%)
InvThroughput: 210 -> 196 (-6.67%)
VALU: 103 -> 96 (-6.80%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34125>
2025-04-22 14:23:04 +00:00
Alyssa Rosenzweig
f1aeb46a34 nir: factor out nir_verts_in_output_prim helper
very useful for geometry shader lowering code.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34638>
2025-04-22 12:47:54 +00:00
Job Noorman
f269c7b3b5 nir/opt_shrink_vectors: enable for load_ubo_vec4
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34600>
2025-04-18 15:56:02 +00:00
Konstantin Seurer
978e9b670e aco,nir: Add support for new GFX12 ray tracing instructions
Adds image_bvh_dual_intersect_ray and image_bvh8_intersect_ray which can
handle the new BVH format. Both instructions write up to 10 VGPRs so
they need to use a vec16 definition in nir.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Caio Oliveira
33295b2249 spirv, nir: Allow non-Aliased workgroup memory blocks
Allocate space for the aliased region first, then allocate the
non-Aliased blocks in sequence after that.

SPV_KHR_workgroup_memory_explicit_layout extension added support for
having Blocks of workgroup (shared) memory, which include layout
decoration.  For that extension all such blocks must be decorated with
Aliased.

SPV_KHR_untyped_pointers extension lifts that requirement, allowing
blocks that don't alias in workgroup memory.  They are still explicitly
laid out.

The motivation is that untyped pointers provide a different
mechanism to obtain the same effect as the Aliased blocks.  Instead of
having two Aliased variables with different types, have a single
variable and use an untyped pointer with a different type to access it.

This patch is a preparation for supporting untyped pointers.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>
2025-04-17 19:13:18 +00:00
Caio Oliveira
fd0a7efb5a spirv, nir: Delay calculation of shared_size when using explicit layout
Move the calculation to nir_lower_vars_to_explicit_types().  This
consolidates the check of shader_info::shared_memory_explicit_layout
in a single place instead of in all drivers.

This is motivated by SPV_KHR_untyped_pointers.  Before that extension
we had essentially two modes for shared memory variables

- No layout decorations in the SPIR-V, and both internal layout and
  driver location was _given by the driver_.

- Explicitly laid out, i.e. they are blocks, and decorated with Aliased.
  Because they all alias, we could assign them driver location directly
  to the start of the shared memory.

With the untyped pointers extension, there's a third option, to be added
by a later commit

- Explicitly laid out, i.e. they are blocks, and NOT decorated
  with Aliased.  Driver location is _given by the driver_.  Blocks
  with and without Aliased can be mixed.

The driver location of multiple blocks that don't alias depend on
alignment that is driver-specific, which we can more easily do from
the nir_lower_vars_to_explicit_types() that already has access to
a function to obtain such value.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (hk)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (anv/hasvk)
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Rob Clark <robdclark@gmail.com> (tu)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>
2025-04-17 19:13:17 +00:00
Caio Oliveira
d5ad798140 spirv, radv, intel: Add NIR intrinsic for cmat conversion
A cooperative matrix conversion operation was represented in NIR by the
cmat_unary_op intrinsic with an nir_alu_op as extra parameter,
that was already lowered to a specific conversion operation
based on the matrix types.

Instead of that, add a new intrinsic `cmat_convert` that is specific
for that conversion.  In addition to the src/dst matrix descriptions
already available, also include the signedness information in the
intrinsic (reuse nir_cmat_signed for that).  This is needed because
different Convert operations define different interpretations for
integers, regardless their original type.

In this patch, both radv and intel were changed to use the same logic
that was previously used to pick the lowered ALU op.

This change will help represent cmat conversions involving BFloat16,
because it avoids having to create new NIR ALU ops for all the
combinations involving BFloat16.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>
2025-04-16 23:13:36 +00:00
Ian Romanick
1d2ebeca17 nir/algebraic: Allow fmin(a,a) optimization when flush denorm to zero is not set
I was surprised this had any affect on Intel GPUs because we have been
unconditionally performing this optimization in the backend since June
2014.

Once that error is fixed (later in this MR), this change prevents a
couple dozen regressions in shader-db and around 90 regressions in
fossil-db. Many of the regressions in fossil-db were loss of SIMD32, and
that can be a big deal.

v2: Add 64-bit too. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16970141 -> 16970139 (<.01%)
instructions in affected programs: 40 -> 38 (-5.00%)
helped: 2 / HURT: 0

total cycles in shared programs: 914617580 -> 914617548 (<.01%)
cycles in affected programs: 3428 -> 3396 (-0.93%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Cycle count: 30546028462 -> 30546025224 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237017827 -> 237017731 (-0.00%)

Totals from 83 (0.01% of 706657) affected shaders:
Cycle count: 3042978 -> 3039740 (-0.11%); split: -0.13%, +0.02%
Non SSA regs after NIR: 78997 -> 78901 (-0.12%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
2025-04-15 23:59:31 +00:00
Alyssa Rosenzweig
63eb27d166 nir: add sampler LOD bias lowering
this is a cleaned up version of the lowering originally written for asahi, moved
to common code so it can be shared with an upcoming Vulkan implementation (not
honeykrisp).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34507>
2025-04-15 14:10:50 +00:00
Alyssa Rosenzweig
9de7ea875d nir: handle mismatched bias/lod bitsizes
the sampler lod bias lowering uses fp16 for perf on AGX.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34507>
2025-04-15 14:10:49 +00:00
Alyssa Rosenzweig
2e15b42eec nir: unvendor lod_bias(_agx)
this will be useful for other backends.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34507>
2025-04-15 14:10:49 +00:00
Connor Abbott
2f93137308 nir/opt_preamble: Handle load_global_ir3
fossil-db results with turnip:

Totals from 994 (0.60% of 165023) affected shaders:
MaxWaves: 10720 -> 11528 (+7.54%); split: +7.57%, -0.04%
Instrs: 1032004 -> 972314 (-5.78%); split: -5.99%, +0.21%
CodeSize: 1847536 -> 1942472 (+5.14%); split: -0.11%, +5.25%
NOPs: 261089 -> 233279 (-10.65%); split: -10.89%, +0.23%
MOVs: 57217 -> 51434 (-10.11%); split: -14.11%, +4.00%
Full: 16412 -> 14647 (-10.75%); split: -10.96%, +0.21%
(ss): 23330 -> 25594 (+9.70%); split: -5.51%, +15.21%
(sy): 17803 -> 15711 (-11.75%); split: -11.93%, +0.18%
(ss)-stall: 96387 -> 107976 (+12.02%); split: -5.14%, +17.17%
(sy)-stall: 952952 -> 765754 (-19.64%); split: -19.84%, +0.19%

STPs: 494 -> 327 (-33.81%)
LDPs: 1447 -> 1163 (-19.63%)
Early-preamble: 668 -> 22 (-96.71%)
Cat0: 280935 -> 251779 (-10.38%); split: -10.60%, +0.22%
Cat1: 93400 -> 84766 (-9.24%); split: -11.79%, +2.55%
Cat2: 343880 -> 337270 (-1.92%); split: -3.20%, +1.28%
Cat3: 189311 -> 180918 (-4.43%)
Cat4: 21008 -> 19920 (-5.18%)
Cat5: 17788 -> 17783 (-0.03%)
Cat6: 45786 -> 39531 (-13.66%)
Cat7: 39896 -> 40347 (+1.13%); split: -0.43%, +1.56%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34483>
2025-04-14 16:53:34 +00:00
Erik Faye-Lund
1d5da22dfd nir/lower_tex: avoid undefined-behavior
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When texture_index and sampler_index are over 32, we can't really check
for them in a single 32-bit word. This happens among other things when
Panfrost uses preload shaders on v9 and later. Otherwise, we trigger
undefined behavior.

We're already doing this for textures in one case, let's be consistent.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>
2025-04-14 11:22:43 +00:00
Erik Faye-Lund
41b136f674 nir/lower_tex: use texture_mask instead of shifting on use
In commit 292ac71a4a ("nir/lower_tex: handle deref casts"), we avoided
using texture_index when a texture instruction contained a variable
deref. There's no good reason why this should be done to some of the
lowering, but not all.

So let's fix up code-paths that were added after this change to do the
same.

The first two patches here crossed paths with the commit that introduced
texture_mask, so it's not strange that the change was missed. The last
one seems to have just copied what was done around it, propagating the
issue.

Fixes: 880b00dc59 ("nir/lower_tex: Add support for lowering YUYV formats")
Fixes: 1358d93650 ("nir/lower_tex: Add support for lowering Y41x formats")
Fixes: 65d6f5aed2 ("nir: add options to lower y_vu, yv_yu, yx_xvxu and xy_vxux")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365>
2025-04-14 11:22:43 +00:00
Konstantin Seurer
cb31b5a958 clc,libcl: Clean up CL includes
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
  headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
  variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
  __OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
  macros.h,u_math.h instead of libcl.h.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576>
2025-04-11 21:27:37 +00:00