Commit graph

4410 commits

Author SHA1 Message Date
Lionel Landwerlin
ba119c73c6 intel: replace RANGE_BASE by BASE for uniform block loads
We're not currently using RANGE_BASE and we'll use BASE for offset
optimizations on Xe2+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:23 +00:00
Lionel Landwerlin
098249ba66 brw: print descriptor & extended descriptors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:22 +00:00
Emma Anholt
cd981e27f7 intel/elk: Move wpos_w setup right into nir_intrinsic_load_frag_w.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Given that the intrinsic will be CSEed at the NIR level, we don't need to
preemptively set it up at the top of the shader.  No change in HSW shader-db.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:43 +00:00
Emma Anholt
269fbcb144 intel/elk: Use pixel_z for gl_FragCoord.z on pre-gen6.
Unless I've seriously missed something, we have the Z in the payload
(which we can always request if we need access to it and it's not already
passed to us due other WM IZ settings).

total instructions in shared programs: 4408303 -> 4408186 (<.01%)
instructions in affected programs: 1164 -> 1047 (-10.05%)
total cycles in shared programs: 142485036 -> 142484566 (<.01%)
cycles in affected programs: 26820 -> 26350 (-1.75%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:43 +00:00
Emma Anholt
dc55b47a58 intel/elk: Move pre-gen6 smooth interpolation 1/w multiply to NIR.
NIR catches that if you're just doing something like adding two smooth
inputs, we can do the multiply once on the result instead of on each
input.  BRW shader-db results:

total instructions in shared programs: 4409146 -> 4408303 (-0.02%)
instructions in affected programs: 800761 -> 799918 (-0.11%)
total cycles in shared programs: 143203198 -> 142485036 (-0.50%)
cycles in affected programs: 79081682 -> 78363520 (-0.91%)
total sends in shared programs: 363044 -> 363042 (<.01%)
sends in affected programs: 33 -> 31 (-6.06%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:42 +00:00
Emma Anholt
fb9b2261a1 intel/elk: Move pre-gen6 gl_FragCoord.w -> interpolation lowering to NIR.
BRW shader-db:
total instructions in shared programs: 4409143 -> 4409146 (<.01%)
instructions in affected programs: 330 -> 333 (0.91%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:41 +00:00
Emma Anholt
17ab39fbf8 intel/elk: Fix some tabs in gen4 URB setup.
This formatted terribly in my editor, just use spaces.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:40 +00:00
Emma Anholt
9d7a016ed1 intel/elk: Retire the global float pixel_x/y values.
Nothing used them any more.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:40 +00:00
Emma Anholt
e1bf014b6e intel/elk: Reduce this->pixel_x/y usage in gfx4 interp setup.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:40 +00:00
Emma Anholt
241bc5da70 intel/elk: Use the pixel_coord UW x/y values for noncoherent FB reads.
No need to force generating the float cast just to turn it back to an int.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:39 +00:00
Emma Anholt
1134cdc198 intel/elk: Lower load_frag_coord to load_{pixel_coord,frag_coord_z/w} in NIR.
This moves some conversions to NIR that may get eliminated, and also
distinguishes gl_FragCoord.z/w loads at the shader info level so we don't
need to flag uses_src_depth/uses_src_w when only gl_FragCoord.xy get used
(as is typical).  This reduces thread payload setup on many shaders.
Also, interestingly, blorp shaders stop reserving space for z/w despite
not putting them in the payload (since PS_EXTRA isn't filled out for z/w).

HSW shader-db is noise:

total instructions in shared programs: 9942649 -> 9942997 (<.01%)
instructions in affected programs: 143167 -> 143515 (0.24%)

total cycles in shared programs: 314768862 -> 314299112 (-0.15%)
cycles in affected programs: 62951452 -> 62481702 (-0.75%)

LOST:   44
GAINED: 26

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:39 +00:00
Emma Anholt
88f1656133 intel/elk: Save the UW pixel x/y as a temp.
This will be used for representing gl_FragCoord in NIR and reducing
payload registers pushed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:38 +00:00
Emma Anholt
5222c35924 intel/elk: Save the UW pixel x/y as a temp on gfx6+.
This will be used for representing gl_FragCoord in NIR and reducing
payload registers pushed.

HSW results:

total instructions in shared programs: 9940636 -> 9948574 (0.08%)
instructions in affected programs: 852560 -> 860498 (0.93%)

total cycles in shared programs: 314804525 -> 314900080 (0.03%)
cycles in affected programs: 39786599 -> 39882154 (0.24%)

LOST:   5
GAINED: 11
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:38 +00:00
Emma Anholt
af74abd68c intel/fs: Don't bother checking if load_frag_coord uses interpolation.
This was leftover dead code from 4bb6e6817e ("intel: Use a system value
for gl_FragCoord") -- the sysval doesn't do any interpolation and doesn't
have sources that could use a barycentric.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25190>
2025-06-18 23:11:37 +00:00
Emma Anholt
0bf114736a intel: Use the common NIR lowering for fquantize2f16.
This generates one extra instruction to set the rounding mode to RTE due
to f2f16_rtne in the lowering.  This changes the result for
fquantize2f16(65505.0) from 65536 to 65504, which fixes SPIR-V
conformance for this value:

    If Value is positive with a magnitude too large to represent as a
    16-bit floating-point value, the result is positive infinity. If Value
    is negative with a magnitude too large to represent as a 16-bit
    floating-point value, the result is negative infinity.

SPIR-V doesn't specify whether this overflow check is before or after
rounding, but IEEE specifies rounding first, which is what produces our
65504.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25552>
2025-06-18 22:45:08 +00:00
Lionel Landwerlin
1d8382b88e brw: enable more lowering for bitfield manipulation at non 32bit sizes
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35381>
2025-06-11 14:09:56 +00:00
Paulo Zanoni
12192f6489 brw: properly decode TGL_PIPE_SCALAR
Source: BSpec "Instruction Fields" page (56701), SWSB field.

Credits to Caio Oliveira here, since he was helping me while we found
this issue together.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35395>
2025-06-09 22:21:13 +00:00
Dave Airlie
870b8717b2 Revert "hasvk/elk: stop turning load_push_constants into load_uniform"
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reverts commit b036d2ded2.

This seems to break gtk4 and other stuff.

Cc: mesa-stable
(taking ack from Lionel saying we should revert)

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35407>
2025-06-09 09:20:19 +10:00
llyyr
c8bd9ac789 brw: don't unconditionally print message on instance creation
This would cause Mesa to print this message even if an Intel GPU is just
being enumerated by a Vulkan application. For example, `vulkaninfo
--summary`.

Fixes: 52f73db5b7 ("brw: implement read without format lowering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35396>
2025-06-07 13:59:22 +00:00
Caio Oliveira
80fb555718 brw: Fix MAD instruction usage in spilling logic
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The intention here is to build a SIMD8 value, that will be expanded
as needed -- just like the SHL/ADD case, but with a single instruction.

Found when the was triggering invalid MAD with SIMD32 (that gets compressed)
*and* with overlapping destination and source *and* which would cause
conflict when divided into two SIMD16.

Fixes: 338273dedd ("brw/reg_allocate: Optimize spill offset calculation using integer MAD")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35302>
2025-06-06 15:31:50 +00:00
Lionel Landwerlin
52f73db5b7 brw: implement read without format lowering
Load the format enum and then just go through a series of :

   if format == R16G16B16A16_UNORM
      color = lower_r32g32_uint_tor_r16g16b16a16_unorm(color)
   else if format == R16G16B16A16_SNORM
      ...

For Gfx12.5, there is no in-shader conversion.

For Gfx12/11, the in-shader conversion covers the following formats :
    - ISL_FORMAT_R10G10B10A2_UNORM
    - ISL_FORMAT_R10G10B10A2_UINT
    - ISL_FORMAT_R11G11B10_FLOAT

For Gfx9, the following formats :
    - ISL_FORMAT_R16G16B16A16_UNORM
    - ISL_FORMAT_R16G16B16A16_SNORM
    - ISL_FORMAT_R10G10B10A2_UNORM
    - ISL_FORMAT_R10G10B10A2_UINT
    - ISL_FORMAT_R8G8B8A8_UNORM
    - ISL_FORMAT_R8G8B8A8_SNORM
    - ISL_FORMAT_R16G16_UNORM
    - ISL_FORMAT_R16G16_SNORM
    - ISL_FORMAT_R11G11B10_FLOAT
    - ISL_FORMAT_R8G8_UNORM
    - ISL_FORMAT_R8G8_SNORM
    - ISL_FORMAT_R16_UNORM
    - ISL_FORMAT_R16_SNORM
    - ISL_FORMAT_R8_UNORM
    - ISL_FORMAT_R8_SNORM

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22524>
2025-06-06 12:28:42 +00:00
Lionel Landwerlin
79498a0849 brw: fix brw_nir_fs_needs_null_rt helper
In 9b42215e0d ("iris: ensure null render target for specific cases") I
wrongly assumed that writing gl_SampleMask would only happen in
multisampled cases.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9b42215e0d ("iris: ensure null render target for specific cases")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13292
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35313>
2025-06-04 10:10:38 +00:00
Lionel Landwerlin
a51d061c00 brw: don't generate invalid instructions
0e3e5146cf ("intel/brw: Use correct instruction for value change check
when coalescing") enabled some new cases that exposed a pre-existing
bug that would turn something like this :

      mul.sat(16) %789:F, %787:F, %788:F
      mov.g.f0.0(16) %790:F, %789:F
      (+f0.0) sel(16) %800:UD, %790:UD, 0u

into this :

      mul.sat(16) %790:F, %787:F, %788:F
      mov.g.f0.0(16) null:F, null<8,8,1>:F
      (+f0.0) sel(16) %800:UD, %790:UD, 0u

The mov[] array can contain the same instruction because it's repeated
for each REG_SIZE writes and a SIMD16 instruction will write 2
REG_SIZE.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0e3e5146cf ("intel/brw: Use correct instruction for value change check when coalescing")
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35276>
2025-06-04 06:08:26 +00:00
Caio Oliveira
2bb9b94c4c brw/disasm: Don't print src1 information for SEND gather
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There's always only the ARF scalar register source, so don't
bother printing other information that won't be used.  Matches
the assembler code.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35297>
2025-06-03 22:52:39 +00:00
Sviatoslav Peleshko
0e3e5146cf intel/brw: Use correct instruction for value change check when coalescing
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When we have partial VGRF MOVs with offsets, we will reach
`channels_remaining == 0` with `inst` that is not writing the whole VGRF.
Currently, even though we check `can_coalesce_vars()` for each offset
separately, it will always check if the dst value is not changed only
for the offset from the instruction that satisfied the
`channels_remaining == 0` condition.

Instead, we should remember and use the correct instruction for each
written offset separately.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10916
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35062>
2025-06-01 17:37:10 +00:00
Lionel Landwerlin
f0e18c475b intel: remove GRL/intel-clc
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35227>
2025-05-29 20:17:13 +00:00
Matt Turner
37016468a5 intel/compiler: Align human-readable send message info
This fprintf() was added in commit cce3bea2a7 ("i965/disasm: Align send
instruction meta-information with dst.")) to align the human-readable
send message info (e.g. "render MsgDesc: RT write ...") with the
destination register on the previous line.

Two months later we disabled printing the instruction offset in commit
662f1ccc24 ("i965: Disable hex offset printing in disassembly."),
thereby unaligning the human-readable send message info for the next 11
years.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35077>
2025-05-28 21:54:40 +00:00
Caleb Callaway
52db0e1480 intel/compiler: fix SHA generation for shader replace
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35140>
2025-05-27 22:57:19 +00:00
Christian Gmeiner
41f2da1a6e treewide: Do not use NIR_PASS_V for nir_divergence_analysis(..)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35131>
2025-05-23 21:19:25 +00:00
Caleb Callaway
e7454f5318 intel/debug: shader dump filter
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
v2: Fixes filtering for various brw shader dump logic

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35061>
2025-05-23 19:57:02 +00:00
Sushma Venkatesh Reddy
6d226ceca1 intel/compiler: Call brw_try_override_assembly independent of debug flag
Previously, brw_try_override_assembly was only called when a debug flag was
enabled. However, during investigations involving workloads such as Steam
games, enabling the debug flag results in excessive NIR and ISA output to
stderr, making debugging more difficult.

This change ensures that brw_try_override_assembly is called when the
INTEL_SHADER_ASM_READ_PATH is set, regardless of the debug flag. This
improves usability in scenarios where minimal debug output is desired.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35115>
2025-05-22 21:45:38 +00:00
Lionel Landwerlin
b036d2ded2 hasvk/elk: stop turning load_push_constants into load_uniform
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.

So just handle load_push_constants in the backend and stop changing
things in Hasvk.

Fixes a bunch of tests in
   dEQP-VK.descriptor_indexing.*
   dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
2025-05-22 07:49:20 +00:00
Lionel Landwerlin
df15968813 anv/brw: stop turning load_push_constants into load_uniform
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.

So just handle load_push_constants in the backend and stop changing
things in Anv.

Fixes a bunch of tests in
   dEQP-VK.descriptor_indexing.*
   dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
2025-05-22 07:49:20 +00:00
Sushma Venkatesh Reddy
524733a990 intel/compiler: Centralize type stomping logic for Gen12.5 restrictions
This patch improves code readability by centralizing the type stomping
logic for Gen12.5 region restrictions in `brw_lower_alu_restrictions`.
It removes redundant comments and ensures type consistency assertions
in `brw_broadcast`, `generate_mov_indirect`, and `generate_shuffle`.

Thank you Ken for guiding me on this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35006>
2025-05-22 06:46:18 +00:00
Iván Briano
27a2f6d1ff brw: add lowering passes for FS barycentric inputs
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:59 +00:00
Iván Briano
8ee14e5291 brw/anv: add provoking vertex to fs_msaa_flags
This will be necessary to select the right value for flat inputs in
fragment shaders when fragment shader barycentrics are in use.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Iván Briano
acdd30a9da brw: check if the FS needs vertex_attributes_bypass to be set
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Iván Briano
c327b83706 brw: implement load_input_vertex intrinsic
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Tapani Pälli
0f591425c9 intel/compiler: provide a helper for null any-hit shader
Xe driver will be disabling the HW functionality for null any-hit
shaders, drivers need to take care of it instead. This commit brings
back parts of older workaround (see b0624e414f) we used to have to
handle the null any-hit case.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>
2025-05-20 10:58:53 +00:00
Mauro Rossi
04a643d877 intel/compiler: use ffsll instead of ffsl in brw_vue_map.c
18bbcf9a triggered the following building error in Android,
simple fix is to use ffsll() as it was done before 18bbcf9a
to process uint64_t generics argument.

Fixes the following building error:

FAILED: src/intel/compiler/libintel_compiler.a.p/brw_vue_map.c.o
...
../src/intel/compiler/brw_vue_map.c:120:37: error: implicit declaration of function 'ffsl' is invalid in C99 [-Werror,-Wimplicit-function-declaratio
n]
   const int first_generic_output = ffsl(generics) - 1;
                                    ^
../src/intel/compiler/brw_vue_map.c:120:37: note: did you mean 'ffs'?
/home/utente/r-x86_kernel/bionic/libc/include/strings.h:72:5: note: 'ffs' declared here
int ffs(int __i) __INTRODUCED_IN_X86(18);
    ^
1 error generated.

Fixes: 18bbcf9a ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34915>
2025-05-11 00:50:21 +02:00
Ian Romanick
338273dedd brw/reg_allocate: Optimize spill offset calculation using integer MAD
Gfx12.5 and later allow the use of two 16-bit immediate values in
integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD,
but that is not helpful where.

v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the
check around using integer MAD is unnecessary.

No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17119962 -> 17118441 (<.01%)
instructions in affected programs: 65398 -> 63877 (-2.33%)
helped: 32 / HURT: 0

total cycles in shared programs: 895433316 -> 895425578 (<.01%)
cycles in affected programs: 13437376 -> 13429638 (-0.06%)
helped: 30 / HURT: 2

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210052706 -> 209550074 (-0.24%)
Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00%

Totals from 7081 (1.00% of 707082) affected shaders:
Instrs: 16864614 -> 16361982 (-2.98%)
Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
2025-05-09 21:31:09 +00:00
Ian Romanick
3db8dbfdc3 brw/reg_allocate: Optimize spill offset calculation using more SIMD8
Re-associate the calculation. The current calcuation is

    ((lane + zero_or_8) << 2) + offset

The first addition is SIMD8, and the shift and second addition are
SIMD16. By switching to

    ((lane << 2) + offset) + zero_or_32

All operations are SIMD8.

The SHL operates directly on the UW 0x76543210UV value, and that
eliminates the MOV to expand the UW to UD.

v2: Switch to alternate method. Update for SIMD32 on Xe2.

No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17121519 -> 17119962 (<.01%)
instructions in affected programs: 73208 -> 71651 (-2.13%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56
helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79%
95% mean confidence interval for instructions value: -56.02 -30.48
95% mean confidence interval for instructions %-change: -3.24% -1.75%
Instructions are helped.

total cycles in shared programs: 895450146 -> 895433316 (<.01%)
cycles in affected programs: 13709400 -> 13692570 (-0.12%)
helped: 31
HURT: 2
helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672
helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51%
HURT stats (abs)   min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -652.42 -367.58
95% mean confidence interval for cycles %-change: -0.61% -0.19%
Cycles are helped.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210566294 -> 210052706 (-0.24%)
Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00%

Totals from 7091 (1.00% of 707082) affected shaders:
Instrs: 17408115 -> 16894527 (-2.95%)
Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
2025-05-09 21:31:09 +00:00
Lionel Landwerlin
5c7c1eceb5 anv/brw: handle pipeline libraries with mesh
I always thought there was a massive issue with pipeline libraries &
mesh shaders. Indeed recent CTS tests have exposed a number of issues.

Some values delivered to the fragment shader are coming from different
places depending on whether the preceding shader is Mesh or not. For
example PrimitiveID is delivered in the per-primitive block in Mesh
pipelines whereas for other pipelines it's coming as a VUE slot (which
is per-vertex). Those are 2 different locations in the payload.

We have to find a layout for fragment shaders that is compatible with
everything. Leaving gaps here and there in the thread payload.

Fixes the following test pattern :

  dEQP-VK.mesh_shader.ext.smoke.fast_lib.shared_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
18bbcf9a63 intel: introduce new VUE layout for separate compiled shader with mesh
Mesh shaders have per vertex block in URB pretty much identical to the
VUE format. Let's just reuse that concept to do all of our layout in
the payload attribute registers. This will ensure that we have
consistent VUE layout between Mesh & non-Mesh pipelines.

We need a new way of laying out the VUE though as we have to
accomodate a HW constraint of maximum (per-primitive + per-vertex) of
32 varying. This means we cannot have 2 locations in the payload for
things like PrimitiveID which can come from either the per-primitive
or the per-vertex block. The new layout places the PrimitiveID at the
end of the per-vertex attributes and shrinks the delivery dynamically
if the mesh stage is active. The shader is compiled with a
MOV_INDIRECT to read the PrimitiveID from the right location in the
attributes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
2d396f6085 intel: prepare VUE layout for more than 2 layouts
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
95efdca00b brw: add documentation pointers to FS attribute layout
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
9d342081e7 brw/nir: add intrinsics to read attribute payload register indirectly
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
ef17fbf8e5 anv/brw: use separate_shader to deduced MUE compaction
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
6230f3029f brw: fix brw_nir_move_interpolation_to_top
In a case like this :
   block_0:
      %5 = ...
      %6 = ...
      block_1:
         %7 = load_interpolated_input %5, %6

The current logic would move load_interpolated_input to block_0 before
%5 but not move %5 & %6 which are sources of that instruction.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:34 +00:00
Lionel Landwerlin
5ff1b31c3f brw: document some brw_wm_prog_data fields
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:34 +00:00