Commit graph

14047 commits

Author SHA1 Message Date
Lionel Landwerlin
2418525b2e anv: avoid 64bit atomics emulation on Xe2+
Xe2+ still requires lowering 64bit image load/store to 2x32bit for the
message format. But atomics work without lowering.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34876>
2025-05-23 10:04:29 +00:00
Paulo Zanoni
d77b49eb0a anv/trtt: don't avoid the TR-TT submission when there is stuff to signal
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When an application issues a sparse binding operation, it may be the
case that the state the app is setting is the state that is already
there. In that case, both n_l3l2_binds and n_l1_binds are zero, so the
batch doesn't contain anything and, since 0802bbd486, we just skip
the batch submission and return.

The problem is that skipping the batch submission and returning
ignores the synchronization: there may be syncobjs that we have to
wait and, more importantly, there may be syncobjs that we have to
signal.

This case is exercised by vkd3d-proton's test suite, but I'm not aware
of any other workload that triggers it. This commit only affects
Meteor Lake and older, as TR-TT is only the default behavior for the
platforms running i915.ko.

Testcase: vkd3d-proton/d3d12/test_sparse_buffer_memory_lifetime
Fixes: 0802bbd486 ("anv/trtt: don't submit empty batches when there are no binds to do")
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35078>
2025-05-23 00:17:18 +00:00
Sushma Venkatesh Reddy
6d226ceca1 intel/compiler: Call brw_try_override_assembly independent of debug flag
Previously, brw_try_override_assembly was only called when a debug flag was
enabled. However, during investigations involving workloads such as Steam
games, enabling the debug flag results in excessive NIR and ISA output to
stderr, making debugging more difficult.

This change ensures that brw_try_override_assembly is called when the
INTEL_SHADER_ASM_READ_PATH is set, regardless of the debug flag. This
improves usability in scenarios where minimal debug output is desired.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35115>
2025-05-22 21:45:38 +00:00
Dylan Baker
51b51eb676 anv: Add comment why we overmap and then unmap a region
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>
2025-05-22 21:14:26 +00:00
Dylan Baker
25dd3923dc anv: attempt to make coverity happy
Coverity is upset that we're using `ptr` after we've `munmap`ed up to
the offset of the region, even though we're just moving past the
unmapped region to the still mapped region. Attempt to make it happy by
doing that calculation before unmapping. If it's still mad there's
nothing left we can do.

CID: 1646981
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CID: 1646956
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>
2025-05-22 21:14:26 +00:00
Dylan Baker
ff5cb90880 anv: avoid potential integer overflow
Coverity points out that we're using a 32bit type on the left side here,
so the entire operation is done as 32 bit instead of 64

CID: 1646960
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>
2025-05-22 21:14:26 +00:00
Dylan Baker
2a3cf70db8 blorp: cast uint32_t -> int64_t to avoid potential overflow
In practice, I don't think it's actually going to overflow, but it could
in theory, which coverity is pointing out.

CID: 1647010
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35114>
2025-05-22 21:14:26 +00:00
Lionel Landwerlin
b036d2ded2 hasvk/elk: stop turning load_push_constants into load_uniform
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.

So just handle load_push_constants in the backend and stop changing
things in Hasvk.

Fixes a bunch of tests in
   dEQP-VK.descriptor_indexing.*
   dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
2025-05-22 07:49:20 +00:00
Lionel Landwerlin
df15968813 anv/brw: stop turning load_push_constants into load_uniform
Those intrinsics have different semantics in particular with regards
to divergence. Turning one into the other without invalidating the
divergence information breaks NIR validation. But also the conversion
means we get artificially less convergent values in the shaders.

So just handle load_push_constants in the backend and stop changing
things in Anv.

Fixes a bunch of tests in
   dEQP-VK.descriptor_indexing.*
   dEQP-VK.pipeline.*.push_constant.graphics_pipeline.dynamic_index_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34546>
2025-05-22 07:49:20 +00:00
Lionel Landwerlin
b204153106 anv: add a comment about Wa_14016820455
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35101>
2025-05-22 07:31:49 +00:00
Sushma Venkatesh Reddy
524733a990 intel/compiler: Centralize type stomping logic for Gen12.5 restrictions
This patch improves code readability by centralizing the type stomping
logic for Gen12.5 region restrictions in `brw_lower_alu_restrictions`.
It removes redundant comments and ensures type consistency assertions
in `brw_broadcast`, `generate_mov_indirect`, and `generate_shuffle`.

Thank you Ken for guiding me on this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35006>
2025-05-22 06:46:18 +00:00
Valentine Burley
dc483ea924 ci: Remove firmware from test-base
Firmware packages continue to grow in size, so stop installing them in
the test-base image.

The necessary firmware is now collected and uploaded per vendor in an
external repository.

LAVA devices can opt into optional firmware by specifying the name of the
archive via LAVA_FIRMWARE.

For bare-metal, Qualcomm firmware required for DUTs in the Google lab is
included in the baremetal image.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13051

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34861>
2025-05-21 08:48:15 +00:00
Iván Briano
815bcda06d anv: enable VK_KHR_fragment_shader_barycentric
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:59 +00:00
Iván Briano
6268792a29 anv: set HW state for fragment shader barycentric
When the FS requires it, set the VertexAttributeBypass and
LegacyBaryAssignmentDisable bits accordingly.

Also program the provoking vertex to give us the per-vertex attributes
in the order the Vulkan specification dictates, and track its dynamic
value for the FS to pick up constant interpolated inputs correctly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:59 +00:00
Iván Briano
27a2f6d1ff brw: add lowering passes for FS barycentric inputs
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:59 +00:00
Iván Briano
8ee14e5291 brw/anv: add provoking vertex to fs_msaa_flags
This will be necessary to select the right value for flat inputs in
fragment shaders when fragment shader barycentrics are in use.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Iván Briano
acdd30a9da brw: check if the FS needs vertex_attributes_bypass to be set
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Iván Briano
c327b83706 brw: implement load_input_vertex intrinsic
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Iván Briano
4c1f9554f5 intel/genxml: update some instructions for Xe2+
3DSTATE_CLIP and 3DSTATE_SF add:
 - Triangle Strip Odd Provoking Vertex Select
3DSTATE_RASTER:
 - Legacy Bary Assignment Disable
3DSTATE_SBE:
 - Vertex Attributes Bypass

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34445>
2025-05-20 20:57:58 +00:00
Aditya Swarup
6528d267ef anv: Disable fast clear when surface width is 16k
HSD 16023071695 description mentions we need to extend
WA_16021232440 to cover the case when surface width is 16k.

BSpec: 57340

Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34838>
2025-05-20 10:44:52 -07:00
Tapani Pälli
5828612da2 anv: use internal rt-null-ahs when any_hit is null
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Tested on BMG and PTL using both settings for RT_CTRL.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>
2025-05-20 10:58:53 +00:00
Tapani Pälli
0f591425c9 intel/compiler: provide a helper for null any-hit shader
Xe driver will be disabling the HW functionality for null any-hit
shaders, drivers need to take care of it instead. This commit brings
back parts of older workaround (see b0624e414f) we used to have to
handle the null any-hit case.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35044>
2025-05-20 10:58:53 +00:00
Felix DeGrood
e75c0160df intel/tools: add intel_measure.py
We are moving away from the INTEL_MEASURE tool, replacing
it with utrace. Utrace is better maintained and provides similar
debug data. The eventual plan is to EOL INTEL_MEASURE from the driver.

This python script reinterprets the dumped utrace data into the
traditional INTEL_MEASURE csv file format.

Usage:
MESA_GPU_TRACES=print_csv MESA_GPU_TRACEFILE=/tmp/ut.csv INTEL_DEBUG=stall <cmd>
intel_measure.py /tmp/ut.csv > im.csv

Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34662>
2025-05-19 17:27:30 +00:00
Lionel Landwerlin
c570740272 anv: enable preemption setting on command/batch correctly
The 2 helpers we're using for doing internal operations (copies,
command generation, etc...) can work on command buffers or lower level
batches.

When working with command buffers, the helpers should set the
preemption using genX(cmd_buffer_set_preemption) so that whatever
operation comes after toggles the state back to what it needs and we
minimize the toggles.

When working with batchs, the helpers should disable preemption using
genX(batch_set_preemption) and turn it back on when done.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35030>
2025-05-19 06:56:04 +00:00
Jianxun Zhang
2212865ce0 anv: Use different PAT entries for compressed resources
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Displayable compressed resournces have a different PAT
entry from the non-displayable compressed.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>
2025-05-16 16:03:54 -07:00
Jianxun Zhang
ca092db7ce intel/dev: Differentiate displayable PAT entry of compression (xe2)
We need two PAT entries with compression for displayable and
non-displayable compressed images. The current 'compressed' entry
is renamed to 'scanout_compressed' for the displayable.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>
2025-05-16 16:03:54 -07:00
Kenneth Graunke
20222cd956 anv: Use the new nir_opt_acquire_release_barriers pass
Improves performance of Phasmophobia with the "Eye Adaptation" video
setting enabled on Arc B570 by about 9.5%.

fossil-db results on Battlemage:

   Totals:
   Instrs: 148797922 -> 148797865 (-0.00%)
   Send messages: 7066341 -> 7066317 (-0.00%)
   Cycle count: 21459978352 -> 21459975048 (-0.00%)

   Totals from 8 (0.00% of 574410) affected shaders:
   Instrs: 4633 -> 4576 (-1.23%)
   Send messages: 479 -> 455 (-5.01%)
   Cycle count: 611886 -> 608582 (-0.54%)

Observed to cut 15% of sends in a Phasmophobia shader, 8.3% in a Far Cry
New Dawn shader, 7% in a Borderlands 3 DX11 shader, and 3.4-3.7% of
sends in a few Witcher 3 and Dark Souls 3 shaders.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33504>
2025-05-16 00:29:13 +00:00
José Roberto de Souza
cb6f96a1e8 anv: Remove a '#if GFX_VER >= 30' block inside of a else of '#if GFX_VERx10 >= 125'
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Removing deadcode.

Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
2025-05-15 15:25:12 +00:00
José Roberto de Souza
37b42ef648 anv: Drop '#if GFX_VERx10 >= 125' inside of '#if GFX_VERx10 >= 125'
This is just redundant.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
2025-05-15 15:25:12 +00:00
José Roberto de Souza
3cd972a2d3 anv: Enable preemption due 3DPRIMITIVE in GFX 12
The issues preventing it to be enabled were fixed so now we can enable
it but we need also to enable workaround 16013994831 back again.

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
2025-05-15 15:25:12 +00:00
José Roberto de Souza
2432d6677e anv: Implement missing part of Wa_1604061319
Description of this workaround are not clear but looking at Iris
implementation we need to emit all 3DSTATE_PUSH_CONSTANT_ALLOC_XS if
any 3DSTATE_PUSH_CONSTANT_ALLOC_XS is emitted.

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34988>
2025-05-15 15:25:12 +00:00
Hyunjun Ko
7ddf51dc99 anv: Fix to set CDEF filter flag correctly.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes to play av1_intel_broken2.ivf.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>
2025-05-15 01:02:05 +00:00
Hyunjun Ko
2e256a3cee anv: Allocate MV buffers enough for AV1 decoding.
As other video memories for AV1 are already allocated for the maximum
sizes, now it does the same for MV buffers too.

This fixes a bunch of artifacts of AV1 playing.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>
2025-05-15 01:02:05 +00:00
Hyunjun Ko
f4d480f808 anv: Always allocate cdf tables when independent profiles provided
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>
2025-05-15 01:02:05 +00:00
Nanley Chery
4502254cd2 anv: Drop the slow clear heuristic
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This no longer provides a performance improvement.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:05 +00:00
Nanley Chery
67d60f4325 intel/blorp: Simplify get_fast_clear_rect() for gfx12.5
Refactor the scale factors to highlight the 16-tile width requirement on
Tile4. The fast-clear simulator code associated with HSD 1407682962
also contains a 16-tile requirement for Tile4 surfaces (for the pitch).

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:05 +00:00
Nanley Chery
312952048b intel/blorp: Redescribe gfx12.5 surfaces for CCS fast clears
According to HSD 1407682962 and the associated simulator code,
fast-clear performance can be affected by: image alignment, tiling,
dimensionality, and row pitch. Redescribe surfaces in order avoid
fast-clearing at a slower rate.

Also, benchmarking the main patch in the performance CI (hw=A750)
shows that some traces are helped significantly:
* TotalWarWarhammer3 +5.58% (n=2)
* Factorio +3.75% (n=1)
* TerminatorResistance +3.3% (n=2)
* Borderlands3 +3.23% (n=2)

We could additionally increase the alignment requirements of surfaces in
order to deterministically increase fast-clear performance. That's left
out of this patch in order to avoid any functional pitfalls that can
arise with increased memory consumption. As a result, performance will
continue to be affected by how ISL/drivers/apps configure main surface
memory alignments (directly or indirectly).

Thanks to Lionel Landwerlin for pointing me to the relevant simulator
code.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11168
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11418
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:05 +00:00
Nanley Chery
169e22f962 intel/blorp: Drop clear color assignment prior to Xe2
This hasn't been used since the responsibility of clear color updates
moved to the drivers.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:05 +00:00
Nanley Chery
e353244553 intel/blorp: Disable repclear for gfx12 fast-clear
Docs indicate that this shouldn't be used.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:05 +00:00
Nanley Chery
8dad01903a intel: Add and use isl_surf_image_has_unique_tiles()
Returns whether or not a subresource range maps to a tile-aligned memory
range which doesn't overlap other subresources.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:04 +00:00
Nanley Chery
fcdae4d4c0 intel: Add and use isl_surf_from_mem()
Unify code which creates surfaces from buffers. The behavior is slightly
changed to use array layers to enable arrayed buffer clears (as needed).

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33776>
2025-05-13 15:13:04 +00:00
Mauro Rossi
04a643d877 intel/compiler: use ffsll instead of ffsl in brw_vue_map.c
18bbcf9a triggered the following building error in Android,
simple fix is to use ffsll() as it was done before 18bbcf9a
to process uint64_t generics argument.

Fixes the following building error:

FAILED: src/intel/compiler/libintel_compiler.a.p/brw_vue_map.c.o
...
../src/intel/compiler/brw_vue_map.c:120:37: error: implicit declaration of function 'ffsl' is invalid in C99 [-Werror,-Wimplicit-function-declaratio
n]
   const int first_generic_output = ffsl(generics) - 1;
                                    ^
../src/intel/compiler/brw_vue_map.c:120:37: note: did you mean 'ffs'?
/home/utente/r-x86_kernel/bionic/libc/include/strings.h:72:5: note: 'ffs' declared here
int ffs(int __i) __INTRODUCED_IN_X86(18);
    ^
1 error generated.

Fixes: 18bbcf9a ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34915>
2025-05-11 00:50:21 +02:00
Ian Romanick
338273dedd brw/reg_allocate: Optimize spill offset calculation using integer MAD
Gfx12.5 and later allow the use of two 16-bit immediate values in
integer MAD. Gfx11 and Gfx12 allow a single immediate for integer MAD,
but that is not helpful where.

v2: brw_reg_alloc::build_lane_offsets is only used on Gfx12.5+, so the
check around using integer MAD is unnecessary.

No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17119962 -> 17118441 (<.01%)
instructions in affected programs: 65398 -> 63877 (-2.33%)
helped: 32 / HURT: 0

total cycles in shared programs: 895433316 -> 895425578 (<.01%)
cycles in affected programs: 13437376 -> 13429638 (-0.06%)
helped: 30 / HURT: 2

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210052706 -> 209550074 (-0.24%)
Cycle count: 31486266412 -> 31436238696 (-0.16%); split: -0.16%, +0.00%

Totals from 7081 (1.00% of 707082) affected shaders:
Instrs: 16864614 -> 16361982 (-2.98%)
Cycle count: 6323185782 -> 6273158066 (-0.79%); split: -0.79%, +0.00%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
2025-05-09 21:31:09 +00:00
Ian Romanick
3db8dbfdc3 brw/reg_allocate: Optimize spill offset calculation using more SIMD8
Re-associate the calculation. The current calcuation is

    ((lane + zero_or_8) << 2) + offset

The first addition is SIMD8, and the shift and second addition are
SIMD16. By switching to

    ((lane << 2) + offset) + zero_or_32

All operations are SIMD8.

The SHL operates directly on the UW 0x76543210UV value, and that
eliminates the MOV to expand the UW to UD.

v2: Switch to alternate method. Update for SIMD32 on Xe2.

No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17121519 -> 17119962 (<.01%)
instructions in affected programs: 73208 -> 71651 (-2.13%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56
helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79%
95% mean confidence interval for instructions value: -56.02 -30.48
95% mean confidence interval for instructions %-change: -3.24% -1.75%
Instructions are helped.

total cycles in shared programs: 895450146 -> 895433316 (<.01%)
cycles in affected programs: 13709400 -> 13692570 (-0.12%)
helped: 31
HURT: 2
helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672
helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51%
HURT stats (abs)   min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -652.42 -367.58
95% mean confidence interval for cycles %-change: -0.61% -0.19%
Cycles are helped.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210566294 -> 210052706 (-0.24%)
Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00%

Totals from 7091 (1.00% of 707082) affected shaders:
Instrs: 17408115 -> 16894527 (-2.95%)
Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
2025-05-09 21:31:09 +00:00
Iván Briano
99405647a4 anv: vkCmdTraceRays* are not covered by conditional rendering
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The spec says:
  Certain rendering commands can be executed conditionally based on a
  value in buffer memory. These rendering commands are limited to
  drawing commands, dispatching commands, and clearing attachments with
  vkCmdClearAttachments within a conditional rendering block which is
  defined by commands vkCmdBeginConditionalRenderingEXT and
  vkCmdEndConditionalRenderingEXT. Other rendering commands remain
  unaffected by conditional rendering.

It would seem that vkCmdTraceRays* are not covered by that.

Fixes new tests dEQP-VK.conditional_rendering.conditional_ignore.trace_rays*

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34864>
2025-05-08 21:08:06 +00:00
Lionel Landwerlin
5c7c1eceb5 anv/brw: handle pipeline libraries with mesh
I always thought there was a massive issue with pipeline libraries &
mesh shaders. Indeed recent CTS tests have exposed a number of issues.

Some values delivered to the fragment shader are coming from different
places depending on whether the preceding shader is Mesh or not. For
example PrimitiveID is delivered in the per-primitive block in Mesh
pipelines whereas for other pipelines it's coming as a VUE slot (which
is per-vertex). Those are 2 different locations in the payload.

We have to find a layout for fragment shaders that is compatible with
everything. Leaving gaps here and there in the thread payload.

Fixes the following test pattern :

  dEQP-VK.mesh_shader.ext.smoke.fast_lib.shared_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
18bbcf9a63 intel: introduce new VUE layout for separate compiled shader with mesh
Mesh shaders have per vertex block in URB pretty much identical to the
VUE format. Let's just reuse that concept to do all of our layout in
the payload attribute registers. This will ensure that we have
consistent VUE layout between Mesh & non-Mesh pipelines.

We need a new way of laying out the VUE though as we have to
accomodate a HW constraint of maximum (per-primitive + per-vertex) of
32 varying. This means we cannot have 2 locations in the payload for
things like PrimitiveID which can come from either the per-primitive
or the per-vertex block. The new layout places the PrimitiveID at the
end of the per-vertex attributes and shrinks the delivery dynamically
if the mesh stage is active. The shader is compiled with a
MOV_INDIRECT to read the PrimitiveID from the right location in the
attributes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
2d396f6085 intel: prepare VUE layout for more than 2 layouts
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
95efdca00b brw: add documentation pointers to FS attribute layout
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00
Lionel Landwerlin
9d342081e7 brw/nir: add intrinsics to read attribute payload register indirectly
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:35 +00:00