Commit graph

14747 commits

Author SHA1 Message Date
Paulo Zanoni
b75e0462ec intel: unify parameters for the exec ioctl retries
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Add the common magic numbers in a header file and make the functions
use them.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
6d6b22b734 intel/xe: unify behavior with i915.ko regarding ENOMEM on DRM_IOCTL_XE_EXEC
When the system is under memory pressure (which can happen, for
example, during CI runs), don't immediately give up the exec ioctl
(which, for Vulkan, will result in the device being declared lost).
Instead, retry a little bit just like we do for i915.ko.

This is a trade-off.

One of the reasons to *not* have unified behavior regarding ENOMEM
between i915.ko and xe.ko is the fact that xe.ko uses vm_bind, so if
the user tried to bind more memory than it is able to, we'll just keep
getting ENOMEM as long as we retry the ioctl. We now have a retry
limit, so we'll eventually return the error.

On the other hand, if the problem is other applications consuming all
the memory, having the retry loop may really help avoid unnecessarily
marking the device as lost, since one of our retries may eventually
succeed.

I believe the tradeoff of "we'll now eventually succeed in some cases
where it's possible to succeed, at the expense of retrying for a few
seconds until giving up in cases where we would never be able to
succeed" is an improvement.

If xe.ko ever gives us a way to differentiate between the two
different reasons for ENOMEM, we'll be able to make things much
better. We can also tune our timeouts if needed.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
680daeea63 intel/i915: warn the user about repeated execbuf ENOMEM after ~2s
By this point, the user may have noticed their game is not drawing any
new frames, so perhaps an error message might help.

This would also, of course, help identifying ENOMEM problems in our CI
runs.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
dc1877a0a1 intel/i915: give up the execbuf ioctl after ~16s of ENOMEMs
If nothing has freed memory until that point, return the error, which
may make the upper layers report the device as lost. It could be that
the system is under very very heavy swapping and that waiting a little
more would make it work, but let's try 16s for now.

v2: Bring down the timeout from ~60s to ~16s (José).

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
7b1e9af900 intel/i915: sleep a little bit between retries of the execbuf ioctl
If the ioctl is returning ENOMEM, incessantly retrying does not seem
to be the best way to proceed. After the second retry, sleep 0.1ms,
then more each time, giving the CPU some time to run the other threads
and processes, in the hope that whatever is eating all the memory
might eventually return it.

If the problem is the current thread, then busy looping won't help
either, so here we at least save some power before the user kills the
app.

v2: Adjust the control flow and the sleep time.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
d19a051714 intel/i915: add i915_gem_execbuf_ioctl()
Unify the common code for i915.ko execbuf submission between Iris and
Anv. I plan to add more code to this function in the next patches.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
0c638d08e5 anv: we never set I915_EXEC_FENCE_OUT
Don't check something that will never be there, just assert() it.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
5e67acbedc anv/xe: set the queue as lost instead of the device on execbuf failure
The i915.ko backend calls vk_queue_set_lost(), let's unify behavior.

v2: Adjust patch due to series reordering.

Reviewed-by: Rohan Garg <rohan.garg@intel.com> (v1)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
0176c877d5 anv/i915: rework set_lost handling in anv_gem_execbuffer()
Move the handling to inside anv_gem_execbuffer(), and also rework
things in a way where we can tell from which caller the error is
coming from.

v2: properly return VkResult (José).

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
0932d0c7e0 anv/xe: rework set_lost handling in xe_exec_ioctl()
Move the handling to inside xe_exec_ioctl() and also rework things in
a way where we can tell from which caller the error is coming from.

v2:
 - properly return VkResult (José).
 - adjust the patch due to reordering the series.

Reviewed-by: Rohan Garg <rohan.garg@intel.com> (v1)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
e694c7e84e anv/xe: extract xe_exec_ioctl()
For now, this just simplifies checking for devinfo->no_hw, but we're
planning to add more code that everybody calling the XE_EXEC ioctl
will run.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Paulo Zanoni
bca75a8484 anv/i915: bring info->no_hw handling to anv_gem_execbuffer()
Every single caller does the same check.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
Calder Young
2bfc62e825 isl: Fix noncoherent framebuffer fetch when base_level != 0
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37736>
2025-10-07 13:35:40 +00:00
Lionel Landwerlin
9cefd2ddf8 brw: avoid looking at variables to get image formats
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36773>
2025-10-07 08:54:26 +00:00
Lionel Landwerlin
63d3c6379e anv: run image/intrinsic update pass
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36773>
2025-10-07 08:54:26 +00:00
Caio Oliveira
7b75bf0759 intel/executor: Expose extra command line arguments to script
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37647>
2025-10-06 23:39:37 +00:00
Juan A. Suarez Romero
d775f3b608 ci: uprev VKCTS to 1.4.3.3
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37620>
2025-10-06 21:53:39 +00:00
Ian Romanick
911f033058 elk: Set lower_txd_data to devinfo
Otherwise data will be NULL, and there will be an instant segfault.

Closes: #14035
Fixes: a49cf90e14 ("elk: use the new lower_txd_cb")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37698>
2025-10-06 19:24:34 +00:00
Nanley Chery
ef4f4d3f84 intel/isl: Update the aux-state of zeroed HiZ
By dumping the contents of a HiZ buffer before and after fast-clearing,
I've observed that a zeroed HiZ block corresponds to the CLEAR state
until gfx12. The fast-clearing application was piglit's bin/hiz. I ran
this test on a couple bare metal platforms (ICL and BDW) and many
simulated ones (SKL, TGL, DG2, and LNL).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36383>
2025-10-06 13:50:39 +00:00
Nanley Chery
a13aab1859 intel/isl: Update the initial HiZ state for Xe2+
Avoids ambiguating in iris and anv.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36383>
2025-10-06 13:50:39 +00:00
Nanley Chery
b709d7dd39 intel: Delete the has_illegal_ccs_values bool
This was only used in one location.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36383>
2025-10-06 13:50:39 +00:00
Nanley Chery
d41bff3836 anv: Query ISL for the aux-state of undefined layouts
For CCS_E on gfx12+, this will cause us to perform full resolves when
transitioning from undefined to a layout which does not support
compression. We don't currently perform such transitions because
compression is always enabled.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36383>
2025-10-06 13:50:38 +00:00
Nanley Chery
7d284fe399 intel/isl: Define initial state of non-zeroed CCS on gfx9-11
isl_aux_get_initial_state() will soon be used for non-zeroed CCS on
gfx9+. Update the function to avoid hitting an unreachable().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36383>
2025-10-06 13:50:37 +00:00
Lionel Landwerlin
69771e4bfe brw: fix render target indexing in FS output reads
I forgot that the base indice is actually a more complex value that
encodes the render target index and other things.

Also fix the 1d-layered accesses by checking the size of the
framebuffer.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14008
Fixes: d4ab2087cf ("brw: lower non coherent FS load_output in NIR")
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37653>
2025-10-06 13:24:16 +00:00
Collabora's Gfx CI Team
7ef5653b11 Uprev ANGLE to 538129c6b3c17dc864101c7a4af4b74b00706f82
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
1df3b59f87...538129c6b3

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37356>
2025-10-04 07:14:50 +00:00
Lionel Landwerlin
a49cf90e14 elk: use the new lower_txd_cb
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37692>
2025-10-03 20:19:03 +00:00
Lionel Landwerlin
a14fee571b elk: remove txd bindless sampler lowering
The bindless sampler heap was introduced in Gfx11 which ELK doesn't
support.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37692>
2025-10-03 20:19:03 +00:00
Lionel Landwerlin
bc8251673d brw: use the new lower_txd_cb
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37692>
2025-10-03 20:19:03 +00:00
Iván Briano
ac182d6045 brw/mesh: drop brw_tue_map::per_task_data_start_dw
It's always set to a fixed value and not used in many places. Use the
value directly where it's needed.

Suggested-by: Lucas Fryzek <lfryzek@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37648>
2025-10-03 17:36:43 +00:00
Iván Briano
e624174134 anv: handle compiling of mesh shader separately from task shader
With EXT_shader_object, it became possible to compile shaders
independently and then use them together later, so we cannot rely on the
lack of task shader data to decide that no task shader will be used. The
flag VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT exists for that purpose,
but it doesn't really make any difference for us. Always assume that if
the mesh shader is reading the task payload, it's going to be used with
one, as otherwise the application is doing it wrong.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13983
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37648>
2025-10-03 17:36:43 +00:00
Kenneth Graunke
29d30c6f3d brw: Only skip SIMD widths based on pressure if an smaller one compiled
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Sometimes the compute shader workgroup size requires a larger SIMD width
than the minimum in order to fit in the available threads.  In that case
we'll skip the SIMD8 shader, and need to try SIMD16 regardless of how
the register pressure estimate looks.

Fixes: 3af4e63061 ("brw: Skip compilation of larger SIMDs when pressure is too high")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37649>
2025-10-02 16:17:26 -07:00
Alyssa Rosenzweig
c2ae207e80 brw,anv: use XML-based stats
I didn't bother switching either iris or elk/hasvk but one could.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37517>
2025-10-02 20:22:00 +00:00
José Roberto de Souza
c008d21947 intel/brw: Move brw_s0() to brw_reg.h
It remove a duplication and also it will be used in a future patch
from other file.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37670>
2025-10-02 10:46:10 -07:00
Jianxun Zhang
42c3585ea1 isl: Reuse Xe2 modifers on newer platforms
We will reuse LNL and BMG modifiers on newer platforms until
new modifiers show up.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35776>
2025-10-01 14:51:53 -07:00
Caio Oliveira
d16d7ac470 intel/executor: Destroy syncobjs after using them
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37645>
2025-09-30 20:17:01 +00:00
Kenneth Graunke
937fa18bb9 iris/ci: Update trace checksums
The difference here was 1-2 pixels.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:03 +00:00
Kenneth Graunke
3af4e63061 brw: Skip compilation of larger SIMDs when pressure is too high
This allows us to skip the entire backend compilation process for
large SIMD widths when register pressure is high enough that we'd
likely decide to prefer a smaller one in the end anyway.  The hope
is to make the same decisions as before, but with less CPU overhead.

We are making mostly the same decisions as before:

   | API / Platform | Total Shaders | Changed | % Identical
   --------------------------------------------------
   | VK / Arc A770 |       905,525 |   1,157 |   99.872% |
   | VK / Arc B580 |       788,127 |      53 |   99.993% |
   | VK / Panther  |       786,333 |      13 |   99.998% |
   | GL / Arc A770 |       308,618 |     269 |   99.913% |
   | GL / Arc B580 |       264,066 |      13 |   99.995% |
   | GL / Panther  |       273,212 |       0 |  100.000% |

Improves compile times on my i7-12700K:

   | Game                      | Arc B580 | Arc A770 |
   ---------------------------------------------------
   | Assassins Creed: Odyssey  |  -13.47% |  -10.98% |
   | Borderlands 3 (DX12)      |  -10.05% |  -11.31% |
   | Dark Souls 3              |  -21.06% |  -21.08% |
   | Oblivion Remastered       |  -11.10% |   -9.82% |
   | Phasmophobia              |  -32.73% |  -31.00% |
   | Red Dead Redemption 2     |  -20.10% |  -14.38% |
   | Total War: Warhammer III  |  -10.11% |  -14.44% |
   | Wolfenstein Youngblood    |  -15.91% |  -13.47% |
   | Shadow of the Tomb Raider |  -30.23% |  -25.86% |

It seems to have nearly no effect on compile times on Xe3 unfortunately,
as only 1,014 shaders in fossil-db even fail SIMD32 compilation in the
first place, and we want to let most of the "might succeed" cases
through to the backend for throughput analysis.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:03 +00:00
Kenneth Graunke
248050b6d0 brw: Add a quick NIR-based register pressure estimate pass
This tries to calculate an underestimate (lower bound) for the register
pressure at various SIMD widths, by counting live values in the NIR
shader.  This fundamentally won't be accurate, but it can give us an
idea of whether it's even worth trying a certain SIMD-width compile.

Doing this at the NIR level means we:
- Can use SSA structure rather than fuzzy liveness intervals
- Can avoid the backend scheduler aggressively trying to hide latency,
  presenting an overinflated view of the register pressure
- Have divergence information on-hand, making it easier to "scale up"
- Can skip cloning and optimizing NIR for compute shader SIMD widths

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:03 +00:00
Kenneth Graunke
5ebd766156 brw: Do most of NIR postprocessing before cloning for SIMD variants
We were doing a lot of NIR work repeatedly for each SIMD variant of
compute and mesh shaders.  Instead, do it once before cloning, and
just do one final optimization loop and out-of-SSA for each.

fossil-db results on Arc B580:

   Totals:
   Instrs: 233771096 -> 233794024 (+0.01%); split: -0.01%, +0.02%
   Subgroup size: 15922768 -> 15922736 (-0.00%); split: +0.00%, -0.00%
   Send messages: 12095619 -> 12098234 (+0.02%); split: -0.00%, +0.02%
   Loop count: 137562 -> 137523 (-0.03%)
   Cycle count: 32600323744 -> 32667411252 (+0.21%); split: -0.06%, +0.27%
   Spill count: 540908 -> 542027 (+0.21%); split: -0.07%, +0.28%
   Fill count: 700938 -> 698983 (-0.28%); split: -0.73%, +0.45%
   Scratch Memory Size: 37266432 -> 37304320 (+0.10%); split: -0.10%, +0.20%
   Max live registers: 72691728 -> 72692987 (+0.00%); split: -0.00%, +0.00%
   Non SSA regs after NIR: 67690309 -> 67688352 (-0.00%); split: -0.01%, +0.00%

   Totals from 3576 (0.45% of 789301) affected shaders:
   Instrs: 6932956 -> 6955884 (+0.33%); split: -0.41%, +0.74%
   Subgroup size: 88816 -> 88784 (-0.04%); split: +0.09%, -0.13%
   Send messages: 329168 -> 331783 (+0.79%); split: -0.02%, +0.81%
   Loop count: 8753 -> 8714 (-0.45%)
   Cycle count: 15153678820 -> 15220766328 (+0.44%); split: -0.14%, +0.58%
   Spill count: 213751 -> 214870 (+0.52%); split: -0.18%, +0.71%
   Fill count: 282616 -> 280661 (-0.69%); split: -1.82%, +1.13%
   Scratch Memory Size: 13056000 -> 13093888 (+0.29%); split: -0.27%, +0.56%
   Max live registers: 834757 -> 836016 (+0.15%); split: -0.11%, +0.26%
   Non SSA regs after NIR: 995033 -> 993076 (-0.20%); split: -0.48%, +0.28%

Looking at a few of the shaders with substantial instruction count
increases, it appears that it is largely due to more loops being
unrolled, which is probably actually a good thing.

The compile time impact of this patch appears to be negligable.
However, doing postprocessing before SIMD cloning allows us to
examine the postprocessed SSA-form NIR for improvements in an
upcoming patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:02 +00:00
Kenneth Graunke
0712c220ab brw: Split brw_postprocess_nir() into two pieces
brw_postprocess_nir contains a lot of stuff these days.  The first part
does a bunch of lowering and cleanup optimizations in SSA form.  The
second part does some post-optimization lowering and the out-of-SSA
conversion.

We may want to do additional work before the post-optimization/post-SSA
phase.  Splitting this allows us to insert such tasks in the "middle".

For convenience, brw_postprocess_nir() becomes a wrapper which invokes
both parts, so callers can continue working as they did until they have
a reason to do otherwise.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:02 +00:00
Kenneth Graunke
71b513a1e9 brw: Lower certain subgroup size modes in brw_preprocess_nir
This allows us to lower known subgroup size cases earlier, giving us
some earlier optimization opportunities.  We would need to know the
actual SIMD width to handle certain cases, but we can just pass 0 here,
which will lead to get_subgroup_size returning 0 - the same as leaving
this unset.  We can come back to that later during the per-SIMD-width
postprocessing.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:02 +00:00
Kenneth Graunke
3e493e03cc brw: Move "SSA form" printing to after divergence analysis is run
We were printing the SSA form, then immediately running divergence
analysis.  This patch flips those, so we can see con/div in INTEL_DEBUG
output for SSA form, which is really useful.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:02 +00:00
Kenneth Graunke
1b0808adf3 intel/nir: Make ffma peephole optimization preserve fp_fast_math flags
float_controls2 may have marked these as needing to preserve NaN or
other values.  If so, our newly contracted ffma needs to as well.

Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls2.*.input_args.mat_det_testedWithout_NotNan*
when nir_opt_algebraic is run after this pass.

Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
2025-09-30 19:44:02 +00:00
Ian Romanick
23bd356b42 brw/nir: nir_intrinsic_load_reloc_const_intel may not be scalar [v3]
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
If the (NIR) destination is a register (i.e., not an SSA value), the
destination of the BRW instruction will not be is_scalar. This occurs in
some shaders in Final Fantasy XVI (and
finalfantasytype0_1.rdc.2826e29da3722a83.1.foz).

If the destination is not is_scalar, revert most of this code to the
state previous to f3593df877. This means

- Allocate a SIMD1 register and UNDEF it.
- Emit a SIMD1 MOV_RELOC_IMM to that register.
- Emit an additional MOV to expand the SIMD1 result.

Closes: #12520
Fixes: f3593df877 ("brw/nir: Treat load_reloc_const_intel as convergent")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37384>
2025-09-29 16:48:07 +00:00
Jordan Justen
be61c12f3e anv: Use image view base-layer in can_fast_clear_color_att()
We currently only support fast clearing the first layer of an image.
Attachments use VkImageView which can specify a base-layer of the view
for an image attachment.

Fixes: 44351d67f8 ("anv: Change params of anv_can_fast_clear_color_view")
Ref: https://projects.blender.org/blender/blender/issues/141181
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37562>
2025-09-26 19:15:22 +00:00
José Roberto de Souza
141a225ca1 intel/brw: Use ASR over SHR for SHADER_OPCODE_ISUB_SAT
src[1]/src0 is signed and Xe2+ SHR don't support operations over signed
data types so lets switch this over ASR that supports signed data
types.

Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37557>
2025-09-26 16:44:24 +00:00
José Roberto de Souza
c45f442d5c intel/decode: Add support to new version of Xe KMD devcoredump with canonical addresses
Customers suggested that Xe KMD should change all possible interfaces
visible to users to canonical address, with that we need some changes
to keep the decode of devcoredump working.

A old version of the tool will not be able to decode secondary batch
buffers when parsing a new version of the file but the new version of
this tool will be able to parse both versions of devcoredump file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37570>
2025-09-26 16:15:53 +00:00
Hyunjun Ko
b7129a2085 anv/video: fix to set slice block size correctly for h265 decoding.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes dEQP-VK.video.encode.h265.resolution_change_dpb_layered_src_video_layout

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37412>
2025-09-26 12:27:59 +00:00
Simon McVittie
9d36bf891b vulkan: Compute path to write into JSON manifests once, use it everywhere
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This reduces duplication: we only need to distinguish between Windows
and Unix in one place.

The previous code was inconsistent about using either the `platforms`
option, or the `host_machine`. Following the logic described in
commit 94379377 "lavapipe: build "Windows" check should use the host machine, not the `platforms` option.",
I've assumed that checking the host machine is the more-correct version
and used that.

Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
2025-09-26 10:47:31 +00:00
Simon McVittie
be8cac52d3 vulkan: Consistently form driver library names as prefix + name + suffix
This consistently uses `NAME.dll` on Windows, `libNAME.dylib` on Darwin
derivatives such as macOS, and `libNAME.so` on Linux, *BSD and so on.
It's also consistent about using the local variable name `icd_file_name`
for this name in every Vulkan driver, which was already the case in many
but not all drivers.

Some of these drivers probably don't make sense (or don't work) on
Windows and/or macOS, but if this is kept consistent for all drivers,
it should avoid the need for driver-specific commits like
commit 611e9f29e "lavapipe: fix icd generation for windows",
commit 951f3287 "lavapipe: set empty dll prefix",
commit 13e7a39f "lavapipe: fixes for macOS support",
commit 7008e655 "radv: Update JSON generator if Windows" and so on,
each time a driver is found to be relevant on more platforms than
previously believed.

Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
2025-09-26 10:47:31 +00:00