Commit graph

218143 commits

Author SHA1 Message Date
Kenneth Graunke
24c66d3871 brw: Vectorize URB intrinsics using nir_opt_load_store_vectorize
This helps cut down URB messages on tessellation and mesh shaders
significantly.  fossil-db results on Battlemage:

   Instrs: 505172392 -> 505207187 (+0.01%); split: -0.00%, +0.01%
   Send messages: 23678197 -> 23656126 (-0.09%); split: -0.09%, +0.00%
   Cycle count: 63150470088 -> 63147482640 (-0.00%); split: -0.01%, +0.00%
   Spill count: 576554 -> 576616 (+0.01%)
   Fill count: 545304 -> 545413 (+0.02%)
   Max live registers: 141099192 -> 141150675 (+0.04%); split: -0.00%, +0.04%
   Max dispatch width: 39856192 -> 39856208 (+0.00%)

   Totals from 4231 (0.27% of 1583648) affected shaders:
   Instrs: 1620161 -> 1654956 (+2.15%); split: -0.25%, +2.40%
   Send messages: 128652 -> 106581 (-17.16%); split: -17.18%, +0.03%
   Cycle count: 24650700 -> 21663252 (-12.12%); split: -12.82%, +0.70%
   Spill count: 378 -> 440 (+16.40%)
   Fill count: 1308 -> 1417 (+8.33%)
   Max live registers: 364676 -> 416159 (+14.12%); split: -0.24%, +14.36%
   Max dispatch width: 67952 -> 67968 (+0.02%)

There are several reasons we didn't go with nir_opt_vectorize_io:

1. nir_opt_vectorize_io appears to work on the slot location level.
   We want to be able to vectorize based on the URB offsets, especially
   for cases like point size, layer, and viewport which have different
   VARYING_SLOT_* values but live in the same vec4 in a URB entry.

2. We want vec8 stores, and nir_opt_vectorize_io only seems to vectorize
   within a single 32-bit vec4.  It does handle 8 components, but that's
   only for packing 16-bit values into a 32-bit vec4.

Improves performance of Sascha Willems' tessellation demo by around 4%
on Meteorlake.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
aafe8967fd brw: Avoid using URB global offset with per-slot offsets on <= Icelake
Both the URB Global Offset and Per-Slot Offsets are specified to be
unsigned numbers.  The URB Global Offset is only 11 bits, and so is
limited to be between [0, 2047].  While the per-slot offsets are
given as U32 values, it would appear that adding the two offsets
does not handle 32-bit overflow/unsigned wrap correctly.

This pops up in Piglit's TCS variable-indexing tests, which ends up
performing loads from offset (x - 16) and a base of 18, and at an offset
(x) with a base of 2.  These should be equivalent, but when x <= 15, the
per-slot offset calculated in the shader is negative (0xfffffff[0-f])
and adding the base of 18 is not wrapping around correctly to [2, 17].

To work around this, avoid using the global offset when the per-slot
offset is present, and just add the two in the shader where unsigned
wrap works correctly.

Tigerlake and later don't seem to have this issue.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
07ac0e3463 brw: Skip vec8 store_urb_vec4_intel noop writemasks as well
We were checking for 0xf which is fine for vec4, but vec8 gets 0xff.
Either way, nothing is writemasked, so we can skip sending the mask.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
dbb24ff56b brw: Assert that urb_vec4_intel stores only have 4/8 components
vec1-3, 5-7, and 9+ are not supported.  Only vec4 and vec8.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
b844082017 nir: Add a round_up_components callback to load/store vectorization
By default, load/store vectorization uses nir_round_up_components()
to round up loads and possibly writemasked stores to the next valid
NIR vector width.  However, some backends may not support load/stores
at all sizes.  For example, older Intel supports only power-of-two
vector widths.  Newer Intel also supports vec2 and vec3, but not
vec5/6/7.  By providing a callback, backends can request promotion
to their next supported memory load/store vector width.

The existing "should we vectorize?" callback should continue to return
false for unsupported vector widths (i.e. beyond the maximum supported).
With this new callback, they do not need to say "no" to vectorization
that would normally produce an unsupported count (e.g. vec5/6/7) but
instead request that the component count be rounded up appropriately.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
e23a83b786 nir: Add load/store vectorizer option for rounding up masked stores
This adds a new option, round_up_store_components, which rounds up the
number of components for stores that support writemasking to the next
valid vector size.  For example, vec4+vec2 stores would round up from
6 components (which wouldn't be supported) to a full supportable vec8
store, relying on writemasking to ensure the correct pieces are written.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
37f3c59b2c nir: Teach opt_load_store_vectorize how to handle Intel URB intrinsics
URB intrinsics are simply memory load/stores to a special memory region,
so it's pretty reasonable to handle these in the memory vectorizer.  We
treat emit_vertex_* intrinsics as a barrier for shader outputs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Kenneth Graunke
c2f03ba12f nir: Add memory modes to URB load intrinsics
This makes it easier for NIR passes to distinguish between inputs and
outputs without having to reason about which URB handle source was
passed to the intrinsic.  It probably also makes it a bit easier for
humans to read the NIR too.

v2: Don't add memory mode to store intrinsics.  It's always output.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
2026-01-27 16:08:36 +00:00
Tapani Pälli
bb84773c81 blorp: fix asserts hit with msaa blorp blits on xe3
Tested on PTL, fixes various copy_and_blit tests that utilize compute
after ab9d3528dc that exposed this to them.

Fixes: ab9d3528dc ("anv: fix queue check in anv_blorp_execute_on_companion on xe3")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39548>
2026-01-27 15:28:55 +00:00
Caleb Callaway
1038ab7b57 anv/driconf: Disable shader LTO for MHW
MHW has a long-running shader compile step on first
launch that is significantly sped up by disabling
Link Time Optimization in the ANV driver.

Shader compile times with LTO disabled are 50% of
baseline measurements and the benchmark shows no
stastically significant change to performance
(tested on LNL-M OOB)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39544>
2026-01-27 14:57:21 +00:00
Caleb Callaway
a91a636faf driconf: LTO disable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39544>
2026-01-27 14:57:20 +00:00
Samuel Pitoiset
5709644f2c radv: optimize barriers when clearing HiZ on GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
HiZ must only be cleared when the full HiZ workaround is enabled. This
means that the previous slow clear draw would disable HiZ because it
hits the conditions (ie. depth/stencil enable and depth writes enabled).

So, the draw and the dispatch can run in parallel by moving the barrier
earlier.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39433>
2026-01-27 14:37:01 +00:00
Samuel Pitoiset
96829d6c5e radv/meta: return the flush bits from radv_clear_hiz()
Similar to other functions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39433>
2026-01-27 14:37:01 +00:00
Samuel Pitoiset
5911ba5ff5 radv/meta: fix 3D color resolves with compute when base slice isn't zero
Needs to consider the base offset, otherwise it's resolving to the
first 3D slice.

Fixes very recent VKCTS coverage dEQP-VK.pipeline.*.multisample.m10_resolve.*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39393>
2026-01-27 14:14:19 +00:00
Caterina Shablia
a3ec5ece8b panvk: fix sparse image non-opaque binds
I have no idea how this passed CTS.

Fixes: 5326c451 ("panvk/csf: implement sparse image non-opaque binds")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39546>
2026-01-27 12:35:09 +00:00
Hans-Kristian Arntzen
27c61f3c0c docs: Add VK_EXT_present_timing to new features.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:53 +00:00
Hans-Kristian Arntzen
a9e261fa14 wsi/common: Allow timestampValidBits < 64 for present timing.
In this case, do the wrapping logic on our end and normalize right away
to host time domain.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:53 +00:00
Hans-Kristian Arntzen
5e2814c8a4 wsi/display: Implement present timing on KHR_display.
Deal with VRR vs FRR as well.

Loosely based on earlier work by Keith Packard and Emma Anholt
(MR 38472 for reference).

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:53 +00:00
Emma Anholt
96bde78fc7 vulkan/wsi: Delete ancient libdrm support for the page flip handler.
Looks like the "new" page flip handler was in 2.4.78 in 2017.  Mesa
requires at least 2.4.109, so we can retire this.

Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:53 +00:00
Emma Anholt
e61975c112 wsi/display: Delete dead vblank_handler path.
We use sequence_handler for our vblank events.

Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:52 +00:00
Emma Anholt
bdaf2b0b2c vulkan/wsi: Add some comments about how the vblank/flip sequencing happens.
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:52 +00:00
Hans-Kristian Arntzen
612d8dde81 wsi/wayland: Fix some locking quirks around present ID update.
Don't drop the lock only to retake it right after.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
bf9cd1546f vulkan/wsi: Implement QUEUE_OPERATIONS_END present timing query.
This is mostly provided for convenience, but it's not implementable by
applications when we're using blit queues for PRIME, so it's quite useful
to have.

This is reworked from previous GOOGLE_display_timing
MRs by Keith Packard and Emma Anholt.
See MR 38472 for reference.

Rather than exposing PRESENT_STAGE_LOCAL, we expose all timestamps in
one unified domain to simplify the implementation.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
cb22d413ba panvk: Enable EXT_present_timing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Eric Engestrom
aa2f474f3c hk: enable VK_EXT_present_timing
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
9ec097c837 nvk: Enable EXT_present_timing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
22bd72aa58 anv: Enable VK_EXT_present_timing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
846390f259 turnip: Enable EXT_present_timing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
42f021fc29 radv: Enable EXT_present_timing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
3fdc14b089 wsi/wayland: Implement EXT_present_timing on Wayland.
Only weakness right now is that we cannot implement VRR vs FRR query reliably.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:51 +00:00
Hans-Kristian Arntzen
9f32fc6d3c vulkan/wsi: Add no-op present timing support to most backends.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:50 +00:00
Hans-Kristian Arntzen
c18b14aea2 anv: Add PRESENT_STAGE_LOCAL_EXT path for calibration.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:50 +00:00
Hans-Kristian Arntzen
c7b70b2e2e vulkan/runtime: Expose PRESENT_STAGE_LOCAL as calibrateable domain.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:50 +00:00
Hans-Kristian Arntzen
47d69664d8 vulkan/wsi: Add common infrastructure for EXT_present_timing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38770>
2026-01-27 11:09:50 +00:00
Aitor Camacho
f08a542756 kk: Remove primitive type from pipeline and rely on dynamic one
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39543>
2026-01-27 10:52:09 +00:00
Samuel Pitoiset
14d3fb5f1b radv: add a workaround for a synchronization bug in Strange Brigade Vulkan
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This game has broken synchronization reported by VVL and it indeed
doesn't wait for idle right before present. Workaround this by
injecting a full barrier (easier than rewriting the dep struct).

This only applies to the Vulkan backend.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14705
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39480>
2026-01-27 09:18:25 +00:00
Francisco Jerez
ffb4d3a032 iris: Rework iris_sample_with_depth_aux() into helper that returns aux usage.
Since this does most of the work to determine the right aux usage for
a depth texture, turn it into a helper that returns that aux usage in
order to avoid duplication of logic between it and its callers.

Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:18 +00:00
Francisco Jerez
c0cf14f0e2 intel/isl: Add unit tests for ISL_AUX_STATE_COMPRESSED_HIER_DEPTH.
v2: Add additional AUX state transition test-cases for HIZ_CCS (Nanley).

v3: Assume partial resolve is equivalent to full resolve on legacy HiZ
    surfaces during isl_aux_state_transition_aux_op() instead of
    asserting (Nanley).

v4: Move some tests into different group, add more MCS tests (Nanley).

Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:18 +00:00
Francisco Jerez
35512cebb1 iris/gfx12.5: Apply HIZ-CCS resolve DC flush after full resolves for all gfx12.5.
This appears to be needed to guarantee that a resolved depth surface
has no remaining fast-cleared blocks on DG2 as well as MTL.  After
this series this should no longer be hit in practice since we'll be
doing partial resolves in most cases, but it seems sensible to keep
and correct the workaround for our peace of mind to make sure that
full resolves are truly resolving the main surface.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:18 +00:00
Francisco Jerez
349b09f8a2 anv/gfx12.5: Apply HIZ-CCS resolve TC flush on full resolves for all gfx12.5.
This appears to be needed to guarantee that a resolved depth surface
has no remaining fast-cleared blocks on DG2 as well as MTL.  After
this series this should no longer be hit in practice since we'll be
doing partial resolves in most cases, but it seems sensible to keep
and correct the workaround for our peace of mind to make sure that
full resolves are truly resolving the main surface.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:17 +00:00
Francisco Jerez
8e1b4b62ce anv/gfx12.5: Take advantage of partial resolves in depth layout transitions.
Issue a partial resolve instead of a full resolve from
transition_depth_buffer() when the final usage requires the
CCS-compressed surface to provide a complete representation of the
image.

This significantly improves performance of applications that
frequently interleave depth rendering and sampling on non-WT surfaces
(e.g. MSAA surfaces).  Nba2K23-trace-dx11-2160p-ultra improves
performance by about 260% with this on MTL, DG2 shows a similar
benefit.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:17 +00:00
Francisco Jerez
ef95d5243f intel/isl: Teach ISL about HIZ CCS partial resolves.
This updates the isl_aux_state transition helpers to consider partial
resolves for HiZ-CCS surfaces, and as a side effect of the update to
isl_aux_prepare_access() partial resolves should be implicitly enabled
in iris now for platforms that support it.

v2: HiZ partial resolves aren't enough to remove cleared blocks unlike
    color partial resolves (Nanley).

v3: Treat ISL_AUX_STATE_CLEAR similar to
    ISL_AUX_STATE_COMPRESSED_HIER_DEPTH so we can continue using it
    after depth buffer fast clears.  Drop flagging partial_resolve ==
    true for HiZ usages so we don't do the wrong thing while preparing
    access of a surface in ISL_AUX_STATE_CLEAR state.

v4: Assume partial resolve is equivalent to full resolve on legacy HiZ
    surfaces during isl_aux_state_transition_aux_op() instead of
    asserting (Nanley).

Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:17 +00:00
Francisco Jerez
cc66f5ff1d intel/blorp: Add support for partial resolves of HiZ-CCS surfaces.
v2: Define additional enum BLORP_OP_HIZ_PARTIAL_RESOLVE to track
    partial resolves (Nanley).
v3: Add comment regarding fall back to full resolve on Gfx12.0 (Nanley).

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:17 +00:00
Francisco Jerez
79ab5db71b intel/measure: Define snapshot type for HiZ partial resolves.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:16 +00:00
Francisco Jerez
f9ce1b9c40 anv/gfx12.5+: Keep HIZ_CCS aux usage while sampling from depth surfaces.
As long as the surface is in a state with valid AUX state with
identity contents of the HiZ surface (E.g. in
ISL_AUX_STATE_COMPRESSED_CLEAR, ISL_AUX_STATE_COMPRESSED_NO_CLEAR,
ISL_AUX_STATE_RESOLVED or ISL_AUX_STATE_PASS_THROUGH states) we can
keep compression enabled, which works around hardware bugs on MTL and
DG2, and will be helpful to switch to partial resolves in a future
commit.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:16 +00:00
Francisco Jerez
baf39d4322 anv/gfx12.5: Infer ISL_AUX_STATE_COMPRESSED_HIER_DEPTH from anv_layout_to_aux_state().
Update anv_layout_to_aux_state() to return the
ISL_AUX_STATE_COMPRESSED_HIER_DEPTH state in cases where we may be
rendering into a HiZ surface in non-WT aux mode, instead of
ISL_AUX_STATE_COMPRESSED_CLEAR.

v2: No need to handle ISL_AUX_STATE_COMPRESSED_HIER_DEPTH in
    anv_layout_to_fast_clear_type() since it should never be reached
    (Nanley).

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:16 +00:00
Francisco Jerez
157a4cc6d0 anv/gfx12.5: Resolve depth during layout transitions from ISL_AUX_STATE_COMPRESSED_HIER_DEPTH.
For transitions to a state that requires the image to be fully defined
by the primary+CCS surface without necessarily requiring a valid
primary we have to perform a resolve if the initial state was
ISL_AUX_STATE_COMPRESSED_HIER_DEPTH, which isn't fully defined by its
primary+CCS surface.  This full resolve will be replaced with a more
efficient partial resolve in a future commit, but we have to do this
up front in order to avoid breaking bisectability.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:16 +00:00
Francisco Jerez
7f1ed1e411 anv/gfx12.5: Can't fast clear multisampled Z/S with HIZ CCS WT aux usage.
We can end up in this situation in cases where the application uses a
layout that allows both rendering and sampling from a depth surface,
since in such cases we will attempt to render with HIZ CCS WT usage as
a side effect of using ISL_AUX_USAGE_HIZ_CCS_WT for all layouts that
allow the image to be sampled from.

Disabling fast clears for that case isn't expected to cause
performance regression since before this series for HiZ CCS non-WT
images transitioning to such a layout we would have issued a full
resolve and used ISL_AUX_USAGE_NONE, which also doesn't support fast
clears.

Multisample depth images should still get fast clears after this
commit in cases where the rendering and sampling is split into
separate render pasess with a layout transition between them that
transitions the image from a W/O layout into a R/W one -- Such
transitions will be handled with a relatively cheap partial resolve in
a subsequent commit.

v2: Add details of additional findings about these hardware issues in
    comment.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

v3: Pass aspect bit consistent with layout to
    anv_layout_to_aux_usage() instead of defaulting to
    VK_IMAGE_ASPECT_DEPTH_BIT.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:15 +00:00
Francisco Jerez
02030b4b8f anv: Use actual layout in anv_fast_clear_depth_stencil() instead of ANV_IMAGE_LAYOUT_EXPLICIT_AUX.
Currently anv_fast_clear_depth_stencil() doesn't know the correct
layout of the depth and stencil images, instead it uses
ANV_IMAGE_LAYOUT_EXPLICIT_AUX to force the base AUX usage of each
plane, which can be inconsistent with the VkImageLayout currently in
use.  Plumb the correct depth and stencil layouts.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:15 +00:00
Francisco Jerez
d283e44634 anv/gfx12.5: Allocate indirect color state for depth surfaces.
The clear color state has to be allocated since we will be sampling
from non-WT HiZ CCS depth surfaces without disabling compression.

v2: Use isl_aux_usage_has_ccs() instead of open coding (Nanley).
v3: Use stricter condition on Gfx12.0 to avoid allocating buffer
    unnecessarily (Nanley).

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:14 +00:00