Commit graph

1266 commits

Author SHA1 Message Date
Sagar Ghuge
af2d51eafa anv: enable BTP+BTI RCC keying for some workloads
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We can drop RT flush and PS Scoreboard stall if state cache perf fix
disabled is set to 1. If bit is set RCC uses the sum of Binding Table
Pointer and Binding Table Index as tag in state cache instead of just
Binding Table Index.

On DX12 this is a performance win on all workloads we've tested.

On DX11 there are a bunch of performance of regression. We think this
is due to the fact that to avoid trashing the RCC, we need to remove
all but render targets from the binding table, meaning all shader
resource accesses have to go through the bindless HW heap. This leads
to additional register usage due to the need to push the base offset
of descriptor sets. Improvement in the compiler would likely mitigate
this.

This change introduce a DRIRC key we only turn on for DX12.

Also platforms prior to DG2/LSC have a really small bindless heap that
leads to additional register usage, so this optimization is completely
disable there.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10872
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10873
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14075
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
2026-03-24 18:17:42 +00:00
Lionel Landwerlin
adf18761f8 anv: rework color_aux operation tracking
The current tracking seems to have hidden issues related to MCS
ambiguate that are currently hidden by the fact that we're inserting
pb-stall+RT-flush on BTI changes which we're going to be remove in the
next commits.

The issues appear to be related to a missing pb-stall+RT-flush between
MCS ambiguate and fast-clear causing failures on the following tests
once BTP+BTI RCC caching is enabled :

  dEQP-VK.pipeline.*.multisample.misc.*multi*
  dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_39x41_ms
  dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_48x48_ms

Here we rework the tracking with a new enum to track 3 classes of
operations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
2026-03-24 18:17:42 +00:00
Lionel Landwerlin
dc79d6b13a anv: merge null surface state packing with previous attachments
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
2026-03-24 18:17:42 +00:00
Lionel Landwerlin
d1eed2239d anv: batch rendering initialization commands
Instead of :

  foreach color attachment
    transition layout
    fast clear
    slow clear

do this :

  foreach color attachment
    transition layout
  foreach color attachment
    fast clear
  foreach color attachment
    slow clear

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
2026-03-24 18:17:42 +00:00
Lionel Landwerlin
268c7f2a44 anv: rename variables in CmdBeginRendering
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
2026-03-24 18:17:42 +00:00
Lionel Landwerlin
bbcb7c7838 anv: move depth/stencil BeginRendering handling prior to color
When rendering only has depth/stencil, we need to look at the
depth/stencil view size to generate a dummy null color attachments. So
do that first, so we don't have to iterate color attachments once more
with the final size.

This change also has the nice impact of removing a BTI change flush
due to the sequence moving from :

  - before blorp BTI-flush
  - color fast-clear
  - after blorp BTI-flush
  - depth fast-clear
  - change RT due to shader outputs (BTI-flush)
  - draw call

to :

  - depth fast-clear
  - before blorp BTI-flush
  - color fast-clear
  - combined after blorp BTI-flush (pending)
  - change RT due to shader outputs (BTI-flush, combined with above)
  - draw call

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
2026-03-24 18:17:42 +00:00
Tapani Pälli
735ad7cefb anv: add required barrier for Wa_14026570320
Ensure RT is not processing rays while requesting state cache
invalidate by making sure compute is done first.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13830
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40388>
2026-03-24 09:34:29 +00:00
Tapani Pälli
1cce7c79f0 anv: remove barrier special handling for RT_BTI_CHANGE
This has been dead code since commit 4b2b824112.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40388>
2026-03-24 09:34:29 +00:00
José Roberto de Souza
c0f1689e11 anv: Fix invalid resource barrier signal stage
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Simulator is crashing when receiving GPGPU + Pixel as resource barrier signal
stage, what according to spec is invalid.
So here replacing the pixel stage by color, over synchronizing it a bit but
keeping it functional.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14641
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
2026-03-23 16:30:39 +00:00
José Roberto de Souza
347e82c718 anv: Always have a valid Resource barrier::Wait stage set
Simulator hangs if a resource barrier has wait stage = None, HW seens
to don't care but something bad could be happning internaly.
So here making sure Wait stage is set to TOP when it is None.

Simulator hangs if a resource barrier has wait stage = None.
The HW seems to ignore it, but something bad could be happening internally.
So here I'm making sure the wait stage is set to TOP when it is None.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
2026-03-23 16:30:39 +00:00
Lionel Landwerlin
5d7cf5e762 anv: don't queue pipe control reasons wihout a trace
When there is no trace pointer, there is usually a another tracepoint
being emitted (see STATE_BASE_ADDRESS,
3DSTATE_BINDING_TABLE_POOL_ALLOC emission).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40503>
2026-03-19 18:13:46 +00:00
José Roberto de Souza
2b91888e54 anv: Remove asserts() added in resource_barrier_wait_stage()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In commit 10b5b279a4 ("anv: Fix CmdResetEvent2() with RESOURCE_BARRIER::Wait stage == none")
I haved added assert to catch invalid cases but looks like we have several tests
affected by that problem causing crashes in debug builds.

So here I'm removing those asserts(), will then work on all the fixes and bring
it back.

Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40476>
2026-03-18 05:36:38 +00:00
Sagar Ghuge
37f26e346a anv: Write IR header using shader instead of CS
On integrated platforms, we have issue where L3 cache not being coherent
with CS and it forces us to push data out L3.

To avoid data cache flush, let's write the IR header with BLORP shader.
There is a small shader launch latency but eventually that should not
matter because writing data with CS (MI_STORE) commands is slower than
shader execution when we consider large number of BVH tree getting
built.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39971>
2026-03-18 03:49:17 +00:00
José Roberto de Souza
10b5b279a4 anv: Fix CmdResetEvent2() with RESOURCE_BARRIER::Wait stage == none
CmdResetEvent2() was calling anv_add_pending_pipe_bits() with no dst_stages
stages causing RESOURCE_BARRIER::Wait stage == none, what causes a GPU hang in
NVL-P simulator.

So here setting dst_stages to VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT and adding
an assert in resource_barrier_wait_stage() to catch hw_stage == 0.

This fixes crucible func.event.cmd_buffer.q0 in simulator.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40445>
2026-03-17 16:42:55 +00:00
Tapani Pälli
a9ea5825b6 anv: update btp address after CmdExecuteCommands
We need to update state.btp address with the last executed secondary
command buffer btp address so that optimization will work correctly.

Fixes: 8a5ac96a67 ("anv: predicate BTP emissions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15041
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40361>
2026-03-12 11:17:45 +00:00
Lionel Landwerlin
e20f5a0a7a anv: use companion RCS for hiz ops on compute queue
Fixes new CTS tests.

Similar to a previous change : 5bf3546cc6 ("anv: Use companion cmd
buffer for CCS and MCS image barriers")

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40332>
2026-03-11 21:34:42 +00:00
Nanley Chery
465c186fc5 anv: Prepare for format width changes in blorp_copy()
blorp_copy() will soon gain the ability to increase the format bpb.
Prepare anv by replicating the clear color pixel on gfx12.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
2026-03-11 00:36:18 +00:00
Michael Cheng
6e92be2747 anv: Rename instruction_state_pool to shader_heap
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Shaders are allocated from anv_shader_heap, which is backed by the
util_vma_heap. Rename the VA range field to shader_heap to match current
usage and avoid confusion.

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40131>
2026-02-27 17:36:41 +00:00
Caio Oliveira
df4042371f anv: Set PIPELINE_SELECT systolic mode based on shader usage
For Gfx125 workloads that use systolic mode, this might mean
an extra PIPELINE_SELECT when flipping between a compute shader
that use the mode and another that doesn't use the mode
(or vice-versa).

Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40014>
2026-02-26 19:05:56 +00:00
Lionel Landwerlin
095c470d25 anv: add missing handling for attachment locations in secondaries
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes:
  dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.interaction_with_shader_object
  dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.remap_single_attachment_shader_object

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d2f7b6d5 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40036>
2026-02-26 20:26:58 +02:00
Lionel Landwerlin
1cd9a4e4a1 anv: avoid filling PC reason for timestamp u_trace captures
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:06 +00:00
Lionel Landwerlin
79a56ef448 anv: add a debug printout for dirty descriptors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:04 +00:00
Lionel Landwerlin
413e169f45 anv: remove snprintf for aux op transition
With perfetto that string is processed later leading to
use-after-free.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
2026-02-25 10:44:03 +00:00
Lionel Landwerlin
8a5ac96a67 anv: predicate BTP emissions
The previous commit enable different command buffers to program the
same 3DSTATE_BINDING_TABLE_POOL_ALLOC instruction even though they
allocated different chunks of binding tables.

Now we can just predicate this programming and skip the stalling,
flushing & invalidation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39527>
2026-02-25 00:17:03 +00:00
Lionel Landwerlin
725c2a39d5 anv: enable sharing binding table pool programming
We currently allocate 64KiB chunks of binding table pools for each
command buffers and program the 3DSTATE_BINDING_TABLE_POOL_ALLOC
instruction accordingly.

But 3DSTATE_BINDING_TABLE_POINTERS_* instructions can address 2^20
bytes. So it's possible to have 2 command buffers share the same
programming if they just add some offsets to their
3DSTATE_BINDING_TABLE_POINTERS_* programming and round down
3DSTATE_BINDING_TABLE_POOL_ALLOC addresses to 2^20.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39527>
2026-02-25 00:17:02 +00:00
Kenneth Graunke
4bdef9824a anv, brw: Consolidate ex_bso bits to a static devinfo inline
If we have extended bindless surface offset (ExBSO) support, we want to
use it.  Consolidate the anv_physical_device and brw_compiler bits into
a single static inline that take devinfo.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:47 +00:00
Kenneth Graunke
9531c6b89e brw: Make indirect_ubos_use_sampler a static inline bool taking devinfo
Having the named field allowed us to indicate that our code conditions
are referring to the specific decision about how we handle indirect
UBOs, rather than some other arbitrary hardware change.

Still, there's no need to store this in a singleton struct - we can
easily have a static inline bool that does the devinfo check for us.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
2026-02-16 21:33:42 +00:00
Lionel Landwerlin
e94cb92cb0 anv: use internal surface state on Gfx12.5+ to access descriptor buffers
As a result on Gfx12.5+ we're not holding any binding table entry to
access descriptor buffers.

This should reduce the amount of binding table allocations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10711
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:26 +00:00
Lionel Landwerlin
812b62a315 anv: remove set index for descriptor buffers
We can check the shader's layout_type.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:25 +00:00
Lionel Landwerlin
42b70cf05a anv: add missing constant cache invalidation for descriptor buffers
A descriptor buffer promoted to push constants requires a constant
cache invalidation if it is modified on the device.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
2026-02-12 16:45:21 +00:00
Lionel Landwerlin
888ac904a3 anv: flush render caches on first pipeline select
Given a situation like this :
  - CB_A: begin, renderDepthA, end
  - CB_B: begin, computeA, barrier (depth), computeB, end

The depth cache is not being flushed between renderDepthA & computeB
because :
  - it's not flushed at the end of CB_A (it's not required)
  - when CB_B starts, we're still on GFX pipeline mode but do not
    flush render caches because pipeline mode is unknown
  - when barrier is CB_B is executed, we're already in compute
    pipeline mode and HW cannot flush depth.

The fix is to flush RT/depth cached when switching from unknown
pipeline mode any pipeline mode.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6dae6ef5f ("vulkan: Optimize implicit end_subpass barrier")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14816
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Tested-by: David Gow <david@davidgow.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39824>
2026-02-12 10:10:23 +02:00
Juston Li
f84ed620c2 anv: set missing protected bit for protected depth/stencil surfaces
This bit is set in mocs for other protected attachment types by
anv_image_fill_surface_state() however was ommited for depth/stencil
attachments here.

Without the protected bit set, it causes heavy black artifacting when
attaching a protected depth attachment image to a framebuffer.

Fixes: 794b0496e9 ("anv: enable protected memory")
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39818>
2026-02-11 21:45:17 +00:00
Nanley Chery
e42b2a5d70 anv: Don't partial resolve LOD1+ for non-FCV CCS
We don't allow fast-clears in this case.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:54 +00:00
Nanley Chery
21d187b7f5 anv: Support fast clears on more layers
On Xe2+, support multi-layer and non-zero-layer CCS fast-clears. To do
this in a simple manner, drop the code which splits multi-layer clears
into fast clears and slow clears. The performance CI reports no
regressions nor improvements on BMG.

For MCS on all platforms and for CCS on prior platforms, use a new
heuristic. Instead of only allowing fast clears on the first
slice/layer, do the following:

For 3D images, only fast-clear if all slices are cleared. Enables
fast-clearing every slice of 3D textures in:

   * Terminator Resistance - 480x270x128.
   * Ghostrunner 2 - 320x180x128.

For 2D arrays, match the Xe2+ behavior and allow clearing to any layer.
This is possible because we only allow fast-clearing if the clear color
matches the default value. Enables fast-clearing every layer of 2D array
textures in:

  * Assassin's Creed - 128x128, 6-layers.
  * Blackops 3 - 1024x1024, 6-layers.
  * Borderlands 3 - 128x128, 6-layers.
  * Cyberpunk - 1024x1024, 10-layers.
  * Unigine Superposition - 4K, 2-layers.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11893
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:54 +00:00
Nanley Chery
b8f6ad9060 anv: Use variable default value for some images using CLEAR
A future commit will enable clearing to more than the first layer of 2D
array images. To ensure consistency for the clear color, require the
ANV_FAST_CLEAR_DEFAULT_VALUE for such images if they make use of
ISL_AUX_STATE_CLEAR. Also, use a non-zero default value for some image
formats.

I tested the majority of workloads in the performance CI. This will
cause those which clear to 2D array layers to gain clears on more than
just the first layer. At the moment, we still only support clearing the
first layer, so there should be no change in performance. Affected games
are documented in the code.

Acked-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:53 +00:00
Nanley Chery
390c9e3fda anv: Inline the CCS/MCS predicated resolve functions
Now we can see the MI writes performed before and after the resolves in
transition_color_buffer().

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:52 +00:00
Nanley Chery
4d8c71ab1f anv: Delete conversion of CCS_D partial resolve
Now that hasvk is the driver for supporting HSW and BDW, we no longer
need to convert CCS_D partial resolves to full resolves to avoid an
assert-failure in BLORP.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:51 +00:00
Nanley Chery
b1db1179c2 anv: Set compressed bit separately from fast-clear type
This will make handling fast-clears on multiple layers simpler by saving
us from having to pass more parameters into fast-clear state setting
functions.

It also allows us to set more complex fast-clear state for FCV_CCS_E
without marking the image as compressed.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:50 +00:00
Nanley Chery
c054d4fe2f anv: Support partial resolves on any level/layer
Enables more support for FCV_CCS_E partial resolves if we ever need it.
Also enables support for multiple layers being fast cleared and needing
resolves. Support for that will arrive in several commits.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:50 +00:00
Nanley Chery
0a8ab13b9d anv: Reset fast-clear type in transition_color_buffer()
Moving the code here will simplify the task of supporting fast-clears on
multiple array layers and depth slices.

Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:49 +00:00
Nanley Chery
ce196c9de5 anv: Fix the fast clear type for FCV writes
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.

Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
2026-01-27 18:46:49 +00:00
Francisco Jerez
349b09f8a2 anv/gfx12.5: Apply HIZ-CCS resolve TC flush on full resolves for all gfx12.5.
This appears to be needed to guarantee that a resolved depth surface
has no remaining fast-cleared blocks on DG2 as well as MTL.  After
this series this should no longer be hit in practice since we'll be
doing partial resolves in most cases, but it seems sensible to keep
and correct the workaround for our peace of mind to make sure that
full resolves are truly resolving the main surface.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:17 +00:00
Francisco Jerez
8e1b4b62ce anv/gfx12.5: Take advantage of partial resolves in depth layout transitions.
Issue a partial resolve instead of a full resolve from
transition_depth_buffer() when the final usage requires the
CCS-compressed surface to provide a complete representation of the
image.

This significantly improves performance of applications that
frequently interleave depth rendering and sampling on non-WT surfaces
(e.g. MSAA surfaces).  Nba2K23-trace-dx11-2160p-ultra improves
performance by about 260% with this on MTL, DG2 shows a similar
benefit.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:17 +00:00
Francisco Jerez
157a4cc6d0 anv/gfx12.5: Resolve depth during layout transitions from ISL_AUX_STATE_COMPRESSED_HIER_DEPTH.
For transitions to a state that requires the image to be fully defined
by the primary+CCS surface without necessarily requiring a valid
primary we have to perform a resolve if the initial state was
ISL_AUX_STATE_COMPRESSED_HIER_DEPTH, which isn't fully defined by its
primary+CCS surface.  This full resolve will be replaced with a more
efficient partial resolve in a future commit, but we have to do this
up front in order to avoid breaking bisectability.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:16 +00:00
Francisco Jerez
7f1ed1e411 anv/gfx12.5: Can't fast clear multisampled Z/S with HIZ CCS WT aux usage.
We can end up in this situation in cases where the application uses a
layout that allows both rendering and sampling from a depth surface,
since in such cases we will attempt to render with HIZ CCS WT usage as
a side effect of using ISL_AUX_USAGE_HIZ_CCS_WT for all layouts that
allow the image to be sampled from.

Disabling fast clears for that case isn't expected to cause
performance regression since before this series for HiZ CCS non-WT
images transitioning to such a layout we would have issued a full
resolve and used ISL_AUX_USAGE_NONE, which also doesn't support fast
clears.

Multisample depth images should still get fast clears after this
commit in cases where the rendering and sampling is split into
separate render pasess with a layout transition between them that
transitions the image from a W/O layout into a R/W one -- Such
transitions will be handled with a relatively cheap partial resolve in
a subsequent commit.

v2: Add details of additional findings about these hardware issues in
    comment.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

v3: Pass aspect bit consistent with layout to
    anv_layout_to_aux_usage() instead of defaulting to
    VK_IMAGE_ASPECT_DEPTH_BIT.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:15 +00:00
Francisco Jerez
02030b4b8f anv: Use actual layout in anv_fast_clear_depth_stencil() instead of ANV_IMAGE_LAYOUT_EXPLICIT_AUX.
Currently anv_fast_clear_depth_stencil() doesn't know the correct
layout of the depth and stencil images, instead it uses
ANV_IMAGE_LAYOUT_EXPLICIT_AUX to force the base AUX usage of each
plane, which can be inconsistent with the VkImageLayout currently in
use.  Plumb the correct depth and stencil layouts.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
2026-01-27 08:52:15 +00:00
Tapani Pälli
f66ff97d58 drirc/anv: implement steps to disable RHWO for Wa_14024015672
Disable RHWO by default for singlesample draws and for MSAA
draws if a drirc key is set (avoid perf hit if not needed).

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39404>
2026-01-23 11:10:07 +00:00
Tapani Pälli
fcbe987e10 anv: fix setting emitted_flush_bits
Fixes a crash with:
   dEQP-VK.api.external.semaphore.opaque_fd.signal_export_import_wait_temporary

when driver calls genX(CmdSetEvent2) -> emit_apply_pipe_flushes with
having NULL in emitted_flush_bits.

Fixes: 8834ef8bcd ("anv: use flushing PIPE_CONTROL for event signaling")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39343>
2026-01-16 13:19:06 +00:00
Tapani Pälli
4b2b824112 anv: hand over ANV_PIPE_RT_BTI_CHANGE to pipe control
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There are issues when using resource barrier for this.

Fixes: 24e9afb0b7 ("anv: implement resource barrier emissions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14533
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39132>
2026-01-04 13:35:24 +00:00
Lionel Landwerlin
d99a3d9b58 anv: remove CS-L3 coherency on Xe2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I'll try to write some crucible tests for this.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: be5f5f659f ("anv: consider CS coherent with L3 on Xe2+")
Fixes: 503355c7f8 ("anv: update pipeline barriers for Xe2+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38966>
2025-12-16 21:35:27 +00:00