We can drop RT flush and PS Scoreboard stall if state cache perf fix
disabled is set to 1. If bit is set RCC uses the sum of Binding Table
Pointer and Binding Table Index as tag in state cache instead of just
Binding Table Index.
On DX12 this is a performance win on all workloads we've tested.
On DX11 there are a bunch of performance of regression. We think this
is due to the fact that to avoid trashing the RCC, we need to remove
all but render targets from the binding table, meaning all shader
resource accesses have to go through the bindless HW heap. This leads
to additional register usage due to the need to push the base offset
of descriptor sets. Improvement in the compiler would likely mitigate
this.
This change introduce a DRIRC key we only turn on for DX12.
Also platforms prior to DG2/LSC have a really small bindless heap that
leads to additional register usage, so this optimization is completely
disable there.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10872
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10873
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14075
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
The current tracking seems to have hidden issues related to MCS
ambiguate that are currently hidden by the fact that we're inserting
pb-stall+RT-flush on BTI changes which we're going to be remove in the
next commits.
The issues appear to be related to a missing pb-stall+RT-flush between
MCS ambiguate and fast-clear causing failures on the following tests
once BTP+BTI RCC caching is enabled :
dEQP-VK.pipeline.*.multisample.misc.*multi*
dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_39x41_ms
dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_48x48_ms
Here we rework the tracking with a new enum to track 3 classes of
operations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
Instead of :
foreach color attachment
transition layout
fast clear
slow clear
do this :
foreach color attachment
transition layout
foreach color attachment
fast clear
foreach color attachment
slow clear
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
When rendering only has depth/stencil, we need to look at the
depth/stencil view size to generate a dummy null color attachments. So
do that first, so we don't have to iterate color attachments once more
with the final size.
This change also has the nice impact of removing a BTI change flush
due to the sequence moving from :
- before blorp BTI-flush
- color fast-clear
- after blorp BTI-flush
- depth fast-clear
- change RT due to shader outputs (BTI-flush)
- draw call
to :
- depth fast-clear
- before blorp BTI-flush
- color fast-clear
- combined after blorp BTI-flush (pending)
- change RT due to shader outputs (BTI-flush, combined with above)
- draw call
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
This is going to bite us a lot more when RCC BTP+BTI is enabled.
In particular this test will hang pretty reliably on LNL :
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.suballocation.multisample_resolve.layers_3.r32g32_sfloat.samples_4_baseLayer1
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f66ff97d58 ("drirc/anv: implement steps to disable RHWO for Wa_14024015672")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
vk_clock_gettime hasn't been used by other implementations ever since
venus and kk migrated over to the common implementation. It'd be better
to drop that helper (or move into anv) because it's not OS agnostic as
compare to the more comprehensive vk_device_get_timestamp.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40582>
Simulator is crashing when receiving GPGPU + Pixel as resource barrier signal
stage, what according to spec is invalid.
So here replacing the pixel stage by color, over synchronizing it a bit but
keeping it functional.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14641
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
Simulator hangs if a resource barrier has wait stage = None, HW seens
to don't care but something bad could be happning internaly.
So here making sure Wait stage is set to TOP when it is None.
Simulator hangs if a resource barrier has wait stage = None.
The HW seems to ignore it, but something bad could be happening internally.
So here I'm making sure the wait stage is set to TOP when it is None.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40516>
_mesa_sha1_format has a few remaining uses, so it's moved to build_id.c,
which is its last user.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
When there is no trace pointer, there is usually a another tracepoint
being emitted (see STATE_BASE_ADDRESS,
3DSTATE_BINDING_TABLE_POOL_ALLOC emission).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40503>
In commit 10b5b279a4 ("anv: Fix CmdResetEvent2() with RESOURCE_BARRIER::Wait stage == none")
I haved added assert to catch invalid cases but looks like we have several tests
affected by that problem causing crashes in debug builds.
So here I'm removing those asserts(), will then work on all the fixes and bring
it back.
Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40476>
On integrated platforms, we have issue where L3 cache not being coherent
with CS and it forces us to push data out L3.
To avoid data cache flush, let's write the IR header with BLORP shader.
There is a small shader launch latency but eventually that should not
matter because writing data with CS (MI_STORE) commands is slower than
shader execution when we consider large number of BVH tree getting
built.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39971>
Add dedicated BLORP op enums so clear paths can be represented
precisely.
This is enum-only groundwork; behavior and trace output are wired in
follow-up commits.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40414>
The runtime builds a final pipeline state with pointers to structures
coming from the associated pipelines libraries.
So far it has considered that the viewMask was part of a structure
together with the rest of the renderpass information. This information
can be specified in pre-raster, fragment & color-output state groups
and it was assumed would be consistent for all 3. And the runtime
currently takes the pointer to the structure from the last pipeline
library (color output).
Some coming spec/cts will clarify that the viewMask only needs to be
specified for pre-raster & fragment groups, making the value in the
color-output group untrustworthy.
This change creates a new state structure to hold the viewMask on its
own so it is only gather on pre-raster & fragment groups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Aitor Camacho <aitor@lunarg.com> (kosmickrisp)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (turnip)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Frank Binns <frank.binns@imgtec.com> (powervr)
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (panvk)
Royaled-yes-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39940>
CmdResetEvent2() was calling anv_add_pending_pipe_bits() with no dst_stages
stages causing RESOURCE_BARRIER::Wait stage == none, what causes a GPU hang in
NVL-P simulator.
So here setting dst_stages to VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT and adding
an assert in resource_barrier_wait_stage() to catch hw_stage == 0.
This fixes crucible func.event.cmd_buffer.q0 in simulator.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40445>
Those values trace back to 2015, pre Vulkan 1.0 release. I have no
idea why it was set to this, except maybe the HALIGN_128 of
RENDER_SURFACE_STATE.
Anyway, discussing this with Nanley, we don't think 128bytes is more
optimal than 64bytes. Nanley suggested the lowest value could be
16bytes for the fixed functions inside the GPU (sampler, dataport),
but a cacheline probably makes more sense for the memory interface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40363>
When running "./deqp-vk -n dEQP-VK.memory.binding.maintenance6*", we
get tons of:
MESA-INTEL: debug: anv_bind_image_memory: ignored VkStructureType
VK_STRUCTURE_TYPE_BIND_MEMORY_STATUS(1000545002)
The function does not ignore VK_STRUCTURE_TYPE_BIND_MEMORY_STATUS: it
looks for it before the main pNext loop. The pNext loop we have there
calls vk_debug_ignored_stype(), which complains about the fact that
we, allegedly, ignore VK_STRUCTURE_TYPE_BIND_MEMORY_STATUS. Move the
code where we find bind_status to the loop so it doesn't complain
anymore.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40351>
Fixes new CTS tests.
Similar to a previous change : 5bf3546cc6 ("anv: Use companion cmd
buffer for CCS and MCS image barriers")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40332>
With upcoming blorp_copy() changes, this avoids the following failures
with zink on gfx9:
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8_2d_array
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8_snorm_2d_array
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8i_2d_array
* dEQP-GLES3.functional.texture.specification.basic_teximage3d.r8ui_2d_array
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
blorp_copy() will soon gain the ability to increase the format bpb.
Prepare anv by replicating the clear color pixel on gfx12.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
We're going to be changing the surface format of images but need to
maintain a consistent render compression format to properly
encode/decode. Generalize and use the field that was previously specific
to ISL_AUX_USAGE_MC.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39974>
WA states that we need to allocate maximum number of stackIDs per DSS
from RT_DISPATCH_GLOBALS to 2048.
We can still throttle/control the CFE_STATE::StackID to be in range
specified by the field.
This does impact performance having CFE_STATE::stackIDs capped to 2K
by default. More the outstanding ray queries, larger the working set and
have more impact on cache hit rate.
This affect performance on Xe2+ onwards:
* Boundary Benchmark: 36.2%
* Solar Bay extreme: 9.8%
* Hitman world of assassination: 3.9%
Fixes: c1a44e8d43 ("anv: force StackIDControl value for Wa_14021821874")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40310>
If multiview is enabled on the render pass, baseLayer and layerCount
will be 0 and 1 respectively and throw us off.
We can still fast clear if view_mask == 1, but anything else hits the
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL restriction.
Fixes: e488773b29 ("anv: Fast clear depth/stencil surface in vkCmdClearAttachments")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40229>
We can't guarantee that skipping the BVH build would let the BVH memory
all zero. So explicitly set it to zero when running things with
BVH_NO_BUILD option.
This will help us to narrow down isuse if it's in BVH encoding or
application shader. Leaving uninitialized blob of memory would hit
intermittent hangs and would lead us to nowhere.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40276>
With the slab, anv_device_lookup_bo() will have anv_bo::map = NULL
while the seen_bbos will not and we want a host pointer for decoding.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40294>