We can remove a bunch of TRTT specific code from the backends as well
as manual submission tracking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
This AUX-TT is only updated on the CPU since ee6e2bc4a3 ("anv: Place
images into the aux-map when safe to do so"). So the only really
important invalidation that needs to happens is on the beginning of a
primary command buffer.
We are required to idle the pipes prior invalidation the AUX-TT. This
might not be happening when the invalidation is put at the beginning
of the secondary command buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29671>
We increased the size of the timestamps but only copied 64bit values
from the secondaries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 521c216efc ("anv: use COMPUTE_WALKER post sync field to track compute work")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29438>
Most of what we do in this function is conditional to not have
VK_RENDERING_SUSPENDING_BIT, so check for it once.
Suggested-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
Consider the following program:
- Uses a multi-sampled image as the color attachment.
- Draws simple geometry at the color attachment.
- Uses the (non-multi-sampled) swapchain image as the resolve image.
- Presents the result.
If the color attachment image (the multi-sampled one) is a sparse
image and it's fully bound, everything works and this patch is not
required.
If the image is partially bound (or just completely unbound), without
this patch the unbound area of the image that ends up being displayed
on the screen is not completely black, and it should be completely
black due to the fact that we claim to support
residencyNonResidentStrict (which is required by vkd3d for DX12).
On DG2, what ends up being displayed in the swapchain image is
actually the whole image as if it was completely bound. On TGL the
unbound area partially displays the geometry that was supposed to be
drawn, but the background is a different color: it's a weird corrupted
image. On both platforms the unbound areas should all be fully black.
This patch applies the proper flushing so that we get the results we
should have.
The bug fixed by this patch is not caught by dEQP or anything our CI
runs (dEQP does have some checks for residencynonResidentStrict
correctness, but none that catch this issue in particular). I was able
to catch this with my own sample program. Using INTEL_DEBUG=stall also
makes the problem go away.
If we had a way to track which images are fully bound we would be able
to avoid this flush. I had code for that in the earliest versions of
sparse before xe.ko had support for gpuva, but it requires maintaining
a bunch of lists, so I'm not sure that's actually worth it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
Initial work by Rafael Antognolli <rafael.antognolli@intel.com>
Reworks
- Rebase to main
- Emit the right hiz op for higher mip levels when transitioning the
depth buffer
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28629>
The device->using_sparse variable is only used at cmd_buffer_barrier()
to decide if we need to apply the heavier-weight flushes that are only
applicable to sparse resources. The big problem here is that we need
to apply the flushes to the non-image and non-buffer memory barriers,
so we were trying to limit those only to applications that ever submit
a sparse resource to the sparse queue.
The reason why we were applying this only to devices that ever
submitted sparse resources is that dxvk games have this thing where
during startup they create and then delete tiny sparse resources, so
switching device->using_sparse to true at resource creation would make
basically every dxvk game start applying the heavier-weight
workaround.
The problem with all that is that even if an application creates a
sparse resource but doesn't ever bind them, the resource should still
behave as an unbound resource (because they are bound with a NULL
bind), so the flushes affecting them should happen. This case is
exercised by vkd3d-proton/test_buffer_feedback_instructions_sm51.
In order to satisfy all the above cases and only really apply the
heavier-weight flushes to applications actually using sparse
resources, let's just count the number of sparse resources that
currently exist and then apply the workaround only if it's not zero.
That covers the dxvk case since dxvk deletes the resources as soon as
they create, so num_sparse_resources goes back to 0.
Testcase: vkd3d-proton/test_buffer_feedback_instructions_sm51
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10960
Fixes: 6368c1445f ("anv/sparse: add the initial code for Sparse Resources")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
The current helper anv_physical_device_bindless_heap_size()
artificially limited the surface heap size on DG2+ to 128MB. The HW is
actually 4GB capable, but we have workaround requiring to overlap the
dynamic state heap with the bindless surface state heap.
The actual limit comes from our virtual address space setup. It is
different between descriptor buffers and regular descriptors.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe037dec6e ("anv: expose VK_EXT_descriptor_buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27873>
The BSpec page "Flush Types" (46213) says the following about the Tex
Invalidate bit:
"Requires stall bit ([20] of DW) set for all GPGPU Workloads."
For newer platforms, this is documented in the description of the
texture invalidation bit in the PIPE_CONTROL page (56551):
"CS Stall bit in PIPE_CONTROL command must be always set for GPGPU
workloads when Texture Cache Invalidation Enable bit is set"
Iris had it only for GFX_VER 9 and 11, while Anv had it missing for
everything.
Please notice that this patch includes a revert of 397e728ef4.
Fixes: 397e728ef4 ("iris: Drop GPGPU Tex Invalidate restriction for TGL+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28608>
We're changing the memory address translation tables, we should
invalidate their cache.
It seems i915.ko is already doing this for us in between batches. The
xe.ko driver only adds invalidates to the ring before submissions if
scratch page is enabled in the VM (which it is today, but may change
in the future), and after some vm_bind and all vm_unbind ioctls, but
we don't use vm_bind for TR-TT. Still, it won't hurt to have it here
righ tnow.
v2: Use PIPE_CONTROL_length (José).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27928>
This should drop it from MTL as there it should apply only for a0
stepping.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28047>
Blitter and video engines don't support PIPE_CONTROL and
3DSTATE_BINDING_TABLE_POOL_ALLOC.
I'm not 100% sure if something else should be called instead but this
is doing the same as cmd_buffer_emit_state_base_address() and this
fixes the test that was crashing in
unreachable("Trying to emit unsupported PIPE_CONTROL command.");
Fixes: dEQP-VK.pipeline.monolithic.timestamp.misc_tests.two_cmd_buffers_secondary_transfer_queue_with_availability_bit
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28053>
Add up to four reasons per flush to perfetto flushes. PC reasons
will help debuggers understand why flushes were required, and
perhaps provide hints as to how they can be avoided.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27400>
To avoid ping-ponging between 3D & GPGPU in the following sequence :
vkCmdDispatch(...)
vkCmdCopyBuffer(...)
vkCmdDispatch(...)
We can try to keep the pipeline in GPGPU mode when doing blorp buffer
operations (we have blorp support for the CCS and can use the same
shaders on RCS).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27956>
Currently the command buffer is completely empty, which is not good.
There are a few of things that should be programmed, but we've
probably been okay due to the default engine initialization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: edcde0679c ("anv: Add helper to create companion RCS command buffer")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27979>
Let's say you have an image in R32_UINT format, a view is created in
R32_SFLOAT and used as color attachment.
When resolving the attachment, our current code uses the image format
(R32_UINT in this case). But resolve mode might apply only to SFLOAT,
so we currently run into an assert in blorp.
We should instead use the view format. There is an exception for
depth/stencil view because the format we want to resolve is actually
the depth/stencil format, not just the depth or stencil aspect.
This fixes vkd3d-proton's test_multisample_resolve_formats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27875>
A number of allocations during command buffer building are sourced
from the dynamic state heap. They're not actually access using an
offset in the dynamic state heap, it just happens to be a conveninent
place.
Use different helpers for thoses so we dynamically change the dynamic
state heap location in the next commits.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22151>
Xe2+ gets rid of PIPELINE_SELECT, so we need to make sure we add a stall
when switching pipelines
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27823>
Take the opportunity to fix the flush of the descriptor buffer surface
when needed. Previously we would only flush it if the shader used one
of the push descriptor.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27504>
For some yet unknown reason the CS L3 coherency setting is different
on MTL than DG2.
Fixes issues in tests from the subgroup :
dEQP-VK.api.buffer_marker.*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c8e122a738 ("anv: Implement rudimentary VK_AMD_buffer_marker support")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27613>
When doing query result copies in 3D mode, we're flushing the render
target cache, but the shader writes go through the dataport.
Fixes flakes/fails in piglit with shader query copies forced with Zink :
$ query_copy_with_shader_threshold=0 ./bin/arb_query_buffer_object-coherency -auto -fbo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b3b12c2c27 ("anv: enable CmdCopyQueryPoolResults to use shader for copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
When URB state for DS changes, we need to emit URB setup for VS with
256 handles and 0 for rest, commit this using a HDC flush before
setting real values.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26920>
Transfer operation are implemented differently on the compute engine
and require a different kind of cache flush.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27233>