This giant patch implements a huge chunk of the Vulkan Sparse
Resources API. I previously had this as a nice series of many smaller
patches that evolved as the xe.ko added more features, but once I was
asked to squash some of the major reworks I realized I wouldn't be
able easily rewrite history, so I just squased basically the whole
series into a giant patch. I may end up splitting this again later if
I find a way to properly do it.
If we want to support the DX12 API through vkd3d we need to support
part of the the Sparse Resources API. If we don't, a bunch of Steam
games won't work.
For now we only support the xe.ko backend, but the vast majority of
the code is KMD-independent and so an i915.ko implementation would use
most of what's here, just extending the part that binds and unbinds
memory.
v2+: There's no way to sanely track the version history of this patch
in this commit message. Please refer to Gitlab.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
While earlier changes to pipe control emission allowed debug dump of
each pipe control, they also changed debug output to almost always print
same reason/function for each pc. These changes fix the output so that
we print the original function name where pc is emitted.
As example:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_batch_emit_pipe_control_write
pc: emit PC=( ) reason: gfx11_batch_emit_pipe_control_write
changes back to:
pc: emit PC=( +depth_flush +rt_flush +pb_stall +depth_stall ) reason: gfx11_emit_apply_pipe_flushes
pc: emit PC=( ) reason: cmd_buffer_emit_depth_stencil
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25282>
When applying barriers for image transitions, we're currently
considering all possible usages of an image. But when running on a
compute only queue for example, the usage of an image will never be
one of those :
- VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT
- VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT
- VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
- VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT
- VK_IMAGE_USAGE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR
Removing unused usages for the compute queue allows us to reduce the
scope of the VK_IMAGE_LAYOUT_GENERAL for example. This a bunch of
transition operation that are completely useless when dealing with
barriers on the compute queue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25092>
Zink is running into those asserts on CI. The problem is that with non
auxilary modifiers like I915_FORMAT_MOD_Y_TILED, we might still
allocate larger buffers with IMPLICIT_CCS.
This isn't a complete fix, the real fix with come with
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25003 where
we stop overallocating and those assert will match the private binding
allocation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 569f80f2df ("anv: Reduce accesses of isl_mod_info->aux_usage")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25099>
When we have MSAA copy/clear operation on the compute queue, use the
companion RCS command buffer to carry out copy/clear operations.
v2: (Sagar)
- Flush cache according to command buffer
- Invalidate AUX when we create new companion RCS command buffer if
platform support AUX TT.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
v2: (Nanley)
- Make sure we skip layout transition during queue ownership transfer
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
If we have valid companion RCS command buffer, we should
end/destroy/reset in the same fashion as of main command buffer.
v2:
- Add lock around anv_cmd_buffer_destroy (Sagar)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
Since we are going to have companion RCS command buffer, we need to
end/destroy/reset companion RCS command buffer similar to main (CCS/BCS)
command buffer.
It's better to split out common code into helper function so that we can
use it later in this series.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
A single Vulkan state can map to multiple fields in different GPU
instructions. This change introduces the bottom half of a simplified
emission mechanism where we do the following :
Vulkan runtime state
|
V
Intermediate driver state
|
V
Instruction programming
This way we can detect that the intermediate state didn't change and
avoid HW instruction emission.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
The goal of this change it to move away from a single batch buffer
containing all kind of pipeline instructions to a list of instructions
we can emit separately.
We will later implement pipeline diffing and finer state tracking that
will allow fewer instructions to be emitted.
This changes the following things :
* instead of having a batch & partially packed instructions, move
everything into the batch
* add a set of pointer in the batch that allows us to point to each
instruction (almost... we group some like URB instructions,
etc...).
At pipeline emission time, we just go through all of those pointers
and emit the instruction into the batch. No additional packing is
involved.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
We'll use this later to know when to reemit
3DSTATE_STREAMOUT::ForceRendering
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
Instead of having that function do only merging of 2 sets of dwords,
it can also do the packing of the new dynamic values. This saves us a
bunch of local structures to declare and calling the packing functions
ourselves.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
Instead of only initializing the clear color when the first subresource
is accessed, initialize it for every FCV-enabled subresource. This is
needed because writes to any subresource may be converted to fast
clears.
Now that init_fast_clear_color is called for every subresource, we take
care not to stomp on the fast-clear-tracking state of the first
subresource by moving the code which updates it outside of
init_fast_clear_color.
Now init_fast_clear_color does just what it says: initializes the fast
clear color.
This fixes the regression introduced with commit 57445adc89,
("anv: Re-enable CCS_E on TGL+").
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8461
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24857>
Emit depth flush after state that sends implicit depth flush. These
states are:
3DSTATE_HIER_DEPTH_BUFFER
3DSTATE_STENCIL_BUFFER
3DSTATE_DEPTH_BUFFER
3DSTATE_CPSIZE_CONTROL_BUFFER
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24689>
Following 71ebd9b9d7, 3DSTATE_GS can be emitted as part of the
pipeline batch and as a dynamic state. Just do the latter.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 71ebd9b9d7 ("anv,hasvk: respect provoking vertex setting on geometry shaders")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24632>
According to WA description, we need to track DS write state
and emit a PSS_STALL_SYNC whenever that state changes.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18411>
And split them into UBO and SSBO
v2 (Lionel):
- Get rid of robustness fields in anv_shader_bin
v3 (Lionel):
- Do not pass unused parameters around
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17545>
This change allow us to insert the MI_SEMAPHORE_WAIT before/after
specific draw call. With GTX tool, we can always update the memory
address to unblock spinning wait.
v2:
- Make sure draw_call_count is thread-safe (Lionel)
- Add static inline helper (Lionel)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24308>
Mismatch allocator could cause bad things, so better set the allocator
on anv_reloc_list_init() and use it in every reloc function.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24411>
MOCS = 0 is a invalid MOCS index, so it is necessary get a valid value
and set to MI_MATH instructions.
So here the mocs index is set with mi_builder_set_mocs(), it can be
always set but it is required when mi_build will emit MI_MATH
instructions.
The mocs index will only be stored and used in gfx12.5+ platforms
so no changes were are required in crocus or hasvk.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22508>
set_image_compressed_bit checks for the image aux usage whereas
cmd_buffer_mark_image_written checks for the subresource's aux usage.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Fixes: 2e8b1f6d ('anv: drop duplicate checks when setting the compressed bit')
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24363>
We need to set the right value on ReorderMode based on the provoking
vertex mode, or the order in which the vertices for tristrip[_adj] are
delivered to the geometry shader doesn't match what Vulkan expects.
Fixes
dEQP-VK.transform_feedback.primitives_generated_query.concurrent.*triangle_strip_with_adjacency*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23243>
anv no longer needs to track if the CFE state is valid since we ensure
that the state is valid at pipeline creation time.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23934>
Wa_14016118574 is not the lineage number for this workaround so
it was updated to Wa_22014412737.
Wa_22014412737 is not applicable for MTL B0 steppings and newer
so using the workaround framework eliminates this pipe_control
instruction for not affected revisions.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24221>
According to Bspec, COMPCS0_CCS_AUX_INV register offset
is 042C8h and COMPCS0_AUX_TABLE_BASE_ADDR is defined to 042C0h.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23958>
There is no mention in spec about subtract one of the number of
threads, also Iris and blorp code don't subtract.
Alchemist PRMs: Volume 2a: Command Reference: Instructions: CFE_STATE: Maximum Number of Threads:
Normally set to the maximum number of threads: (# EUs) * (# threads/EU)
Cc: mesa-stable
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23973>
Although the following is based on this observations for OpenGL, we
probably need this for Vulkan as well.
KHR-GL46.texture_buffer.texture_buffer_operations_ssbo_writes writes
to an SSBO in a compute program, then issues a memory-barrier, which
causes us to add a DC-flush. Then a second compute program samples
from the SSBO written by the first compute program.
Although we expected the DC-flush to make the writes available to the
second compute program, on MTL this wasn't the case. Adding the
"Untyped Data-Port Cache Flush" fixes this.
The PRM indicates that compute programs must set "Untyped Data-Port
Cache Flush" to flush some LSC writes when flushing HDC. Although we
are setting DC-flush, and not HDC-flush, it does appear that the
following reference might also apply to DC-flush.
In the Intel(R) Arc(tm) A-Series Graphics and Intel Data Center GPU
Flex Series Open-Source Programmer's Reference Manual, Vol 2a: Command
Reference: Instructions, PIPE_CONTROL, HDC Pipeline Flush (DWord 0,
Bit 9), there is a programming note:
> When the "Pipeline Select" mode is set to "GPGPU", the LSC Untyped
> L1 cache flush is controlled by "Untyped Data-Port Cache Flush" bit
> in the PIPE_CONTROL command.
Ref: a8108f1d44 ("anv: Add missing untyped data port flush on PIPELINE_SELECT")
Ref: bd8e8d204d ("iris: Add missing untyped data port flush on PIPELINE_SELECT")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>
In the Intel(R) Arc(tm) A-Series Graphics and Intel Data Center GPU
Flex Series Open-Source Programmer's Reference Manual, Vol 2a: Command
Reference: Instructions, PIPE_CONTROL, HDC Pipeline Flush (DWord 0,
Bit 9), there is a programming note:
> When the "Pipeline Select" mode is set to "GPGPU", the LSC Untyped
> L1 cache flush is controlled by "Untyped Data-Port Cache Flush" bit
> in the PIPE_CONTROL command.
Ref: a8108f1d44 ("anv: Add missing untyped data port flush on PIPELINE_SELECT")
Ref: bd8e8d204d ("iris: Add missing untyped data port flush on PIPELINE_SELECT")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>
This should be equivalent, but refactoring the code will allow the
next two patches to use an else block for this check.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23176>