Next patch will need to frequently get the count of supported engine
for compute and copy engines, so to reduce the overhead of doing
KMD queries at every call here caching this information into
intel_device_info struct.
With that ANV and Iris would need to set this information as intel/dev
can't depend on intel/common, so here adding a single function
to update intel_device_info with all fields filled by intel/common
functions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29899>
When I wrote sparse resources support for Anv we didn't have TileYs
support so I made non-opaque binds work even for non-standard block
shapes, which meant the block size could be either 64k or 4k. Since
then we merged TileYs support and changed our sparse resources
implementation to treat all the non-standard block shape cases as
"everything is the miptail", which means non-opaque binds are not
possible. So here we adjust the code to more explicitly represent
that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
There are 3 different places in our code where we calculate the tile
size and until recently the 3 implementations were different and with
slight bugs. Unify everything and also change the calculation to use
tile_info->phys_extent_B.
While doing this we move the isl_surf_get_tile_info() calls from
anv_sparse_calc_block_shape() to its callers so we total amount of
times we call it doesn't change.
v2: Adjust the patch now that tile_info is not part of isl_surf
anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
The code that tries to create a "pretend block shape" for linear
tiling surfaces was necessary back when we were going to support
sparse residency (non-opaque binds) for non-standard block shapes
(since there was uncertainty about TileYs support). That hasn't been
the case since before we merged sparse resources upstream, so remove
the code and leave an assertion instead, just in case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
Since commit 18d8c3ca33 we were allocating a little more than what
we were actually using (2621440 bytes instead of 2097152, aka 0x280000
instead of 0x200000), and we were not properly marking the BO as
internal. No applications should be misbehaving because of this.
Fixes: 18d8c3ca33 ("anv: Add missing ANV_BO_ALLOC_INTERNAL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
I've found myself adding this piece of code to our codebase when
debugging some Zink sparse failures recently, so let's upstream it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
This calculation was wrong for both compressed formats and
multi-sampled images. As a result, we misreported the image as having
a single miptail.
No Vulkan or GL CTS tests were tripping on this bug. I found this
while looking for tile size calculations after fixing a similar bug
elsewhere in the code.
The calculation should now match what we have in
anv_sparse_bind_image_memory(), which is widely tested.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
We have to take the number of samples into account when calculating
the tile size. If we don't do this, multi-sampled images may end up
falling in the "goto out_everything_is_miptail" case, while in reality
multi-sampled images don't even have miptails.
Also assert that the value is one of the only two values we expect
this to be. This assert would have been useful to catch this issue,
since with multi-sampled images we were getting values like 16k or 32k
depending on the number of samples.
This helps move forward progress in some Zink tests, but does not
make them fully pass yet, as those tests are full of sub-cases and
this only helps some of them:
KHR-GL46.sparse_texture2_tests.UncommittedRegionsAccess
KHR-GL46.sparse_texture2_tests.SparseTexture2Commitment
KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup
Fixes: 7ef3d652b2 ("anv/sparse: enable MSAA for Sparse when applicable")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
The Vulkan spec splits sparse resources in two different features:
sparse binding and sparse residency.
Sparse binding is much simpler. It requires the resources to be fully
bound before being used and it treats them as a black box. We're
required to support sparse binding for all the formats that are
supported by non-sparse, but that's easy beacause this feature is
simpler.
Now sparse residency is the one where we're allowed to partially bind
resources, and the one that comes with more complicated features such
as block shapes and non-opaque binding of images. This feature is
subdivided into:
- sparseResidencyBuffer
- sparseResidencyImage2D
- sparseResidencyImage3D
- sparseResidency{2,4,8,16}Samples (which refers to 2D images)
Notice that there's no sparseResidencyImage1D. And if you read the
specs it's clear that sparse residency is meant for non-1D images.
Still, supporting it didn't require any extra effort in Anv so we just
did it.
That's until we started running GL CTS tests on Zink. There's a CTS
test that checks for the standard block shapes. It creates 1D images
and expects the block shapes for them to be the standard 2D block
shapes. While we could very well just patch
anv_sparse_calc_image_format_properties() to return the standard 2D
block shapes for 1D images, that's just wrong (block shapes for 1D
images are just line segments, not rectangles!) so let's just reject
this all until maybe one day Vulkan defines sparseResidencyImage1D and
we get GL_ARB_sparse_texture3 to match it, or somebody decides to
change the GL CTS test.
Testcase: KHR-GL46.sparse_texture2_tests.StandardPageSizesTestCase
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
Some applications may benefit from this while some can get a performance
hit. Default to false and make it possible to toggle only for selected
workloads.
See workaround 14022483228 for some measurements.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29760>
This is a follow up to 059e82a4 ("anv: remove descriptor array bounds
checking"), that kind of bound checking is not required by the spec.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29663>
Makes Cyberpunk, Hitman and Total War Warhammer 3 run on LNL.
Fixes: c9e41f25a1 ("anv: Add heaps for Xe KMD in platforms without LLC")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29775>
In this assert we want to enforce that if a cached buffer is created
it is a cached+coherent as Xe KMD don't support cached+incoherent.
Did not caught this issue because it only reproduces in platforms with
GPU outside of LLC.
Fixes: 9d8d5cf8c9 ("anv: Remove block promoting non CPU mapped bos to coherent")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29826>
This field encodes bits [27:6] of the scratch surface state offset
according to the hardware spec, already on XeHP platforms. However,
on previous platforms we were passing bits [25:4] instead, which was
apparently okay for two reasons:
1/ We never used more than 8 MB of scratch surface states apparently.
2/ A shift right by 2 was implicitly happening while copying the
value of r0.5 into the address register holding the extended
descriptor, which with the ExBSO addressing mode disabled
considered bits [31:12] as the surface state index within the
pool.
However on Xe2 ExBSO addressing mode is always enabled for the UGM
shared function, so we have to add an extra SHR instruction to format
the extended descriptor regardless, and there is no point in
disobeying the hardware spec passing a left-shifted offset.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
Compressed memory types are not CPU visible and Vulkan specification
don't have any requirement about that but some applications like
vkcube fails to run without a host visible option, so here appending
default_buffer_mem_types and compressed_mem_types.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
Xe2 replaces auxiliary surface mapping by software to compress buffers,
instead it reserves part of the memory for the compression purpose.
To enable compression in Xe2 it is necessary bind memory with one of
the PAT indexes that has compression enabled.
It is still always returning false in anv_image_is_pat_compressible()
as it still needs more work before compression can be enabled but the
foundation for the compressed allocation is here.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
We'll reduce pitch alignment in a following patch. However, CCS
fast-clears don't seem to work unless the pitch is 512B aligned.
Disable fast clears for unaligned pitches.
Prevents the next patch from failing the following piglit tests:
* fbo-attachments-blit-scaled-linear
* hiz-stencil-test-fbo-d24s8
* hiz
* polygon-mode-facing
* clearbuffer-mixed-format
* glsl-lod-bias (transient failure)
No failures have been observed in anv, but there are more restrictions
for fast-clears in that driver compared to iris.
Note:
* The -fbo flag is necessary to make these fail. Otherwise, they end up
with aligned render targets.
* Each of these tests allocate an image that has a pitch greater than
512B and they collectively cover all the misalignment options - 128B,
256B and 384B.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
Intel modifiers supporting compression are specified to be compatible
with the display engine, even if they won't actually be used for
scanout.
Attempting to capture a wider scope of modifiers resulted in test
errors. I chose to narrow the scope instead of digging into them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
Now that we're using macros to handle aux-map CCS layout, we have no
need for the ISL surface representation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
ISL surfaces for CCS are not needed to describe flat CCS and aux-map
CCS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
As each anv_device has its own address space it was necessary create
one dummy_aux_bo per anv_device.
Also this workaround requires us to disable the
buffer_length_in_aux_addr optimization, that is done in the physical
device creating because isl_dev of physical device is copied
to isl_dev in anv_device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
7da5b1caef ("anv: move trtt submissions over to the anv_async_submit")
added a hard dependency on timeline semaphore which is still optional.
And since it gates the sparseBinding feature, we should not use it if
sparseBinding is not enabled.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7da5b1caef ("anv: move trtt submissions over to the anv_async_submit")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29779>
The intention of this block was to set one of the flags that is used
to select a PAT index but this was doing more than that.
It was promoting WB+0 way coherency BOs to WC+1 way coherency possibly
causing regression in platforms without LLC.
anv_device_get_pat_entry() return WC/writecombining if no flags is
set so we don't need this block after all.
Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Fixes: a65e982b44 ("anv: Split ANV_BO_ALLOC_HOST_CACHED_COHERENT into two actual flags")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29769>
Completely skip the stall & programming if the bindless address has
not changed. Only on Gfx12.5+ since previous generations also program
the binding table pool base address through STATE_BASE_ADDRESS.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
In a following change the predicate registers might be used when
flushing the state.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
The mi-builder already takes care of mi write/read fences, but we have
a few cases in Anv where we also need to fence mi-write ->
shader-read.
We also have one case where a command buffer jump address is modified
by a previous mi write command.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29595>
See also HSDES#14015504893 regarding the region-based tessellation
redistribution feature which allows fine-tuning the number of regions
per patch. This sets it to the recommended value, since region-based
redistribution is enabled by default.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29562>
Since we don't need to share that data with other fixed functions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29571>
With this change, the engine initialization batches are build and
submitted at vkCreateDevice() but the function doesn't wait for them
to complete. Instead we wait at vkDestroyDevice() or whenever another
submission happens on the queue, we check whether the initialization
batch has completed (without waiting) and free it if completed.
Seems to be about 25% reduction time of vkCreateDevice()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
The array pool does a single allocation and then splits it out. The
downside is that the pool is not lockless, but for border colors it
likely doesn't matter much as there is a max border colors for 4k.
Seems to be a 30% time reduction for vkCreateDevice()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
We can remove a bunch of TRTT specific code from the backends as well
as manual submission tracking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
We want to make this more generic so that it can be reused for device
initialization as well as TRTT submissions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>