In order to rewrite timestamps for indirect dispatch's, instroduce a
ANV_TIMESTAMP_REWRITE_INDIRECT_DISPATCH that repacks the PostSync field
for a EXECUTE_INDIRECT_DISPATCH.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26421>
On newer platforms (Arrowlake and above) we can issue a
EXECUTE_INDIRECT_DISPATCH that allows us to:
* Skip issuing mi load/store instructions for indirect parameters
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26421>
On newer platforms (Arrowlake and above) we can issue a
EXECUTE_INDIRECT_DRAW that allows us to:
* Skip issuing mi load/store instructions for indirect parameters
* Skip doing the indirect draw unroll on the CPU side when the
appropriate stride is passed
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26421>
Refactor the function to use the new common functions introduced for
indirect dispatch previously.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26421>
Refactor out loading the indirect parameters and filling the interface
descriptor data.
Reworks:
* Jordan: Change anv to use get_interface_descriptor_data which
returns the IDD struct rather than filling it.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26421>
Xe2 follows MTL and has different prefetch sizes for different
types of engines.
BSpec: 60223
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26396>
As Xe KMD don't support WB + 0 way coherency, so this are the only two
memory types possible for integrated GPUs without LLC in Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
Unlike i915, Xe KMD needs the cache parameter in gem_create
then during vm bind it request the PAT index that matches previous
parameter.
The PAT index selected could have more memory caracteristics that KMD
don't need to know.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
Xe KMD requires that all platforms supported by it set PAT information.
This will be implemented in Iris and ANV in the next patches.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
Xe KMD requires PAT for all platforms so here adding PAT entries to
all platforms supported by Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
This changes allow us to support HOST_COHERENT, HOST_CACHED and
HOST_COHERENT + HOST_CACHED memory types for platforms that has
the PAT uAPI.
Be aware that Xe KMD will not be able to support cached only memory
types, anv_xe_physical_device_init_memory_types() will reflect that
but internal usage should not allocate
VK_MEMORY_PROPERTY_HOST_CACHED_BIT only memory, hence the assert
added.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
Insert a dummy blit prior to MI_ARB_CHECK, MI_SEMAPHORE_WAIT,
MI_FLUSH_DW submitted on the copy engine.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26209>
This provides a basic implementation of VK_AMD_buffer_marker: we can
write the 32-bit markers from within a command buffer. Unfortunately,
our hardware has several limitations that make this difficult to
implement well:
1. We don't have insight into when specific stages finish (i.e.
all geometry shaders are done, but pixel rasterization may
still be occurring).
2. We cannot perform pipelined writes of 32-bit values to arbitrary
memory locations. PIPE_CONTROL::Write Immediate Value would be
the obvious way to implement this, but it only supports 64-bit
values, and the extension doesn't allow us to do that. We instead
use MI_STORE_DATA_IMM to write 32-bit values, but this requires
hard stalls.
Despite those limitations, the extension may still be useful for tools
to debug GPU hangs. We hope to offer another extension in the future
which offers similar functionality but is more efficient on our GPUs.
v2: Updated by Lionel Landwerlin to fix a number of flushing and
cache coherency issues with these writes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14924>
Per Google Test FAQ recommendation, prefer consutrctors and destructors
unless there's a need to use SetUp/TearDown.
We will take advantage of this later to initialize an fs_builder.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26301>
A "dance" is required with this uAPI, first we need to ask KMD what is
the size of the giving query id, then memory needs to be allocated to
match that size and then query again with the memory address set and
at this time Xe KMD will copy the query data to memory.
This dance was being duplicated in xe_engine_get_info() and
anv_xe_physical_device_get_parameters() and the next patch will also
use it in Iris, so here adding it common/xe and re-using as much
as possible.
There is one more implementation of this function in intel/dev but
due to how libs are linked intel/dev can't depend on to intel/common.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26325>
This heuristic doesn't show much difference when you have a beafy
processor but on lower end skus, it increase the number of buffers in
the execbuffer ioctl, adding significant overhead in i915.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cdd3178fb ("anv: Meet CCS alignment reqs with dedicated allocs")
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26335>
This solves a problem when you have a big memory chunk of which some
regions are bound to images. If the image is destroyed, currently the
aux-tt mapping stays and prevent any new image aux-tt mapping within
that region, until the memory is freed.
This maps & unmaps the aux-tt region at respectively bind & destroy
time, so that the memory chunks can be map through aux-tt.
If there is aliasing of memory to 2 different images, then the first
one "wins" the aux mapping and gets compression support. The second
one doesn´t.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ee6e2bc4a3 ("anv: Place images into the aux-map when safe to do so")
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26335>
The BO address is not really a good criteria since we can bind an
image at an offset inside a BO.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ee6e2bc4a3 ("anv: Place images into the aux-map when safe to do so")
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26335>
To implement this feature, we need to do CPU side tracking of all
L3/L2/L1 entries. This does add a little bit of CPU allocations, but
the advantage is that the traversal of the page table tree is faster.
No more need for the linear seach of find_buffer().
With this feature, we can have multiple VkImage bind to the same main
memory address, as long as they share exact same mapping parameters.
The AUX mapping will be removed when the last VkImage is destroyed.
As previously, if the L1 mapping entry parameters don't match, the
mapping fails. Anv handles this nicely by just disabling AUX on the
image.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26335>
Once NIR code is lowered and a few optimization passes have run, there
might be flag register interactions between instructions quite far
away from one another.
In the following case :
f0 = and r0, r1
...
fs_interpolate r2, r3
...
if f0
...
endif
If we lower fs_inteporlate while using the f0 register, we completely
garble the value meant for the if block.
To fix this, emit the predication for fs_interpolate in brw_fs_nir.cpp
when doing the NIR translation to the backend IR. This will guarantee
that the flag register interactions are visible to the optimization
passes, avoiding the problem above.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9757
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26306>
`\$` is interpreted before being passed to `re.search()`, but luckily
for us the escape is also invalid and because of that, python 3.12+
warns us about it.
Use a raw string instead, so that the `\` is passed untouched to
`re.search()`.
Fixes: aa04b47c6e ("intel/perf: add support for GtSlice/GtSliceXDualsubsliceY variables")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26355>
Here renaming the PAT entries to a name that better express each
entry.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
Anybody will be tempted to factor out the if-else block code since it
looks like duplication but else block actually handles the ycbcr images
where the aspect masks are compatible but don't need to be the same.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26294>
Set some state and implement dummy draws whenever viewport pointer
is being reprogrammed.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25987>
Add a helper that only emits hw_state, this makes it easier to modify
dirty state and call helper to emit only wanted state.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25987>
Those are not reused, so this will be the first and only allocation, so
no need to use the "realloc" variants.
For the fs_reg arrays, there's currently no particular reason to keep
them uninitialized, so zero-initialize them too -- not ideal but better
than random values.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26302>
This reworks the intel_compute_pixel_hash_table_nway() pixel pipe
hashing table computation helper to handle cases where some pixel
pipes have processing power different from the others, this is helpful
for Gfx12.7+ platforms where there are pixel pipes with 1 DSS as well
as pixel pipes with 2 DSSes, which currently can lead to a serious
performance bottleneck in the pixel pipes with lower processing power.
In order to avoid such a load imbalance the
intel_compute_pixel_hash_table_nway() function will now take two pixel
pipe bitsets instead of one: Pixel pipes enabled on both bitsets will
appear with twice the frequency on the table as pixel pipes which only
appear on one bitset. See the comments below for more details on the
algorithm used to construct a pixel hashing table with the desired
properties.
With this change rendering performance improves by about 25% on a
fused MTL platform -- The list of specific configs this is expected to
show an improvement on is not included here since the list is rather
long and some of the configs may still be embargoed or may never be
productized, but in order to find out whether your Gfx12.7+ device
could be affected by this you can check the output of the
intel_dev_info tool from the Mesa tree and see if there are multiple
"pixel pipe" entries with different DSS count. That isn't expected to
occur on any DG2 configuration, only on MTL+ platforms, so this change
should have no effect at all on DG2 (it's easy to convince oneself
that it won't since for DG2 mask1 should equal mask2 so mask2 will be
set to zero at the beginning of intel_compute_pixel_hash_table_nway()
and the new swzx[] permutation will be set to the identity).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26266>
Sync xe_drm.h with commit 3b8183b7efad ("drm/xe/uapi: Be more specific
about the vm_bind prefetch region").
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26238>
Using the unsized data field is incorrect. The data is located behind
the entire drm_i915_query_perf_config structure.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26285>
DRM_XE_EXEC_QUEUE_SET_PROPERTY is the offset,
while DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY is the real number.
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26253>
Regarding supporting these formats, the spec says:
"A sparse image created using VK_IMAGE_CREATE_SPARSE_RESIDENCY_BIT
supports all non-compressed color formats with power-of-two element
size that non-sparse usage supports. Additional formats may also be
supported and can be queried via
vkGetPhysicalDeviceSparseImageFormatProperties.
VK_IMAGE_TILING_LINEAR tiling is not supported."
Regarding the formats themselves, the spec says:
"VK_FORMAT_B8G8R8G8_422_UNORM specifies a four-component, 32-bit
format containing a pair of G components, an R component, and a B
component, collectively encoding a 2×1 rectangle of unsigned
normalized RGB texel data. One G value is present at each i
coordinate, with the B and R values shared across both G values and
thus recorded at half the horizontal resolution of the image. This
format has an 8-bit B component in byte 0, an 8-bit G component for
the even i coordinate in byte 1, an 8-bit R component in byte 2,
and an 8-bit G component for the odd i coordinate in byte 3. This
format only supports images with a width that is a multiple of two.
For the purposes of the constraints on copy extents, this format is
treated as a compressed format with a 2×1 compressed texel block."
Since these formats are to be considered compressed 2x1 blocks and we
don't necessarily have to support non-compressed formats that
non-sparse support, we can claim them as not supported with sparse.
In addition to all of that, if you look at isl_gfx125_filter_tiling()
you'll see that we don't even support Tile64 for these formats, so
sparse residency (i.e., non-opaque image binds) doesn't really make
sense for them yet.
The Vulkan spec defines 4 other YCBCR "2x1 compressed" formats like
the ones we have in this commit, but we don't support them even
without sparse, so there's no reason to check them here.
A recent change in VK-GL-CTS made tests that use these formats go from
unsupported to failures:
7ecc7716a983 ("Do not use and check for STORAGE image support, when
it is not used in the test")
This commit "fixes" the following VK-GL-CTS failures (by making them
return NotSupported):
dEQP-VK.sparse_resources.image_block_shapes.2d.b8g8r8g8_422_unorm.samples_1
dEQP-VK.sparse_resources.image_block_shapes.2d.g8b8g8r8_422_unorm.samples_1
dEQP-VK.sparse_resources.image_block_shapes.2d_array.b8g8r8g8_422_unorm.samples_1
dEQP-VK.sparse_resources.image_block_shapes.2d_array.g8b8g8r8_422_unorm.samples_1
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>