This has the best fossil-db results across in a sweep from 0..15.
fossil-db results on Alderlake:
Instructions in all programs: 152849904 -> 152824116 (-0.0%)
SENDs in all programs: 7677830 -> 7677830 (+0.0%)
Loops in all programs: 48470 -> 48470 (+0.0%)
Cycles in all programs: 11988670382 -> 11987530942 (-0.0%)
Spills in all programs: 42863 -> 41777 (-2.5%)
Fills in all programs: 77114 -> 73044 (-5.3%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31990>
Now actually making use of new Xe KMD OA syncronization uAPI.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
Xe KMD added a uAPI to syncronze metrics id changes, so we can make
it wait for all previous workloads in exec_queue and all previous
metrics id changes to finish before start change it again.
This should make Vulkan queries more robust.
So this makes use of intel_bind_timeline to syncronize the metrics id
changes and xe_queue_get_syncobj_for_idle() to syncronize with
exec_queue.
As i915 and some versions of Xe KMD will not support it, this feature
will only be used then intel_bind_timeline parameter is not NULL and
timeline has a valid syncobj id.
At this patch level all callers will set it to NULL, next patch will
add and initialize timeline in ANV when supported by Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
This bit is not needed for barriers and appears to trigger a
performance regression. So leave it for just for AUX-TT
flushing/invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3814dee1a ("anv: add plumbing/support for L3 fabric flush")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12090
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
Stuff COMPUTE_WALKER_BODY in COMPUTER_WALKER in both iris and anv.
This also fixes the tracepoint for ray dispatches. Stuffing
COMPUTE_WALKER_BODY allow us to set the
cmd_buffer->state.last_compute_walker.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31822>
If a render area covers an area that is smaller than an attachment's
extent and is not aligned to the CCS block size, we must load the clear
color so that the pixels outside of that area are decompressed with the
right clear color.
Prevents the next patch from causing the following test failure on gfx9:
dEQP-VK.renderpass.suballocation.load_store_op_none.color_load_op_none_store_op_none
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
Store an array of clear values, one for each view format of the image.
Load the clear value based on the view format.
anv_image_msaa_resolve() may override the source or destination with
ISL_FORMAT_UNSUPPORTED, so make anv_image_get_clear_color_addr() handle
that format.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
In later commits, we'll rely on the number of view formats used by an
image to determine the size allocated for an array of clear colors in
the aux-state tracking buffer. Having a single view format for dmabufs
with clear color support allows anv to transparently handle this case.
Restrict the number of view formats by explicitly setting the image
format list to incomplete. Secondly, loosen the non-zero clear color
restriction on clear color supporting dmabufs. Those images can support
any clear color even with an incomplete list because we restrict
problematic accesses for the clear color during the negotiation phase.
Lastly, update add_all_surfaces_explicit_layout() to assert that the
sizing of the imported clear color struct meets expectations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
HSD 14021541470 lists a HW bug on blitter engine where the compression pairing bit is
not programmed correctly for stencil resources.
Use RCS Engine to perform copy instead.
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31792>
Follow up of the work performed in decode to
support inline query.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Signed-off-by: Stéphane Cerveau <scerveau@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31765>
Should only impact Xe2+
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 02294961ee ("anv: stop using a binding table entry for gl_NumWorkgroups")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31799>
We're not using a binding table entry anymore for num_workgroups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 02294961ee ("anv: stop using a binding table entry for gl_NumWorkgroups")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31799>
Fixes: 4aa3b2d ('anv: LNL+ doesn't need the special flush for sparse')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31737>
Emit dummy VF_STATISTICS state before each VF state.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31759>
This recommended values should improve the performance of async
compute in gfx20, we may want to tweek this for Linux but at least
this values should give us a better baseline than default values.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30796>
Setting the missing registers to specification recommended values that
is also the default value, so it is not expected any changes in
behavior or performance here.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30796>
We jump out of the loops whenever result is not VK_SUCCESS, there is
no need to check for it there. I guess I missed this detail in the
most recent rework for this function.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31698>
When the VkBuffer is of size 2^32 (which matches maxBufferSize), we
have vm_bind->size set to 2^32, which is fine because it fits in an
uint64_t. What is not fine is the 'i' variable being size_t, because
on 32bit systems it will loop forever since it will always be smaller
than 2^32.
Credits to Iván for not only reporting it, but also coming up with the
solution at the same time as I did, then testing it.
Cc: mesa-stable
Reported-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31698>
This will make things easier in situations where we don't want to use
the binding table at all (indirect draws/dispatches).
The mechanism is simple, upload a vec3 either through push constants
(<= Gfx12.0) or through the inline parameter register (>= Gfx12.5).
In the shader, do this :
if vec.x == 0xffffffff:
addr = pack64_2x32 vec.y, vec.z
vec = load_global addr
This works because we limit the maximum number of workgroup size to
0xffff in all dimension :
maxComputeWorkGroupCount = { 65535, 65535, 65535 },
So we can use the large values to signal the need for indirect
loading.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
When flushing the render target cache for future operations, we need a
stall at pixel scoreboard. We likely didn't see any issue until now
because a change in render target added the pb-stall.
When using a 2 compute shaders with the following pattern :
vkCmdDispatch()
vkCmdPipelineBarrier() ImageBarrier with (src|dst)AccessMask=0 & identical layout
vkCmdDispatch()
we should ensure that the first dispatch is completed before executing
the second one, otherwise they can race to on resource accesses. This
fixes failures in some new CTS tests.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31676>
We have no cases where we intentionally pass a NULL layout when dynamic
offsets, and doing so would cause a null dereference. Le't asd an assert
for that.
CID: 1620447
Fixes: f39cd30f4f ("anv: Track all the descriptor sets")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31638>
Coverity notices that we've insured that index index is < MAX_RTS in one
case, but that it must be greater in one case. Since `color_att_count`
is a uint32_t, it can easily exceed MAX_RTS (8), and would thus create
an out-of-bounds read situation. While the type system would allow this,
the actually implementation shouldn't, so an assert should make Coverity
happy and help us check our assumption.
CID: 1620440
Fixes: d2f7b6d5a7 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31640>
The code that initializes each queue got big enough that the
repetitive error handling is getting ugly and it could benefit from
being on its own function.
v2: Rebase, try to improve the comments.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
Since the L2 bug fix we've been overestimating l3l2_binds by a lot in
most of the cases: almost every single call to anv_sparse_bind_trtt
ends up using either 0 or 1 elements for l3l2_binds, with occasionally
something using 512 or more. By switching to util_dynarray we can
guarantee the best of every case:
- l1_binds will remain a stack array for the vast majority of the
calls
- even more than before, since STACK_ARRAY was limited to 8
elements and now we do 32
- l1 will be properly dimensioned without the need for reallocs
- l3l2_binds will be completely empty most of the times and only
trigger allocations when necessary
Here's the top 10 most common results of anv_sparse_bind_trtt() for a
trace of Assassin's Creed: Valhalla. The first column is how many
times we had that case while running the trace. After this patch, all
these cases will proceed without any memory allocations.
168 trtt_binds: num_vm_binds:04 l3l2:0000 l1:0004
344 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0004
420 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0012
422 trtt_binds: num_vm_binds:04 l3l2:0000 l1:0008
479 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0024
560 trtt_binds: num_vm_binds:03 l3l2:0000 l1:0003
1005 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0002
1024 trtt_binds: num_vm_binds:02 l3l2:0000 l1:0004
2145 trtt_binds: num_vm_binds:02 l3l2:0000 l1:0002
3735 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0001
Only 70 out of total 11340 calls to anv_sparse_bind_trtt() contained
l3l2 elements.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
We use 2MB page table BOs, as defined by ANV_TRTT_PAGE_TABLE_BO_SIZE.
Each BO is enough to hold 512 pages, since each one has 4096 bytes.
Each L1 page can fit 1024 entries of 64kb size, which means our 512
pages should be able to fit a little less than 32gb of sparse resource
memory, since we also need some L2 pages and an L3 page. I don't see
any real world application using more than a single BO.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>