ANB is only used by Android WSI which uses explicit sync so these
flags can be dropped.
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29883>
The condition of flat ccs and vram_only checker causes different
aux usage at binding stage. The current design is reusing CCS_E
on Xe2, so we want both Xe2 integrated and discreted GPUs behave
the same way.
Xe2 shouldn't need any special setup of CCS in the loop.
Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30111>
On pre-Xe2 platforms, the compression on these modifiers that
don't support compression are enabled. The compressed will be
resolved when needed. On Xe2+ we haven't support explicit
resolve, so all the paths to resolves are prohibited now. But
the code is still doing it, causing an assertion failure:
Fixes: vkcube
src/intel/vulkan/anv_private.h:5467:
anv_image_get_fast_clear_type_addr: Assertion
`device->info->ver < 20' failed.
Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30111>
v2: Add comment and assertion to explain why the shift is
safe. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
When -fsanitize=shift is used, many instances of the following are
produced:
src/intel/compiler/brw_fs_nir.cpp:114:30: runtime error: left shift of negative value -1
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
When -fsanitize=shift is used, 'ninja test' would fail in several
Intel assembly tests (mul.asm and and.asm) with:
src/intel/compiler/brw_reg.h:703:22: runtime error: left shift of 65532 by 16 places cannot be represented in type 'int'
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
When -fsanitize=shift is used, many instances of the following are
produced:
src/intel/compiler/brw_eu_compact.c:2244:50: runtime error: left shift of negative value -306
v2: Add comment and assertion to explain why the shift is
safe. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
When -fsanitize=shift is used, many instances of the following are
produced:
src/intel/compiler/brw_compiler.h:1661:44: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int'
I think this is an actual bug. It should check the sentinel value, but
the sentinel value is 64. The shift by 64 is treated as a shift by
0. The varying 0 is explicitly filtered by the rest of the
if-test. How does this work?
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30333>
Due to recent regression, adding INTEL_DEBUG=optimizer is dumping
shader optimization pass details to console rather than to respective
files.
Thank you, Kenneth W Graunke for helping me figure this out.
Fixes: 17b7e49089 ("intel/brw: Move out of fs_visitor and rename print instructions")
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30389>
Due the lack of SIMD8 in Xe2 platforms we are not able to compile
a shader for dEQP-VK.protected_memory.stack.stacksize_1024 that fits
into scratch space.
So before this patch when such failure happened it would return
VK_ERROR_OUT_OF_HOST_MEMORY error.
So here when available include the compiler error string to better
inform what the actual failure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30380>
Add a restriction on SIMD mode for fast-clear pixel
shader according to the Bspec.
Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29907>
BSpec 71045 and 57023 still points that protected/encrypted bit is still
bit 0, bit 1 should not be set or undesired MOCS index could be set.
Fixes: 7be8bc2c97 ("isl: Add mocs for xe2")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30369>
Only Cc stable because it's needed for the next patches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29982>
From all the many possible errors returned by the vm_bind ioctl, some
can actually happen in the wild when the system is under memory
pressure. Thomas Hellström pointed to us that, due to its asynchronous
nature, the vm_bind ioctl itself has to pin some memory, so if the
number of bind operations passed is too big, there is a probability
that it may run out of memory.
Previously the Kernel would return ENOMEM when this condition
happened. Since commit e8babb280b5e ("drm/xe: Convert multiple bind
ops into single job") the Kernel has started returning ENOBUFS when it
doesn't have enough memory to do what it wants but thinks we'd succeed
if we tried to do one bind operation at a time (instead of doing
multiple operations in the same ioctl), and ENOMEM in some other
situations. Still-uncommitted commit "drm/xe: Return -ENOBUFS if a
kmalloc fails which is tied to an array of binds" proposes converting
a few more ENOMEM cases no ENOBUFS.
Still, even ENOMEM situations could in theory be possible to recover
from, because if we wait some amount of time, resources that may have
been consuming memory could end up being freed by other threads or
processes, allowing the operations to succeed. So our main idea in
this patch is that we treat both ENOMEM and ENOBUFS in the same way,
so our implementation can work with any xe.ko driver regardless of
having or not having the commits mentioned above.
So in this patch, when we detect the system is under memory pressure
(i.e., the vm_bind() function returns VK_ERROR_OUT_OF_HOST_MEMORY), we
throw away our performance expectations and try to go slowly and
steady. First we wait everything we're supposed to wait (hoping that
this alone could also help to alleviate the memory pressure), and then
we synchronously bind one piece at a time (as this will ensure ENOBUFS
can't be returned), hoping that this won't cause the Kernel to try to
reserve too much memory. All this while also hoping that whatever
thing that may be eating all the memory goes away in the meantime. If
even this fails, we give up and hope the upper layer will be able to
figure out what to do.
This fixes a bunch of LNL failures and flaky tests (as LNL is our
first officially supported xe.ko platform). This can be seen in dEQP
but only if multiple tests are being run parallel. Happens in multiple
tests, some of which may include:
- dEQP-VK.sparse_resources.image_sparse_binding.2d_array.rgba8_snorm.1024_128_8
- dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16_snorm.1024_128_8
- dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16ui.512_256_6
I don't ever see these errors when running Alchemist/DG2 with xe.ko.
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30276>
Paulo reported that Vulkan is also affected by the drop of permanent
exec queues in Xe KMD, Iris already have this handling.
So here using the special DRM_IOCTL_XE_EXEC with num_batch_buffer == 0
to get a syncobj signaled when the last DRM_IOCTL_XE_EXEC is completed.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30156>
The idea here was that pixel shader framebuffer writes used the g0 and
g1 thread payload register values to construct the message header.
However, most messages are headerless and don't use either. There's a
2012-era comment that the simulator at one point had a bug where certain
headerless messages would incorrectly take the values from the g0/g1
register contents rather than using sideband. But, that was likely
fixed eons ago. So we really don't need to do this.
Furthermore, there are many more shader stages these days:
- VS: r1 contains output URB handles
- TCS: r1 contains ICP handles
- TES: r1 contains gl_TessCoord.x (r4 contains output URB handles)
- GS: r1 contains output URB handles
- CS: r1 contains LocalID.X on DG2+ but nothing on older hardware
- Task/Mesh: r1 contains LocalID.X
- BS: r1 contains bindless stack handles
Vertex and geometry aren't likely to benefit here because r1 is needed
for their output messages, which are also what terminate the shader.
TES will definitely benefit because we were making a value pointlessly
live for the whole program. Same for TCS, to a lesser extent. Compute
prior to DG2 was the worst, as g1 literally has no meaningful content,
so there is no point to keeping it live.
fossil-db on Alchemist shows substantial spill/fill improvements:
Totals:
Instrs: 148782351 -> 148741996 (-0.03%); split: -0.03%, +0.01%
Cycles: 12602907531 -> 12605795191 (+0.02%); split: -0.70%, +0.72%
Subgroup size: 7518608 -> 7518632 (+0.00%)
Send messages: 7341727 -> 7341762 (+0.00%)
Spill count: 54633 -> 52575 (-3.77%)
Fill count: 104694 -> 100680 (-3.83%)
Scratch Memory Size: 3375104 -> 3287040 (-2.61%)
Totals from 301172 (48.21% of 624670) affected shaders:
Instrs: 95531927 -> 95491572 (-0.04%); split: -0.05%, +0.01%
Cycles: 9643531593 -> 9646419253 (+0.03%); split: -0.91%, +0.94%
Subgroup size: 4492512 -> 4492536 (+0.00%)
Send messages: 4399737 -> 4399772 (+0.00%)
Spill count: 20034 -> 17976 (-10.27%)
Fill count: 41530 -> 37516 (-9.67%)
Scratch Memory Size: 1522688 -> 1434624 (-5.78%)
Assassin's Creed Odyssey in particular has 20% fewer fills.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30146>
Remove extra reporting of hdc_flush when viewing a Perfetto trace for
anv.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30312>
Doxygen documentation says
> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.
so we can omit the filename itself when using the annotation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>