This introduces a new level cache key for physical device. The main
motivation is for shader object because the Vulkan spec says:
"Guaranteed compatibility of shader binaries is expressed through a
combination of the shaderBinaryUUID and shaderBinaryVersion members of
the VkPhysicalDeviceShaderObjectPropertiesEXT structure queried from a
physical device. Binary shaders retrieved from a physical device with
a certain shaderBinaryUUID are guaranteed to be compatible with all
other physical devices reporting the same shaderBinaryUUID and the
same or higher shaderBinaryVersion."
Meaning that with ESO, the driver needs to compile shaders for the
worst case with every possible logical device features enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27632>
Rather than doing this lowering potentially multiple times when a
shader is relinked we can instead do it once in the compiler.
This change also gets us closer to converting to NIR at compile
time.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27690>
The base_ir variable used by this pass is set via visit_list_elements()
however this pass was skipping visit_list_elements() for the initial
list of instructions i.e. it was skipping it for globals so if we
ended up trying to flatten an expression on a global we would segfault.
To quote the code comment on the base_ir variable:
"This is implemented by visit_list_elements -- if the visitor is
not called by it, nothing good will happen"
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27743>
The resource size doesn't need to match the binding granularity. For
example, if the user wants to create a 32kb buffer, Anv will require
its memory to have 64kb, but the buffer size will still be the
original 32kb. And the spec says:
VUID-VkSparseMemoryBind-size-01100:
"size must be less than or equal to the size of the resource minus
resourceOffset"
VUID-VkSparseMemoryBind-size-01102:
"size must be less than or equal to the size of memory minus
memoryOffset"
So when binding such buffer, size should actually be the lesser of the
two values: 32kb, and we have to accept that. Since our binding
granularity is 64kb, we're safe to simply extend the requested size to
match our binding granularity, since we already require the memory to
be appropriately sized.
None of this is exercised by dEQP. This was caught by
piglit/arb_sparse_buffer-basic using Zink.
Testcase: piglit/arb_sparse_buffer-basic
Issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10220
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26410>
I need to add some sparse-related checks that require having the
anv_buffer and anv_image, and putting them directly inside
anv_queue_submit_sparse_bind_locked() doesn't feel like the right
thing to do. Here we change the interface so now we have
anv_sparse_bind_buffer() and anv_sparse_bind_image_opaque() as the
main interface into anv_sparse.c, so they both can call the lower
level anv_sparse_bind_resource_memory() function.
In the next patch we'll be adding changing the code of the functions
we just created, justifying their addition.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26410>
What we're checking in the assertion we're changing seems to be what
the OpenGL spec describes as:
"<offset> must be an integer multiple of the implementation
dependent constant SPARSE_BUFFER_PAGE_SIZE_ARB, and <size> must
either be a multiple of SPARSE_BUFFER_PAGE_SIZE_ARB, or extend to
the end of the buffer's data store"
There are two sizes in question here: the size of the VkBuffer and the
size of its corresponding VkDeviceMemory. It looks like
bo->base.base.size corresponds to VkDeviceMemory, while res->obj->size
corresponds to VkBuffer. Here we're really talking about the VkBuffer
size, so fix the assertion.
On Anv, we're hitting this issue because piglit's
arb_sparse_buffer-basic creates a buffer of size 32k and tries to
issue a bind operation with size 32k. The catch here is that Anv
requires the memory to be 64kb, so Zink gets confused and hits the
assertion.
Issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10220
Testcase: piglit/arb_sparse_buffer-basic
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26410>
This also allows tools build on clc to drop their workaround to include
it themselves. Rusticl might need it once it supports extensions which
need this file pulled in.
Later if the need to include it changes based on llvm version, we can
easily handle this in clc.
The main reason to include it only conditionally is the massively
reduction in compilation time. It also removes the mental burden from
users of clc to deal with any of this themselves.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10633
Fixes: 37a1346347 ("meson: remove opencl-external-clang-headers option and rely on shared-llvm")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27663>
This adds the following template options:
- add an option to fill TC set_vertex_buffers from st_update_array directly
(always true without u_vbuf, so always used with radeonsi)
- add an option saying that there are no zero-stride attribs
- add an option saying that there are no user buffers
(always true with glthread, so always used with radeonsi)
- add an option saying that there is an identity mapping between vertex
buffers and vertex attribs
I have specifically chosen those options because they improve performance.
I also had other options that didn't, like unrolling the setup_arrays loop.
This adds a total of 42 variants of st_update_array_templ for various cases.
Usually only a few of them are used in practice.
Overhead of st_prepare_draw in VP2020/Catia:
Before: 8.5% of CPU used
After: 6.13% of CPU used
That's 2.37% improvement. Since there are 4 threads using the CPU and
the percentage includes all threads in the system, the improvement for
the GL thread is about 8% (roughly 2.17% * 4; each thread at 25% of global
utilization means 100% utilization in 4 cores).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27731>
This way we execute 1 half of setup_arrays with the fast path enabled,
and the other half with the fast path disabled, so it's not that much
of code duplication, and it will facilitate further optimizations.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27731>
The fast path is the only focus of optimizations, so let's stop using
the slow one if the fast path is allowed. Only display lists with drivers
lacking draw_vertex_state use it, and drivers not exposing
PIPE_CAP_ALLOW_DYNAMIC_VAO_FASTPATH use it.
This changes gl_constants::AllowDynamicVAOFastPath to UseVAOFastPath
because it's no longer turned on/off dynamically, but only one of them
is always used per VAO. It also removes the IsDynamic and NumUpdates
fields of VAOs.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27731>
In the code path without arguments it tries 3 different paths and error
messages are overwritten one by other, in this case any of those
error messages are irrelevant.
For the code path with arguments is similar, as it already have a
fprintf(stderr) in the caller of open_error_state_file().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27728>
This file defines i915 context priorities, all users in Iris and ANV
have moved to i915 specific files, so the only remaining for this file
is move it to i915 folder so it do not gets included in common code
by mistake.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27728>
Iris now has just one i915_drm.h include in the common code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27728>
This field was being set with i915 specific flags, replacing it
by a capture boolean we can have the same behavior with less
i915_drm.h usage in the common code.
This also allow us to implement VM capture in Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27728>
(EXEC_OBJECT_SUPPORTS_48B_ADDRESS | EXEC_OBJECT_PINNED) is set in every
place that setups a iris_bo, so here moving it to a single and i915
specific place.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27728>
Meson complains:
../src/intel/decoder/meson.build:67: DEPRECATION: Project uses feature that was always broken, and is now deprecated since '1.3.0': str.format: Value other than strings, integers, bools, options, dictionaries and lists thereof..
So instead of trying to format a file, change gentest_xml to store just
the string. Need to adapt genxml_path to consider the current source
dir, but everything else works like before.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27706>
Both are deprecated and the alternatives are already being used in
the project, so start using those here too:
```
../src/intel/shaders/meson.build:64: WARNING: Project targets '>= 1.1.0' but uses feature deprecated since '0.56.0': meson.source_root. use meson.project_source_root() or meson.global_source_root() instead.
../src/intel/shaders/meson.build:65: WARNING: Project targets '>= 1.1.0' but uses feature deprecated since '0.56.0': meson.build_root. use meson.project_build_root() or meson.global_build_root() instead.
```
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27706>
This reduces the overhead of _mesa_HashLookupLocked by 19% according to
sysprof, which could be inaccurate.
While this commit inlines _mesa_HashLookupLocked for a better gain,
the testing was done without inlining to make it fair.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27494>
Make it behave like it's always true.
There is no disadvantage in keeping it always true, but when it's
incorrectly false, things break.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27494>
This removes the pointer indirection every time we access the hash table.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27494>
split from "mesa: enable GL names reuse for _mesa_HashTable, remove the alternative"
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27494>
061b8bfd29 moved handling of fixed operands earlier, but it should have
moved the fixing of writelane operands earlier too.
This fixes Crucible's func.uniform-subgroup.exclusive.imin64 on GFX8.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 061b8bfd29 ("aco/ra: rework fixed operands")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27583>
According to Valgrind, vcc/m0 are uninitialized and this fixes it.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27583>
We've implemented another workaround completely disabling high
priority preemption.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e6e320fc79 ("anv: make Wa_16013994831 to use intel_needs_workaround")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27716>
With TES, the primitive ID is an input variable but it's considered a
sysval by SPIRV->NIR. Though, its value is greater than
VARYING_SLOT_VAR0 which means its location was adjusted by mistake.
This fixes compiling a tessellation evaluation shader in debug build
with Enshrouded.
Fixes: dfbc03fa88 ("spirv: Fix locations for per-patch varyings")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27413>
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset. To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.
These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick. In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned. So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.
This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.
shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3
fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26
- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%
From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
We need to allocate "shared size" bytes for each workgroup but
we were incorrectly multiplying by the number of workgroups in
each supergroup instead, which would typically cause us to allocate
less memory than actually required.
The reason this issue was not visible until now is that the kernel
driver is using a large page alignment on all BO allocations and
this causes us to "waste" a lot of memory after each allocation.
Incidentally, this wasted memory ensured that out of bounds
accesses would not cause issues since they would typically land
in unused memory regions in between aligned allocations, however,
experimenting with reduced memory aligments raised the issue,
which manifested with the UE4 Shooter demo as a GPU hang caused
by corrupted state from out of bounds memory writes to CS
shared memory.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27675>