There are few EXT_ extensions that were promoted to KHR_, but we didn't
enabled them as supported.
This makes some CTS tests to be run as unsupported when they should be
supported instead.
For example, we were passing 16/108 line rasterization tests instead of
40/108 because we did not enabled KHR_line rasterization.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28090>
Besides disabling early-z when a frame is an odd width or height, we
need to disable it if the buffer is 16-bit and multisampled.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28009>
The new simulator provides a new API, so we need to adapt the code.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28009>
Summary:
- ensure headers used outside runtime are included in dependency source
- drop redundant idep_vulkan_common_entrypoints_h
- drop redundant icd side tricks for the order of header gen
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28066>
We need to allocate "shared size" bytes for each workgroup but
we were incorrectly multiplying by the number of workgroups in
each supergroup instead, which would typically cause us to allocate
less memory than actually required.
The reason this issue was not visible until now is that the kernel
driver is using a large page alignment on all BO allocations and
this causes us to "waste" a lot of memory after each allocation.
Incidentally, this wasted memory ensured that out of bounds
accesses would not cause issues since they would typically land
in unused memory regions in between aligned allocations, however,
experimenting with reduced memory aligments raised the issue,
which manifested with the UE4 Shooter demo as a GPU hang caused
by corrupted state from out of bounds memory writes to CS
shared memory.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27675>
We already had the logic implemented, but it was never really tested
(there was a comment about that)
So the advantage of this is that we now test that code (in fact, there
were a small typo on that code).
There aren't too much CTS tests for this feature, but we gets tests
like this working:
dEQP-VK.clipping.clip_volume.depth_clip.*
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10527
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27386>
To handle coverity warning:
4. thread2_modifies_field: Thread2 sets cache_size to a new value. Note that this write can be reordered at runtime to occur before instructions that do not access this field within this locked region. After Thread2 leaves the critical section, control is switched back to Thread1.
CID 1559509 (#1 of 1): Check of thread-shared field evades lock acquisition (LOCK_EVASION)6. thread1_overwrites_value_in_field: Thread1 sets cache_size to a new value. Now the two threads have an inconsistent view of cache_size and updates to fields correlated with cache_size may be lost.
521 cache->cache_size += bo->size;
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26951>
No driver supports urol/uror on all bit sizes. Intel gen11+ only for 16
and 32 bit, Nvidia GV100+ only for 32 bit. Etnaviv can support it on 8,
16 and 32 bit.
Also turn the `lower` into a `has` option as only two drivers actually
support `uror` and `urol` at this momemt.
Fixes crashes with CL integer_rotate on iris and nouveau since we emit
urol for `rotate`.
v2: always lower 64 bit
Fixes: fe0965afa6 ("spirv: Don't use libclc for rotate")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by (Intel and nir): Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27090>
Demoting means that we don't execute any writes to memory but
otherwise the invocation continues to execute. Particularly,
subgroup operations and derivatives must work.
Our implementation of discard does exactly this by using
setmsf to prevent writes for the affected invocations, the
only difference for us is that with discard/terminate we
want to be more careful with emitting quad loads for tmu
operations, since the invocations are not supposed to be
running any more and load offsets may not be valid, but with
demote the invocations are not terminated and thus we should
emit memory reads for them to ensure quad operations and
derivatives from invocations that have not been demoted still
work.
Since we use the sample mask to implement demotes we can't tell
whether a particular helper invocation was originally such
(gl_HelperInvocation in GLSL) or was later demoted
(OpIsHelperInvocationEXT added with SPV_EXT_demote_to_helper_invocation),
so we use nir_lower_is_helper_invocation to take care of this.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26949>
Both OpenGL and Vulkan drivers share the same V3D_CSD definitions.
Therefore, move it to a common place instead of duplicating.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Currently, the function handle_reset_query_cpu_job() starts to iterate
between the performance queries in the zero-index. This is not correct,
as we should start iterating the performance queries at first, which
is a index indicated by info->first.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The copy timestamp query user extension allows the creation of a CPU job
that copies the results of a timestamp query to a BO with the possibility
to indicate the timestamp availability with a availability bit.
By using the copy timestamp query user extension, it will be possible to
use the multisync user extension to synchronize this type of job, which
currently possible with the user space implementation without stalling.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The reset timestamp user extension allows the creation of a CPU job that
resets a timestamp query by updating its value in the timestamp BO and
resetting the availability syncobj.
Using the reset timestamp user extension, it will be possible to use the
multisync user extension to synchronize this type of job, which is not
currently possible with the user space implementation without stalling.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The timestamp query user extension allows the creation of a CPU job that
calculates the timestamp by updating its values in a BO with the appropriate
offsets and signalling the availability syncobjs.
The CPU job should be serialized so it only executes after all previously
submitted work has completed. This is accomplished by setting job->serialize
to V3DV_BARRIER_ALL before setting the multi-sync extension.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
A CPU job of type V3DV_JOB_TYPE_CPU_RESET_QUERIES is only created for
performance and timestamp queries. Occlusion queries are handled with a
compute job. Therefore, there is no need to handle occlusion queries in
the CPU job.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The indirect CSD user extension allows the creation of a CSD
job linked to a CPU job. When we submit the CPU job, the CPU job
will run when the indirect CSD dependency is completed, map the
indirect buffer to read the CSD dispatch parameters and reconfigure
the CSD job accordingly.
Using the indirect CSD user extension, allows us to use the multisync
user extension to synchronize this type of job, which is not currently
possible from user-space without stalling.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
We will be introducing a new type of queue, a CPU queue. This queue will
be responsible for handling the CPU jobs, such as timestamp queries and
indirect CSD dispatch.
Therefore, add a CPU queue to the enum v3dv_queue_type and its
respective barrier.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Include a check to assure that the kernel driver supports the CPU queue.
Also, indicate that, currently, the simulator doesn't have support for
the CPU queue.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Currently, set_multisync() doesn't allow using external syncobjs as
in_syncs objects in the multisync extension. Add the possibility to use
external syncobjs as in_syncs. This will ease the synchronization of CPU
jobs, as they sometimes depend on external syncobjs, such as
query availability syncobjs.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
With the support of CPU jobs by the kernelspace, now the CPU job
functions will also use the multisync extension. Therefore, move the
multisync functions to the beginning of the file to allow the CPU job
functions to call them.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
For Wayland wsi allocations, v3dv used the wl_drm protocol, which is now
being phased out in favor of dmabuf feedback.
wl_drm is used to figure out the display device (in v3dv assumed to be
vc4) and then to authenticate with the Wayland compositor in order to
allocate scanout-able buffers (in this case, dumb buffers) directly at
the display device.
Recent commit 88c03ddd34 changed the behavior of the wsi code, and
wl_drm is now passing the render device instead, which broke Wayland
wsi.
It turns out that the authentication code is not really needed and since
we would like to remove wl_drm usage and the master device is assumed to
be vc4 anyway, we can just remove some unneeded device-specific wsi code
and get Vulkan Wayland wsi back to work.
Fixes: 88c03ddd34 ("egl/drm: get compatible render-only device fd for kms-only device")
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26200>
So far we were packing by hand unnormalized coordinates at the
V3D41_TMU_CONFIG_PARAMETER_1 pack structure. To get this working we
hardcoded V3D_VERSION to 41 at v3dv_uniforms, that works for v71
because the structure are the same. But that is somewhat ugly, and
will not work if a new hw generation have a different structure.
Additionally, we found that for v3d this will be also needed.
So this commit adds a helper on the compiler. For now, and to simplify
it also use just one method for both generations. This solves the
problem of the same code needed on both v3d and v3dv.
But the idea is that in the future we need a similar need, but the
structure different on each generation, it would have used a similar
approach to other generation dependent function calls (like
v3d40_vir_emit_tex), having the implementation on a source file that
can safely include the hw generation headers.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25544>
ldunifa works exactly the same as ldunif: the hw will prefetch the
next 4 bytes after a read, so if a buffer is exactly a multiple of
a page size and a shader uses ldunifa to read exactly the last 4 bytes
the prefetch will read out of bounds and spam the error on the kernel
log. Avoid that by allocating extra bytes in this scenario.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25752>
We have been using drm_render_device to refer to the render device and
drm_primary_device to refer to the display device, but that is confusing
because the render device also has a primary node (for legacy reasons),
so don't use primary to refer to the display device.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25748>