Meson devenv is a feature added in meson 0.58 (thus the features is
version guarded) that allows creating a shell environment with
environment variables automatically setup for running the project inside
the build dir. Some variables (such as LD_LIBRARY_PATH and PATH) are set
automatically, others must be added by the project.
For vulkan is is relativley simple, we create a new, uninstalled, icd
file for each driver and set the VK_ICD_FILENAMES variable
appropriately. This can be used with:
```sh
meson devenv -C $builddir
```
then, vulkan applications will automatically use the uninstall vulkan
driver, no need to install.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14826>
This brings in some interesting new vulkan tests and fixes for the
spurious KHR-GL TF failures. Also, reduces the runtime of
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36 so that it
should stop timing out.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
After starting to use a new version of the simulator, it got
outdated.
We made some initial effort to update it, but it was not
working. Taking into account that no one is using it, it is better to
just remove it.
We keep the noop drm drivers, as they could have some value for
developers that doesn't have access to the v3dv3 simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14682>
Semaphores info was stored as an info of event_wait cpu jobs and this
leads to mem leak when the same event_wait job in the same cmd buffer
batch was submitted more than once. As a result,
`dEQP-VK.api.command_buffers.record_simul_use_primary` fails due to a
double-free of sems_info.
In this patch, we no longer use v3dv_event_wait_cpu_job_info to store
semaphores from a submit info, since semaphores is related to a queue
submission and not to the event_wait job type. If we spawn a wait_thread,
we copy semaphores to an auxiliary struct (v3dv_wait_thread_info) that
will be used in wait_thread to get job and semaphores information. When
the spawned thread finishes, it releases the related
v3dv_wait_thread_info and the semaphores copy as well.
Fixes: d5bd18fb ("v3dv: store wait semaphores in event_wait_cpu_job_info")
Suggested-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14736>
Just as with all other TLB operations, we can only use the TLB if the render
area is aligned to tile boundaries. If it is not, then the operation would
overwrite pixels outside the render area, which is not allowed.
In this case, we can't even emit a previous TLB load to fix this because the
TLB has the multisampled attachment, not the resolve attachment, which is
just a destination buffer for the tile store.
Because the condition for tile alignment has to be determined for each
subpass, we handle this by storing this information in the attachment
state of the command buffer with the start of each subpass. We store
whether the attachment is to be resolved and whether it can use the
TLB (considering tile alignment restrictions).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14752>
This is a left over from when we added multi-version support in the
driver, where we turned this helper into a versioned scheme.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14704>
The new runner reduces the runtime by about 1/3 thanks to using rust
instead of python, and includes automatic flake handling so you don't just
have to skip flaky tests. The wrapper script also includes IRC flake
reporting (so one can track and update the flakes list to improve CI
reliability), always uploading results to CI for review (so you can
diagnose flakes and look at timings), has a prettier regressions report
and a helpful timing report, and is the same as what's used by all the HW
runners as well.
The downside is that by dropping the massive list of skips, you no longer
get flagged if Mesa refactors end up accidentally disabling extensions and
thus making tests skip. For that, I've started on
https://gitlab.freedesktop.org/anholt/deqp-runner/-/merge_requests/33 so
that hardware drivers get extension checking coverage too.
Thanks to the perf improvement, we get to drop one of the jobs for
llvmpipe.
xfail lists were mostly sed-jobs from the prior expectations lists. The
exceptions to that you'll find in the form of whitespace around the
affected test group (usually changes of capitalization or
special-characters), or an explanation for the more interesting changes
(which thankfully we can now record in the xfails lists!).
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14604>
Allow to vectorize operations from a smaller bit-size into
scalar operations of a larger bit-size. This allows us to
turn 2x8-bit into a equivalent scalar 16-bit load/store.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This generalizes the support we added for 16-bit to also handle
8-bit loads via ldunifa. The story is the same: we align the address
to 32-bit downwards and we skip any bytes that are not of interest.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Just like with 16-bit, this mode only supports scalar access, but
we are already lowering all non 32-bit accesses to scalar.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Since ldunif is a 32-bit instruction we need to demote these to
UBO loads, like we do for indirect indexing, with the exception
of scalar 16bit uniforms with an offset that is 32-bit aligned.
For the exception where we can use lfdunif we read a 32-bit slot
from memory where the uniform data is in the lower 16-bit and we
will read garbage in the upper 16-bit which we won't use anyway.
It should be noted that by using ldunif, we are consuming
32-bit from the uniform stream, but this is fine because
if there is valid uniform data in the upper 16-bit (i.e.
we had a ivec2 uniform aligned to a 32-bit address), since
we scalarize 16-bit loads, we would see another load uniform
with an unaligned offset for the second component, which we
will demote to UBO.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
These are required by VK_KHR_16bit_storage. Our hardware, however,
doesn't provide any mechanism to decide on the rounding mode of
the conversion and it seems to be using RTE, so we implement
RTZ in software.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
If we know we have a load with a constant offset, then even if it
is not aligned to 32-bit we can still produce an aligned offset
and then skip over the bytes we don't need.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Even though ldunifa is strictly 32-bit we may be able to use it
to load 16-bit values that sit at 32-bit aligned addresses.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
The vectorization pass can inject 32_2x16 (un)packing opcodes
upon successful vectorization of 16-bit operations into 32-bit
counterparts, so make sure we lower these to something our
backend can handle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This allows us to implement 16-bit access on uniform and
storage buffers.
Notice that V3D hardware can only do general access on scalar
16-bit elements, which we currently enforce by running a lowering
pass during shader compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.
It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Without this the simulator wrapper will abort upon seeing this
query, rendering the driver unusable in that context.
Also, it seems the simulator environment doesn't quite work with
multisync at present, so do not enable it until we figure out what
the issue is.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14678>
When we create a image view with D24S8 format we made a reformatting
to RGBA8UI if the aspect selected is just STENCIL. But when
configuring the stores we select the aspects based on the attachment
format. Quoting from cmd_buffer_render_pass_emit_stores:
/* From the Vulkan spec, VkImageSubresourceRange:
*
* "When an image view of a depth/stencil image is used as a
* depth/stencil framebuffer attachment, the aspectMask is ignored
* and both depth and stencil image subresources are used."
*
* So we ignore the aspects from the subresource range of the image
* view for the depth/stencil attachment, but we still need to restrict
* the to aspects compatible with the render pass and the image.
*/
const VkImageAspectFlags aspects =
vk_format_aspects(ds_attachment->desc.format);
So we could ending trying to store on a Z+Stencil buffer, using a
RGBA8UI format.
So far this only affected some tests when using the simulator
(assert). Those were working on the real hw, but probably would fail
on other situations, so lets use the original image format on that
case.
v2 (Iago)
* Improve comment grammar
* Do the same on load too (not just store)
v3 (Iago)
* Re-word comments.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14635>
We track last submitted jobs by queue type. After all cmd buffer
batches have been submitted, we emit a noop job that waits all jobs
submitted to each GPU queue complete and signals the fence.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphores support, we can use a GPU job to handle
multiple signal semaphores in the end of a cmd buffer batch. It
means, the last job in the last cmd buffer will be in change of
signalling semaphores as long as it meets some conditions:
1 - A GPU-job signals semaphores whenever we only have submitted
jobs for the same queue (there is no syncobj created for any
other type). Otherwise, we emit a noop job that waits on the
completion of all jobs submitted and then signals semaphores.
2 - A CPU-job is never in charge of signalling semaphores. We
process it first and emit a noop job that depends on all jobs
previously submitted to signal semaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphore support, we can improve the way we handle
wait semaphores considering different job types and cmd buffer
batch scenarios, that means:
- A GPU job depends on wait semaphores whenever it is the first job
submitted to a queue in this command buffer batch (the `first` flag
for the job's queue type is set).
- For the first CPU job, if there are wait semaphores, we should
wait for the CPU and GPU being idle to process the job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
The order in which a GPU job is scheduled is guaranteed within the
same queue type (CL, TFU, CSD), but the order of completion of jobs
from different queues cannot be guaranteed. Since we have multiple
semaphores support now, we can track the completion of the last job
submitted to each queue and therefore better determine when gpu is
idle. We do it using an array of syncobj (last_job_syncs) for each
GPU queue (CL, TFU, CSD). With this, job serialization also become
more accurate. We also keep tracking the very last job submitted
(last_job_sync became an element of the last_job_syncs array as
V3DV_QUEUE_ANY) for the case we don't have multisync support.
To help in handling wait semaphores, we set a flag per queue to
indicate we are starting a new cmd buffer batch and a job submitted
to this queue will be the first.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
In addition to keep a copy of wait semaphores, extend
v3dv_submit_info_semaphores to hold a copy of signal semaphores too.
With a copy of wait and signal semaphores, we can enable GPU jobs to
handle more than one wait and signal semaphores.
By now, we don't change the way as we signalling semaphores when all
jobs complete, i.e., we still use the master thread to signal
semaphores. In this context, no GPU job is actually in charge of
signalling, but the support for multiple signal semaphores is done
here.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in a tfu job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in cl submission. In CL, we have two
kind of job (bin and render), therefore, we need also to determine
the stage to sync, that means to add job dependencies/wait
semaphores.
Also, as we currently process all signal semaphores of a cmd buffer
batch together in the submit master thread (when the last wait
thread completes), there isn't now a situation in which GPU jobs
need to handle signal VkSemaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of a boolean (sem_wait) in v3dv_event_wait_cpu_job_info,
that is used to determine wait condition for jobs put in a wait
thread, copy the wait semaphores data and store it as struct
v3dv_submit_info_semaphores. In the following patches we enable
multiple semaphores in GPU jobs, and therefore we need this data
to add wait semaphores as job dependencies for pending jobs
submitted from a wait thread.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of pass pSubmit to queue_submit_cmd_buffer, create a struct
v3dv_submit_info_semaphores to wrap semaphores data from VkSubmitInfo.
In the next commit, this struct will help to handle wait condition for
jobs submitted in a wait event context, since we need to hold this
data when handle wait events and pass it to queue_submit_job() called
from wait threads. The main goal is to allow multiple wait semaphores
in a job submission. Later, this struct will be extended to include a
copy of signal semaphores too.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
is_wait_thread is passed, but not actually used; and cpu_queue_handle_idle
is in charge to handle wait threads spawned before this one.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>