From:
defined[\s]*\([\s]*PIPE_(OS|ARCH|CC)_([0-9A-Z_]+)[\s]*\)
To:
DETECT_$1_$2
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19674>
After 'v3dv: fix debug dump on BO free' we changed the order, and this
lead to the following test
dEQP-VK.api.object_management.multithreaded_per_thread_resources.device_memory_small
v2: Expanded comment just before the reset, explaining that we need to
do the reset before we free the BO from the kernel (Iago)
Raising this assertion:
deqp-vk: ../src/broadcom/vulkan/v3dv_bo.c:281: v3dv_bo_alloc: Assertion `bo && bo->handle == 0' failed.
Fixes: 2c44597181 ('v3dv: fix debug dump on BO free')
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19693>
This reverts commit cb02cf464c.
There are 3 reported flakes over a period of a month, and we have been
unable to reproduce it even once. It clearly doesn't happen often enough
to warrant disabling our vulkan CI, so let's restore it while we
continue to try to reproduce the issue on our side.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19720>
Imported BOs are not allocated by the device so we don't
update BO stats when they are imported. Therefore, we should
not be updating them when they are freed either.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19675>
This can cause us to stomp the contents of r5 before we have a chance to read
it, like this:
0x3d103186bb800000 nop ; nop ; ldvary.r0
0x3d105686bbf40000 nop ; mov rf26, r5 ; ldvary.r1
0x020000ef0000d000 bu.allna 232, r:unif (0x0000001c / 0.000000)
0x3d1096c6bbf40000 nop ; mov rf27, r5 ; ldvary.r2
Here, the MOV in the last instruction is supposed to read r5 produced from
ldvary.r0, but because we have inserted the bu instruction in between now
that read happens at the same time that ldvary.r1 updates r5, stomping the
value we were supposed to read.
Fix this by disallowing injection of a branch instruction in between an ldvary
instruction and its write to the r5 register 2 instructions later.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7062
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19616>
All defined in the baremetal-test-arm*
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19548>
We don't need to login anymore, but we can't use plain minio commands
now. `ci-fairy` got a helper as `s3cp` to keep an almost identical
API.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19076>
We have been incorrectly assuming there was just one for all the
events, apparently CTS never uses more than one event.
Fixes: e6884df088 ('v3dv: fix event synchronization')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19518>
Since we now implement events in the GPU we need to be more careful
and insert barriers to honor the dependencies provided by the API
as well as ensuring we are synchronizing these with the compute
queue, since that is how we implement GPU event functionality.
Fixes: ecb01d53fd ("v3dv: refactor events")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19458>
These leaks on device creation failure have been there before, but
were only exposed as CTS failures after the recent event refactoring.
Partially fixes:
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
dEQP-VK.api.object_management.alloc_callback_fail.device
dEQP-VK.api.object_management.alloc_callback_fail.device_group
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19458>
We are initializing the device, so we know this will be NULL.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19458>
Now that we implement GPU-side event functions in the GPU we
no longer have the issue that didn't allow us to expose
sync_fd.
Further more, new spec text has also made the problematic
behavior undefined, so the test that caused this issue,
dEQP-VK.api.external.semaphore.sync_fd.import_twice_temporary,
is incorrect and should be fixed.
It should be noted that we still keep sync_fd disabled in the
simulator, at least until the CTS tests are fixed, since the
synchronous execution model of the simulator means that in the
problematic scenario we can block the CPU on the execution
of the command buffer before we ever submit the signaling job,
still causing a deadlock.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
This replaces our current implementation, which is 100% CPU based,
with an implementation that uses compute shaders for the GPU-side
event functions. The benefit of this solution is that we no longer
need to stall on the CPU when we need to handle GPU-side event
commands.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
In vulkan, we load descriptors via vulkan resource index, which
returns a vec2, of which we want component 0 which holds the actual
index. Typically, this will be cleaned-up by the time we get to
emitting VIR so the index is a single scalar component, but there
are some cases where this might no be the case, so make sure we don't
assume it to be a scalar, like we do in other places.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
In 7f6ecb8667 we added reference counting for descriptor set layouts,
however, we didn't realize that pools created without the flag
VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT don't free individual
descriptors and can only be reset or destroyed. Since we only drop
references when individual descriptor sets were destroyed, we would
leak set layouts referenced from descriptor sets allocated from these
pools.
Fix that by keeping a list of all allocated descriptor sets (no matter
whether VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT is present or
not) and then traversing the list dropping the references on pool resets
and destroys.
Fixes: 7f6ecb8667 ('v3dv: add reference counting for descriptor set layouts')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19337>
nir_opt_gcm get us worse shader-db stats, but that is expected. But we
want to prevent to get worse values on spill/fills. Analyzing the
outcome with shader-db, this mostly happen with shaders that are
already complex, and are already spilling/filling.
So the best option here is adding a new strategy, that fall backs if
we get spill/fill using nir_opt_gcm.
It is not clear in which order we should disable gcm. For now we
disable it before loop unrolling.
We get a slight performance gain (in average) using nir_opt_gcm.
We don't show the shaderdb stats, as they are worse, but as mentioned,
this is expected.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
That allows to reduce the number of parameters of the method. And
after all, they were already filled using an existing strategy struct.
This would make easier adding new fields on a strategy.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Instead of using a custom optimize_nir method, with the same purpose.
Running the fossils for the v3dv well know applications (ue4 demos,
Quake3d, etc) we got somewhat inconclusive outcome in general,
although slightly worse values:
Instrs: 265129 -> 265277 (+0.06%); split: -0.06%, +0.12%
Thread Count: 5504 -> 5506 (+0.04%)
Totals from 153 (10.23% of 1495) affected shaders:
Instrs: 84603 -> 84751 (+0.17%); split: -0.19%, +0.37%
Thread Count: 316 -> 318 (+0.63%)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Optimizations that we are already calling on the Vulkan driver. As
preparation to the Vulkan frontend to use v3d_optimize_nir too.
We need to add a new parameter to v3d_optimize_nir in order to know if
we can call nir_opt_find_array_copies. As we don't track if we are
calling nir_var_lower_copies, we explicitly call it when we create the
uncompiled shader create. So instead of tracking, we assume that each
driver (v3d/v3dv) would call it when the shader is created. So when
v3d_optimize_nir is called as part of the process to compile it at the
compiler, we call it with allow_copies as false.
We exclude on purpose nir_opt_gcm as it is a case of a optimization
that could help performance even if it hurts shader db stats.
shaderdb stats:
total instructions in shared programs: 11705923 -> 11705034 (<.01%)
instructions in affected programs: 88350 -> 87461 (-1.01%)
helped: 201
HURT: 80
Instructions are helped.
total threads in shared programs: 375552 -> 375558 (<.01%)
threads in affected programs: 6 -> 12 (100.00%)
helped: 3
HURT: 0
total uniforms in shared programs: 3486108 -> 3485789 (<.01%)
uniforms in affected programs: 7473 -> 7154 (-4.27%)
helped: 90
HURT: 1
Uniforms are helped.
total max-temps in shared programs: 2021860 -> 2021802 (<.01%)
max-temps in affected programs: 800 -> 742 (-7.25%)
helped: 21
HURT: 3
Max-temps are helped.
total sfu-stalls in shared programs: 19299 -> 19296 (-0.02%)
sfu-stalls in affected programs: 18 -> 15 (-16.67%)
helped: 10
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 11725222 -> 11724330 (<.01%)
inst-and-stalls in affected programs: 88402 -> 87510 (-1.01%)
helped: 201
HURT: 80
Inst-and-stalls are helped.
total nops in shared programs: 269674 -> 269386 (-0.11%)
nops in affected programs: 3641 -> 3353 (-7.91%)
helped: 103
HURT: 29
Nops are helped.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
For the non-ssa case, we were trying to use reg->num_components. But
this is not the same that nir_ssa_def_components_read. It is the
number of components of the destination register. And in the 16bit
case, even if nir_lower_tex packs the outcome, it doesn't update the
number of components, as nir_tex_instr_dest_size would still return
4. And nir validate would check that those values are the same.
So this change focuses on the last part of this comment at
nir_lower_tex:
* Note that we don't change the destination num_components, because
* nir_tex_instr_dest_size() will still return 4. The driver is just
* expected to not store the other channels, given that nothing at the
* NIR level will read them.
We just limit how many channels we would use for the f16 case.
It is also worth to note, based on the CTS and different applications
we test, that this is a corner case.
This was detected when we experimented to enable nir_opt_gcm for v3d,
that lead to raise an assertion slightly below with some shaderdb
tests, but technically it could happen without it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
For compute shaders, to avoid a crash with that optimization, it requires
doing some optimizations and lowerings before. Example:
static void
lower_cs_shared(struct nir_shader *nir)
{
NIR_PASS_V(nir, nir_lower_vars_to_explicit_types,
nir_var_mem_shared, shared_type_info);
NIR_PASS_V(nir, nir_lower_explicit_io,
nir_var_mem_shared, nir_address_format_32bit_offset);
}
In the same way other drivers (like anv) calls
nir_opt_load_store_vectorize as part of their post-process-nir.
So one option would be to move nir_opt_load_store_vectorize outsize
the common v3d_nir_optimize, to a post-process nir method.
To make things simpler, this change calls that optimization only if we
have a v3d_compiler object, that is when each frontend has already
done their lowerings, and call the v3d_compiler to get the final
assembly (so we are already on a kind of post processing nir step).
This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we
start to call v3d_optimize_nir on v3dv directly.
Slight shaderdb changes, but not significant.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Even if there is a slight difference of meaning between FIXME and
TODO, at some point we agreed to use just FIXME for all pending things
to do, just to make it easier to grepping for things that can be done.
And after all, one could argue that is there is something pending TO
DO, is that needs FIXING.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19225>
Let the error returned be bubbled up.
Fixes: dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
Fixes: 591103d04d ("v3dv: don't return incompatible driver if GPU is not present")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18901>
If the pipeline was created with the creation flags
VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR or
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR it is
really likely that methods from VK_KHR_pipeline_executable_properties
that would require having access to the qpu insts around will be
called.
Instead of getting those back from the BO where we upload them, we
just keep them around. This could require more host memory, but would
allow us to avoid needing to handle map/unmap the BO when needed (so
needing the host memory in any case). This can be tricky if those
methods are being called from different threads (so we can avoid
adding a mutex there).
In the same way, if the pipeline was not created with those flags, we
skip collecting data that requires the QPU. Only
GetPipelineExecutableProperties is allowed to be called without any of
those flags, and doesn't require that info.
This fixes a race condition crash at GetPipelineExecutableProperties
when using fossilize-replay with some fossils with several shaders,
and using several threads, as some thread would be unmapping the bo
before other thread stopped to use it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18859>