Now that we implement GPU-side event functions in the GPU we
no longer have the issue that didn't allow us to expose
sync_fd.
Further more, new spec text has also made the problematic
behavior undefined, so the test that caused this issue,
dEQP-VK.api.external.semaphore.sync_fd.import_twice_temporary,
is incorrect and should be fixed.
It should be noted that we still keep sync_fd disabled in the
simulator, at least until the CTS tests are fixed, since the
synchronous execution model of the simulator means that in the
problematic scenario we can block the CPU on the execution
of the command buffer before we ever submit the signaling job,
still causing a deadlock.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
This replaces our current implementation, which is 100% CPU based,
with an implementation that uses compute shaders for the GPU-side
event functions. The benefit of this solution is that we no longer
need to stall on the CPU when we need to handle GPU-side event
commands.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
In vulkan, we load descriptors via vulkan resource index, which
returns a vec2, of which we want component 0 which holds the actual
index. Typically, this will be cleaned-up by the time we get to
emitting VIR so the index is a single scalar component, but there
are some cases where this might no be the case, so make sure we don't
assume it to be a scalar, like we do in other places.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
In 7f6ecb8667 we added reference counting for descriptor set layouts,
however, we didn't realize that pools created without the flag
VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT don't free individual
descriptors and can only be reset or destroyed. Since we only drop
references when individual descriptor sets were destroyed, we would
leak set layouts referenced from descriptor sets allocated from these
pools.
Fix that by keeping a list of all allocated descriptor sets (no matter
whether VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT is present or
not) and then traversing the list dropping the references on pool resets
and destroys.
Fixes: 7f6ecb8667 ('v3dv: add reference counting for descriptor set layouts')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19337>
nir_opt_gcm get us worse shader-db stats, but that is expected. But we
want to prevent to get worse values on spill/fills. Analyzing the
outcome with shader-db, this mostly happen with shaders that are
already complex, and are already spilling/filling.
So the best option here is adding a new strategy, that fall backs if
we get spill/fill using nir_opt_gcm.
It is not clear in which order we should disable gcm. For now we
disable it before loop unrolling.
We get a slight performance gain (in average) using nir_opt_gcm.
We don't show the shaderdb stats, as they are worse, but as mentioned,
this is expected.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
That allows to reduce the number of parameters of the method. And
after all, they were already filled using an existing strategy struct.
This would make easier adding new fields on a strategy.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Instead of using a custom optimize_nir method, with the same purpose.
Running the fossils for the v3dv well know applications (ue4 demos,
Quake3d, etc) we got somewhat inconclusive outcome in general,
although slightly worse values:
Instrs: 265129 -> 265277 (+0.06%); split: -0.06%, +0.12%
Thread Count: 5504 -> 5506 (+0.04%)
Totals from 153 (10.23% of 1495) affected shaders:
Instrs: 84603 -> 84751 (+0.17%); split: -0.19%, +0.37%
Thread Count: 316 -> 318 (+0.63%)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Optimizations that we are already calling on the Vulkan driver. As
preparation to the Vulkan frontend to use v3d_optimize_nir too.
We need to add a new parameter to v3d_optimize_nir in order to know if
we can call nir_opt_find_array_copies. As we don't track if we are
calling nir_var_lower_copies, we explicitly call it when we create the
uncompiled shader create. So instead of tracking, we assume that each
driver (v3d/v3dv) would call it when the shader is created. So when
v3d_optimize_nir is called as part of the process to compile it at the
compiler, we call it with allow_copies as false.
We exclude on purpose nir_opt_gcm as it is a case of a optimization
that could help performance even if it hurts shader db stats.
shaderdb stats:
total instructions in shared programs: 11705923 -> 11705034 (<.01%)
instructions in affected programs: 88350 -> 87461 (-1.01%)
helped: 201
HURT: 80
Instructions are helped.
total threads in shared programs: 375552 -> 375558 (<.01%)
threads in affected programs: 6 -> 12 (100.00%)
helped: 3
HURT: 0
total uniforms in shared programs: 3486108 -> 3485789 (<.01%)
uniforms in affected programs: 7473 -> 7154 (-4.27%)
helped: 90
HURT: 1
Uniforms are helped.
total max-temps in shared programs: 2021860 -> 2021802 (<.01%)
max-temps in affected programs: 800 -> 742 (-7.25%)
helped: 21
HURT: 3
Max-temps are helped.
total sfu-stalls in shared programs: 19299 -> 19296 (-0.02%)
sfu-stalls in affected programs: 18 -> 15 (-16.67%)
helped: 10
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 11725222 -> 11724330 (<.01%)
inst-and-stalls in affected programs: 88402 -> 87510 (-1.01%)
helped: 201
HURT: 80
Inst-and-stalls are helped.
total nops in shared programs: 269674 -> 269386 (-0.11%)
nops in affected programs: 3641 -> 3353 (-7.91%)
helped: 103
HURT: 29
Nops are helped.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
For the non-ssa case, we were trying to use reg->num_components. But
this is not the same that nir_ssa_def_components_read. It is the
number of components of the destination register. And in the 16bit
case, even if nir_lower_tex packs the outcome, it doesn't update the
number of components, as nir_tex_instr_dest_size would still return
4. And nir validate would check that those values are the same.
So this change focuses on the last part of this comment at
nir_lower_tex:
* Note that we don't change the destination num_components, because
* nir_tex_instr_dest_size() will still return 4. The driver is just
* expected to not store the other channels, given that nothing at the
* NIR level will read them.
We just limit how many channels we would use for the f16 case.
It is also worth to note, based on the CTS and different applications
we test, that this is a corner case.
This was detected when we experimented to enable nir_opt_gcm for v3d,
that lead to raise an assertion slightly below with some shaderdb
tests, but technically it could happen without it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
For compute shaders, to avoid a crash with that optimization, it requires
doing some optimizations and lowerings before. Example:
static void
lower_cs_shared(struct nir_shader *nir)
{
NIR_PASS_V(nir, nir_lower_vars_to_explicit_types,
nir_var_mem_shared, shared_type_info);
NIR_PASS_V(nir, nir_lower_explicit_io,
nir_var_mem_shared, nir_address_format_32bit_offset);
}
In the same way other drivers (like anv) calls
nir_opt_load_store_vectorize as part of their post-process-nir.
So one option would be to move nir_opt_load_store_vectorize outsize
the common v3d_nir_optimize, to a post-process nir method.
To make things simpler, this change calls that optimization only if we
have a v3d_compiler object, that is when each frontend has already
done their lowerings, and call the v3d_compiler to get the final
assembly (so we are already on a kind of post processing nir step).
This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we
start to call v3d_optimize_nir on v3dv directly.
Slight shaderdb changes, but not significant.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Even if there is a slight difference of meaning between FIXME and
TODO, at some point we agreed to use just FIXME for all pending things
to do, just to make it easier to grepping for things that can be done.
And after all, one could argue that is there is something pending TO
DO, is that needs FIXING.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19225>
Let the error returned be bubbled up.
Fixes: dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
Fixes: 591103d04d ("v3dv: don't return incompatible driver if GPU is not present")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18901>
If the pipeline was created with the creation flags
VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR or
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR it is
really likely that methods from VK_KHR_pipeline_executable_properties
that would require having access to the qpu insts around will be
called.
Instead of getting those back from the BO where we upload them, we
just keep them around. This could require more host memory, but would
allow us to avoid needing to handle map/unmap the BO when needed (so
needing the host memory in any case). This can be tricky if those
methods are being called from different threads (so we can avoid
adding a mutex there).
In the same way, if the pipeline was not created with those flags, we
skip collecting data that requires the QPU. Only
GetPipelineExecutableProperties is allowed to be called without any of
those flags, and doesn't require that info.
This fixes a race condition crash at GetPipelineExecutableProperties
when using fossilize-replay with some fossils with several shaders,
and using several threads, as some thread would be unmapping the bo
before other thread stopped to use it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18859>
Before this commit we were using individual pointers to each pipeline
stage struct. We did that instead of an array because we needed to had
a pointer for the binner stages too, and at that time we didn't have a
enum to handle those stages.
Since then we introduced broadcom_shader_stage, and started to use in
a lot of places (and per-stage arrays) so we can now use an array.
The main advantage is being able to handle several cases as
loops. This also adds some consistency to the code (because as
mentioned, in a lot of other places we use an array).
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18943>
This feature is only concerned with buffers bound through a descriptor
set. We are still keeping the code for this (disabled by default) since
it may be useful for debugging some scenarios.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>
Our implemention was bogus, it was only putting a cap on the offset
based on the aligned buffer size and this doesn't ensure the access
to the buffer happens within its valid range.
I think the only reason we have been passing the tests is that we
align all buffers sizes to 256B and the tests create buffers with a
size that is smaller than that (like 64B). When get the size of the
buffer from the shader, we get the actual bound range (so 64B in this
case) and by capping to that we don't ensure the access will stay
within that range, but we ensure it will stay within the underlying
memory bound to the buffer (256B), and this is fine by the spec,
however, I think if the actual buffer range was the same as the
underlying allocation we would fail the tests.
A valid behavior for robust buffer access on an out-of-bounds access
is to return any valid bytes within the buffer, so we can just
make that offset 0.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>
This is the minimum required by KHR_maintenance4 and there is no
reason we can't support this.
The only restriction we have is that the texture state base
address (which comes into play with texel buffers) must be aligned
to 4-bits, but this doesn't restrict the size of the buffer, only
its base address, and we already have requirements for buffer
alignment that ensure this.
Fixes: dEQP-VK.api.info.vulkan1p3_limits_validation.khr_maintenance4
Fixes: 2c388c1d ('v3dv: set maxBufferSize property')
Acked-by: Eric Engestrom <eric@igalia.com>
Tested-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18748>
This extension was promoted to Vulkan 1.3 so we should be setting its
properties directly in the VkPhysicalDeviceVulkan13Properties struct
which the common mesa code will use to populate outgoing properties.
Apparently, only the properties struct was promoted and not the features
struct.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Tested-by: Eric Engestrom <eric@igalia.com>
Fixes: ee62a4c751 ('v3dv: implement VK_EXT_texel_buffer_alignment')
Fixes: dEQP-VK.api.info.get_physical_device_properties2.properties.basic
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18697>
If we emit a ldunif to load the ubo/ssbo base address and
then we are immediately moving it to the unifa register we
can have the ldunif write directly to unifa and avoid the mov
in between, which won't be done by copy propagation because that
only works with temp registers.
Also, since we can't read from unifa we must be careful to disallow
reuse of the ldunif result for a future ldunif of the same base address.
We do that by only reusing ldunif results from temp registers.
total instructions in shared programs: 12468943 -> 12455139 (-0.11%)
instructions in affected programs: 1661233 -> 1647429 (-0.83%)
helped: 8307
HURT: 3994
total uniforms in shared programs: 3704532 -> 3704522 (<.01%)
uniforms in affected programs: 339 -> 329 (-2.95%)
helped: 7
HURT: 0
total max-temps in shared programs: 2148158 -> 2148290 (<.01%)
max-temps in affected programs: 9320 -> 9452 (1.42%)
helped: 175
HURT: 295
total spills in shared programs: 2202 -> 2202 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 3059 -> 3057 (-0.07%)
fills in affected programs: 27 -> 25 (-7.41%)
helped: 1
HURT: 0
total sfu-stalls in shared programs: 21167 -> 21056 (-0.52%)
sfu-stalls in affected programs: 497 -> 386 (-22.33%)
helped: 209
HURT: 127
total inst-and-stalls in shared programs: 12490110 -> 12476195 (-0.11%)
inst-and-stalls in affected programs: 1662875 -> 1648960 (-0.84%)
helped: 8312
HURT: 3987
total nops in shared programs: 316563 -> 313553 (-0.95%)
nops in affected programs: 24269 -> 21259 (-12.40%)
helped: 2158
HURT: 1006
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>
It is possible for some signals to write to unifa directly. We will
enable this from ldunif shortly so we should check for it here.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>
We had a comment stating that we were using different program ids for render
and binning but this isn't true. We were only assigning ids to the render
stages and then we would create the binning stages and not assign a program id
to them at all, so they would remain with a program id of 0.
This change removes the comment and makes sure we assign the same program
id to the binning and render stages of the pipeline, which makes it a lot
easier to match render and binning shaders when debugging.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18630>
Instead, we should just return VK_SUCCESS. The physical device
won't be initialized and vkEnumeratePhysicalDevices will not
list it as available, which is the expected behavior here.
Also, VK_ERROR_INCOMPATIBLE_DRIVER is not a valid return code
from vkEnumeratePhysicalDevices, so never return that, instead
we return VK_ERROR_INITIALIZATION_FAILED if a valid device was
found but we failed to create the physical device for it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-By: Ryan Houdek <Sonicadvance1@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18591>
This extension adds new NONE attachment load / store operations,
which are identical to the DONT_CARE variants with the difference
that DONT_CARE doesn't ensure that the original contents of the
memory within the render area are preserved and these new versions
do (with some caveats).
Our implementation was not destroying data with DONT_CARE anyway
so we already support the new semantics. Our implementation is
such that we don't need to do anything specific with the new
operations and the current behavior will do what is expected.
We pass all the tests under:
dEQP-VK.renderpass*.load_store_op_none.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18570>
If the render area is not aligned to tile boundaries it means we have partially
covered tiles in the framebuffer. In this case, we always need to load the tile
buffer from memory in order to preserve the contents outside the render area
on the tile buffer store. However, if in this scenario we know we won't be
storing the tile buffer we can skip the load safely.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18570>
The first argument is the name of the library, and the second argument
is the list of files; those two got a bit mixed up.
Fixes: 1ae8018a6a ("meson: Add support for the vc4 driver.")
Fixes: 4f3e380fa0 ("meson: Add support for the vc5 driver.")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18593>
The hw supports restarts of list primmitives and we pass
all the relevant CTS tests.
We don't advertise patch list restarts because we don't support
tessellation shaders yet.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18544>
We don't have any special requirements for this, so we can just expose
the extension.
The tests in CTS have an issue where they only check if a format is
supported for sampling but don't check if an image with that format
can be created for sampling. In our case, since we can't sample
1D depth/stencil images, this causes affected tests to crash in the
simulator (they pass on the device though). There is an issue with
a fix here:
https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/3923
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18489>
The VC4 and Lima Piglit failures seems to mostly fall in two camps:
1. The hardware lacks sRGB support, but the drivers decide to expose it
nevertheless, with some varying level of emulation. This leads to some
failures, probably because we're missing sRGB decoding somewhere.
2. The spec@ext_texture_compression_s3tc@compressedteximage fails,
mostly due to the test not setting the mipfilter to nearest. With
that fixed, the test passes on VC4, but still fails on Lima due to an
a bit dodgy miplod bias in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18180>
If we fail to init the pipeline the callee will then destroy it
and if we had assigned the layout to the pipeline it will try to
unref it, so make sure we ref it right after assigning it.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7206
Fixes: dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
Fixes: 14dab6b10c ('v3dv: ref/unref pipeline layout objects')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18426>