For the non-ssa case, we were trying to use reg->num_components. But
this is not the same that nir_ssa_def_components_read. It is the
number of components of the destination register. And in the 16bit
case, even if nir_lower_tex packs the outcome, it doesn't update the
number of components, as nir_tex_instr_dest_size would still return
4. And nir validate would check that those values are the same.
So this change focuses on the last part of this comment at
nir_lower_tex:
* Note that we don't change the destination num_components, because
* nir_tex_instr_dest_size() will still return 4. The driver is just
* expected to not store the other channels, given that nothing at the
* NIR level will read them.
We just limit how many channels we would use for the f16 case.
It is also worth to note, based on the CTS and different applications
we test, that this is a corner case.
This was detected when we experimented to enable nir_opt_gcm for v3d,
that lead to raise an assertion slightly below with some shaderdb
tests, but technically it could happen without it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
For compute shaders, to avoid a crash with that optimization, it requires
doing some optimizations and lowerings before. Example:
static void
lower_cs_shared(struct nir_shader *nir)
{
NIR_PASS_V(nir, nir_lower_vars_to_explicit_types,
nir_var_mem_shared, shared_type_info);
NIR_PASS_V(nir, nir_lower_explicit_io,
nir_var_mem_shared, nir_address_format_32bit_offset);
}
In the same way other drivers (like anv) calls
nir_opt_load_store_vectorize as part of their post-process-nir.
So one option would be to move nir_opt_load_store_vectorize outsize
the common v3d_nir_optimize, to a post-process nir method.
To make things simpler, this change calls that optimization only if we
have a v3d_compiler object, that is when each frontend has already
done their lowerings, and call the v3d_compiler to get the final
assembly (so we are already on a kind of post processing nir step).
This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we
start to call v3d_optimize_nir on v3dv directly.
Slight shaderdb changes, but not significant.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
Even if there is a slight difference of meaning between FIXME and
TODO, at some point we agreed to use just FIXME for all pending things
to do, just to make it easier to grepping for things that can be done.
And after all, one could argue that is there is something pending TO
DO, is that needs FIXING.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19225>
Let the error returned be bubbled up.
Fixes: dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
Fixes: 591103d04d ("v3dv: don't return incompatible driver if GPU is not present")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18901>
If the pipeline was created with the creation flags
VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR or
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR it is
really likely that methods from VK_KHR_pipeline_executable_properties
that would require having access to the qpu insts around will be
called.
Instead of getting those back from the BO where we upload them, we
just keep them around. This could require more host memory, but would
allow us to avoid needing to handle map/unmap the BO when needed (so
needing the host memory in any case). This can be tricky if those
methods are being called from different threads (so we can avoid
adding a mutex there).
In the same way, if the pipeline was not created with those flags, we
skip collecting data that requires the QPU. Only
GetPipelineExecutableProperties is allowed to be called without any of
those flags, and doesn't require that info.
This fixes a race condition crash at GetPipelineExecutableProperties
when using fossilize-replay with some fossils with several shaders,
and using several threads, as some thread would be unmapping the bo
before other thread stopped to use it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18859>
Before this commit we were using individual pointers to each pipeline
stage struct. We did that instead of an array because we needed to had
a pointer for the binner stages too, and at that time we didn't have a
enum to handle those stages.
Since then we introduced broadcom_shader_stage, and started to use in
a lot of places (and per-stage arrays) so we can now use an array.
The main advantage is being able to handle several cases as
loops. This also adds some consistency to the code (because as
mentioned, in a lot of other places we use an array).
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18943>
This feature is only concerned with buffers bound through a descriptor
set. We are still keeping the code for this (disabled by default) since
it may be useful for debugging some scenarios.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>
Our implemention was bogus, it was only putting a cap on the offset
based on the aligned buffer size and this doesn't ensure the access
to the buffer happens within its valid range.
I think the only reason we have been passing the tests is that we
align all buffers sizes to 256B and the tests create buffers with a
size that is smaller than that (like 64B). When get the size of the
buffer from the shader, we get the actual bound range (so 64B in this
case) and by capping to that we don't ensure the access will stay
within that range, but we ensure it will stay within the underlying
memory bound to the buffer (256B), and this is fine by the spec,
however, I think if the actual buffer range was the same as the
underlying allocation we would fail the tests.
A valid behavior for robust buffer access on an out-of-bounds access
is to return any valid bytes within the buffer, so we can just
make that offset 0.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>
This is the minimum required by KHR_maintenance4 and there is no
reason we can't support this.
The only restriction we have is that the texture state base
address (which comes into play with texel buffers) must be aligned
to 4-bits, but this doesn't restrict the size of the buffer, only
its base address, and we already have requirements for buffer
alignment that ensure this.
Fixes: dEQP-VK.api.info.vulkan1p3_limits_validation.khr_maintenance4
Fixes: 2c388c1d ('v3dv: set maxBufferSize property')
Acked-by: Eric Engestrom <eric@igalia.com>
Tested-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18748>
This extension was promoted to Vulkan 1.3 so we should be setting its
properties directly in the VkPhysicalDeviceVulkan13Properties struct
which the common mesa code will use to populate outgoing properties.
Apparently, only the properties struct was promoted and not the features
struct.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Tested-by: Eric Engestrom <eric@igalia.com>
Fixes: ee62a4c751 ('v3dv: implement VK_EXT_texel_buffer_alignment')
Fixes: dEQP-VK.api.info.get_physical_device_properties2.properties.basic
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18697>
If we emit a ldunif to load the ubo/ssbo base address and
then we are immediately moving it to the unifa register we
can have the ldunif write directly to unifa and avoid the mov
in between, which won't be done by copy propagation because that
only works with temp registers.
Also, since we can't read from unifa we must be careful to disallow
reuse of the ldunif result for a future ldunif of the same base address.
We do that by only reusing ldunif results from temp registers.
total instructions in shared programs: 12468943 -> 12455139 (-0.11%)
instructions in affected programs: 1661233 -> 1647429 (-0.83%)
helped: 8307
HURT: 3994
total uniforms in shared programs: 3704532 -> 3704522 (<.01%)
uniforms in affected programs: 339 -> 329 (-2.95%)
helped: 7
HURT: 0
total max-temps in shared programs: 2148158 -> 2148290 (<.01%)
max-temps in affected programs: 9320 -> 9452 (1.42%)
helped: 175
HURT: 295
total spills in shared programs: 2202 -> 2202 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 3059 -> 3057 (-0.07%)
fills in affected programs: 27 -> 25 (-7.41%)
helped: 1
HURT: 0
total sfu-stalls in shared programs: 21167 -> 21056 (-0.52%)
sfu-stalls in affected programs: 497 -> 386 (-22.33%)
helped: 209
HURT: 127
total inst-and-stalls in shared programs: 12490110 -> 12476195 (-0.11%)
inst-and-stalls in affected programs: 1662875 -> 1648960 (-0.84%)
helped: 8312
HURT: 3987
total nops in shared programs: 316563 -> 313553 (-0.95%)
nops in affected programs: 24269 -> 21259 (-12.40%)
helped: 2158
HURT: 1006
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>
It is possible for some signals to write to unifa directly. We will
enable this from ldunif shortly so we should check for it here.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>
We had a comment stating that we were using different program ids for render
and binning but this isn't true. We were only assigning ids to the render
stages and then we would create the binning stages and not assign a program id
to them at all, so they would remain with a program id of 0.
This change removes the comment and makes sure we assign the same program
id to the binning and render stages of the pipeline, which makes it a lot
easier to match render and binning shaders when debugging.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18630>
Instead, we should just return VK_SUCCESS. The physical device
won't be initialized and vkEnumeratePhysicalDevices will not
list it as available, which is the expected behavior here.
Also, VK_ERROR_INCOMPATIBLE_DRIVER is not a valid return code
from vkEnumeratePhysicalDevices, so never return that, instead
we return VK_ERROR_INITIALIZATION_FAILED if a valid device was
found but we failed to create the physical device for it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-By: Ryan Houdek <Sonicadvance1@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18591>
This extension adds new NONE attachment load / store operations,
which are identical to the DONT_CARE variants with the difference
that DONT_CARE doesn't ensure that the original contents of the
memory within the render area are preserved and these new versions
do (with some caveats).
Our implementation was not destroying data with DONT_CARE anyway
so we already support the new semantics. Our implementation is
such that we don't need to do anything specific with the new
operations and the current behavior will do what is expected.
We pass all the tests under:
dEQP-VK.renderpass*.load_store_op_none.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18570>
If the render area is not aligned to tile boundaries it means we have partially
covered tiles in the framebuffer. In this case, we always need to load the tile
buffer from memory in order to preserve the contents outside the render area
on the tile buffer store. However, if in this scenario we know we won't be
storing the tile buffer we can skip the load safely.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18570>
The first argument is the name of the library, and the second argument
is the list of files; those two got a bit mixed up.
Fixes: 1ae8018a6a ("meson: Add support for the vc4 driver.")
Fixes: 4f3e380fa0 ("meson: Add support for the vc5 driver.")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18593>
The hw supports restarts of list primmitives and we pass
all the relevant CTS tests.
We don't advertise patch list restarts because we don't support
tessellation shaders yet.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18544>
We don't have any special requirements for this, so we can just expose
the extension.
The tests in CTS have an issue where they only check if a format is
supported for sampling but don't check if an image with that format
can be created for sampling. In our case, since we can't sample
1D depth/stencil images, this causes affected tests to crash in the
simulator (they pass on the device though). There is an issue with
a fix here:
https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/3923
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18489>
The VC4 and Lima Piglit failures seems to mostly fall in two camps:
1. The hardware lacks sRGB support, but the drivers decide to expose it
nevertheless, with some varying level of emulation. This leads to some
failures, probably because we're missing sRGB decoding somewhere.
2. The spec@ext_texture_compression_s3tc@compressedteximage fails,
mostly due to the test not setting the mipfilter to nearest. With
that fixed, the test passes on VC4, but still fails on Lima due to an
a bit dodgy miplod bias in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18180>
If we fail to init the pipeline the callee will then destroy it
and if we had assigned the layout to the pipeline it will try to
unref it, so make sure we ref it right after assigning it.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7206
Fixes: dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
Fixes: 14dab6b10c ('v3dv: ref/unref pipeline layout objects')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18426>
We were computing these from the final swizzle resulting from
combining the format swizzle and the view swizzle, but here we
want to use the format swizzle alone, which is the one we
use to define these properties in the format table.
Fixes CTS test fails with EXT_border_color_swizzle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18416>
The best way to tune this value is to test Vulkan
applications. Current somewhat big value (512), was obtained by
testing only vkQuake2. Additionally at that time the bo cache was the
first performance oriented improvement we implemented.
After more improvements were included, and retested with more
applications, the conclusion is that we can reduce the value. More
info on the issue that closes.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7090
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18398>
Acked-by: Juan A. Suarez <jasuarez@igalia.com> # for broadcom
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> # for zink
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18318>
This is the standard pattern in the kernel for providing vfunc tables
for C objects. We're using it in the pipeline cache code but we're
about to start adding more stuff and so it really helps if we have it
for command buffers as well.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18324>
Most other init functions follow the Vulkan API convention of putting
the parent object first.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18324>