Venus query records have been properly propagated from nested cmds
already, so no special care is needed here for qfb optimizations.
Test:
- dEQP-VK.api.command_buffers.*nested*
- dEQP-VK.conditional_rendering.*nested*
- dEQP-VK.draw.dynamic_rendering.nested_*
- dEQP-VK.multiview.*nested*
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33992>
Stride can be zero if there are less than two queries to copy.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 7755c41b3e ("panvk/csf: Rework the occlusion query logic to avoid draw flushes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34020>
This matches Intel driver. Chromium always sets RT format to YUV420
which would cause us to not report other formats as supported.
Only check that the RT format is actually supported when creating
config, but don't limit supported surface formats.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34001>
This allows the exported fds to be mapped for writing. This is needed
for virtgpu native ctx support where the fds are mapped rw when the
mappings are added to the guest by kvm. This aligns with other mesa
drivers, and unblocks the extended testing with venus on top.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34017>
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>
We also want to run Android CTS in the Android jobs.
Since the Android CTS is quite large, download it and strip it down to
only contain the interesting tests, so to reduce the space taken in the
container image.
Eventually we might want to have android-cts be run via deqp-runner
itself, but for now add a proof-of-concept mechanism which calls the
android-cts directly and uses an ad-hoc handling of expectations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33499>
This simplifies the propagation of the protection value, we just set
it on buffer->address at creation time and forget about it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33909>
This was recently broken because pipelineCacheUUID was computed using
the physical device cache key. This caused SteamOS precompilation to
not happen for games that have shaders-based drirc.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33875>
There are a lot of things that can't be tested outside of the driver,
like drirc workarounds, RADV_DEBUG options and debugging stuff.
Writing RADV specific tests would help to avoid introducing regressions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33875>
A significant CPU performance bottleneck in mesa GL is refcounting atomics:
even with the current pinning attempts, eliminating them can yield huge
performance gains (easily verified by running drawoverhead with return false at the top of pipe_reference_described()).
This is a proof of concept for removing refcounts from gallium objects,
namely sampler views and resources. Sampler views were smaller in scope,
so I started there. This MR alone is not expected to noticeably affect
performance, though if applied to all drivers/frontends,
it would enable a bunch of code deletion for crazy samplerview
refcounting hacks currently used to try circumventing the existing overhead.
Co-authored-by: Karol Herbst <kherbst@redhat.com
Co-authored-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33813>
In 4064b5546b, the idea was to have the minimum value as if all
stages are active, however hwconfig does not follow that for the
tessellation control stage. Ignore min values from hwconfig.
Fixes: 4064b5546b ("intel/dev: reduce warning noise from urb settings");
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33554>
When translating atomic instructions, the base type of the imageView can be
float only for image_atomic_exchange. If a float type image is used with other
atomic instructions the results are undefined.
Enforce this check in the shader translator and don't emit any instruction if
it fails.
Fixes crash in piglit test arb_shader_image_load_store@invalid.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33749>
The nir to tgsi translator flattens all constants in the nir shader into uint32
immediates. In the svga driver, the vgpu10 shader translator then packs all
these immediates into a constant buffer, and also optimizes it to prevent
repetitions by only emitting a 32-bit constant once.
This can cause problems with double sized constants, since either the lower or
higher 32-bits of different 64-bit constant can be identical, and in the constant
buffer that repeating 32-bit value will be emitted only once, so a 64-bit
constant gets split into two non-contiguous 32-bit values.
When this 64-bit constant is then invoked by a double instruction live ddiv or
dmul, the source register can now have invalid swizzles like .xz or .xw since
its 32-bit components are not contiguous.
We have seen this happen in the piglit test -
spec@arb_gpu_shader_fp64@execution@glsl-fs-loop-unroll-mul-fp64
which emits invalid swizzle values for double instructions.
To fix this, introduce a new option in nir to tgsi shader translator that
preserves uint64 constants. When a 64-bit immediate is translated into svga
shader code, its 32-bit components are contiguous and aligned in the constant
buffer, so accessing them only emits valid swizzles .xy and .zw.
Other drivers using the nir to tgsi shader translater should not see any change
in the tgsi shader emitted unless they too explicitly invoke the
keep_double_immediates option like svga.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33749>
During translation of tgsi shaders to vgpu10 shader code that is sent to
svga device, we may get as input a double instruction with incorrect
swizzles such as xzxz. In this case we have a workaround to move the value
in that register to a temporary register with an xyzw swizzle.
However the functions that check if the instruction has double source or
destination did not check for all instructions, such as DDIV, so if
incorrect swizzles are sent in the shader tgsi code then the same
incorrect swizzle is also emitted in the vgpu10 shader code.
Fix this by adding all the double instructions in double checking
functions.
Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33749>
This workaround is similar to anv_assume_full_subgroups, but it applies
to the shaders that use shared memory. If they rely on the implicit
synchronization, and we choose a smaller group size than the
(broken) shader expects, it will produce incorrect results.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23408>