On Wayland, if the wl_drm interface is not available, for example if the
compositor is using the proprietary NVIDIA driver along with their egl-wayland
library, the device_select layer will fail to initialize. However, the failure
path will unconditionally call wl_drm_destroy even though info.wl_drm would be
NULL in that case. This can cause a segfault in libwayland-client.so.
To fix this, check if info.wl_drm is NULL before calling wl_drm_destroy. This
way, initialization will fail gracefully even if that interface is not present.
Signed-off-by: Erik Kurzinger <ekurzinger@nvidia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10598>
Move it out of the "cs" sub-struct, since the bit can be used for
other shader stages in the future.
This also removes a subtle issue in spirv_to_nir:
info.cs.shared_memory_explicit_layout was used without checking for
the CS shader stage. It ended up being "harmless" since the effects
also depended on presence of shared variables.
Fixes: 5de6c5973a ("spirv: Implement SPV_KHR_workgroup_memory_explicit_layout")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10529>
We were implictly including it in exposing VK 1.2, but we weren't making
use of the supplied struct. Actually enabling it gives us a chance to do
slightly better at Z/S UBWC, and means we won't lose the separate usage
test coverage when switching back to exposing VK 1.1.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10594>
This makes us handle all non-aggregate types before we handle aggregate
types. This is going to matter in the next commit.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10372>
On Bifrost, fragment shaders might output either FP16 or FP32. The blend
shader will access the output as-is within the register, so depending on
the precision of the blend shader's logic, it may need to insert a
f2f16 or f2f32 conversion. This requires expanding the blend shader key.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10393>
We only want it to trigger if MRT is actually in use. This is a cheap
key (only require multiple variants for an obscure edge case) and avoids
the perf regression of using this pass which is needed for conformance.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10393>
Creating Android swapchain image from gralloc buffer requires to use
VkImageDrmFormatModifierExplicitCreateInfoEXT. To fill the struct info,
we need to query extended resource info from gralloc.
With the queried modifier from gralloc, we can ask the driver for the
plane count of the given format and modifier pair.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10553>
We allocate them all align16, so mark the unions (and their container
structs) that way so the compiler can do aligned SSE load/stores.
glmark2 -b loop FPS +0.197265% +/- 0.117633% (n=1906)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10604>
When a TSY barrier is hit, the entire supergroup will be synchronized.
If the supergoup is large and uses all available QPU threads it would
mean that we would sychronize and stall all running threads until all
of them reach the barrier, which may be inefficient.
This patch makes it so that if the compute shader has any such barriers
we limit the supergroup size so each supergroup only takes half of the
QPU threads available at most, so that if one supergroup hits a
barrier we have at least one other supergroup we can run, reducing
idle QPU time.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
Each supergroup executes a number batches. Each batch has 16 elements
(one per QPU lane), except possibly the last batch which might be
incomplete. Until now, we packed a single workgroup in each supergroup,
which can lead to more incomplete batches and less efficient use
of the QPUs depending on the configuration of workgroups being dispatched.
This patch computes a number of workgroups per supergroup so that
we reduce or completely eliminate incomplete batches if possible.
It should be noted however, that TSY barriers act on supergroups,
so larger supergroups lead to larger syncpoints on barriers too.
A follow-up patch will try to find a good balance for compute shaders
that use such barriers.
This improves performance of the Sascha Willem's computecloth demo
by ~13%.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
The V3D hardware allows us to pack multiple workgroups together to avoid
wasting execution lanes in shader cores.
For example, if we dispatch 16 workgroups with a local size of 1 element, we
can pack all 16 workgroups in a single 16-wide dispatch where each lane
executes a different workgroup, instead of 16 1-wide dispatches.
When we do this, we don't have a uniform workgroup id any more.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
Now that all drivers are using brw_cs_get_dispatch_info() we can
remove one function (which is now unused) and reduce the scope of the
other.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10504>