Currently the display FD is opened twice because of pvr_winsys_create()
being called twice, however the WSI (which will do modeset on the
display FD in case of VK_KHR_display) is registered against the winsys
created at PhysicalDevice enumeration time, and the display FD opened at
Device creation time will only be used for allocating dumb buffer (which
does not require master privilege).
Add a parameter to pvr_winsys_create() to indicate whether the master
privilege is desired on the display FD, and pass true only when creating
the winsys for PhysicalDevice initialization.
Fixes VK_KHR_display operation on PowerVR driver, which is broken after
the WSI code starts to drop master in commit 870e233ca5
("vulkan/wsi/display: Avoid holding drm master for the device's fd.").
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15161
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40640>
lower_sx10_external and lower_sx12_external are used for
LSB aligned formats such as DRM_FORMAT_S010, which are typically
used by software decoders. Unlike MSB aligned 10/12 bit formats
used by hardware decoders such as P010 they need to manually
get "shifted" in order to correctly map to the 0-1 range.
In the commit mentioned below the corresponding code got removed,
probably because it got confused with similar sounding code in
the common path - and because we don't have tests on the CI for the
affected formats yet.
Note: the formats in question are not yet supported in Vulkan.
Fixes: 5127568b98 ("compiler/nir: use common ycbcr math")
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40561>
They are only nightly jobs that run full VKCTS. The main advantage is
that we have mesh shaders coverage on NAVI31/GFX1201. It's still not
possible to enable that on pre-merge because of random GPU hangs.
Expect random GPU hangs on NAVI31/GFX1201 nightly jobs but I think
it's better than no coverage at all.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40626>
These jobs only skip the tests that are known to hang. The timeout is
also increased to 120s.
Also rename them to -full for less confusion.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40626>
The kernel uses an updated buffer format for xe3p gpus when EU stall
sampling, so this updates intel_monitor to use the correct formatting,
leaving room for any future formatting updates.
This also addresses an issue with not packing the formatted structure
with the correct macro, which lead to incorrect offsets being used for
parsing the buffer.
BSpec: 79847
v2: Add BSpec reference number, suggested by Lionel
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40622>
For depth clears of sysmem attachments to work properly, additional
register state is required in tu_clear_sysmem_attachments().
Fixes various CTS tests on a8xx:
- dEQP-VK.conditional_rendering.draw_clear.clear.depth.*
- dEQP-VK.api.image_clearing.core.clear_depth_stencil_attachment.*
with FD_DEV_FEATURES=has_generic_clear=0, which will result in
tu_clear_sysmem_attachments() fallback being used
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40542>
Overwrite the whole framebuffer cbuf rather than copying it from the
stack; fixes util_framebuffer_get_num_samples getting uninitialized
stack contents during validation.
Suggested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Fixes: 2eb45daa9c ("gallium: de-pointerize pipe_surface")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14082
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39138>
For compiling models, we don't really need a context for a real device.
To support ML frameworks models in which compilation happens
ahead-of-time (AoT), add API for compilation that doesn't require a
pipe_context.
Add struct pipe_ml_device with function pointers for:
- ml_operation_supported: query operation support
- ml_subgraph_create: compile a subgraph
- ml_subgraph_serialize: serialize a compiled subgraph
- ml_subgraph_destroy: free subgraph resources
Move ml_operation_supported, ml_subgraph_create, and
ml_subgraph_destroy from pipe_context to pipe_ml_device.
Add pipe_screen::get_ml_device() to obtain the device.
Change pipe_ml_subgraph.context (pipe_context*) to
pipe_ml_subgraph.device (pipe_ml_device*).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Replace the boolean padding_same field in pipe_ml_operation.conv
and .pooling with explicit per-side padding fields: padding_top,
padding_bottom, padding_left, padding_right.
Frontends always compute these from their own padding representation
(e.g. TFLite same/valid, PyTorch (pad_h, pad_w)). Drivers use
them directly, removing the need for drivers to derive padding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Change the tensor backing storage from pipe_resource* to uint8_t*.
This simplifies tensor data management by using raw memory pointers
instead of pipe_resource objects. Frontends allocate tensor data with
malloc() and drivers access it directly, removing the need for
pipe_buffer_map/unmap for tensor data access.
We initially used resources thinking that the NPU would want to directly
access the data in those tensors. It is clear now that all NPUs will
need the data to be compressed and reformatted in some way, so let's
drop the incovenient resources and just use allocated memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
We used to lower multisampled arrays to 3D images by adjusting the
height and the Y coordinate so that addressing samples became
addressing into the new base image. This worked for gallium, but
was never implemented for vulkan, and also had the disadvantages
that (a) we handled arrays and non-arrays differently, and
(b) the image height was restricted to 4096.
Change this so that we lower samples into the Z coordinate instead,
adding new layers for each sample. This requires that we know the
number of samples (so we have to save a sysval for this in gallium)
but means that we handle arrays and non-arrays the same. More
importantly, we can fit 3 bits to indicate the number of samples
into the attribute descriptor in Vulkan, so this scheme works
there as well as in OpenGL.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
We reduce the number of bits used for pixel stride from 10 to 7. This
gives us space to store the log2 of the number of samples, which
we will need later.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
Not really used yet, but we will need it later when we change how we
lower multisampled image arrays.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
The s0abs bit in the encoing of fred instruction is wrongly set to the
status of .neg modifier instead of .abs modifier.
Fix this copy-n-paste error.
Fixes GLCTS tests when running on top of Zink:
dEQP-GLES2.functional.shaders.random.trigonometric.vertex.4
dEQP-GLES2.functional.shaders.random.trigonometric.vertex.45
dEQP-GLES2.functional.shaders.random.trigonometric.fragment.4
dEQP-GLES2.functional.shaders.random.trigonometric.fragment.45
Fixes: 8ec174b3f9 ("pco: add support for various selection, complex, trig ops")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40611>
The gl_nir_add_point_size() function now handles either shaders with
io_lowered set (no output variable will be created and store_output
intrinsic will be used) or with io_lowered cleared (an output variable
will be created for PSIZ).
However the nir_variable pointer for PSIZ variable is currently not
initialized at all, and a -Wmaybe-uninitialized warning may appear
complaining this.
As it shouldn't be used when it's allocated within the io_lowered
cleared situation, initialize it to NULL to satisfy the compiler.
Acked-by: Simon Perretta <simon.perretta@imgtec.com>
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40560>
The referenced commit switched from a passthrough shader
to fs_clear_color[write_all_cbufs=0]. It shouldn't matter since
the shader isn't supposed to be executed - it's only setup to get
the first color output active.
On some chips (gfx8) it seems to cause issues (hangs or page fault)
for some piglit tests, eg:
framebuffer-blit-levels draw stencil
To fix this, introduce a 3rd variant, where a constant buffer isn't
required and instead the color is hardcoded in the shader.
Fixes: ca09c173f6 ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40486>
Split the monolithic v3dv_private.h (~2600 lines) into self-contained
sub-headers so each .c file only includes what it needs:
v3dv_common.h, v3dv_device.h, v3dv_image.h, v3dv_pass.h,
v3dv_query.h, v3dv_pipeline.h, v3dv_descriptor_set.h,
v3dv_cmd_buffer.h, v3dv_version_dispatch.h
As part of this commit we remove v3dv_private.h.
We keep v3dvx_private.h as it is, because the gain would be really
small (a lot of really small sub-headers).
In addition to keep things more tidy, we made a quick performance
check. We measured how many files are re-compiled and the performance
difference when touching one of the headers, compared with keeping
just one monolithic header.
Header touch (incremental) Split Monolithic Speedup
-------------------------- ----- ---------- -------
v3dv_image.h 2369 (24f) 2436 (33f) 1.03x
v3dv_query.h 2357 (20f) 2436 (33f) 1.03x
v3dv_pass.h 2352 (20f) 2436 (33f) 1.04x
v3dv_cmd_buffer.h 2354 (20f) 2436 (33f) 1.03x
v3dv_descriptor_set.h 2436 (33f) 2436 (33f) 1.00x
v3dv_pipeline.h 2437 (33f) 2436 (33f) 1.00x
v3dv_device.h 2418 (31f) 2436 (33f) 1.01x
v3dv_common.h 2419 (33f) 2436 (33f) 1.01x
v3dv_version_dispatch.h 2371 (26f) 2436 (33f) 1.03x
Header touch (incremental) Split Monolithic Speedup
-------------------------- ---------- ---------- -------
v3dv_image.h 2377 (24f) 2443 (33f) 1.03x
v3dv_query.h 2346 (20f) 2443 (33f) 1.04x
v3dv_pass.h 2360 (20f) 2443 (33f) 1.04x
v3dv_cmd_buffer.h 2351 (20f) 2443 (33f) 1.04x
v3dv_descriptor_set.h 2438 (33f) 2443 (33f) 1.00x
v3dv_pipeline.h 2429 (33f) 2443 (33f) 1.01x
v3dv_device.h 2418 (31f) 2443 (33f) 1.01x
v3dv_common.h 2432 (33f) 2443 (33f) 1.00x
v3dv_version_dispatch.h 2373 (26f) 2443 (33f) 1.03x
The bigger gain is on the files recompiled for some headers (going
from 33 down to 20 in some cases). The performance gain is not so
relevant though.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40169>
The list in the documentation still doesn’t go higher than v10, and it
isn’t clear from that list of GPU IDs which one actually corresponds to
the newer generations, but at least users can test them.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39564>
Call process_intel_debug_variable() early in anv_CreateInstance() so the
intel_debug bitset is populated, then set enable_debug_logging when
INTEL_DEBUG=perf is active. This makes anv_perf_warn() messages visible
in non-debug builds.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40551>
In non-debug builds, __vk_log_impl() silently drops all messages due to
two compile/link-time gates: an early return when no debug callbacks are
registered, and the MESA_VK_LOG=0 guard around the mesa_log*() calls.
Add vk_instance::enable_debug_logging so drivers can opt in to log
output at runtime. When set, both gates are bypassed.
No functional change without a driver setting the flag.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40551>
The per-bind status was always being set to VK_SUCCESS instead of the
actual result from nvk_bind_image_memory.
Fixes: 93792b5ef2 ("nvk: Add static wrappers for image/buffer binding")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40579>
When a predt/predf branch can be removed, any sync flags set on the
terminator were removed as well. Fix this by copying these flags to the
prede that replaces the terminator.
Fixes frame instability in "Devil May Cry 5" and "Resident Evil 3".
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 39088571f0 ("ir3: add support for predication")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40608>
We can drop RT flush and PS Scoreboard stall if state cache perf fix
disabled is set to 1. If bit is set RCC uses the sum of Binding Table
Pointer and Binding Table Index as tag in state cache instead of just
Binding Table Index.
On DX12 this is a performance win on all workloads we've tested.
On DX11 there are a bunch of performance of regression. We think this
is due to the fact that to avoid trashing the RCC, we need to remove
all but render targets from the binding table, meaning all shader
resource accesses have to go through the bindless HW heap. This leads
to additional register usage due to the need to push the base offset
of descriptor sets. Improvement in the compiler would likely mitigate
this.
This change introduce a DRIRC key we only turn on for DX12.
Also platforms prior to DG2/LSC have a really small bindless heap that
leads to additional register usage, so this optimization is completely
disable there.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10872
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10873
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14075
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>
The current tracking seems to have hidden issues related to MCS
ambiguate that are currently hidden by the fact that we're inserting
pb-stall+RT-flush on BTI changes which we're going to be remove in the
next commits.
The issues appear to be related to a missing pb-stall+RT-flush between
MCS ambiguate and fast-clear causing failures on the following tests
once BTP+BTI RCC caching is enabled :
dEQP-VK.pipeline.*.multisample.misc.*multi*
dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_39x41_ms
dEQP-VK.pipeline.*.framebuffer_attachment.diff_attachments_2d_32x32_48x48_ms
Here we rework the tracking with a new enum to track 3 classes of
operations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982>