It can be quite confusing to see the tests failing to load models
without knowing why. To avoid making people waste time with strace, link
with the stubs at build time but look for the actual implementation at
run time.
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40269>
While I was investigating some task+mesh random GPU hangs in CI, I
finally found a sequence that caused a test failure:
dEQP-VK.dgc.ext.graphics.mesh.conditional_rendering.general.classic_bind_with_count_buffer_condition_false_with_task_shader
dEQP-VK.dgc.ext.graphics.mesh.token_draw_count.monolithic_with_task_shader
Executing these two tests in a row caused the second one to always fail
(tested on NAVI33).
After investigating I figured out that only the DGC GFX IB was
predicated (with IB2) and the DGC ACE IB was always running, although
without any mesh draws to consume the task output. It seems the
hardware is confused if another task+mesh draw is dispatched after that,
and this could cause failures or GPU hangs.
Fix this by resetting the number of DGC sequences to 0 when conditional
rendering is used. This is the only option to emulate conditional
rendering with DGC and ACE.
This also likely fixes DGC+RT on compute queue.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41939>
This should be after we finalize desc.use.
Fixes FSR4 on RDNA3.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: ca0496bc26 ("radv: use load_deref_transpose_amd for transposed cooperative matrix loads")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41922>
We should not report support for subgroup ops or DGC for mesh stages on
pre-Turing.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reported-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 145b8540e5 ("nvk: Advertises VK_EXT_mesh_shader")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41962>
The extension allows for a simplified API where
pPresentationInfo is NULL and the second call to
vkAntiLagUpdateAMD() is omitted which makes it necessary
to separate frames on vkQueuePresentKHR().
The second main difference is that the wait time is now
based on the previous input stage plus the average frame
time. This greatly smooths frame pacing.
v2:
- measure the GPU frame time directly
- Only try to evaluate frames which are likely to complete
within the waiting time
- Calculate the average absolute deviation of the total
frame time and use that to determine the slack time
v3:
- move frame separation to vkQueuePresentKHR()
- tightened frame pacing aiming for at most 1ms overlap
Acked-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41727>
util_format_z24_unorm_packed_pack_z_32unorm was accidentally writing
48-bits instead of 24 since it used a 16-bit integer pointer instead of
an 8-bit pointer. This could cause a segfault if the function was used,
but it is currently unused.
Fixes: 18f352090d ("util/format: Add a Z24_UNORM_PACKED format")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41288>
On Mali, HW does not advertise support for writing D24X8 with AFBC
enabled but when AFBC is enabled for a D24X8 image, we can lower it to
just D24.
In Panfrost we keep the external format as Z24X8 but the internal format
as Z24 packed. The driver already handles setting up a new resource in
the external format when mapping to CPU since AFBC resources can't be
mapped directly anyways.
For PanVK we return the Z24 packed format D24X8 with AFBC and otherwise
Z24X8 format without AFBC.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41288>
The Vulkan meta path used to issue a full drawcall with a rectangle
primitive for cmdClear*Image which is not optimal if the device has
support for HW clears. This commit enables the meta path to skip the
full drawcall if the driver supports it by setting
VK_ATTACHMENT_LOAD_OP_CLEAR on the attachment and letting the driver
handle setting up a clear pipeline.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41288>
panvk_get_image_layout_transition_handler returns the same zero struct
in both paths so it can simply be removed. This also means that
transition_image_layout_sync_scope and cmd_transition_image_layout can
be removed as they are always NOPs.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41288>
ffloor(a) is lowered as a - ffract(a). dEQP expects that for example
ffloor(a) == 1.0 for every a in between 1.0 a 2.0. This worked fine,
but the new ffract(a + b(is_integral)) -> ffract(a) rule broke this.
Specifically, dEQP-GLES2.functional.shaders.struct.uniform.equal_fragment
checks that ffloor(a + 1.0) == 1.0 for every a between 0.0 and 1.0.
However this is not exactly true once the ffract(a + 1.0) is lowered
to ffract(a).
Prevent this by marking ffract from ffloor lowering as exact so that
the recently introduced ffract(a + b(is_integral)) -> ffract(a) rule
does not trigger.
Fixes: c6aaafa3 ("nir: add lowering for ffloor")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15562
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41882>
It's a no-op on other stages so let's not run this.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
With task/mesh shaders, we need that lowering to not happen.
Move to conditionally lower local invocation index with
nir_lower_compute_system_values_options in case of compute shader.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
This define structure and a way to upload the GS header when present.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
This add a new lowering pass for shared memory atomics that will be used
for mesh/task stages.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
NVIDIA hardware have an instruction allowering you to retrive the mask
of active threads matching the same source value as the current
invocation.
This is going to be used by shared memory lowering for mesh / task
stages on NVK.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
For example, the FS may write gl_SampleMask while color writes are
masked out and there is no depth attachment.
Note that the proprietary driver still considers more state when
disabling the FS, such as the depth test being disabled, and thus
disables the FS in cases where we do not. However, I think that is
too much of a stretch unless we find some real workload needing it.
This change also allows disabling an FS that has discard.
This requires being careful around occlusion queries, since when one
is enabled, we cannot disable an FS that can discard.
Found via gpu-ratemeter bench: vk.pix.noaa.output.color+z+samplemask.colormask=0
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41857>
The destriding lowering hard-coded a special case for weight_width == 5
with a fallback "+1" branch that was only correct for 3x3 kernels.
Replace it with formulas derived from TFLite's SAME-padding rule for
stride 2:
The half-resolution expansion applied to the reshuffle output and to
the strided_to_normal() input is:
weight_width / 2
which gives 1 for 3x3, 2 for 5x5, and 3 for 7x7 kernels.
The reshuffle window start offset is:
(weight_width + input_width % 2 - 2) / 2
This folds the previous odd-input fixup into the same expression
preserves the existing 3x3 and 5x5 behavior while extending the
lowering to wider odd kernels such as 7x7.
Fixes Models.Op/inception_000, which uses Inception V1's Conv2d_1a_7x7,
in the Teflon test suite.
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41774>
This creates the BO with AMDGPU_GEM_CREATE_NO_CPU_ACCESS for buffers
that we don't map.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41850>
Add a buffer create function that takes PIPE_RESOURCE_FLAG_* flags.
Disable suballocation for all buffers on UVD/VCE without VM support.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41850>
VkBindMemoryStatus is using a pointer to VkResult but the value cannot
be correctly encoded and decoded with the current code generator. Until
the issues are fixed, the extension should not be used as it'll cause
cts failures and invalid behavior.
Test: dEQP-VK.memory.binding.maintenance6.*
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41893>
Ordering of the extensions was affecting the codegen and some structures
were missing due to errors during codegen. One example is the custom
border color structure for the samplers, due to the reference from new
vkRegisterCustomBorderColorEXT function that's introduced with a
different extension VK_EXT_descriptor_heap. This CL adds a sorting
mechanism to generate code for supported extensions first to ensure
deepcopy and transform functions are created correctly.
Test: dEQP-VK.pipeline.*
Reviewed-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41893>
Fix
"dEQP-VK.api.copy_and_blit.*.image_to_image.all_formats.color.2d_to_1d.*.e5b9g9r9_ufloat_pack32.*"
on HK.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 5f5f4474f6 ("nir: Add a format unpack helper and tests")
Reviewed-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41929>
Unlike BitSet, which is backed by a Vec<u32>, this is backed by a
fixed-length array is therefore Copy. It's also mostly const so it can
be constructed and used from const contexts. Because of the const
rules, it's a bit more rigid and can only really accept keys which are
unsigned integer types.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41915>
checking completion alone disregards submit_count, which is used to
determine the validity of any existing usage pointer. this could lead to
large numbers of bos with stale usage and infinite memory ballooning
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41936>