PRE_POST_FRAME_SHADER_MODE_EARLY_ZS_ALWAYS was introduced in
architecture version 7.2, not 7.0 as we assumed. Using it on
G31 (a 7.0 device) caused some CTS failures.
Cc: mesa-stable
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34744>
It's very common needing to extract or overwrite a certain field
in an already packed register value, so add macros to do that
instead of manually doing that each time.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35088>
In panfrost_clear_depth_stencil and panfrost_clear_render_target, we
start the blit context before binding the clear targets. If we don't
legalize AFBC beforehand, we get a recursive blit crash. panfrost_clear
does not need this because the resource should already be legalized in
panfrost_batch_add_surface.
Fixes the following piglit tests with pan_force_afbc_packing:
- spec@arb_clear_texture@arb_clear_texture-base-formats
- spec@arb_clear_texture@arb_clear_texture-simple
- spec@arb_clear_texture@arb_clear_texture-sized-formats
Fixes: 17a62ff993 ("panfrost: legalize afbc before blitting")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
In 59a3e12039, we changed the UBO->push optimization in panfrost to
only push UBOs that are available in a CPU buffer. We require
first_ubo_is_default_ubo, to ensure that UBO0 will be a user buffer. We
weren't setting this flag for the image conversion shaders, so got an
assertion failure compiling them. This can be triggered by the
panvk_force_afbc_packing driconf option.
The conversion shader info UBO isn't exactly a "default" UBO in the
sense of being lowered from uniforms, but it is a user buffer, so
setting the flag should be fine.
Fixes: 59a3e12039 ("panfrost: do not push "true" UBOs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
Add perf ETW events using TraceLogging API, the following are adding:
- MFT receives fence (FenceCompletion).
- MFT has output MFSample (METransformHaveOutput).
- MFT calls to pipe end_frame (PipeEndFrame) -- bracketed.
- MFT calls to pipe flush (PipeFlush) -- bracketed.
- MFT submits a frame to pipe (PipeSubmitFrame) -- bracketed from begine_frame to encode_bitstream/encode_bitstream_sliced
- MFT processinput (ProcessInput) -- bracketed
- MFT processoutput (ProcessOutput) -- bracketed
The ETW provider(s) are:
- H264Enc: 0000e264-0dc9-401d-b9b8-05e4eca4977e
- H265Enc: 0000e265-0dc9-401d-b9b8-05e4eca4977e
- AV1Enc: 0000eaa1-0dc9-401d-b9b8-05e4eca4977e
Note that the provider is mostly the same as the WPPTrace provider for each codec, with the additional 'e' (e.g. 0000e264 vs 00000264)
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35219>
Allocate TS together with the tracked resource, which gets rid
of the resource mutation on surface creation and the diversion
between the interal and shared TS handling.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
Untangle the convoluted render compatible check from
etna_render_handle_incompatible to make it easier to read and move it
into a separate function so it can be reused from other callers.
As this is intended to be called also at resource creation time, where
we don't know the exact level of the resource that might be rendered to,
the stride check for linear resources is made a bit more conservative by
checking that the last level (the one with the smallest stride) still
meets the render target stride alignment requirement.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
TS is only allocated for single layer surfaces, so there is no need to
cache a ts_offset taking into account the layer offset in etna_surface.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
etna_screen_resource_alloc_ts is only called for textures that have a
single layer and slice, as we don't want to duplicate the driver side
TS tracking information per layer or depth slice. Stop pretending to
support allocating TS for such resources.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
this (conceptually) flattens out the batches into a wrapping id which
is used to determine whether a resource has in-flight work pending
the code cleanup is nice, but evaluating performance of this is difficult.
in testing a heavy use case of unsynchronized subdata:
* 10% fewer driver flushes (good)
* 20% fewer direct unsynchronized uploads (bad? or possibly hidden race conditions fixed...)
Fixes: 9cc06f817c ("tc: allow unsynchronized texture_subdata calls where possible")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35026>
We should use the cl_slice code to get proper validation, which also makes
it simpler to read out data and gets rid of some UB there.
This also fixes CL_KERNEL_EXEC_INFO_SVM_PTRS with param_value being null.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32942>
The old interfaces added back in clover's time were modeled after a very
bindful resource model.
However SVM (shared virtual memory) requires us to be way more flexible.
The new interfaces allow frontends to create a cut-out in the GPU's vm and
to assign addresses themselves. This gives us the following benefits:
- The frontend is empowered to synchronize resource addresses between
several devices. cl_mem objects in OpenCL span across a set of multiple
devices and SVM requires them to have the same VMA across all of them.
- Coarse grain SVM can be implemented without bothering drivers too much
as the frontend can be responsible to make sure a host allocation with
a specific VMA matches a GPU allocation with the identical VMA.
- Support for Global variables in the CrossWorkgroup storage class
Initializers. Those can depend on addresses of CrossWorkgroup memory,
if the frontend can just assign a VMA, this address can be passed as a
constant to spirv_to_nir and folded without the need to support
spilling of constant initializers.
Drivers not able to give us a vm-cutout are left with implementing
cl_ext_buffer_device_address instead.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32942>
cl_ext_buffer_device_address requires us to set a fixed address for a
given memory allocation. As this extension is intended to be implemented
on top of vulkan we have to take its limitations into account.
For SVM we'll add proper VM management interfaces, but zink won't be able
to implement those, so here we are.
The old interfaces added back in clover's time were modeled after a very
bindful resource model and the frontend was require to bind all the used
resources ahead of launch_grid.
cl_ext_buffer_device_address and also SVM however will require us to
dynamically attach a list of buffers used in a dispatch with known
addresses, hence set_global_binding isn't really suited for those use
cases.
So PIPE_RESOURCE_FLAG_FIXED_ADDRESS is added to tell a driver that the
address of a resource needs to stay the same over its lifetime, which then
can be queried via pipe_screen::resource_get_address.
All such buffers then can be either bound via set_global_binding or passed
in via pipe_grid_info::globals.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32942>
Since introduction of support for more than 16 varyings, support for
v12+ has been broken on certain apps.
This manifest with a black screen on all GL Core 1.x apps like glxgears
or xonotic in legacy mode.
The issue is a wrong buffer index being used on v12 for the varying
descriptors.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: cd2ca0ac22 ("panfrost: Enable more than 16 varyings on v9+")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35201>
instead of using our own flags; also REALTIME_PRIORITY is never used,
so the relevant code is removed
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34983>