Applications using Mesa built with LLVM 20.1.4 fail to start with
strange segmentfaults/bus errors when radeonsi driver is used. The last
piece of stacktrace looks like
- pipe_reference_described
- pipe_reference
- radeon_bo_reference
- radeon_ws_bo_reference
- radeon_lookup_or_add_real_buffer
Coredump shows the pointer dst passed to pipe_reference_described() is
either unaligned or even invalid, which is the reason of crashing. The
crash goes away when Mesa is built without optimization.
Looking through the related functions, it's found that
radeon_ws_bo_reference() contains unsafe type cast from radeon_bo to
pb_buffer_lean: though the former's first field is just the later, this
violates strict aliasing rules as pb_buffer_lean isn't compatible with
radeon_bo. Such violation ultimately results in miscompilation.
Let's take the address of pb_buffer_lean field, avoiding the unsafe
cast. It's still required to cast pb_buffer_lean back to radeon_bo since
radeon_bo_reference may update the pointer, which is safe as radeon_bo
contains a pb_buffer_lean member and C language permits access members
through a pointer in type of the container.
Fixes: 6d913a2bcc ("r300,r600,radeonsi: switch to pb_buffer_lean")
Link: https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Aliasing-Type-Rules.html
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35249>
With older FW this needs to be always enabled, but it can now be
disabled when using the new separate header instructions for
dependent_slice_segment_flag and slice_segment_address.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35072>
We create hierarchy masks based on the number of levels available,
creating a bitmask with `max_levels` bits set. Originally these bits
all came together. Modify this to spread the bits out, which improves
performance on chips like the G31 with only 2 levels of hierarchy.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34744>
PRE_POST_FRAME_SHADER_MODE_EARLY_ZS_ALWAYS was introduced in
architecture version 7.2, not 7.0 as we assumed. Using it on
G31 (a 7.0 device) caused some CTS failures.
Cc: mesa-stable
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34744>
Using MemScope::System synchronizes with everything, which is exactly
what we don't want for constant loads. This is currently a no-op
because we aren't using MemScope::Constant pre-Ampere yet.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35217>
Internal shaders can also use slm, so we need to allocate it correctly.
This fixes
dEQP-VK.dgc.ext.compute.misc.max_pc_range_256_full_preprocess_with_execution_set
with NAK_DEBUG=spill
Fixes: 105bdf2e36 ("nvk: Add a helper for dispatching compute shaders")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35143>
It's very common needing to extract or overwrite a certain field
in an already packed register value, so add macros to do that
instead of manually doing that each time.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35088>
In panfrost_clear_depth_stencil and panfrost_clear_render_target, we
start the blit context before binding the clear targets. If we don't
legalize AFBC beforehand, we get a recursive blit crash. panfrost_clear
does not need this because the resource should already be legalized in
panfrost_batch_add_surface.
Fixes the following piglit tests with pan_force_afbc_packing:
- spec@arb_clear_texture@arb_clear_texture-base-formats
- spec@arb_clear_texture@arb_clear_texture-simple
- spec@arb_clear_texture@arb_clear_texture-sized-formats
Fixes: 17a62ff993 ("panfrost: legalize afbc before blitting")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
In 59a3e12039, we changed the UBO->push optimization in panfrost to
only push UBOs that are available in a CPU buffer. We require
first_ubo_is_default_ubo, to ensure that UBO0 will be a user buffer. We
weren't setting this flag for the image conversion shaders, so got an
assertion failure compiling them. This can be triggered by the
panvk_force_afbc_packing driconf option.
The conversion shader info UBO isn't exactly a "default" UBO in the
sense of being lowered from uniforms, but it is a user buffer, so
setting the flag should be fine.
Fixes: 59a3e12039 ("panfrost: do not push "true" UBOs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
There's no behavior change, but to prepare for the next img2buf blit
improvement, except adding asserts to make clear of the existing blit
code paths.
v2: use switch with unreachable default per @gfxstrand has suggested
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35220>
Add perf ETW events using TraceLogging API, the following are adding:
- MFT receives fence (FenceCompletion).
- MFT has output MFSample (METransformHaveOutput).
- MFT calls to pipe end_frame (PipeEndFrame) -- bracketed.
- MFT calls to pipe flush (PipeFlush) -- bracketed.
- MFT submits a frame to pipe (PipeSubmitFrame) -- bracketed from begine_frame to encode_bitstream/encode_bitstream_sliced
- MFT processinput (ProcessInput) -- bracketed
- MFT processoutput (ProcessOutput) -- bracketed
The ETW provider(s) are:
- H264Enc: 0000e264-0dc9-401d-b9b8-05e4eca4977e
- H265Enc: 0000e265-0dc9-401d-b9b8-05e4eca4977e
- AV1Enc: 0000eaa1-0dc9-401d-b9b8-05e4eca4977e
Note that the provider is mostly the same as the WPPTrace provider for each codec, with the additional 'e' (e.g. 0000e264 vs 00000264)
Reviewed-by: Yubo Xie <yuboxie@microsoft.com>
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35219>
Allocate TS together with the tracked resource, which gets rid
of the resource mutation on surface creation and the diversion
between the interal and shared TS handling.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
Untangle the convoluted render compatible check from
etna_render_handle_incompatible to make it easier to read and move it
into a separate function so it can be reused from other callers.
As this is intended to be called also at resource creation time, where
we don't know the exact level of the resource that might be rendered to,
the stride check for linear resources is made a bit more conservative by
checking that the last level (the one with the smallest stride) still
meets the render target stride alignment requirement.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
TS is only allocated for single layer surfaces, so there is no need to
cache a ts_offset taking into account the layer offset in etna_surface.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
etna_screen_resource_alloc_ts is only called for textures that have a
single layer and slice, as we don't want to duplicate the driver side
TS tracking information per layer or depth slice. Stop pretending to
support allocating TS for such resources.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34488>
Prevent the function from unnecessarily returning false by:
* Comparing the image tile range with that of every LOD instead of only
LOD0.
* Using the correct comparison check for the exclusive tile end ranges.
Fixes: 8dad01903a ("intel: Add and use isl _surf_image_has_unique_tiles()")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35192>
The BSpec page for Structure_RENDER_SURFACE_STATE says:
"For typed buffer and structured buffer surfaces, the number of
entries in the buffer ranges from 1 to 2^27. For raw buffer
surfaces, the number of entries in the buffer is the number of
bytes which can range from 1 to 2^30. After subtracting one from
the number of entries, software must place the fields of the
resulting 27-bit value into the Height, Width, and Depth fields as
indicated, right-justified in each field. Unused upper bits must be
set to zero."
According to the vkd3d-proton developers, this is what is happening
with the applications:
"There's also the problematic case of games using typed descriptors
but passing non-typed buffer descriptors, which is an extremely
common app bug that works on all D3D12 drivers that we need to work
around by creating typed views."
Previously, we had an assert() to check for "num_elements > (1 <<
27)", but that assert was preventing us from running games such as
Marvel's Spider-Man Remastered and Assassin's Creed: Valhalla in Debug
mode. So not only I removed the assert, but I also made the code clamp
num_elements to the maximum of (1 << 27) based on my incorrect
interpretation of the paragraph quoted above from BSpec.
What I did not realize was that num_elements is being used just to
calculate Structure_RENDER_SURFACE_STATE Height, Width and Depth, and
our register bit fields on SKL and newer are big enough to fit any
number of num_elements up to 2^32, not only 2^27. Clamping
num_elements results in an incorrect value for S.Depth, which
generates visual corruption in some games.
On Marvel's Spider-Man Remastered, without this patch the texture of
the asphalt in some streets (like the very first one you jump to when
the game starts) gets rendered incorrectly.
Testcase: vkd3d-proton/d3d12/test_large_texel_buffer_view
Link: https://github.com/HansKristian-Work/vkd3d-proton/issues/2071
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12827
Fixes: f3c7e14f09 ("isl: don't assert(num_elements > (1ull << 27))")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35032>