When we have partial VGRF MOVs with offsets, we will reach
`channels_remaining == 0` with `inst` that is not writing the whole VGRF.
Currently, even though we check `can_coalesce_vars()` for each offset
separately, it will always check if the dst value is not changed only
for the offset from the instruction that satisfied the
`channels_remaining == 0` condition.
Instead, we should remember and use the correct instruction for each
written offset separately.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10916
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35062>
(cherry picked from commit 0e3e5146cf)
Applications using Mesa built with LLVM 20.1.4 fail to start with
strange segmentfaults/bus errors when radeonsi driver is used. The last
piece of stacktrace looks like
- pipe_reference_described
- pipe_reference
- radeon_bo_reference
- radeon_ws_bo_reference
- radeon_lookup_or_add_real_buffer
Coredump shows the pointer dst passed to pipe_reference_described() is
either unaligned or even invalid, which is the reason of crashing. The
crash goes away when Mesa is built without optimization.
Looking through the related functions, it's found that
radeon_ws_bo_reference() contains unsafe type cast from radeon_bo to
pb_buffer_lean: though the former's first field is just the later, this
violates strict aliasing rules as pb_buffer_lean isn't compatible with
radeon_bo. Such violation ultimately results in miscompilation.
Let's take the address of pb_buffer_lean field, avoiding the unsafe
cast. It's still required to cast pb_buffer_lean back to radeon_bo since
radeon_bo_reference may update the pointer, which is safe as radeon_bo
contains a pb_buffer_lean member and C language permits access members
through a pointer in type of the container.
Fixes: 6d913a2bcc ("r300,r600,radeonsi: switch to pb_buffer_lean")
Link: https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Aliasing-Type-Rules.html
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35249>
(cherry picked from commit b1d81a7df1)
PRE_POST_FRAME_SHADER_MODE_EARLY_ZS_ALWAYS was introduced in
architecture version 7.2, not 7.0 as we assumed. Using it on
G31 (a 7.0 device) caused some CTS failures.
Cc: mesa-stable
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34744>
(cherry picked from commit 13b35a3c9c)
Internal shaders can also use slm, so we need to allocate it correctly.
This fixes
dEQP-VK.dgc.ext.compute.misc.max_pc_range_256_full_preprocess_with_execution_set
with NAK_DEBUG=spill
Fixes: 105bdf2e36 ("nvk: Add a helper for dispatching compute shaders")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35143>
(cherry picked from commit 0e5880ebe4)
In panfrost_clear_depth_stencil and panfrost_clear_render_target, we
start the blit context before binding the clear targets. If we don't
legalize AFBC beforehand, we get a recursive blit crash. panfrost_clear
does not need this because the resource should already be legalized in
panfrost_batch_add_surface.
Fixes the following piglit tests with pan_force_afbc_packing:
- spec@arb_clear_texture@arb_clear_texture-base-formats
- spec@arb_clear_texture@arb_clear_texture-simple
- spec@arb_clear_texture@arb_clear_texture-sized-formats
Fixes: 17a62ff993 ("panfrost: legalize afbc before blitting")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
(cherry picked from commit 104ea2e4cf)
In 59a3e12039, we changed the UBO->push optimization in panfrost to
only push UBOs that are available in a CPU buffer. We require
first_ubo_is_default_ubo, to ensure that UBO0 will be a user buffer. We
weren't setting this flag for the image conversion shaders, so got an
assertion failure compiling them. This can be triggered by the
panvk_force_afbc_packing driconf option.
The conversion shader info UBO isn't exactly a "default" UBO in the
sense of being lowered from uniforms, but it is a user buffer, so
setting the flag should be fine.
Fixes: 59a3e12039 ("panfrost: do not push "true" UBOs")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34992>
(cherry picked from commit bed54fa402)
The BSpec page for Structure_RENDER_SURFACE_STATE says:
"For typed buffer and structured buffer surfaces, the number of
entries in the buffer ranges from 1 to 2^27. For raw buffer
surfaces, the number of entries in the buffer is the number of
bytes which can range from 1 to 2^30. After subtracting one from
the number of entries, software must place the fields of the
resulting 27-bit value into the Height, Width, and Depth fields as
indicated, right-justified in each field. Unused upper bits must be
set to zero."
According to the vkd3d-proton developers, this is what is happening
with the applications:
"There's also the problematic case of games using typed descriptors
but passing non-typed buffer descriptors, which is an extremely
common app bug that works on all D3D12 drivers that we need to work
around by creating typed views."
Previously, we had an assert() to check for "num_elements > (1 <<
27)", but that assert was preventing us from running games such as
Marvel's Spider-Man Remastered and Assassin's Creed: Valhalla in Debug
mode. So not only I removed the assert, but I also made the code clamp
num_elements to the maximum of (1 << 27) based on my incorrect
interpretation of the paragraph quoted above from BSpec.
What I did not realize was that num_elements is being used just to
calculate Structure_RENDER_SURFACE_STATE Height, Width and Depth, and
our register bit fields on SKL and newer are big enough to fit any
number of num_elements up to 2^32, not only 2^27. Clamping
num_elements results in an incorrect value for S.Depth, which
generates visual corruption in some games.
On Marvel's Spider-Man Remastered, without this patch the texture of
the asphalt in some streets (like the very first one you jump to when
the game starts) gets rendered incorrectly.
Testcase: vkd3d-proton/d3d12/test_large_texel_buffer_view
Link: https://github.com/HansKristian-Work/vkd3d-proton/issues/2071
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12827
Fixes: f3c7e14f09 ("isl: don't assert(num_elements > (1ull << 27))")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35032>
(cherry picked from commit ecc90e1bb3)
We should use the cl_slice code to get proper validation, which also makes
it simpler to read out data and gets rid of some UB there.
This also fixes CL_KERNEL_EXEC_INFO_SVM_PTRS with param_value being null.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32942>
(cherry picked from commit 35a9829391)
Use the param type, not the referenced variable. The referenced variable
can be a structure, which wouldn't be recognized as a sampler or image.
Fixes: 733bee57eb - glsl: lower samplers with highp coordinates correctly
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel Dieter@nuetzel-hh.de on gfx8 (Polaris 20)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34959>
(cherry picked from commit bd5d623674)
Normal fmin doesn't make any promises about NaN, common additionally
doesn't make any promises about infinities. Would be nice to hook that
up to codegen but lowering them to normal works for now.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34941>
(cherry picked from commit 4b1c824b67)
The sparse image VA needs to be returned to the application for replay.
Reported by Baldur.
VKCTS has coverage but it doesn't verify this yet.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35162>
(cherry picked from commit 63758bc093)
This function was removed about a decade ago, let's get rid of the
prototype as well!
Fixes: a347a0f53f ("mesa: Completely remove QuerySamplesForFormat from driver func table")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35184>
(cherry picked from commit 439b88c619)
For partial secondary cmdbufs, we emit FBDs/TDs in the primary cmdbuf
before calling the secondary. In order to set the provoking vertex mode
correctly here, we need to look at the mode set by pipelines bound in
the secondary cmdbuf.
This leaves one edge case: reemitting FBDs/TDs in a secondary cmdbuf
after a flush. If the secondary cmdbuf only contains vk_meta draws,
without ever binding a pipeline, we won't know which provoking vertex
mode to use here. This is actually okay, because in that case the
provoking vertex mode doesn't matter for any of the draws in the
secondary, and the FBDs/TDs will be reemitted on the primary with the
correct mode.
Fixes: 7a9f14d3c2 ("panvk: advertise VK_EXT_provoking_vertex")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 65406cf500)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
In this case, we need to emit the FBDs and TDs for the meta command
before we know what provoking vertex mode the application is going to
use. To handle this, we make a guess for which provoking vertex mode we
need. Then we use cs_maybe to leave space to flip the provoking vertex
bit if the guess was wrong.
This case is still unhandled on JM.
Fixes: 7a9f14d3c2 ("panvk: advertise VK_EXT_provoking_vertex")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 885805560f)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
Because we advertise provokingVertexModePerPipeline=false, the provoking
vertex mode must be set the same for all pipelines used in a renderpass.
vk_meta doesn't care about the provoking vertex mode, but the vulkan api
doesn't provide a way to express this, so it always sets
PROVOKING_VERTEX_MODE_FIRST (the vulkan default). This causes an
assertion failure when vk_meta is used in a renderpass where the
application sets PROVOKING_VERTEX_MODE_LAST.
There are a few different cases here, that need different handling. The
simplest is when vk_meta is used after the first application draw, in
which case we can just ignore the state passed by vk_meta and use the
existing state.
Fixes: 7a9f14d3c2 ("panvk: advertise VK_EXT_provoking_vertex")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 4d99346477)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
This is needed to handle the provoking vertex mode correctly. vk_meta
doesn't care which provoking vertex mode is used, but there is no way to
express this directly in the vulkan api.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 32177b99d5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
The tiler OOM exception handler allocated a region of memory to dump
save/restored registers. For defining more functions in the future, we
allocate a register dump region for each subqueue, that can hold the
largest number of registers needed by any functions executed on that
subqueue.
This does mean that we cannot have function calls more than one deep. If
we ever need nested function calls, we will have to consider a real
stack.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit d60c688317)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
The register save/restore machinery is useful for more general callable
functions, not just exception handlers.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 61e7d47270)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
We have an edge case with VK_EXT_provoking_vertex where we may need to
emit FBDs and TDs before we know what provoking vertex mode the
application is using for the renderpass. To handle this, we want to
retroactively patch the provoking vertex bit. This commit introduces an
abstraction to do that.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Tested-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
(cherry picked from commit 83bb97796b)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35194>
tc_sync reuses the same batch, which breaks the current disambiguation
methods by returning !busy for work which is currently executing
on the reused batch
by also tracking the completed generation, this scenario is detected
and disambuguated
Fixes: 9cc06f817c ("tc: allow unsynchronized texture_subdata calls where possible")
(cherry picked from commit b89e0fa226)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35204>
Apply the direct dispatch WLS instance limit to panfrost as well to keep
compute jobs with large workgroup counts from running out of memory.
Fixes: 1304f4578d ("panfrost: Adapt emit_shared_memory for indirect dispatch")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34979>
(cherry picked from commit 64ce37b2d9)
Apply the direct dispatch WLS instance limit to PanVK/JM as well to keep
compute jobs with large workgroup counts from hitting
VK_ERROR_OUT_OF_DEVICE_MEMORY.
Fixes: 005703e5b5 ("panvk: Move TLS preparation logic to cmd_dispatch_prepare_tls"
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34979>
(cherry picked from commit e6e406de0e)
During direct dispatch, we calculate the size of the WLS allocation
based on the number of WLS instances which is an unbounded calculation
on number of workgroups.
This leads to extreme allocation sizes and potentially
VK_ERROR_OUT_OF_DEVICE_MEMORY for direct dispatches with a high amount
of workgroups.
This change adds an upper bound to the number of WLS instances, using
the same value we assume for indirect dispatches.
Additionally, this commit fixes the WLS max instance calculation (which
should be per core).
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34979>
(cherry picked from commit 0a47a1cb6d)
The CSF version of dispatch_precomp allocates TLS/WLS prior to calling
cmd_dispatch_prepare_tls, which will do the same.
This commit removes this unnecessary allocation.
Fixes: cc02c5deb4 ("panvk: Implement precomp dispatch")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34979>
(cherry picked from commit a6c7a774ab)
This update is aimed at fixing pop-free clipping and follows
the advices by Vitaliy Kuzmin: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12440
This functionality requires calculating the value of the following two
registers: PA_CL_GB_HORZ_DISC_ADJ and PA_CL_GB_VERT_DISC_ADJ. These two
registers are available on all the gpus of the r600 family.
This code is built on the backport of radeonsi updates which are relevant
to this very functionality:
57e658d041 "radeonsi: rework how guardband registers are updated to decrease overhead"
146c2b7c28 "radeonsi: adjust clip discard based on line width / point size"
4d74432dd3 "radeonsi: don't discard points and lines"
63680471f9 "radeonsi: remove si_context::{scissor_enabled,clip_halfz}"
This change was tested on rv770, barts and cayman:
deqp-gles[2-3]/functional/clipping/line/wide_line_clip_viewport_center: fail pass
deqp-gles[2-3]/functional/clipping/line/wide_line_clip_viewport_corner: fail pass
deqp-gles[2-3]/functional/clipping/point/wide_point_clip: fail pass
deqp-gles[2-3]/functional/clipping/point/wide_point_clip_viewport_center: fail pass
deqp-gles[2-3]/functional/clipping/point/wide_point_clip_viewport_corner: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35052>
(cherry picked from commit df2c774a83)
We still allow mesh shader promote constant output to flat, but
mesh shader like geometry shader may store multi vertices'
varying in a single thread. So mesh shader may store different
constant values to different vertices in a single thread, we
should not promote this case to flat.
I'm not using shader_info.mesh.ms_cross_invocation_output_access
because OpenGL does not require IO to have explicit location, so
when nir_shader_gather_info is called in OpenGL GLSL compiler to
compute ms_cross_invocation_output_access, some implicit output
has -1 location which causes ms_cross_invocation_output_access
unset for it.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13134
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35081>
(cherry picked from commit 6f2a1e19da)
Here we allow packed stencils to skip the completeness check also.
Will be used in the following patch for a bug in the game Foundation.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35107>
(cherry picked from commit 27945bbd8a)
These lines look like they were mistakenly introduced, and cause a
significant perf hit. Eg. this fix improves the Horizon Zero Dawn
in-game benchamark by ~42% on my ampere machine (5992 pts -> 8517 pts).
Fixes: d16e75e55f ("nak: Lower texture inputs for Kepler B")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35100>
(cherry picked from commit 9d620fabd2)