Some GPU hangs witnessed in the wild on RDNA4 in Control and Arc Raiders
seem to point towards closest-hit shaders reading a stale value for the
SGPR pair containing the currently-executing shader's address.
This SGPR pair was read by VALU in the preceding traversal shader,
making it susceptible to VALUReadSGPRHazard. Inserting
VALUReadSGPRHazard mitigations before accessing the s_setpc target seems
to fix the hang. We don't have conclusive proof that this is hazardous,
but given that all signs point towards it and we have a reasonably
simple workaround, let's roll with this for now to mitigate the hangs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38290>
This basically remaps color attachment formats for the resolve
operation.
Co-Authored-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38442>
When no workgroup size is specified we try to run with the most optimal one
possible. However we didn't take into account that we shouldn't run a
workgroup of higher dimensionality than requested by the application.
Fixes: 376d1e6667 ("rusticl: implement cl_khr_suggested_local_work_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38375>
There were two issues:
1. The global_work_offset parameter is optional but we errored on NULL
2. We didn't return the reqd_work_group_size when set on the kernel.
Fixes: 376d1e6667 ("rusticl: implement cl_khr_suggested_local_work_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38375>
There is no disable_screen_content_tools in AV1 spec, instead this
should be seq_choose_screen_content_tools. But we don't need that either
as we keep the effective value in force_screen_content_tools.
Same for seq_choose_integer_mv and force_integer_mv.
Also stop overriding these values and instead fix frame header coding
to work with all combinations.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38260>
Before this commit, nested loops aren't counted correctly:
-------------
V |
-> A --> B --> C ->
^ |
-------
A is both predecessor and successor of B but A isn't in B's loop.
Instead a block B is in loop header H's block if H is the successor
of B and H dominates B.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38393>
Instead of inserting the spill instruction before the instruction that
caused the spill, instead insert it either right after the definition
or at the end of the block that contains the definition.
This helps reduce code size and also moves STOREs outside of loops on
average.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38238>
_math_matrix_is_dirty() should only be used to decide if we need to
run _math_matrix_analyse(). We already decided that we had a new
texture matrix when we called _mesa_update_texture_matrices() so
we need to set _TexMatEnabled correctly otherwise we might
incorrectly return _NEW_FF_VERT_PROGRAM | _NEW_FF_FRAG_PROGRAM in
the following if-statement.
Fixes: ec978e002f ("mesa: only update fixed-func programs on texture matrix enablement changes")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14286
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38473>
This add support to the lowering the reduction operations.
Thanks to Georg Lehmann for a lot of the ideas and optimising in
this.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
This adds some new call operations to handle various parts of the
reductions.
cmat_reduce: is the initial toplevel operation from SPIR-V
this is used after lowering for row/col operation on single hw
supported matrix sizes. The spir-v operation is lowered into
multiple of these on flex dimensions, but also can be lowered into
others.
cmat_reduce_finish:
after multiple reduction operations on a flexible dimension matrix,
there is often subsequent operations on the output matrices to
finish the operation.
cmat_reduce_2x2:
this takes 4 input matrices, and 1 dst to do a 2x2 reduction op.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
With coopmat2 a bunch of functions need a lot of lowering passes
to happen before they can be lowered, so mark them as to be lowered
later.
Drivers needing these should call the nir_remove_non_cmat_call_entrypoints
where they remove entrypoints now, and call the original nir_remove_non_entrypoints
after lowering coopmat2.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
This adds a new instruction type to handle cooperative matrix calls.
This clones the call instr, drops callee, and adds a single metadata
slot and a call operation (dummy only for now).
(Not NACKed by Alyssa)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
If the shader has flat varyings but API level flatshade is disabled, we
need to switch to flat shading in PA.CONFIG to ensure proper interpolation.
Passes dEQP-GLES3.functional.rasterization.flatshading.* on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Tested-by: Daniel Lang <dalang@gmx.at>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36404>
Build utilities to retrieve the build id are now exposed
through the new HAVE_BUILD_ID instead of HAVE_DL_ITERATE_PHDR
since this will allow adding support for platforms that do
not support HAVE_DL_ITERATE_PHDR
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38420>
MESA_KK_DISABLE_WORKAROUNDS provides a way to disable workarounds
we've had to apply to get Vulkan conformance. In hopes that Metal
bugs get fixed in upcoming macOS releases.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38426>
This extension seems to be supported on GC3000 (HALTI2) and later hardware.
While no explicit feature bit documents this capability, testing
confirms that the required vertex formats work correctly on these GPUs.
This patch adds the missing B10G10R10A2 vertex format variants
(UNORM, SNORM, USCALED, SSCALED), gates support behind the HALTI2
feature check, and updates features.txt to reflect the new capability.
All relevant piglit tests pass.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38446>
The VMA of VkDeviceMemory has to accomodate all the resources that can
be bound to it. For sparse images it's 64KiB alignment, for other
tiled images it's 4KiB. But we also have a workaround that requires a
64KiB alignment for Tile4 images.
The initial version of the slab allocator missed the 4KiB alignment.
This fix adds the workaround handling too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dabb012423 ("anv: Implement anv_slab_bo and enable memory pool")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38480>
This is useful for running multiple dependent precomp shaders in a row.
We could do this using PANLIB_BARRIER_CSF_WAIT and then immediately wait
on the syncobj, but it's a little silly to do that on the same subqueue.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37971>
In some cases, we don't need to signal the syncobj yet because we will
be issuing a second asynchronous instruction afterwards blocking on the
first.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37971>
SB_ID(LS) is currently equal to zero, so this is not a behavior change,
but worth setting it explicitly for clarity and in case the sb
assignments change.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 885805560f ("panvk/csf: fix case where vk_meta is used before PROVOKING_VERTEX_MODE_LAST")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38458>
We check fn_set_fbds_provoking_vertex_stride == 0 to determine whether a
previous function variant has already been allocated, so this value must
be initialized to zero before we start the loop. We could fix this by
explicitly initializing just that field, but I figure it's simpler and
safer to just zero-initialize the whole struct.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 885805560f ("panvk/csf: fix case where vk_meta is used before PROVOKING_VERTEX_MODE_LAST")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38458>
This helps for debug when wanting to check which pipeline mode the
driver has selected for a given section of a frame.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38317>