This is really noticeable for games that resolve a bunch of occlusion
queries (in this case 4096) because it seems that emitting 4096
WAIT_REG_MEM packets can stall more than expected. Fixes this by
waiting for queries in the resolve query shader.
This improves performance of an unreleased game by +~10% (71->78 FPS).
RADV should now be really close to Windows performance for that title.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22630>
With the latest changes in opt_algebraic we got f2u32 in the final code
that should be lowered before conversion to assembly.
Fixes: b3685f3ba7
nir/algebraic: insert patterns inside optimizations list
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22640>
(cherry picked from commit 6a78af1dbb)
Previously, continue preambles and postambles were added directly to
the CS array which means all BOs were correctly added to the BO list,
and this has been broken recently. IB BOs need to be added to the list.
When a BO isn't added to the list as part of a submission, it might
randomly VM faults.
This fixes VM faults and random GPU hangs on NAVI21 in Mesa CI.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8849
Fixes: 41a9bced31 ("radv: Fill continue preambles and postambles properly.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22625>
(cherry picked from commit 84d8ea6e2b)
Fixes a pile of
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.*
on a6xx gen2 and later.
Fixes: 87978c3933 ("freedreno/a6xx: Allow z24s8 format casts")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22610>
(cherry picked from commit b83af7e5b8)
For MTL (verx10 == 125), float64 is supported, but int64 is not.
Therefore we need to lower cluster broadcast using 32-bit int ops.
For gfx12.5+ platforms that support int64, the register regions
used by cluster broadcast aren't supported by the 64-bit pipeline.
On MTL, dEQP-VK.subgroups.clustered.*_double* and
dEQP-VK.subgroups.clustered.*_dvec* were failing to validate the
compiled shader in debug mode, and reportedly gpu-hanging in release
mode.
With this change dEQP-VK.subgroups.clustered.*_double* passed all 48
tests and dEQP-VK.subgroups.clustered.*_dvec* passed all 140 tests on
MTL.
Rework:
* Move from generator to brw_fs_lower_regioning.cpp. (Suggested by
Francisco)
* Apply to verx10 >= 125.. (Suggested by Francisco)
Cc: 23.1 <mesa-stable>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>
(cherry picked from commit fcb72ffd0c)
So far we were only considering the number of vertices to draw to
compute the offset in a stream output buffer.
But this is not correct, as it depends on the primitive type too. For
instance, with 4 vertices, if we use a triangle strip primitive, then 2
triangles are generated from those 4 vertices, so 6 vertices will be
captured.
This fixes spec@!opengl es
3.0@gles-3.0-transform-feedback-uniform-buffer-object.
CC: 23.1
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22607>
(cherry picked from commit a86d18a8c4)
It appears to be possible that IDLE is observed before COMPLETE.
In this case, an application may access present_id in subsequent
QueuePresentKHR and race against the fence worker reading present_id.
Solve this by adding a separate signal_present_id that is used when
completing to avoid the race.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 32f7ff2c20)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22637>
if a winsys is allocated by the frontend, it should be freed by the frontend
rather than the driver to ensure it doesn't leak if it doesn't reach
the driver
cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22396>
(cherry picked from commit 1e6e3427f0)
VK_FORMAT_FEATURE_2_COLOR_ATTACHMENT_BIT is equal to
VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT but we want COLOR_ATTACHMENT_BIT.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22540>
(cherry picked from commit 72a522fb36)
For some reasons this seems broken only on GFX6. Note that PAL doesn't
allowed block-compressed with 1D on all GPUs.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22540>
(cherry picked from commit 8a2fab66de)
mantissa needs to be at the lower part for shift left.
This fixes large integer value conversion.
Cc: mesa-stable
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22570>
(cherry picked from commit 6a39d35df0)
- VAR31 was ignored.
- Only a half of the 16-bit slot was passed through, though I'm not sure
if nir_lower_io handles vec8. The slots are only for GLES and I don't
think a passthrough TCS is possible with GLES.
Fixes: a8e84f50bc - nir: Add helper to create passthrough TCS shader
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21861>
(cherry picked from commit ace8a7068e)
The buffer max_index value in translate_generic struct is relevant for
indexed draw only. So do not clamp the element index in generic_run() as it
is called for non-indexed draw only.
This patch passes index_size to the common generic_run_one function
so index clamping is only performed when a non-zero index_size is specified.
This fixes a text selection bug with kitty terminal emulator running on ARM
when it falls back to the generic translate path for unsigned byte vertex
array.
cc: mesa-stable
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22568>
(cherry picked from commit 13e885842a)
if an attachment other than the msrtss blit attachment has clears pending,
unbinding the other attachment will trigger a clear flush, which will then
recurse into the msrtss blit that's being triggered
instead, save/restore these clears around the msrtss blit since they
can be executed during the normal renderpass
(cherry picked from commit 82add9f2e9)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22598>
with the new zsbuf elimination handling, the fb state calculated in
u_blitter's fb restore may be incorrect if the zsbuf has indeed been
eliminated, so ensure the right fb is stored to be reapplied so that
misrenders will be avoided
fixes some crashes/misrenders in webgl
(cherry picked from commit ec0860b401)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22598>
Same/similar issues are seen on MTL platform as DG2 so disable for both.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22435>
(cherry picked from commit d561bac6bb)
Since Gfx12+ 3DSTATE_STENCIL_BUFFER gained its own
Width/Depth/Format/etc... fields. So don't set those fields but leave
the address/pitch to 0.
Issue found on simulation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15637>
(cherry picked from commit 3ca1fdc8b5)
Indeed, the current framebuffer hardcoded cleanup
is not sufficient.
For instance, this issue is triggered with:
"piglit/bin/fbo-depthstencil clear default_fb -samples=2 -auto"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22234>
(cherry picked from commit 035b84f308)
We can't map the CCS memory region. So, rely on the kernel's zeroing of
new allocations. This is helpful when creating dmabufs that use
compression.
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22487>
(cherry picked from commit b2d7386631)