Some limits got stuck to the old binding table limits. Those don't
apply anymore since EXT_descriptor_indexing was implemented.
Fixes: 6e230d7607 ("anv: Implement VK_EXT_descriptor_indexing")
Fixes: 96c33fb027 ("anv: enable direct descriptors on platforms with extended bindless offset")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31999>
(cherry picked from commit d6acb56f11)
etnaviv_ml.h uses dynarray, but the u_inlines.h header is needed by
some of the files that include it.
Fixes: d6473ce28e ("etnaviv: Use NN cores to accelerate convolutions")
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31842>
(cherry picked from commit 70bff0c971)
Because dyn_start and dyn_end are indices into
nvk_root_descriptor_table->dynamic_buffers, we would need to offset
cbuf->dynamic_idx by
nvk_root_descriptor_table->set_dynamic_buffer_start[cbuf->desc_set]
in order to do those comparisons correctly.
We could do that, but it's simpler and no less precise to sinply
re-use the same comparison that we do in the other cases here.
This fixes a rendering artifact in Baldur's Gate 3 (Vulkan), which
regressed with the commit listed below.
Fixes: 091a945b57 ("nvk: Be much more conservative about rebinding cbufs")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32086>
(cherry picked from commit dc12c78235)
Previously, we were passing the end index which was incorrect.
Also, improve the macros so that they can take an expression for
the count.
Fixes: b2d85ca36f ("nvk: Use helper macros for accessing root descriptors")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32086>
(cherry picked from commit 64f17c1391)
The Early-Z optimization is disabled when there is a discard
instruction in the shader used in the draw call.
But if discard is the only reason to disable Early-Z, and at
draw call time the updates in the draw call are disabled we
can enable Early-Z using a shader variant.
If there are occlussion queries active we also need to disable
Early-z optimization.
So this patch enables Early-Z in this scenario.
The performance improvement is significant when running gfxbench
benchmark showing an average improvement of 11.15%
fps_avg helped: gl_gfxbench_aztec_high.trace: 3.13 -> 3.73 (19.13%)
fps_avg helped: gl_gfxbench_aztec.trace: 4.82 -> 5.68 (17.88%)
fps_avg helped: gl_gfxbench_manhattan31.trace: 5.10 -> 6.00 (17.59%)
fps_avg helped: gl_gfxbench_manhattan.trace: 7.24 -> 8.36 (15.52%)
fps_avg helped: gl_gfxbench_trex.trace: 19.25 -> 20.17 ( 4.81%)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32028>
(cherry picked from commit 5b951bcdd7)
We can end up calling vk_multialloc_alloc with 0 size when
`attachment_count` is 0 and `clearValueCount` is 0.
Addressed:
```
Direct leak of 1 byte(s) in 1 object(s) allocated from:
#0 0x7faf033ee0 in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7fada5cc10 in vk_default_alloc ../src/vulkan/util/vk_alloc.c:26
#2 0x7fac50b270 in vk_alloc ../src/vulkan/util/vk_alloc.h:48
#3 0x7fac555040 in vk_multialloc_alloc
../src/vulkan/util/vk_alloc.h:234
#4 0x7fac555040 in void
tu_CmdBeginRenderPass2<(chip)7>(VkCommandBuffer_T*,
VkRenderPassBeginInfo const*, VkSubpassBeginInfo const*)
../src/freedreno/vulkan/tu_cmd_buffer.cc:4634
#5 0x7fac900760 in vk_common_CmdBeginRenderPass
../src/vulkan/runtime/vk_render_pass.c:261
```
seen in:
dEQP-VK.robustness.robustness2.bind.notemplate.r32i.dontunroll.nonvolatile.uniform_texel_buffer.no_fmt_qual.len_252.samples_1.1d.frag
Fixes: 4cfd021e3f ("turnip: Save the renderpass's clear values in the cmdbuf state.")
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32057>
(cherry picked from commit c923eff742)
Non-trivial collects (i.e., ones that will introduce moves because the
sources don't line-up with the destination) may cause source intervals
to get implicitly moved when they are inserted as children of the
destination interval. Since we don't support moving intervals in shared
RA, this may cause illegal register allocations. Prevent this by
creating a new top-level interval for the destination so that the source
intervals will be left alone.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31978>
(cherry picked from commit b36a7ce0f1)
Otherwise anv_descriptor_set is accessed through an unaligned pointer,
which is undefined behavior in C.
```
anv_descriptor_set.c:1620:17: runtime error: member access within misaligned address 0x61900002c2b5
for type 'struct anv_descriptor_set', which requires 8 byte alignment 0x61900002c2b5
```
Fixes: 2570a58bcd ("anv: Implement descriptor pools")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32070>
(cherry picked from commit a2c4a34303)
Copy propagation would incorrectly occur in this code
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
to create
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, u0<0>:UD NoMask group0
This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.
It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)
mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
I'm not sure it's worth it.
No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
(cherry picked from commit 80a5d158ae)
Fixes: 212f1ab40e ("nvc0: support PIPE_CAP_RESOURCE_FROM_USER_MEMORY_COMPUTE_ONLY")
Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
(cherry picked from commit 277925471e)
Having the RGB* formats before the BGR* formats in the table causes
problems where under some circumstances, some applications end up
with the wrong colors.
The repro case for me is: Xvnc + mutter + chromium
There was an existing comment in dri_fill_in_modes() which explained
the problem. This was lost when dril_target.c was created.
Fixes: ec7afd2c24 ("dril: rework config creation")
Fixes: 3de62b2f9a ("gallium/dril: Compatibility stub for the legacy DRI loader interface")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31950>
(cherry picked from commit e1098310da)
When etna_screen_create(..) is called with gpu != NULL and npu == NULL,
screen->pipe_nn is incorrectly set up. This leads to an unintended
stream configuration for compute-only contexts, as determined by
pipe = (compute_only && screen->pipe_nn) ? screen->pipe_nn : screen->pipe;
To address this, extend the gpu != npu condition by adding a check for
npu != NULL to ensure pipe_nn is only initialized when both gpu and npu
are provided.
Fixes: a4653587cc ("etnaviv: Add a separate NPU pipe")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32025>
(cherry picked from commit f4e8849d79)
Also change ifdef's from just `HAVE_LIBDRM` to check for both LIBDRM
and for UDMABUF HEADER. preventing unbalanced guards preventing part of
the code from being included if you just have LIBDRM or just have the
udmabuf headers.
Fixes: 4cfaf10c ("llvmpipe: Only use udmabuf with libdrm")
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31877>
(cherry picked from commit 159fb9691d)
Fixes KHR-GL46.multi_bind.dispatch_bind_image_textures which decides
max_image_samples==1 means that MSAA image load/store is supported.
Switch the condition to > 0, which matches what zink does.
Fixes: e277b13182 ("freedreno: Stop exposing MSAA image load/store on desktop GL.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31857>
(cherry picked from commit f8e7c0e2a2)
This change fixes the calculation of the number of cubemap
images which requires a 6x multiplier.
This commit is inspired from nir_lower_robust_access and fixes
at least the following tests on cayman:
spec/arb_shader_image_load_store/layer/imagecubearray/layered binding test: fail pass
spec/arb_shader_image_load_store/max-size/imagecubearray max size test/8x8x2046x1: fail pass
Fixes: 27f5157777 ("r600/sfn: Add lowering pass to legalize image access")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31626>
(cherry picked from commit 5c63d7a916)
According to PAL, an image with DCC/HTILE and mipmaps isn't coherent
with L2 when the mip level is in the metadata mip-tail region.
This fix isn't super optimal because the driver should rely on the
subresource range to determine if the mip level is in the mip-tail,
but it's easier to backport. Upcoming commits will optimize that.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11939
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31920>
(cherry picked from commit b23cc8c1d3)
This is a workaround that is still in progress, see HSD 22020521218.
If we don't have these NOPs we may see rendering corruption or even
GPU hangs.
While we still don't fully understand the issue from the hardware
point of view, let's have this workaround so we can pass CTS and move
things forward. If we need to change this later, we can. Besides, the
impact is minimal. Shaderdb/fossilize report no changes for this
patch.
On our Blackops trace, the lack of this patch causes corruption in fog
rendering (rectangles where fog was supposed to be shown don't show
the fog).
On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without
this patch we get a GPU hang.
Backport-to: 24.2
Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>
(cherry picked from commit 5ca883505e)
This bit is not needed for barriers and appears to trigger a
performance regression. So leave it for just for AUX-TT
flushing/invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3814dee1a ("anv: add plumbing/support for L3 fabric flush")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12090
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
(cherry picked from commit cb224370b6)
Detect when the temporary index buffer cannot be generated due to too
large primitive count, and simply drop the draw on the floor.
Fixes a webgl reachable asan/crash.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12092
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31914>
(cherry picked from commit 98ff271c5a)
This fixes a CTS hang on Hawaii.
We previously only did a CB/DB flush,
but that doesn't include a L2 cache flush.
Also fix the comment that said this is for GFX9+.
Fixes: 7c62f6fa01
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31906>
(cherry picked from commit 96b95c8427)
We may still need to insert a continue block even if there is only one
backedge, in a situation like:
for (...) {
if (...) continue;
foo();
break;
}
We want foo() to be executed before reconverging. This is important for
the BVH encoding kernel, which launches an invocation for each node in
the tree and does a preorder traversal:
while (true) {
if (!ready[node]) continue;
encode();
for (child node)
ready[child] = true;
break;
}
For the first few nodes, which will be in the same wave, we need
encode() for the root node to be called first, then its children spin
until ready, then the children call encode(), and so on. This can only
work if the children that aren't ready yet are parked while the parent
executes encode(), which requires the continue block.
This is also required because divergence analysis will assume that
uniform values written before the continue are still uniform after it,
which isn't the case now and causes an RA validation failure with Godot.
Fixes: 0fa93fb662 ("ir3: Fix convergence behavior for loops with continues")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31905>
(cherry picked from commit d3533716f9)
Computing 'htile_size/meta_size' is allowed for RADEON_SURF_MODE_1D when
RADEON_SURF_TC_COMPATIBLE_HTILE isn't set.
Lacking of computing causes performance degradation in some scenarios.
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31617>
(cherry picked from commit 0442a6c292)