The Early-Z optimization is disabled when there is a discard
instruction in the shader used in the draw call.
But if discard is the only reason to disable Early-Z, and at
draw call time the updates in the draw call are disabled we
can enable Early-Z using a shader variant.
If there are occlussion queries active we also need to disable
Early-z optimization.
So this patch enables Early-Z in this scenario.
The performance improvement is significant when running gfxbench
benchmark showing an average improvement of 11.15%
fps_avg helped: gl_gfxbench_aztec_high.trace: 3.13 -> 3.73 (19.13%)
fps_avg helped: gl_gfxbench_aztec.trace: 4.82 -> 5.68 (17.88%)
fps_avg helped: gl_gfxbench_manhattan31.trace: 5.10 -> 6.00 (17.59%)
fps_avg helped: gl_gfxbench_manhattan.trace: 7.24 -> 8.36 (15.52%)
fps_avg helped: gl_gfxbench_trex.trace: 19.25 -> 20.17 ( 4.81%)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32028>
We can end up calling vk_multialloc_alloc with 0 size when
`attachment_count` is 0 and `clearValueCount` is 0.
Addressed:
```
Direct leak of 1 byte(s) in 1 object(s) allocated from:
#0 0x7faf033ee0 in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7fada5cc10 in vk_default_alloc ../src/vulkan/util/vk_alloc.c:26
#2 0x7fac50b270 in vk_alloc ../src/vulkan/util/vk_alloc.h:48
#3 0x7fac555040 in vk_multialloc_alloc
../src/vulkan/util/vk_alloc.h:234
#4 0x7fac555040 in void
tu_CmdBeginRenderPass2<(chip)7>(VkCommandBuffer_T*,
VkRenderPassBeginInfo const*, VkSubpassBeginInfo const*)
../src/freedreno/vulkan/tu_cmd_buffer.cc:4634
#5 0x7fac900760 in vk_common_CmdBeginRenderPass
../src/vulkan/runtime/vk_render_pass.c:261
```
seen in:
dEQP-VK.robustness.robustness2.bind.notemplate.r32i.dontunroll.nonvolatile.uniform_texel_buffer.no_fmt_qual.len_252.samples_1.1d.frag
Fixes: 4cfd021e3f ("turnip: Save the renderpass's clear values in the cmdbuf state.")
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32057>
Wire up with common kmod code.
On JM, this is a no-op implementation only allowing medium priority.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31961>
Panfrost currently doesn't support priorities, assumes default priority as
medium to properly support global priorities on Vulkan.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31961>
Establish groups of resolve and unresolve operations that the a7xx
hardware can then use to improve efficiency. Creating such groups enables
continuation of command stream processing while these (un)resolves are in
progress, as long as those latter operations don't depend on the grouped
(un)resolves.
To enable concurrent resolves and unresolves, corresponding fields on the
RB_CCU_CNTL register have to be set appropriately.
Resolve groups are tracked through a scoped struct that logs any pending
resolve operation. Once the group is complete, the emit helper function
will write out the CCU_END_RESOLVE_GROUP event to the command stream.
The buffer ID field on the RB_BLIT_INFO register can be used to disperse
different resolve operations across all available slots in the resolve
engine. The 0x8 and 0x9 IDs are reserved for depth and stencil buffers,
while the 0x0-0x7 range is used for color buffers. A simple incremented
counter is used to assign IDs for all color buffers inside any resolve
group. While it can occur for two color or depth/stencil buffers inside
the same resolve group to have identical IDs, hardware doesn't seem to
have a problem with handling that.
Two TU_DEBUG options are provided, 'noconcurrentresolves' and
'noconcurrentunresolves` disable respective operations by adjusting the
mode set through RB_CCU_CNTL.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31190>
For RB_BLIT_INFO, documentation of the buffer ID field is updated to
explain its use on a7xx.
RB_CCU_CNTL definition for a7xx is updated with fields for concurrent
resolve/unresolve modes and enhanced with dedicated enum types.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31190>
Non-trivial collects (i.e., ones that will introduce moves because the
sources don't line-up with the destination) may cause source intervals
to get implicitly moved when they are inserted as children of the
destination interval. Since we don't support moving intervals in shared
RA, this may cause illegal register allocations. Prevent this by
creating a new top-level interval for the destination so that the source
intervals will be left alone.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31978>
Otherwise anv_descriptor_set is accessed through an unaligned pointer,
which is undefined behavior in C.
```
anv_descriptor_set.c:1620:17: runtime error: member access within misaligned address 0x61900002c2b5
for type 'struct anv_descriptor_set', which requires 8 byte alignment 0x61900002c2b5
```
Fixes: 2570a58bcd ("anv: Implement descriptor pools")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32070>