This lets us batch up the state changes from multiple
vkCmdBindDescriptorSets, which ANGLE and zink will both do in a single
draw.
Improves ANGLE (sysmem) driver_overhead perf by 5.18806% +/- 1.03444% (n=5).
Improves ANGLE aztec_ruins_high perf by ~.3%. (clear result in the graph,
but the screen went to sleep mid way through and so it was high variance)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20084>
It turns out that this actually is supported. GMEM can hold multiple
layers which are cleared, loaded, and resolved separately. The stride
between layers seems to be implicitly calculated based on the tile size,
and we have to match it when blitting to/from GMEM. One tricky thing is
that now we may realize that we don't have enough space for GMEM only
when computing the tiling config, because we may not know the number of
framebuffer layers until we have the framebuffer and too many
framebuffer layers will exhaust GMEM.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19505>
Emit GRAS_SU_CNTL, GRAS_CL_CNTL, the polygon mode, and the VRS registers
in one draw state. We're running out of draw states, and this saves a
draw state while preparing us for the rest of the rasterization state to
be dynamic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18912>
Dynamic renderpasses need vsc_prim_strm_pitch, vsc_draw_strm_pitch
values, and a correct BO. The easiest way to solve this is to
lazily init VSC when it is needed, and not at every cmdbuf
initialization.
Fixes CTS tests (when running with TU_DEBUG=gmem,forcebin):
dEQP-VK.draw.dynamic_rendering.complete_secondary_cmd_buff.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18996>
Traces of GLES games that ANGLE has taken frequently have no-op stencil
writes, which ANGLE and Zink both pass straight through. Given that we
support dynamic stencil state updates via tu_CmdSetStencil*(), draw time
really is the time for deciding this state unfortunately.
Reuse the fancier stencil write enables check from "can we do early z?" in
"can we do LRZ?". This gets one set of draws in among_us to have LRZ, but
I don't see a detectable performance difference.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18691>
This wasn't taking into account the dynamic primitive topology, and it
was suboptimal with dynamic rendering, because we don't know when
compiling the pipeline whether variable multisample rate is being used.
It's going to be even more difficult to support the current approach
with graphics pipeline library because the MSAA state is derived from
mulisample state, rasterization state, input assembly state, and
tessellation state, which may be in different pipelines. Just set it
dynamically based on the pipeline and re-emit it when the pipeline's
MSAA or rectangular/bresenham state differs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18554>
A subpass in gfxbench has the depth buffer present, but not written to,
for a render pass using the depth buffer as an input attachment. We can
skip single-prim-mode and the associated "oh no don't use sysmem" in that
case.
Improves gfxbench vk-5-normal perf by 1.56193% +/- 0.0743035% (n=14).
Part of #6327.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18241>
A later CmdBindPipeline would shrink the two draw states' sizes to the
number of VBs the pipeline actually uses, but we can save some CPU
overhead and memory by not emitting all the unused VBs as well.
Improves zink drawoverhead throughput on test 5 (1 VB change) by 38.5178%
+/- 0.48738% (n=18).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17932>
Emit VFD_DECODE and VFD_DEST separately, similarly to what Gallium does.
This means we emit a few more VFD_DECODE for binning shaders and when
there are unused attributes, but hopefully the overhead won't be too
much. In exchange we lose one draw state, and in the future we can
pre-compute the dynamic vertex state independently of the shader, so
there should be lower CPU overhead with dynamic vertex inputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17554>