RBBM_PRIMCTR_7 is pre-clipped, whereas RBBM_PRIMCTR_8 is after clipping.
I believe we want pre-clipping, and this is what tu does.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19100>
It was determined through testing that `XXH64_update` is
significantly slower than calling `XXH64` directly as far as small
data velocity is concerned. This function is called on every RP end
which made it visible while profiling but substantial difference
(measured to be ~4x) made it not show up whatsoever.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18428>
This seems like a debug thing, but the blob also seems to use it for
workarounds where an event is required but no actual work needs to be
done. For example CP_REG_WRITE uses it for various workarounds.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19023>
Otherwise we use old invalid value.
Relevant CTS tests:
dEQP-VK.pipeline.monolithic.multisample.misc.dynamic_rendering.multi_renderpass.r8g8b8a8_unorm_r16g16b16a16_sfloat_r16g16b16a16_*
Fixes: ed125e6cca
("tu: Initial support for dynamic rendering")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18999>
Dynamic renderpasses need vsc_prim_strm_pitch, vsc_draw_strm_pitch
values, and a correct BO. The easiest way to solve this is to
lazily init VSC when it is needed, and not at every cmdbuf
initialization.
Fixes CTS tests (when running with TU_DEBUG=gmem,forcebin):
dEQP-VK.draw.dynamic_rendering.complete_secondary_cmd_buff.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18996>
When called with FD_BO_PREP_FLUSH as the only op bit set, the intention
is to only sync with the submit-queue.. we shouldn't be calling down to
the kernel (where op==0 gets interpreted as MSM_PREP_READ).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18926>
We have the correct merged color write enable state as a local var here,
use that instead of the zero cmd->state.color_write_enable. Fixes
blending in many traces with ANGLE on turnip. In the process of fixing,
clarify the logic a little bit.
Fixes: 169e03800d ("tu: Implement VK_EXT_color_write_enable")
Fixes: #7328
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18956>
Previously we would use patch control points if there was no GS, but
it wasn't immediately obvious that this driver param is unused if there
is no GS. Make it output 0 instead, making it clear that we can emit it
even if we don't know the patch control points. This change in the
cmdstream is split out from the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18773>
Even though it's tessellation-related, it's set based on the
tessellation variant which is only known after linking. The param stride
may change due to LTO if fast linking is not used.
Fixes: e9f5de11d4 ("tu: Initial implementation of VK_EXT_graphics_pipeline_library")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18773>
It turns out that the ir3_setup_const_state() already includes reserved
consts, so we were accidentally counting it twice. This makes us use
less consts, and if there are enough reserved consts can make it go
negative and wrap around. Fix this while also making sure the previous
bug remains fixed.
Fixes: 8cb1deded6 ("ir3/analyze_ubo_ranges: Account for reserved consts")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18840>
This fixes the case where fixup_regfootprint() adds to the reg footprint
but it isn't accounted for when determining whether we should double
threadsize in ir3_collect_info(). This would produce a hang on a650 and
above where we have a reg footprint of 33 and doubled threadsize.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18840>
We weren't considering the number of components, which means that we
would overestimate the output size, which could result in nonsensical
things like a reg footprint of larger than r48.x. In addition, in some
cases we can force double regsize which would go badly if this
miscalculated the reg footprint, although currently this only happens
with compute shaders where there are no outputs. It's not actually
necessary anyway, because any output must come from an input or
something in the shader - this is how RA works. Just delete it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18840>
I missed this when enabling pipeline libraries, and we were also setting
this to the wrong thing. Previously we were using rasterization state
when parsing depth/stencil indirectly via builder->depth_clip_disable,
which is not allowed with pipeline libraries. Fixing this is a bit
painful because now RB_DEPTH_CNTL can depend on state from both the
fragment shader library and the pre-rasterization library, in addition
to being disabled via output interface state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18861>
util_format_luminance_to_red returns PIPE_FORMAT_NONE for LATC formats,
because there's no red-alpha variant of it, only red-green.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18596>
I noticed that glmark2's glFinish()es in its offscreen rendering tests
under zink were spinning. When we passed -1 as the timeout for
drmSyncobjWait(), the kernel would immediately return ETIME.
Fixes: 0a82a26a18 ("turnip: Porting to common implementation for timeline semaphore")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18739>
Traces of GLES games that ANGLE has taken frequently have no-op stencil
writes, which ANGLE and Zink both pass straight through. Given that we
support dynamic stencil state updates via tu_CmdSetStencil*(), draw time
really is the time for deciding this state unfortunately.
Reuse the fancier stencil write enables check from "can we do early z?" in
"can we do LRZ?". This gets one set of draws in among_us to have LRZ, but
I don't see a detectable performance difference.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18691>
Now that the state for each pipeline is split into pieces, we can mostly
implement it by stitching together the pieces. One TODO is that we could
do more to split up the pre-rast and FS commands into separate draw
states so that we have to emit less commands when fast linking,
currently we compile the variants but delay emitting the commands until
link time, but note that even the Gallium driver doesn't currently do
this. Given the strict SSO model (e.g. with separate VPC registers for
each stage) it may even be possible to do most of the linking ahead of
time with only a few fixups for corner cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18554>
Right now, we pass around the push constant state in a lot of places,
but we'll want to add other driver-managed constants. Add a struct which
we can add to, and separate out the total driver-reserved constants from
the size of push constants.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18554>
With pipeline libraries, computing this might have to be delayed because
it depends on multiple pieces of state and there's no way to disentangle
them. Therefore we have to store the requisite state in the pipeline and
combine it later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18554>