Move all the logic into tu_CmdBegin/EndQueryIndexedEXT. CmdBegin/EndQuery in
the common runtime is a wrapper that calls tu_CmdBegin/EndQueryIndexedEXT with
index 0.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29441>
Match the structure of the common Vulkan runtime and NVK.
Additionally update a comment to reflect the current state.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29441>
We set LRZ_FEEDBACK_EARLY_LRZ_LATE_Z mask for rendering pass after
HW binning because:
- Draws with EARLY_Z contributed to depth buffer in BINNING stage;
- Draws with LATE_Z is what usually disables LRZ.
- Draws with EARLY_LRZ_LATE_Z are the ones we want because they
represent the common case of FS with "discard".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25345>
Some draws do write depth but cannot contribute to LRZ during the BINNING pass
e.g. when fragment shader has "discard" in it, however they can contribute to
LRZ during the RENDERING pass via LRZ feedback meachanism. This may allow the
draws that follow to depth test against the updated LRZ, this is especially
important if such "bad" draws were at the start of the renderpass.
LRZ feedback happens during the RENDERING pass when LRZ_FEEDBACK_ZMODE_MASK
is set, if draw has a6xx_ztest_mode that has corresponding flag set in
LRZ_FEEDBACK_ZMODE_MASK - its depth values would be used for feedback.
LRZ feedback alongside with LRZ testing also works during sysmem rendering.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25345>
The ir3_info is reset by ir3_collect_shader_info() on the expectation
that all info is collected inside that function. This meant that we were
accidentally disabling early preamble. Re-enable it.
We keep a copy in ir3_info for shader statistics in the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29903>
Additionally simplify the code by inlining the logic from
tu_get_buffer_memory_requirements directly into
tu_GetDeviceBufferMemoryRequirements.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29854>
No Turnip or ir3 changes required, this was implemented in NIR by Intel.
Passes dEQP-VK.spirv_assembly.instruction.*.float_controls2.*
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29866>
Drop tu_sampler_ycbcr_conversion in favor of the common vk_ycbcr_conversion.
This allows using CreateSamplerYcbcrConversion and DestroySamplerYcbcrConversion
from the common runtime and will be required for vk_sampler and for using the
common ycbcr lowering later.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29808>
a7xx renamed events around flushing:
a6xx a7xx
FLUSH CLEAN
INVALIDATE INVALIDATE
FLUSH+INVALIDATE FLUSH
The FLUSH events stayed the same but now they also invalidate. By not
adopting the new CLEAN events, we're inadvertantly invalidating too
much.
This change is just a refactor, that makes generic code consistently use
the a7xx terminology. The next commit will actually make us use CLEAN.
Note that LRZ_FLUSH is deliberately not changed because it actually
also invalidates (and the real name on a6xx was FLUSH_AND_INVALIDATE).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29824>
On a740 TPL1_DBG_ECO_CNTL1.TP_UBWC_FLAG_HINT must be the same between
all drivers in the system, somehow having different values affects
BLIT_OP_SCALE. We cannot automatically match blob's value, so the
best thing we could do is a toggle.
Example:
FD_DEV_FEATURES=enable_tp_ubwc_flag_hint=0
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29754>
cmd->state.attachments was accessed out of bounds, which somehow instead
of crash caused the tracepoint to be skipped.
drawcall_bandwidth_per_sample_sum was divided by 0 when there were no
draw calls in a renderpass.
Fixes: 1aab0fc4f5
("tu: Add attachments' UBWC info to renderpass tracepoint")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29752>
On newer hardware where ZPASS_DONE events are used for sample count writes
the memory polling in occlusion query endings can be wholly avoided. A WFI
is still required, but the performance gain is still in the range of 10% on
the trivial occlusionquery demo.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29403>
On newer devices where ZPASS_DONE events have sample count writing
abilities the firmware expects these events to come in begin-end pairs,
essentially corresponding to a typical occlusion query usage. Since this
event is also used in the autotuner we have to avoid event pairs to be
emitted in an interleaved fashion.
Additional renderpass state now tracks whether a given renderpass contains
an occlusion query. If so, autotuner will emit miscellaneous ZPASS_DONE
events in order to form its own begin-end pairs before and after the
renderpass commands.
Occlusion query behavior inside a renderpass doesn't change. But when used
outside of a renderpass, possible autotuner usage requires to again emit
ZPASS_DONE events that end up forming begin-end pairs of these events both
at the start and the end of the query.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 4e6a1f8852 ("tu/autotune: Use `CP_EVENT_WRITE7::ZPASS_DONE` on A7XX")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29403>
The update_stencil_mask function was removed when moving to the common
Vulkan dynamic state handling.
Fixes: 97da0a7734 ("tu: Rewrite to use common Vulkan dynamic state")
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29658>
For A6XX it's a no-op, but A7XX+ doesn't clamp to [0,1] with disabled
depth clamp, to support VK_EXT_depth_clamp_zero_one we have to always
enable clamp and manually set depth range to [0,1] when rs->depth_clamp_enable
is false.
Passes:
dEQP-VK.depth.*
dEQP-GLES3.functional.fbo.depth.depth_test_clamp.* (zink)
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29387>
Match the structure of NVK and RADV. Pull all event related code from
tu_device.cc/h and tu_cmd_buffer.cc/h into one location.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29630>
This register stores the depth format of the underlying depth
buffer, it seemingly doesn't change anything about the LRZ buffer
itself and has no behavioral changes over setting it to 0.
However, it's possible that there's some case where it does matter
so matching the proprietary driver's behavior is safer.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
A7XX allows setting the FC depth to an arbitrary F32 value rather
than being limited to 0.0/1.0, we use this to match the depth clear
value.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
The layout of the LRZ FC section has changed substantially between
A6XX and A7XX so the best way to express the layout was determined
to be a templated structure.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
A lot of CHIP template parameters were hardcoded to A6XX rather than
the actual chip which would lead to an incorrect command stream being
generated.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
The functionality of GRAS_LRZ_CNTL on A6XX was split into GRAS_LRZ_CNTL
and GRAS_LRZ_CNTL2 on A7XX. The only new field is for the Z function to
be specified.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
This is an exceptional case where any writes to gl_Depth should be
ignored, it means we can use LRZ in this case and don't need to
disable it.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>