BLIT_EVENT_STORE_AND_CLEAR presumably swallows the BLIT_EVENT_CLEAR
at the start of the next bin. Should be faster than separate events.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30270>
We want to reduce the buffer allocations for other type of data than
timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
Transfer ops also use CCHE since they use the same path as
texture access.
This addresses the flakiness seen in
KHR-GL46.shader_storage_buffer_object.advanced-usage-sync-cs
CCHE wasn't being invalidated between the compute op and transfer
op which would sometimes lead to old/invalid data to be copied
in the transfer op.
Fixes: fb1c3f7f5d ("tu: Implement CCHE invalidation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11458
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30490>
Add basic KHR_8bit_storage support for Adreno 750 devices, for now enabling
the storageBuffer8BitAccess feature. A separate descriptor is provided for
8-bit storage access. The descriptor index is adjusted appropriately for
8-bit SSBO loads and stores.
The 8-bit SSBO loads cannot go through isam since that instruction isn't
able to handle those. The ldib and stib instruction encodings are a bit
peculiar but they match the blob's image buffer access through VK_FORMAT_R8
and the dedicated descriptor. These loads and stores do not work in
vectorized form, so they have to be scalarized. Additionally stores of
8-bit values have to clear up higher bits of those values.
8-bit truncation can leave higher bits as undefined. Zero-extension of
8-bit values has to use masking since the corresponding cov instruction
doesn't function as intended. 8-bit sign extension through cov from a
non-shared to a shared register also doesn't work, so an exception is
applied to avoid it.
Conversion of 8-bit values to and from floating-point values also doesn't
work with a straightforward cov instruction, instead the conversion has
to go through a 16-bit value.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9979
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254>
Until now, if the 16-bit storage functionality is supported by the
hardware, two separate descriptors were set up, with isam loads and stores
piping through the descriptor of the corresponding size and other storage
access using the 16-bit descriptor.
These changes keep separate descriptors on a650, but leverage post-a650
isam.v functionality that enables use of 16-bit descriptors for 32-bit
loads, removing the need for the separate 32-bit descriptor.
Storage buffer descriptors are set up according to 16-bit storage support
and the indicated isam.v support, using those descriptors for 32-bit isam
loads as well if the latter is present.
Dynamic offset application in tu_CmdBindDescriptorSets is modified to
determine the offset shift value based on the descriptor's format and not
on the descriptor's position in the layout binding.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254>
We set LRZ_FEEDBACK_EARLY_LRZ_LATE_Z mask for rendering pass after
HW binning because:
- Draws with EARLY_Z contributed to depth buffer in BINNING stage;
- Draws with LATE_Z is what usually disables LRZ.
- Draws with EARLY_LRZ_LATE_Z are the ones we want because they
represent the common case of FS with "discard".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25345>
Some draws do write depth but cannot contribute to LRZ during the BINNING pass
e.g. when fragment shader has "discard" in it, however they can contribute to
LRZ during the RENDERING pass via LRZ feedback meachanism. This may allow the
draws that follow to depth test against the updated LRZ, this is especially
important if such "bad" draws were at the start of the renderpass.
LRZ feedback happens during the RENDERING pass when LRZ_FEEDBACK_ZMODE_MASK
is set, if draw has a6xx_ztest_mode that has corresponding flag set in
LRZ_FEEDBACK_ZMODE_MASK - its depth values would be used for feedback.
LRZ feedback alongside with LRZ testing also works during sysmem rendering.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25345>
a7xx renamed events around flushing:
a6xx a7xx
FLUSH CLEAN
INVALIDATE INVALIDATE
FLUSH+INVALIDATE FLUSH
The FLUSH events stayed the same but now they also invalidate. By not
adopting the new CLEAN events, we're inadvertantly invalidating too
much.
This change is just a refactor, that makes generic code consistently use
the a7xx terminology. The next commit will actually make us use CLEAN.
Note that LRZ_FLUSH is deliberately not changed because it actually
also invalidates (and the real name on a6xx was FLUSH_AND_INVALIDATE).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29824>
On a740 TPL1_DBG_ECO_CNTL1.TP_UBWC_FLAG_HINT must be the same between
all drivers in the system, somehow having different values affects
BLIT_OP_SCALE. We cannot automatically match blob's value, so the
best thing we could do is a toggle.
Example:
FD_DEV_FEATURES=enable_tp_ubwc_flag_hint=0
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29754>
cmd->state.attachments was accessed out of bounds, which somehow instead
of crash caused the tracepoint to be skipped.
drawcall_bandwidth_per_sample_sum was divided by 0 when there were no
draw calls in a renderpass.
Fixes: 1aab0fc4f5
("tu: Add attachments' UBWC info to renderpass tracepoint")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29752>
On newer devices where ZPASS_DONE events have sample count writing
abilities the firmware expects these events to come in begin-end pairs,
essentially corresponding to a typical occlusion query usage. Since this
event is also used in the autotuner we have to avoid event pairs to be
emitted in an interleaved fashion.
Additional renderpass state now tracks whether a given renderpass contains
an occlusion query. If so, autotuner will emit miscellaneous ZPASS_DONE
events in order to form its own begin-end pairs before and after the
renderpass commands.
Occlusion query behavior inside a renderpass doesn't change. But when used
outside of a renderpass, possible autotuner usage requires to again emit
ZPASS_DONE events that end up forming begin-end pairs of these events both
at the start and the end of the query.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 4e6a1f8852 ("tu/autotune: Use `CP_EVENT_WRITE7::ZPASS_DONE` on A7XX")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29403>
Match the structure of NVK and RADV. Pull all event related code from
tu_device.cc/h and tu_cmd_buffer.cc/h into one location.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29630>
A7XX allows setting the FC depth to an arbitrary F32 value rather
than being limited to 0.0/1.0, we use this to match the depth clear
value.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
A lot of CHIP template parameters were hardcoded to A6XX rather than
the actual chip which would lead to an incorrect command stream being
generated.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
The functionality of GRAS_LRZ_CNTL on A6XX was split into GRAS_LRZ_CNTL
and GRAS_LRZ_CNTL2 on A7XX. The only new field is for the Z function to
be specified.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453>
Blob doesn't use hw binning with GS on all a6xx and a7xx, however
in Turnip it worked without issues until a750. On a750 there are CTS
failures when e.g. dEQP-VK.subgroups.arithmetic.framebuffer.* in
parallel with "forcebin". It is exacerbated by using "syncdraw".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29074>
We need invalidate CCHE when we optimize out an invalidation of UCHE,
for example a storage image write to texture read. We missed this
earlier because of the blob's tendency to always over-flush, but the
blob does use this when building acceleration structures.
Fixes: 95104707f1 ("tu: Basic a7xx support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28445>
If this state was emitted at the point of previous RP, which
could happen if pipeline is not set at the start of current RP,
we have to emit non-draw-state state since it would become stale
in the next tile.
Fixes test with stale reg dbg:
dEQP-VK.transform_feedback.primitives_generated_query.get.queue_reset.32bit.tese.xfb.color_write_disable_static.patch_list.pgq_default_xfb_default.two_draws.pqg_first.none_2_queries
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28326>
The pipeline used in RP may have been bound in another RP, so
we have to save relevant state and re-apply it on first draw.
Fixes GPU hang in the following test with forced binning + reg stomping:
dEQP-VK.transform_feedback.primitives_generated_query.get.queue_reset.32bit.tese.xfb.color_write_disable_static.patch_list.pgq_default_xfb_default.two_draws.pqg_first.none_2_queries
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28326>
A750 adds explicit definition of PC_TESS_PARAM_SIZE and
PC_TESS_FACTOR_SIZE, probably in order to to correctly overlap execution
of several draws.
Note that blob adds a bit more space ({0x10, 0x20, 0x30, 0x40} bytes)
to PC_TESS_FACTOR_SIZE than we are, but the purpose of this additional
space is unknown.
Emitting these regs on whole A7XX seem to be fine - A740 doesn't
complain.
Fixes GPU faults in Witcher 3.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28210>
Previously cmd->state.vertex_buffers.size changing would not trigger
a re-emission of the state, leading to dEQP-VK.dynamic_state.*.line_width.*
failing on A7XX.
Signed-off-by: Amber Harmonia <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28208>
We precompile static state and count it as dynamic, so we have to
manually clear bitset that tells which dynamic state is set, in order to
make sure that future dynamic state will be emitted. The issue is that
framework remembers only a past REAL dynamic state and compares a new
dynamic state against it, and not against our static state masquaraded
as dynamic.
Example:
- Set dynamic state S with value A
- Bind pipeline with dynamic state S
- Draw
- Bind pipeline with static state S with value B
- Draw
- Set dynamic state S with value A
- Bind pipeline with dynamic state S
- Draw
Previously, at the last draw the dynamic state S was not dirty and
current dynamic state was equal to the past dynamic state, so
it was not emitted, while GPU used value B from static pipeline.
This fix, at the point of static pipeline binding, clears the
bitset which tells that dynamic state S was previously set.
This forces the next dynamic state to be re-emitted.
Fixes broken rendering in Arma 3, and probably some other
games running through DXVK.
Fixes: 97da0a7734
("tu: Rewrite to use common Vulkan dynamic state")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27961>