No need to track is_push_descriptor in templ. No need to conditionally
decide to use set or NULL handle since we pass NULL handle from the cmd
side. Also fixed the arg type mismatch in the template helper.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28686>
The check won't reduce much of the overhead but also adds more when
something is to be fixed (mostly the case for push descriptor).
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28563>
Summary:
1. skip zero count
2. no need to check last binding count on the restore path
3. flatten the helper to avoid a 2nd pass in free_descriptors
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28563>
This is to balance with other checks against it, and meanwhile making it
explicit that real descriptor free shouldn't call the free helper.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28563>
The ring cs shmem cache is already there. The external fence/sempahore
support will be eventually via adopting mesa common drm syncobj support.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28532>
The two extensions are implemented natively but allow to leak structs to
renderer side to avoid deep copying huge driver side pNext chain. It
doesn't make things more robust if we hide the two behind core 1.3 and
drop the two from the protocol so that venus-protocol filters out the
leaked structs. e.g. we'd still have to flip some bits in the core
feature structs.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28532>
The default vn_relax is mainly targeting Vulkan commands expecting a
rely like object creation and property queries. The defined relax reason
here is VN_RELAX_REASON_RING_SPACE. The polling strategy involves more
busy waits to overcome sleep penalty affecting cpu utilization, as well
as an edge case for Android system server which forces to sleep longer
even with trivial hrtimer interval.
However, for the below relax reasons:
- VN_RELAX_REASON_RING_SPACE
- VN_RELAX_REASON_FENCE
- VN_RELAX_REASON_SEMAPHORE
- VN_RELAX_REASON_QUERY
It's a waste of cpu cycles if we do more busy waits if the initial
polled signals are not "ready". Having less busy waits there allows to
jump to higher order of sleeps sooner to disturb the scheduler less
until signaled.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Up to this commit in this MR, the gfxbench manhattan scores have been
improved by 10~15% with ANGLE-on-Venus on some AMD platforms.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Better distinguish different client waiting and prepare for applying
different waiting profile for different reasons.
Default case is avoided in reason string mapping so that below can be
hit upon compilation:
- error: enumeration value ‘XXX’ not handled in switch [-Werror=switch]
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Similar to the rationale for the 50ms -> 5ms adjustment before. When
there's enough cpu cycles, doing so would only help reduce cpu
utilization. When cpu is mostly drained, less host side unnecessary
polling is favored by the scheduler. Also in the latter case, it'd be
the non-primary ring, so it doesn't hurt to idle out faster.
Besides the theory, there's no regression in popular benchmarks, but
only power wins. Making the idle timeout too small will lead to overhead
built up. e.g. From the initial notify to ring being waken up, it's
about 200us. The notify op is more expensive than ring thread doing a
few more polls. However, we normally would save many more polls by idle
out earlier. From my local testing, reducing down to 500us won't incur
and real perf regressions either.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
The ring notification can be blocked on renderer main thread if a vq cmd
is waiting for a ring cmd (via a different non-idle ring). This change
optimizes to only try waking up the ring on the idle timeout period.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28287>
Extend the vkGetPhysicalDeviceFormatProperties2 cache to include
VkFormatProperties3 from the pNext chain. VkFormatProperties3 was
observed being always attached for DXVK and thus skipping the cache
if not handled.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28194>
For guest vram, there's already roundtrip to protect device memory alloc
ordering. This change adds the same protection for shmem used in below
scenarios and optimize to wait for new shmem only.
- reply shmem
- indirect upload shmem
- cmd stream shmem
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28147>
A roundtrip is to ensure a cmd via virtqueue happens on the renderer
side before a ring relies on it. Since venus is now with multi-ring, the
roundtrip submit and wait should belong to ring instead of instance, and
each ring owns its own roundtrip seqno to synchronize with virtqueue.
No behavior change.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28147>
Below is the common client pattern (app, angle, zink, etc):
- a few resets for queries to be used in this batch
- optional, depending on EXT_host_query_reset
- a few queries
- incremental
- can cross query pool boundary
The HW drivers normally have faster shader path when there are too many
individual reset and copies. Without further resolving, this ends up
with linear overhead on the 2d engines. This change has largely
optimized that:
- angle: many copies => 1 copy (or 2)
- zink: many resets and copies => 1 reset and 1 copy (or 2)
and again...some more renamings around
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28112>
Drop vn_combine_query_records_and_record_feedback to save the lines of
codes for args preparing. Also refactor to avoid the cmd stride trick,
but use indexing.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28112>
1. move record into alloc to simplify caller handling, which aligns
with ffb and sfb as well
2. simplify locking to reduce lock overhead
3. remove unbalanced free from record helper
4. move reset to alloc
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28112>
Add a new free helper while renaming the alloc one as well. During query
record resolving, use a dropped list to store those records being reset.
This is to prepare for later further query record resolving.
This change also simplifies a query pool compare.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28112>