When CPU clock is the same with the authoritative trace clock (normally
default to CLOCK_BOOTTIME), perfetto drops the non-monotonic snapshots
to ensure validity of the global source clock in the resolution graph.
When they are different, the clocks are marked invalid and the rest of
the clock syncs will fail during trace processing.
There's no central daemon emitting consistent snapshots for
synchronization between CPU and GPU clocks on behalf of renderstages and
counters producers. The sequence-scoped clock (64 <= ID < 128) is unique
per producer + writer pair within the tracing session.
Turnip is a bit tricky here, since clocks may be synchronized before
`tu_perfetto_end_submit` is called (in case of KGSL), but emission of
perfetto event has to happen on the same thread as other renderstage events.
To solve this I save the clocks in `tu_perfetto_state` and emit them in
`stage_end` when needed.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37465>
In short, perfetto doesn't require the initial clock snapshot to be
earlier than the timestamp to be converted. So we don't have to do
complex handling for it.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37465>
Instructions are allocated using a linear context so cannot themselves
be used as a ralloc context anymore. Use the variant's ir instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 114e6a3104 ("ir3: Use a linear allocation context for ir3_instructions.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37523>
- Removed DBG/CHICKEN regs from being stomped, because they randomly
cause issues, and there is no even point of stomping them.
- *ATTR_BUF_GMEM regs are not emitted at every renderpass.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37372>
Use tile_max_w/h which is the HW bound for the tile width/height and is
much smaller than the theoretical maximum width/height with a lopsided
tile with just the depth attachment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37513>
Unlike the store/resolve that uses A2D, The FDM load path uses the 3d
pipeline and is therefore affected by the hardware FDM offset registers.
The fallback sysmem clear path also uses the 3d pipeline. Subtract off
the HW offset from the destination coordinates, similar to how it is
subtracted from viewport and scissor.
Fixes: b34b089ca1 ("tu: Use GRAS bin offset registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37496>
In msm backend's has_set_iova codepath, mapping a BO into a lazy VMA will
require moving that VMA into the zombie VMA mechanism once the BO is
destroyed. That means tu_sparse_vma destruction should avoid freeing VMA if
BO was mapped into it and then zombified.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 764b3d9161 ("tu: Implement transient attachments and lazily allocated memory")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37413>
For the fallback !has_set_iova codepath, util_vma_heap shouldn't be used
for freeing allocations since it's not initialized or used for allocations.
A helper tu_free_iova() function is added to complement tu_allocate_iova(),
handling the vma lock and freeing the allocation in the util_vma_heap when
appropriate.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 93a80f4bb9 ("tu/drm: Split out iova allocation and BO allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37413>
After the refactoring, tu_bo_init() is not allocating iova anymore so it
should also not free the util_vma_heap allocation for the has_set_iova
case.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 93a80f4bb9 ("tu/drm: Split out iova allocation and BO allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37413>
This commit keeps vkcts as a nightly job, but this puts us in shooting
distance to what we've been working for for the past 2.5 years!
We will flip the switch to making this job part of the merge pipeline
after a week of stress testing to make sure reliability issues,
especially around USB, don't come back to haunt my days and nights.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37367>
Again, instrs don't get freed as we go, so the linear gc context saves us
5 pointers per instr.
Fossil replay time for deadspace3 on a debugoptimized build -4.85258% +/-
3.04009% (n=10)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37316>
Since we don't free registers as we go, we can just allocate them in a
linear gc context that gets freed at ralloc destroy. Saves 5 pointers of
memory per register for the ralloc overhead.
Fossil replay time for deadspace3 on a debugoptimized build -4.30353% +/-
1.80078% (n=10).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37316>
Having a list of all enabled/used extensions in meson allows us to get
rid of a lot of boilerplate in every bvh build shader.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35326>
Unlike z/s blits, where we want the fallback to use the re-written blit,
we don't want this in the handle_snorm_copy_blit() path.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37279>
Synced from kernel commit f23e09a60d48 ("drm/msm: Update GMU register
xml").
Update GMU register xml with additional definitions for a7x family.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37216>
Mesa has shifted more things to reg64 instead of seperate 32b HI/LO
reg32's. This works better with the "new-style" c++ builders that
mesa has been migrating to for a6xx+ (to better handle register
shuffling between gens), but it leaves the C builders with missing
_HI/LO builders.
So handle the special case of reg64, automatically generating the
missing _HI/LO builders. (This is for the benefit of the kernel
which cannot use the c++ builders.)
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37216>
Synced from kernel commit bb1953588068 ("drm/msm: remove python 3.9
dependency for compiling msm").
Since commit 5acf49119630 ("drm/msm: import gen_header.py script from Mesa"),
compilation is broken on machines having python versions older than 3.9
due to dependency on argparse.BooleanOptionalAction.
Switch to use simple bool for the validate flag to remove the dependency.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37216>
Since these generated files are no longer checked in, either in mesa or
in the linux kernel, simplify things by dropping the verbose generated
comment.
These were semi-nerf'd on the kernel side, in the name of build
reproducibility, by commit ba64c6737f86 ("drivers: gpu: drm: msm:
registers: improve reproducibility"), but in a way that was semi-
kernel specific. We can just reduce the divergence between kernel
and mesa by just dropping all of this.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37216>
Commit 84e93daa26 ("freedreno/registers: allow skipping the
validation") synced a change that made validation optional for
kernel builds, to avoid a lxml dependency for kernel builds.
But this inadvertantly also disabled schema validation on the
mesa side. CI (and meson "test" target) still validates the
xml against the schema, but it is easier if this is also done
as part of the normal build to avoid suprises from Marge.
Fixes: 84e93daa26 ("freedreno/registers: allow skipping the validation")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37216>
Transient attachments have been in Vulkan since 1.0, and are a way to
avoid allocating memory for attachments that can be stored entirely in
tile memory. The driver exposes a memory type with LAZILY_ALLOCATED_BIT,
and apps use this type to allocate images with TRANSIENT_ATTACHMENT
usage, which are restricted to color/depth/stencil/input attachment
usage. The driver is supposed to then delay allocating memory until it
knows that one of the images bound to the VkDeviceMemory must have
actual backing memory.
Implement this using the "lazy VMA" mechanism added earlier. We reserve
an iova range for lazy BOs, and only allocate them if we chose sysmem
rendering or there is a LOAD_OP_LOAD/STORE_OP_STORE. Because we never
split render passes and force sysmem instead, we don't have to deal with
the additional complexity of that here and just allocate everything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37151>
Up until now tu_device_memory (turnip's VkDeviceMemory) was a thin
wrapper around tu_bo (the GEM BO), so when binding an image to a
VkDeviceMemory we could just store the BO. But now we have to skip
allocating the BO unless we need to for lazily-allocated memory, and the
tracking for that needs to happen at the API level instead of the
kernel/GEM level, so store the tu_device_memory instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37151>
Add an extremely limited form of sparse where zeroing memory is not
supported and only one BO can be fully bound to the sparse VMA
immediately when it's created. This can be implemented on drm/msm even
without VM_BIND, by just reserving the iova range. However kgsl doesn't
let us control iova offsets, so we have to use "real" sparse support to
implement it. In effect this lets us reserve an iova range and then
"lazily" allocate the BO. This will be used for transient allocations in
Vulkan when we have to fallback to sysmem.
As part of this we add skeleton sparse VMA support to virtio, which is
just enough for lazy VMAs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37151>
Reserve an iova range separately before allocating a BO. This reduces
the size of the critical section under the VMA lock and paves the way
for lazy BOs, where iova initialization is separated out.
While we're here, shrink the area where the VMA mutex is applied when
importing a dma-buf. AFAICT it's not useful to lock the entire function,
only the VMA lookup and zombie BO handling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37151>
When enabling
dEQP-VK.renderpass2.dedicated_allocation.attachment_allocation.grow.17,
we see a hang on a618 when a draw is immediately followed by a blit
without anything in between. The draw and clear are writing completely
different surfaces.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37151>