Shader model 6.2 was the upper bounds of what *could* be generated
before, but not all devices support it. And other devices support
even more. So, let's pass in the shader model / validator that will
be used by the API caller.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21178>
Currently, we always use a hierarchy mask with all levels enabled. While this is
efficient for geometry-heavy workloads like 3D games, it is wasteful for 2D
applications that draw very few vertices. For drawing just a few textured quads,
the overhead of small bin sizes outweighs any performance advantages, so it's a
bit slower. More problematically, small bin sizes require tremendous amounts of
memory for the polygon lists, leading to significant memory consumption (~10MB)
for the polygon list for even the simplest of 2D blits.
To reduce our memory footprint, we need to choose our hierarchy masks more
carefully. In general, we want to allow small bin sizes for geometry-heavy
workloads but not for geometry-light workloads. We estimate vertex count in the
driver as a proxy for this, and use a simple heuristic to select a bin size
based on the estimated vertex count. None of this is an exact science, and the
heuristic could probably be tuned. Nevertheless, the heuristic used (comparing
framebuffer size to vertex count) works well in practice, significantly reducing
the memory footprint of 2D applications like Firefox without hurting the
performance of 3D applications.
I originally wrote this patch while diagnosing high memory footprints on my
Midgard laptop, which is why only Midgard is in scope here. On Bifrost and
Valhall, we have a similar hiearchy mask selection problem. It seems likely that
the same heuristic would work there too, but it's a different code path that I
have not integrated or tested. I'll leave that for the adventurous reader, to
get the memory footprint win there too.
(It's also possible the win is smaller on newer Malis than on Midgard, since Arm
claims they optimized the tiler data structures on the newer parts. There's
probably still some merit to the idea.)
On Mali-T860, glmark2 -bdesktop frametime decreased by 1.35% +/- 0.91% at 95%
confidence, showing a slight win for 2D workloads No statistically significant
difference for glmark2 -bshading:shading=phong, since 3D workloads continue to
use the same hierarchy masks.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
In the next commit, we will refine our algorithm to select hierarchy masks based
on the vertex count. In preparation, augment the driver to track rough estimates
of the vertex count so we have a "geometry complexity" input for the heuristic.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
As Intel does. These functions are written with the expectation that they will
be inlined away, allowing gcc's copy-prop and constant folding to eliminate the
template struct and any unused fields.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
On Valhall, we were calling emit_const_buf in two places:
1. The main "handle dirty flags" code shared with Bifrost
2. A Valhall-specific shader environment emitter
The latter was not dirty tracked, and the former was not used. That meant we
were calling emit_const_buf way too much. It's not a cheap routine, either.
Instead, use the results from the dirty tracked function in the shader
environment emitter, to avoid the redundant call and get the expected dirty
tracking.
In a Dolphin trace I'm looking at, fps increases 27->33.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
I tried to be too clever and ended up wasting cpu cycles. it's
much, much, much, much faster to just generate this one struct array
every time than it is to do set lookups with thousands of members
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
vstate draws bind their own vertex buffers unrelated to the bound
gallium buffers, so any draw occurring after a vstate draw must
rebind vertex buffers to ensure the correct ones are bound
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
Renames radv_declare_shader_args to declare_shader_args and runs it
twice to first gather the user SGPR count without push constants and
descriptor sets.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22119>
vulkan resolves only provide "extents" instead of src and dst regions like
GL, which means vk resolves can't be used to downscale images, as such
operations will instead just crop the image
fixes#8655
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22195>
Push constants are handled per bind point internally. Using a separate
structure in the cmdbuf state would allow us to update it easily
without relying on bound pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22209>
Sometimes apps (glances at stk) spin on a syncobj with very short
timeouts. But ensuring the fence is flushed all the way through to
the kernel (including handling TC unflushed fences) only needs to
be done once.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
We've had drm/sched support on the kernel side for more than a year and
a half. This makes submit ioctl async by handling fence waits from the
sched's kthread, which is what threaded submit was originally working
around. For now, threaded submit is only used for virtgpu, which does
not (yet?) have drm/sched support.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
We've had gpu-sched support in the kernel for a while now, so our fence
waits are not synchronous in the ioctl path. The only reason this path
still exists is that virtgpu does not have gpu-sched. So lets disable
it on msm.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
Buffers are added to the deferred freelist at the tail. And frequently
the last reference is dropped immediately after the submit. So almost
always, once we see a still-busy BO, the remaining in the list will also
still be busy.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
The android window system (SurfaceFlinger, et al) already does it's own
throttling. Trying to do this also in mesa's egl is counterproductive.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22197>
this ensures that the buffer is marked active and prevents promotion
in cases where reordering would break rendering
unordered_read prohibits write reordering for buffers, so setting
this flag must be done when the buffer is actually used, ideally as
late as possible
setting it at the time of (re)bind catches all the buffer rebind cases
which might otherwise erroneously permit reordering
fixes#8381
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22205>
Requires some hand rolled assembly:
- DC CVAC / DC CIVAC for aarch64
- DCCMVAC / DCCIMVAC for arm32, unfortunately it seems that it is
illegal to call them from userspace.
- clflush for x86-64
We handle x86-64 case because Turnip may run in x86-64 guest
e.g. in FEX-Emu or Box64.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550>
vkd3d requires cached memory type.
MSM backend doesn't have a special ioctl for memory
flushing/invalidation, we'd have to use cvac and civac
arm assembly instructions (would be done in following commit).
KGSL has an the ioctl for this, which is used in this commit.
Note, CTS tests doesn't seem good at testing flushing and
invalidating, the ones I found passed on KGSL with both
functions being no-op.
Based on the old patch from Jonathan Marek.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7636
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550>