Commit graph

169186 commits

Author SHA1 Message Date
Jesse Natalie
31778ac869 microsoft/clc: Add shader model / validator to compiler API
Shader model 6.2 was the upper bounds of what *could* be generated
before, but not all devices support it. And other devices support
even more. So, let's pass in the shader model / validator that will
be used by the API caller.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21178>
2023-03-31 00:37:19 +00:00
Alyssa Rosenzweig
cd03392c7e panfrost: Choose hierarchy masks by vertex count
Currently, we always use a hierarchy mask with all levels enabled. While this is
efficient for geometry-heavy workloads like 3D games, it is wasteful for 2D
applications that draw very few vertices. For drawing just a few textured quads,
the overhead of small bin sizes outweighs any performance advantages, so it's a
bit slower. More problematically, small bin sizes require tremendous amounts of
memory for the polygon lists, leading to significant memory consumption (~10MB)
for the polygon list for even the simplest of 2D blits.

To reduce our memory footprint, we need to choose our hierarchy masks more
carefully. In general, we want to allow small bin sizes for geometry-heavy
workloads but not for geometry-light workloads. We estimate vertex count in the
driver as a proxy for this, and use a simple heuristic to select a bin size
based on the estimated vertex count. None of this is an exact science, and the
heuristic could probably be tuned. Nevertheless, the heuristic used (comparing
framebuffer size to vertex count) works well in practice, significantly reducing
the memory footprint of 2D applications like Firefox without hurting the
performance of 3D applications.

I originally wrote this patch while diagnosing high memory footprints on my
Midgard laptop, which is why only Midgard is in scope here. On Bifrost and
Valhall, we have a similar hiearchy mask selection problem. It seems likely that
the same heuristic would work there too, but it's a different code path that I
have not integrated or tested. I'll leave that for the adventurous reader, to
get the memory footprint win there too.

(It's also possible the win is smaller on newer Malis than on Midgard, since Arm
claims they optimized the tiler data structures on the newer parts. There's
probably still some merit to the idea.)

On Mali-T860, glmark2 -bdesktop frametime decreased by 1.35% +/- 0.91% at 95%
confidence, showing a slight win for 2D workloads No statistically significant
difference for glmark2 -bshading:shading=phong, since 3D workloads continue to
use the same hierarchy masks.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
2023-03-31 00:19:18 +00:00
Alyssa Rosenzweig
1887b26845 panfrost: Estimate vertex count for hier mask
In the next commit, we will refine our algorithm to select hierarchy masks based
on the vertex count. In preparation, augment the driver to track rough estimates
of the vertex count so we have a "geometry complexity" input for the heuristic.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
2023-03-31 00:19:18 +00:00
Alyssa Rosenzweig
cabed30111 panfrost: Clean up tiler calculations
We're about to do some work on this file. Clean it up first.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
2023-03-31 00:19:18 +00:00
Danylo Piliaiev
9f43bc73da freedreno/computerator: Add support for a7xx
Not everything works correctly, e.g. stib seems flakey while stg
seems alright.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
f32eb48095 freedreno/computerator: Templatize a6xx backend
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
8558d07014 freedreno: Add dummy a730/a740 definition
Needed for assembly/disassembly.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
3389c3b84c freedreno: Move fd6_pack.h to common code accessible by computerator
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
48ad485d1c freedreno/computerator: Convert to C++
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
1ae595873f freedreno: C++ fixes for computerator to compile
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
6826a0ab14 freedreno/computerator: C++ proofing
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
5d2ddce99f freedreno/registers: More a7xx regs
Based on 011c54b0 from Jonathan Marek.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Danylo Piliaiev
899d142336 freedreno/registers: Document new CP_EVENT_WRITE::SEQNO
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148>
2023-03-30 23:40:48 +00:00
Alyssa Rosenzweig
1e67f71324 panfrost: Add a v9 fast path for no images
The usual case.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
2023-03-30 23:21:59 +00:00
Alyssa Rosenzweig
e6529d6dcc panfrost: Don't update access with a single batch
drawoverhead test 25 from 462->492

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
2023-03-30 23:21:59 +00:00
Alyssa Rosenzweig
c224bc6f70 panfrost: Mark packs as ALWAYS_INLINE
As Intel does. These functions are written with the expectation that they will
be inlined away, allowing gcc's copy-prop and constant folding to eliminate the
template struct and any unused fields.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
2023-03-30 23:21:59 +00:00
Alyssa Rosenzweig
f8b29f47a0 panfrost: Don't redundantly call emit_const_buf
On Valhall, we were calling emit_const_buf in two places:

1. The main "handle dirty flags" code shared with Bifrost
2. A Valhall-specific shader environment emitter

The latter was not dirty tracked, and the former was not used. That meant we
were calling emit_const_buf way too much. It's not a cheap routine, either.

Instead, use the results from the dirty tracked function in the shader
environment emitter, to avoid the redundant call and get the expected dirty
tracking.

In a Dolphin trace I'm looking at, fps increases 27->33.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
2023-03-30 23:21:59 +00:00
Alyssa Rosenzweig
6ba62be633 panfrost: Print perf debug on seqnum overflow
Another unexpected source of flushes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21830>
2023-03-30 22:53:16 +00:00
Alyssa Rosenzweig
9d3e01ddef panfrost: Print perf debug when flushing everything
..Even if the only batch is the one that's currently bound.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21830>
2023-03-30 22:53:16 +00:00
Mike Blumenkrantz
70b7c24206 zink: stop caching vertex states
I tried to be too clever and ended up wasting cpu cycles. it's
much, much, much, much faster to just generate this one struct array
every time than it is to do set lookups with thousands of members

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
91ddfe55b5 zink: use fast popcnt for vstate draws
also delete some unused stubs for no dynamic vertex input since I'm never
gonna implement that path

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
11a61ab424 zink: don't swizzle velems state for vstate draws
this isn't ever used, so don't touch it

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
f676704fca zink: explicitly pass null velems when creating pipelines with dynamic vinput
this may or may not be a usable pointer, and it's not being read, so
don't pass it at all

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
1ead8f7375 zink: add another vstate draw template for popcnt presence
matching radeonsi

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
41983630c4 zink: bind vertex state directly from draw hook
this is more streamlined and readable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
837168db20 zink: use search_or_add for masking vstate
this should be a significant perf boost instead of multiple lookups

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
4be5caba67 zink: flag vertex buffers for rebind after vstate draws
vstate draws bind their own vertex buffers unrelated to the bound
gallium buffers, so any draw occurring after a vstate draw must
rebind vertex buffers to ensure the correct ones are bound

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Mike Blumenkrantz
6c3b5921b2 zink: omit VkPipelineVertexInputStateCreateInfo with dynamic vinput
this should never be used/needed

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22116>
2023-03-30 22:28:38 +00:00
Konstantin Seurer
f6147051e2 radv: Stop counting user SGPRS separately
Renames radv_declare_shader_args to declare_shader_args and runs it
twice to first gather the user SGPR count without push constants and
descriptor sets.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22119>
2023-03-30 21:52:03 +00:00
Konstantin Seurer
0c915ba501 radv: Set user SGPR locations when declaring args
Merge shader arg declaration with setting up the user data locations.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22119>
2023-03-30 21:52:03 +00:00
Harri Nieminen
fd767a4517 bin: Fix typos
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22189>
2023-03-30 21:37:00 +00:00
Mike Blumenkrantz
db582e5e7d zink: block resolves where src extents > dst extents
vulkan resolves only provide "extents" instead of src and dst regions like
GL, which means vk resolves can't be used to downscale images, as such
operations will instead just crop the image

fixes #8655

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22195>
2023-03-30 21:13:40 +00:00
Samuel Pitoiset
373c6346f5 radv: add push constant state to the cmdbuf state
Push constants are handled per bind point internally. Using a separate
structure in the cmdbuf state would allow us to update it easily
without relying on bound pipelines.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22209>
2023-03-30 20:41:23 +00:00
Samuel Pitoiset
a0baefa033 radv: copy need_indirect_descriptor_sets to radv_cmd_state
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22209>
2023-03-30 20:41:23 +00:00
Samuel Pitoiset
eeefe18f05 radv: add a helper to convert a VkPipelineBindPoint
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22209>
2023-03-30 20:41:23 +00:00
Rob Clark
88f3676019 freedreno: Optimize repeated finishes
Sometimes apps (glances at stk) spin on a syncobj with very short
timeouts.  But ensuring the fence is flushed all the way through to
the kernel (including handling TC unflushed fences) only needs to
be done once.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
2023-03-30 19:42:01 +00:00
Rob Clark
8416bc1c60 freedreno/drm: Disable threaded-submit for msm
We've had drm/sched support on the kernel side for more than a year and
a half.  This makes submit ioctl async by handling fence waits from the
sched's kthread, which is what threaded submit was originally working
around.  For now, threaded submit is only used for virtgpu, which does
not (yet?) have drm/sched support.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
2023-03-30 19:42:01 +00:00
Rob Clark
a16533c43e freedreno/drm: Make threaded-submit optional
We've had gpu-sched support in the kernel for a while now, so our fence
waits are not synchronous in the ioctl path.  The only reason this path
still exists is that virtgpu does not have gpu-sched.  So lets disable
it on msm.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
2023-03-30 19:42:01 +00:00
Rob Clark
cacbbfd6a8 mesa: Add a few more function traces
Sprinkle around a few more traces that were useful in locating fence
waits.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
2023-03-30 19:42:01 +00:00
Rob Clark
c2194552e7 freedreno/drm: Stop cleanup at first active BO
Buffers are added to the deferred freelist at the tail.  And frequently
the last reference is dropped immediately after the submit.  So almost
always, once we see a still-busy BO, the remaining in the list will also
still be busy.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
2023-03-30 19:42:01 +00:00
Rob Clark
712c26e2b6 freedreno/drm: Fast path for idle check
If already idle, no need to cleanup_fences() (and take related lock).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22098>
2023-03-30 19:42:00 +00:00
Mike Blumenkrantz
77c7198d76 zink: fix quads emulation gs with array variables
this was broken for e.g., gl_ClipDistance, which uses explicit array
types and therefore cannot be directly read/written

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22208>
2023-03-30 19:21:52 +00:00
Rob Clark
f9a074dd55 dri2/android: Bypass throttling
The android window system (SurfaceFlinger, et al) already does it's own
throttling.  Trying to do this also in mesa's egl is counterproductive.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22197>
2023-03-30 18:46:04 +00:00
Konstantin Seurer
816f434efc radv/rq: Rematerialize inv_dir before proceed
Helps with register pressure.

Quake II RTX:
Totals from 7 (14.29% of 49) affected shaders:

VGPRs: 688 -> 672 (-2.33%)
CodeSize: 167496 -> 167560 (+0.04%); split: -0.01%, +0.05%
MaxWaves: 70 -> 72 (+2.86%)
Instrs: 31716 -> 31760 (+0.14%); split: -0.02%, +0.16%
Latency: 385343 -> 386040 (+0.18%); split: -0.01%, +0.19%
InvThroughput: 78878 -> 78045 (-1.06%); split: -1.22%, +0.17%
VClause: 596 -> 600 (+0.67%)
Copies: 4774 -> 4747 (-0.57%); split: -0.98%, +0.42%
PreVGPRs: 617 -> 592 (-4.05%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20469>
2023-03-30 18:15:11 +00:00
Mike Blumenkrantz
9c73312248 zink: set src access when rebinding buffers, unset unordered_*
this ensures that the buffer is marked active and prevents promotion
in cases where reordering would break rendering

unordered_read prohibits write reordering for buffers, so setting
this flag must be done when the buffer is actually used, ideally as
late as possible

setting it at the time of (re)bind catches all the buffer rebind cases
which might otherwise erroneously permit reordering

fixes #8381

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22205>
2023-03-30 17:54:11 +00:00
Danylo Piliaiev
2cc9364c20 tu/drm: Support cached non-coherent memory
Requires some hand rolled assembly:
- DC CVAC / DC CIVAC for aarch64
- DCCMVAC / DCCIMVAC for arm32, unfortunately it seems that it is
  illegal to call them from userspace.
- clflush for x86-64

We handle x86-64 case because Turnip may run in x86-64 guest
e.g. in FEX-Emu or Box64.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550>
2023-03-30 15:50:47 +00:00
Danylo Piliaiev
5a59410962 turnip: add cached and cached-coherent memory types
vkd3d requires cached memory type.

MSM backend doesn't have a special ioctl for memory
flushing/invalidation, we'd have to use cvac and civac
arm assembly instructions (would be done in following commit).

KGSL has an the ioctl for this, which is used in this commit.

Note, CTS tests doesn't seem good at testing flushing and
invalidating, the ones I found passed on KGSL with both
functions being no-op.

Based on the old patch from Jonathan Marek.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7636

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20550>
2023-03-30 15:50:47 +00:00
Erik Faye-Lund
bd816084c6 zink: enable spir-v 1.6 for vulkan 1.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18274>
2023-03-30 14:06:54 +00:00
Erik Faye-Lund
99bd1eaf3d zink: use spir-v 1.6 local-size when needed
The WorkgroupSize built-in is deprecated in SPIR-V 1.6, so let's switch
to using LocalSizeId instead, like the spec recommends.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18274>
2023-03-30 14:06:54 +00:00
Erik Faye-Lund
da895596da zink: use demote from spir-v 1.6 when possible
With SPIR-V 1.6, we don't need to enable the extension for demote any
more.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18274>
2023-03-30 14:06:54 +00:00