Commit graph

178442 commits

Author SHA1 Message Date
Christian Gmeiner
d48d8aefdf docs: Move isaspec out of drivers/freedreno
Lets put it under 'Developer Topics'.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25452>
2023-10-02 07:20:13 +00:00
Iago Toral Quiroga
4afbf4ad31 v3d: get rid of shader_state pointer in v3d_key
Having this pointer in the key is undesirable since it makes
copying keys difficult and error prone (as seen in previous
patches), also, it is only there for convenience and we don't
strictly need it (in fact the vulkan driver doesn't use it at
all), so let's just get rid of it so our v3d_key is fully
static.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:07 +00:00
Iago Toral Quiroga
3fb9e27a3d v3d: fix RAM shader cache
The RAM shader cache was using the v3d_key for hashes and
comparisons which is not correct. Particularly, this struct
has a void pointer where we store a reference to an uncompiled
shader with the NIR code, and that is of course not accounted
for when hashing and comparing keys, which can lead to bogus
cache hits.

This patch introduces a v3d_cache_key that has both the v3d key
and a sha1 of the uncompiled NIR. Now key hashing and comparison
is done on the static part of the v3d key (that is, excluding the
uncompiled shader pointer, which may be invalid in the cache if
the original shader was deleted) and taking the sha1 from the
uncompiled shader. This also makes sure the shader key we store
in the cache has a NULL shader_state pointer to make it more
clear that this field may not be used at all for caching purposes.

This fixes GPU hangs with some OpenCL tests (through Rusticl)
caused by incorrect RAM cache hits.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:07 +00:00
Iago Toral Quiroga
8a4bd328cf v3d: use pre-computed shader sha1 for disk cache
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:06 +00:00
Iago Toral Quiroga
0ed36b524c v3d: compute nir sha1 for uncompiled shader state
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:06 +00:00
Iago Toral Quiroga
adc63d2503 broadcom/compiler: add a couple of shader key helpers
Our shader key includes a void pointer that we can't just memcmp,
so add helpers that allow us toget the 'static' portion and size
of a key. We will use this to fix up the shader cache in v3d in
a later patch.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:06 +00:00
Sergi Blanch Torne
ccd3e68146 ci: disable Collabora's LAVA lab for maintance
This is to inform you of some planned downtime in the LAVA lab as follows:
* Start: 2023-10-02 08:00 BST (07:00 UTC)
* End: 2023-10-02 12:00 BST (11:00 UTC)

Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25368>
2023-10-02 07:42:08 +02:00
Martin Roukala (né Peres)
b1156507ed ci/vkcts-navi21: mark more of the RT handles checks as flakes
We keep hitting more and more of them, so let's be more inclusive.

Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25495>
2023-10-02 03:08:11 +00:00
Martin Roukala (né Peres)
a7ed839490 ci/vkcts-vangogh: mark dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.* as flake
This mirrors what we did on navi21, as there are just too many of these tests.

Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25495>
2023-10-02 03:08:11 +00:00
Janne Grunau
dbe2230408 asahi: decode: Fix uint64_t format modifiers in agxdecode_stateful()
Fixes i386 build.

Fixes: acd5ed0451 ("asahi: decode: Implement VDM call/ret")
Signed-off-by: Janne Grunau <j@jannau.net>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
d99ed6d66d asahi: Handle layered background programs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
3715586580 asahi: Generate layered EOT programs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
c87095e518 asahi: Use a 2D Array texture for array render targets
Fixes KHR-GLES31.core.geometry_shader.layered_framebuffer.blending_support with
eMRT forced.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
87a7b239e1 asahi: Write to cubes/etc attachments as 2D array
To reduce shader variants, the tilebuffer lowering code does not know the
actual texture targets of the spilled render targets, only whether they are
layered or not. As such, all layered targets (3D, cube map, etc) get written out
uniformly as 2D Arrays. For that to work, the driver needs to do the
corresponding transform.

Regular imageStore() instructions are not affected by any of this.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
0cbecc1ad1 asahi: Predicate layer ID reads
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
e2a0d64d52 asahi: Add pass to predicate layer ID reads
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
e518c92d26 asahi: Assume LAYER is flat-shaded
It can't be anything else, this makes sure the varyings are sorted properly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
68437eb0ba asahi: Account for layering for attachment views
Do not force a single-layer view, use an actual array attachment when there are
multiple layers, since this corresponds to a layered framebuffer that will write
to an array with the eMRT path.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
9dc87a00fd asahi: Expose VS_LAYER_VIEWPORT behind a flag
We can't technically expose the extension without a higher GL version, but the
implemented subset should work and this lets us test with piglit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
2396d3fe62 asahi: Use layered layouts
For correct eMRT code.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
8a48af4f8f agx/lower_tilebuffer: Support spilled layered RTs
If we spill render targets with a layered framebuffer, our spilled targets are
assumed to be 2D Arrays (in general). We need to use arrayed image operations to
load/store from these. The layer is given by the layer as read in the fragemnt
shader. This handles the eMRT portion of layered rendering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
041451b655 agx/tilebuffer: Support layered layouts
Just add a flag for it. We don't care about the actual # of layers when
calculating the layout, only the boolean fact of being layered or not. The
reason we need this at all is because the eMRT implementation needs to
account for layering and that is only keyed off the tilebuffer layout.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
b252630604 agx: Support packed layered rendering writes
With the new pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
4a954dff07 asahi,agx: Select layered rendering outputs
These 2 are together

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
88fd76d378 asahi: Add helper to get layer id in internal program
For background/EOT only.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
7d94f2ee49 agx: Add pass to lower layer ID writes
The hardware needs the layer ID and the viewport index packed together. That
consumes an entire varying slot, if we want those available in the frag shader
we need a separate slot. Add a pass to insert the extra packed write.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
175819eec6 agx: Handle layered block image stores
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
c3a208d6d9 agx: Pack block image store dim correctly
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
da0da5d6f8 agx/nir_lower_texture: Allow disabling layer clamping
For background program with layered.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
10b9c2fa36 nir: Support arrays in block_image_store_agx
For layered rendering, runs once per layer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
f4042afd57 nir: Add layer_id_written_agx sysval
We'll implement layer ID reads in the frag shader with a varying read, but if
the VS doesn't write the varying we need to return 0 per the spec. Add a sysval
to detect that case so we can handle it at runtime without keys.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
d83d24e96a agx: Insert jmp_exec_none instructions
With the exception of the backwards branch for loops, all the control flow we
insert during instruction selection just predicates instructions rather than
actually jumping around. That means, for example, we execute both sides of the
if even for a uniform condition! That's inefficient. The solution is insert
jmp_exec_none instructions after control flow in order to skip unexecuted
regions, which is much faster than predicating them out. However, jmp_exec_none
is costly in itself, so we need to use a heuristic to determine when it's
actually beneficial.

This uses a very simple heuristic for this purpose. However, it is a massive
performance speed-up for Dolphin uber shaders: 39fps -> 67fps at 2x resolution.
Nearly a doubling of performance!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
79c4d4213c agx: Add agx_prev_block helper
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
dd6106c8bd agx: Add jumps to block ends
jmp_exec_none variant that jumps to the last instruction of the target block,
rather than the beginning. This is convenient for skipping over elses, while
still executing the block-final pop_exec instruction. Similarly for skipping
over loop bodies while still executing the block-final pop_exec, after break
instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
22ab505a3d agx: Augment if/else/while_cmp with a target
Add an optional pointer to a target block for these instructions. This does NOT
act like a logical branch, and does NOT get added to the logical control flow.
It is ignored wholesale until after RA, when physical edges may be inserted by a
pass we add later in this series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
9a894c9a33 agx: Set PIPE_SHADER_CAP_CONT_SUPPORTED
So we get adequate testing of continues, rather than lowering them in GLSL. We
don't really /want/ to see continues but lowering them away will just make them
harder to test... and besides, we should be optimizing them in NIR (not GLSL) so
we can get the win on Vulkan too.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
d05657e0d6 agx: Hoist sample_mask/zs_emit
Although this is well-motivated, perf effect seems to be neglible for Dolphin.
It does prevent the scheduler from making things worse by sinking these
instructions though, so as a way to prevent future problems this seems sensible.

The kind of problem this affects (late discard) isn't modelled in shader-db.
Nevertheless, nothing concerning there:

   total instructions in shared programs: 1756699 -> 1756722 (<.01%)
   instructions in affected programs: 10106 -> 10129 (0.23%)
   helped: 21
   HURT: 41
   Inconclusive result (value mean confidence interval includes 0).

   total bytes in shared programs: 11525404 -> 11525452 (<.01%)
   bytes in affected programs: 72900 -> 72948 (0.07%)
   helped: 27
   HURT: 41
   Inconclusive result (value mean confidence interval includes 0).

   total halfregs in shared programs: 483394 -> 483286 (-0.02%)
   halfregs in affected programs: 4945 -> 4837 (-2.18%)
   helped: 88
   HURT: 78
   Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
0d8362b842 agx: Align the reg file for 256-bit vectors
This fixes live range splitting with 3D textureGrad(), which involves vectors
larger than the natural 128-bit maximum and hence requires special handling.
Fixes this assert with a combination of debug flags and new patches:

   unsigned int find_best_region_to_evict(struct ra_ctx *, unsigned int,
   unsigned int *, unsigned int *):
   Assertion `(rctx->bound % size) == 0 && "register file size must be aligned
   to the maximum vector size"' failed

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
cb14cddfa5 asahi: Clamp index buffer extent to what's read
This makes for cleaner agxdecodes, I think this matches what I've seen on the
macOS side but I might be misremembering. Certainly shouldn't hurt.

This only applies for direct draws.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Friedrich Vock
2be9b66cdd radv: Fix check in insert_block
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25496>
2023-10-01 13:11:50 +02:00
Friedrich Vock
a0fba17311 radv: Initialize shader freelist on allocation
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25496>
2023-10-01 13:11:43 +02:00
Vitaliy Triang3l Kuzmin
a43ee1ca50 r600: Replace R600_BIG_ENDIAN with UTIL_ARCH_BIG_ENDIAN
In particular, removes the dependency of r600_formats.h on r600_pipe.h so
it can be shared between Gallium and Vulkan.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24513>
2023-10-01 09:25:50 +00:00
Marek Olšák
43e7285069 winsys/amdgpu: pad gfx and compute IBs with a single NOP packet
to minimize CP overhead

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25043>
2023-10-01 08:45:22 +00:00
Marek Olšák
4f660f9937 ac/gpu_info: pad IBs according to ib_size_alignment
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25043>
2023-10-01 08:45:22 +00:00
Marek Olšák
b6f435888b ac/gpu_info: replace ib_alignment with per-IP IB base and size alignments
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25043>
2023-10-01 08:45:22 +00:00
Eric Engestrom
276caddbd9 ci/deqp-runner: restore exit-on-error after getting deqp-runner's exit code
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24738>
2023-10-01 02:00:50 +00:00
Eric Engestrom
f8326d0950 ci/deqp-runner: fix indentation
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24738>
2023-10-01 02:00:50 +00:00
Marek Olšák
6b29c16db8 amd: rename GFX110x to NAVI31-33
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25492>
2023-09-30 23:08:47 +00:00
Marek Olšák
c7e08acd12 ac/llvm: fix flat PS input corruption
Fixes: 0a54fbb5b4 - radeonsi/gfx11: interp changes for 32bit

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25492>
2023-09-30 23:08:47 +00:00
Marek Olšák
d50cc2e0cf ac/gpu_info: don't align IBs to the GL2 cache line size
PAL doesn't do it. If drivers want IBs not to share cache lines with other buffers,
they should align the size manually.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25492>
2023-09-30 23:08:46 +00:00