Commit graph

4455 commits

Author SHA1 Message Date
Dave Airlie
1486b54e80 panvk: move to using common command buffer status
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16922>
2022-11-11 05:01:24 +00:00
Jason Ekstrand
84cd81e104 panvk: Use common code for command buffer lifecycle management
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16922>
2022-11-11 05:01:24 +00:00
Jason Ekstrand
2126bb6c92 panvk: Drop panvk_cmd_buffer::queue_family_index
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16922>
2022-11-11 05:01:24 +00:00
Alyssa Rosenzweig
811f8a1946 panfrost: Require 64-byte alignment on imports
While Panfrost allocates linear images with strides that are a multiple of 64
bytes, other dma-buf producers on the system may not satisfy this requirement.
However, at least on v7 and newer, any image with a regular format must have a
stride that is a multiple of 64 bytes.

This fixes a real bug in an application that created a linear R8_UNORM image
with stride 480 bytes, imported it as an EGL_image, and then tried to texture
from it with the GPU. Previously, the driver allowed this situation but it
resulted in an imprecise fault from the GPU. This patch corrects the driver to
reject the import as invalid due to the unaligned stride, ensuring we never
attempt to texture from such a resource.

To implement, we add some new layout queries to centralize knowledge about the
stride alignment requirements, and we sprinkle in asserts to show how the
invariant is upheld throughout the lifecycle of image creation to texturing.

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19620>
2022-11-10 14:37:18 +00:00
Alyssa Rosenzweig
1827b4a2db panfrost: Compile indirect dispatch shader on first use
For 2D UI workloads and even most 3D workloads, the indirect dispatch shader
won't actually be needed, but we currently compile it during eglInitialize() on
every v7 application. That hurts app start-up time, especially given that this
shader doesn't hit the disk cache. We can instead defer compiling this shader
until it's actually needed, when glDispatchComputeIndirect() gets called.

The tradeoff is that the first glDispatchComputeIndirect() call will be (much)
slower than successive calls, since we need to build and compile this internal
shader. I'm unconvinced that's a problem in practice.

An app would need to call glDispatchComputeIndirect for the first time in the
middle of a scene.  2D apps never would call that, OpenCL doesn't have that, and
GL compute will have the same costs just moved around.  So it's down to a 3D
GLES3.1 app that indirectly dispatches compute for the first time time in the
middle of a scene. Which, meh? It's not entirely implausible but we have bigger
fish to fry, and this fixes a real problem (about 5% of eglInitialize time spent
building this shader that won't actually get used).

es2_info starts slightly faster with this change.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19622>
2022-11-10 14:22:56 +00:00
Yonggang Luo
e399dc3544 util: normalize include files under src/util/*.h with util/ prefix in mesa code base
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19546>
2022-11-10 06:27:25 +00:00
Alyssa Rosenzweig
f8553ef44c panfrost: Remove out-of-band CRC support
Without additional signalling of modifiers, CRCs cannot possibly in a correct
way work across process boundaries. Since we don't do that signalling, we should
not be allocating private CRCs for imported resources, and we should not be
using our own private CRCs for internal resources.

The entire out-of-bands CRC infrastructure is a hack to let us do CRCs even for
imported/exported BOs, but that can't possibly work. Remove it, and remove a
pile of special cases across the driver.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19576>
2022-11-09 15:56:20 +00:00
Benjamin Tissoires
67cee534a8 CI: convert to use the new S3 server instead of the legacy minio
We don't need to login anymore, but we can't use plain minio commands
now. `ci-fairy` got a helper as `s3cp` to keep an almost identical
API.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19076>
2022-11-04 11:41:42 +00:00
Alyssa Rosenzweig
5f93feed61 panfrost: Don't merge workgroups with variable shared mem
If nir->info.shared_size = 0 but grid->variable_shared_mem > 0, the shader uses
shared memory but the compiler may not realize that. We need to disable
workgroup merging even in this case. The alternate approach is to statically
check for shared intrinsics in the compiler, but this is a bit easier all things
considered.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18581>
2022-11-02 23:36:56 +00:00
Alyssa Rosenzweig
b35a55bb42 panfrost: Precompile shaders
We have no vertex shader key, and unless legacy GL features are used, the
fragment shader key is known ahead-of-time. That means we can precompile shaders
at CSO create time, hopefully avoiding some draw-time jank.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
2022-11-02 16:52:11 +00:00
Alyssa Rosenzweig
2316b80d77 panfrost: Don't use nir_variable to link varyings
NIR deemphasizes nir_variable. We want to transition off it. Instead of walking
the list of variables and playing games with the GLSL types to collect varying
information, walk the list of instructions and use the I/O semantics to collect
similar information.

In addition to avoiding the reliance on nir_variable, this fixes handling of
struct varyings under certain circumstances. Such programs are compiled by the
GLES3.1 CTS but not used, so without this fix, the affected tests would regress
when precompiling.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
2022-11-02 16:52:11 +00:00
Alyssa Rosenzweig
d0281fc16a pan/mdg: Use bifrost_nir_lower_store_component
Move the pass from the Bifrost compiler to the Midgard/Bifrost common code
directory, and take advantage of it on Midgard, where it fixes the same
tests as it fixed originally on Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
2022-11-02 16:52:11 +00:00
Alyssa Rosenzweig
17589be72b pan/mdg: Use .u32 for flat shading
This is simple and matches what we do on Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
2022-11-02 16:52:11 +00:00
Alyssa Rosenzweig
225a8f6e27 pan/mdg: Don't pair ST_VARY.a32 with other instrs
For some reason, LD_ATTR/ST_VARY.a32 bundles raise INSTR_INVALID_ENC, at
least on Mali-T860. Don't construct such pairs. This is a blunt hack but
I don't know where this curveball requirement is coming from and this
unblocks the rest of this series.

total instructions in shared programs: 99879 -> 99788 (-0.09%)
instructions in affected programs: 3179 -> 3088 (-2.86%)
helped: 49
HURT: 9
helped stats (abs) min: 1.0 max: 6.0 x̄: 2.04 x̃: 2
helped stats (rel) min: 0.93% max: 10.53% x̄: 5.46% x̃: 4.88%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.61% max: 2.13% x̄: 1.41% x̃: 1.14%
95% mean confidence interval for instructions value: -1.93 -1.20
95% mean confidence interval for instructions %-change: -5.37% -3.41%
Instructions are helped.

total bundles in shared programs: 43778 -> 45102 (3.02%)
bundles in affected programs: 10737 -> 12061 (12.33%)
helped: 10
HURT: 369
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.50 x̃: 1
helped stats (rel) min: 2.90% max: 18.75% x̄: 6.93% x̃: 5.21%
HURT stats (abs)   min: 1.0 max: 10.0 x̄: 3.63 x̃: 4
HURT stats (rel)   min: 0.82% max: 44.44% x̄: 15.27% x̃: 13.33%
95% mean confidence interval for bundles value: 3.29 3.69
95% mean confidence interval for bundles %-change: 13.68% 15.69%
Bundles are HURT.

total quadwords in shared programs: 76783 -> 77914 (1.47%)
quadwords in affected programs: 18633 -> 19764 (6.07%)
helped: 9
HURT: 370
helped stats (abs) min: 1.0 max: 2.0 x̄: 1.22 x̃: 1
helped stats (rel) min: 0.87% max: 8.33% x̄: 3.71% x̃: 3.85%
HURT stats (abs)   min: 1.0 max: 7.0 x̄: 3.09 x̃: 3
HURT stats (rel)   min: 0.82% max: 35.00% x̄: 7.82% x̃: 6.11%
95% mean confidence interval for quadwords value: 2.82 3.15
95% mean confidence interval for quadwords %-change: 7.02% 8.06%
Quadwords are HURT.

total registers in shared programs: 7266 -> 7076 (-2.61%)
registers in affected programs: 1224 -> 1034 (-15.52%)
helped: 171
HURT: 25
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.27 x̃: 1
helped stats (rel) min: 8.33% max: 50.00% x̄: 21.85% x̃: 20.00%
HURT stats (abs)   min: 1.0 max: 2.0 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 10.00% max: 100.00% x̄: 35.73% x̃: 33.33%
95% mean confidence interval for registers value: -1.10 -0.84
95% mean confidence interval for registers %-change: -17.69% -11.32%
Registers are helped.

total threads in shared programs: 4956 -> 5019 (1.27%)
threads in affected programs: 99 -> 162 (63.64%)
helped: 43
HURT: 6
helped stats (abs) min: 1.0 max: 2.0 x̄: 1.74 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 2.0 max: 2.0 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for threads value: 0.91 1.66
95% mean confidence interval for threads %-change: 67.36% 95.90%
Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
2022-11-02 16:52:11 +00:00
Alyssa Rosenzweig
e04156b42a pan/mdg: Disassemble the .a32 bit
Corresponds to .auto32 on Bifrost. This is helpful for a conformant
implementation of flat shading.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19363>
2022-11-02 16:52:11 +00:00
Illia Abernikhin
aa4ac5ff8b utils: Merge util/debug.* into util/u_debug.* and remove util/debug.*
Rename env_var_as_unsigned() -> debug_get_num_option(), because duplicate
Rename env_var_as_bool() -> debug_get_bool_option(), because duplicate

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7177

Signed-off-by: Illia Abernikhin <illia.abernikhin@globallogic.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19336>
2022-11-02 07:25:39 +00:00
Alyssa Rosenzweig
2a6338722e panfrost: Don't use nir_variable in the compilers
More future proof, simpler, and works with early I/O lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19456>
2022-11-02 04:22:06 +00:00
Alyssa Rosenzweig
6a87719d35 pan/bi: Don't lower outputs for compute
Useless.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19456>
2022-11-02 04:22:06 +00:00
Anton Bambura
18e7f5c428 panfrost: Enable Mali-T620
Support of this GPU is now good enough to enable it

Signed-off-by: Anton Bambura <jenneron@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19408>
2022-10-31 13:02:06 +00:00
Alyssa Rosenzweig
1ff3b87ba2 panfrost: Enable rendering to 16-bit and 32-bit
Bifrost onwards handle this in hardware, and the Midgard lowering isn't
too terrible. Enable the format, otherwise desktop GL apps such as
Hacknet try to render to the format and get an incomplete framebuffer.

Cc stable because apparently we've been advertising this format
unintentionally as a result of some other interaction? Unclear how
Hacknet is hitting this, maybe it's an app bug. Shrug, it's not a big
deal regardless.

Additionally, we need to restrict texturing from 32-bit normalized due
to a restriction added with the v7 pixel format fiasco. That means
restricting rendering to 32-bit normalized on v7 onwards.

Closes: #7251
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Dang Huynh <danct12@disroot.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19358>
2022-10-29 18:23:55 +00:00
Alyssa Rosenzweig
3a9cdd780d panfrost/ci: Disable trace-based testing
Trace-based testing has not worked for Panfrost. It was a neat
experiment, and I'm glad we tried it, but the results have been mostly
negative for the driver. Disable the trace-based tests.

For testing that specific API features work correctly, we run the
conformance tests (dEQP), which are thorough for OpenGL ES. For big GL
features, we run Piglit, and if there are big GL features that we are
not testing adequately, we should extend Piglit for these. For
fine-grained driver correctness, we are already covered.

Where trace-based testing can fit in is as a smoke test, ensuring that
the overall rendering of complex scenes does not regress. In principle,
that's a lovely idea, but the current implementation has not worked out
for Panfrost thus far. The crux of the issue is that the trace based
tests are based on checksums, not fuzzy-compared reference images. That
requires updating checksums any time rendering changes. However, a
rendering change to a trace is NOT a regression. The behaviour of OpenGL
is specified very loosely. For a given trace, there are many different
valid checksums. That means that correct changes to core code frequently
fail CI after running through the rest of CI, only because a checksum
changed in a still correct way. That's a pain to deal with, exacerbated
by rebase pains, and provides negative value to the project. Some recent
examples of this I've hit in the past two weeks alone:

   panfrost: Enable rendering to 16-bit and 32-bit
   4b49241f7d ("panfrost: Use proper formats for pntc varying")
   ac2964dfbd ("nir: Be smarter fusing ffma")

The last example were virgl traces, but were especially bad: due to a
rebase fail, I had to update traces /twice/, wasting two full runs of
pre-merge CI across *all* hardware. This was extremely wasteful.

The value of trace-based testing is as a smoke test to check that traces
still render correctly. That is useful, but it turns out that checksums
are the wrong way to go about it. A better implementation would be
storing only a single reference image from a software rasterizer per
trace. No driver-specific references would be stored. That reference
image must never change, provided the trace never changes. CI would then
check rendered results against that image with tolerant fuzzy
comparisons. That tolerance matches with the fuzzy comparison that the
human eye would do when investigating a checksum change anyway. Yes, the
image comparison JavaScript will now report that
0 pixels changed within the tolerance, but there's nothing a human eye
can do with that information other than an error prone copypaste of new
checksums back in the yaml file and kicking it back to CI, itself a
waste of time.

Finally, in the time we've had trace-based testing alongside the
conformance tests, I cannot remember a single actual regression in one
of my commits the trace jobs have identified that the conformance tests
have not also identified. By contrast, the conformance test coverage has
prevented the merge of a number of actual regressions, with very few
flakes or xfail changes, and I am grateful we have that coverage. That
means the value added from the trace jobs is close to zero, while the
above checksum issues means that the cost is tremendous, even ignoring
the physical cost of the extra CI jobs.

If you work on trace-based testing and would like to understand how it
could adapted to be useful for Panfrost, see my recommendations above.
If you work on CI in general and would like to improve Panfrost's CI
coverage, what we need right now is not trace-based testing, it's
GLES3.1 conformance runs on MediaTek MT8192 or MT8195. That hardware is
already in the Collabora LAVA lab, but it's not being used for Mesa CI
as the required kernel patches haven't made their way to mainline yet
and nobody has cherry-picked them to the gfx-ci kernel. If you are a
Collaboran and interested in improving Panfrost CI, please ping
AngeloGioacchino for information on which specific patches need to be
backported or cherry-picked to our gfx-ci kernel. Thank you.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19358>
2022-10-29 18:23:55 +00:00
Alyssa Rosenzweig
78785f3b18 pan/mdg: Don't schedule across memory barrier
Fixes KHR-GLES31.core.shader_image_load_store.basic-glsl-misc-cs

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
2022-10-27 20:13:11 +00:00
Alyssa Rosenzweig
934f9bbae7 panfrost: Avoid a XFB special case
This worked around an issue that doesn't apply to the Valhall XFB lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
2022-10-27 20:13:11 +00:00
Alyssa Rosenzweig
0955fe8fe2 panfrost: Use compute-based XFB on Midgard
Now we're back to a single XFB implementation for all gens. Fixes:

   KHR-GLES31.core.draw_indirect.advanced-twoPasses-transformFeedback-arrays
   KHR-GLES31.core.draw_indirect.advanced-twoPasses-transformFeedback-elements

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
2022-10-27 20:13:11 +00:00
Alyssa Rosenzweig
9e2ce225e6 pan/mdg: Fix 64-bit address arithmetic
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
2022-10-27 20:13:11 +00:00
Alyssa Rosenzweig
4a626d9829 pan/bi: Clean up sysval handling a bit
Combine some cases.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
2022-10-27 20:13:11 +00:00
Alyssa Rosenzweig
4b49241f7d panfrost: Use proper formats for pntc varying
The formats of special attributes are supposed to match their architectural
definitions, and point coordinates are architecturally defined as RGBA32F. In
practice this doesn't seem to fix anything.

Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19237>
2022-10-26 01:56:08 +00:00
Alyssa Rosenzweig
21a4dbb720 panfrost: Don't use lower_wpos_pntc on Midgard
gl_PointCoord is implemented via a special attribute descriptor on Midgard. This
descriptor has an orientation bit, the orientation is driver-controlled. That
means we can map rast->sprite_coord_mode to this bit, rather than lowering in
the shader.

This is a bug fix for point sprites, which are implemented natively on Midgard
for dubious reasons and need to be flipped this way. It is also an optimization
for apps reading gl_PointCoord, removing the extra arithmetic to flip, although
the value of this is somewhat dubious.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19237>
2022-10-26 01:56:08 +00:00
David Heidelberg
2b750cacd7 ci/panfrost: re-enable traces on t760
Also make a note, why we don't run gles2 piglit.

Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19211>
2022-10-24 21:24:46 +00:00
David Heidelberg
c6f575f663 ci/panfrost: Humus Portal trace got fixed, update checksum
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19211>
2022-10-24 21:24:46 +00:00
David Heidelberg
9ba7164d2f ci/panfrost: enable piglit-gl on g52 again and deparalelize
The job fits into 15 minutes of runtime, so deparalelize.

Stress-tested.

Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19211>
2022-10-24 21:24:46 +00:00
David Heidelberg
b970e25890 ci/panfrost: deduplicate gitlab-ci.yml
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19211>
2022-10-24 21:24:46 +00:00
Alyssa Rosenzweig
829f769e60 pan/mdg: Fix 16-bit alignment with spiller
The loop over sources has to happen for every instruction, regardless of whether
we also need to register allocate the destination. The other source loops handle
this properly, but this one was missed.

Fixes spilling failure in shaders/android/angle/aztec_ruins/16.shader_test when
the input NIR is shuffled a bit (from reordering passes).

Fixes: 129d390bd8 ("pan/mdg: Fix bound setting in RA for sources")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19093>
2022-10-17 19:11:10 +00:00
Alyssa Rosenzweig
2c446b6636 pan/mdg: Limit work registers for large workgroups
When more than 8 registers are used, Midgard can only fit 64 threads in a
thread group. For barriers to work properly, a threadgroup must fit an entire
work group. The GL driver configures the hardware to have threadgroups the size
of work groups. That means if more than 64 threads are used in a workgroup, and
more than 8 registers are used, the hardware will fault spawning threads.

To workaround this hardware limitation, we need to limit the number of work
registers used depending on the size of the workgroup. Typically, the work group
size is known at compile-time so that determination can usually be made without
variants. To avoid variants, we make a pessimistic estimate in the case when
it's not known at compile-time.

shader-db shows 6 shaders affected. I expect that all of these would fault with
DATA_INVALID_FAULT if they tried to execute before this patch, due to the
oversize local size, and faulting is even slower than spilling ;-)

Fixes dEQP-GLES31.functional.synchronization.* on Mali-T860.

instructions HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 121 -> 157 (29.75%)
instructions HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 121 -> 157 (29.75%)
instructions HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 141 -> 184 (30.50%)
instructions HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 141 -> 184 (30.50%)
instructions HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 513 -> 933 (81.87%)
instructions HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 505 -> 1002 (98.42%)

bundles HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 73 -> 116 (58.90%)
bundles HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 73 -> 116 (58.90%)
bundles HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 61 -> 97 (59.02%)
bundles HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 61 -> 97 (59.02%)
bundles HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 281 -> 701 (149.47%)
bundles HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 278 -> 775 (178.78%)

registers helped:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 11 -> 8 (-27.27%)
registers helped:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 11 -> 8 (-27.27%)
registers helped:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 14 -> 8 (-42.86%)
registers helped:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 14 -> 8 (-42.86%)
registers helped:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 16 -> 8 (-50.00%)
registers helped:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 16 -> 8 (-50.00%)

threads helped:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)

spills HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 0 -> 5
spills HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 0 -> 5
spills HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 0 -> 8
spills HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 0 -> 8
spills HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 0 -> 112
spills HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 0 -> 146

fills HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 0 -> 26
fills HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 0 -> 26
fills HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 0 -> 33
fills HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 0 -> 33
fills HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 0 -> 209
fills HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 0 -> 234

total instructions in shared programs: 1521691 -> 1522766 (0.07%)
instructions in affected programs: 1542 -> 2617 (69.71%)
helped: 0
HURT: 6
HURT stats (abs)   min: 36.0 max: 497.0 x̄: 179.17 x̃: 43
HURT stats (rel)   min: 29.75% max: 98.42% x̄: 50.13% x̃: 30.50%
95% mean confidence interval for instructions value: -49.36 407.69
95% mean confidence interval for instructions %-change: 17.14% 83.12%
Inconclusive result (value mean confidence interval includes 0).

total bundles in shared programs: 649296 -> 650371 (0.17%)
bundles in affected programs: 827 -> 1902 (129.99%)
helped: 0
HURT: 6
HURT stats (abs)   min: 36.0 max: 497.0 x̄: 179.17 x̃: 43
HURT stats (rel)   min: 58.90% max: 178.78% x̄: 94.01% x̃: 59.02%
95% mean confidence interval for bundles value: -49.36 407.69
95% mean confidence interval for bundles %-change: 36.20% 151.83%
Inconclusive result (value mean confidence interval includes 0).

total registers in shared programs: 90681 -> 90647 (-0.04%)
registers in affected programs: 82 -> 48 (-41.46%)
helped: 6
HURT: 0
helped stats (abs) min: 3.0 max: 8.0 x̄: 5.67 x̃: 6
helped stats (rel) min: 27.27% max: 50.00% x̄: 40.04% x̃: 42.86%
95% mean confidence interval for registers value: -8.03 -3.30
95% mean confidence interval for registers %-change: -50.95% -29.13%
Registers are helped.

total threads in shared programs: 55717 -> 55723 (0.01%)
threads in affected programs: 6 -> 12 (100.00%)
helped: 6
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

total spills in shared programs: 1108 -> 1392 (25.63%)
spills in affected programs: 0 -> 284
helped: 0
HURT: 6

total fills in shared programs: 4721 -> 5282 (11.88%)
fills in affected programs: 0 -> 561
helped: 0
HURT: 6

Cc: mesa-stable
Closes: #7228
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19092>
2022-10-17 18:56:13 +00:00
Alyssa Rosenzweig
5c95be85ab panfrost/ci: Remove stale fail
Due to fractional run. This whole section passes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19092>
2022-10-17 18:56:13 +00:00
Daniel Stone
2e774180c6 Revert "panfrost/ci: Disable t720 jobs"
This reverts commit b3a69d1c31.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19113>
2022-10-17 12:13:47 +01:00
Alyssa Rosenzweig
b3a69d1c31 panfrost/ci: Disable t720 jobs
They're dead, Jim!

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19084>
2022-10-15 00:53:22 +00:00
Yonggang Luo
44ccaca41d util/mesa/wide: Rename _SIMPLE_MTX_INITIALIZER_NP to SIMPLE_MTX_INITIALIZER
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18393>
2022-10-14 03:27:41 +00:00
Alyssa Rosenzweig
ab2d5deec2 asahi,panfrost: Remove exact attribute
Not used, although in the future it might be...

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18922>
2022-10-13 18:06:52 -04:00
Alyssa Rosenzweig
a64e38b0aa panfrost,asahi: Remove unused function
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18922>
2022-10-13 18:06:51 -04:00
Alyssa Rosenzweig
0f24c8ef5f panfrost,asahi: Remove unused prepare macro
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18922>
2022-10-13 18:06:51 -04:00
Alyssa Rosenzweig
847361ba07 panfrost: Remove load_kernel_input path
Now the state tracker's responsible to lower away for us (and the state tracker
can do it correctly, our implementation is incorrect with a strict reading of
the Gallium contract).

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18658>
2022-10-05 16:09:21 +00:00
Thomas H.P. Andersen
0dd58bd115 panfrost: avoid warning about unused function
This function is only used if PAN_ARCH >= 5

Fixes a clang warning about unused static inlined functions.

Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18800>
2022-09-25 03:53:15 +00:00
Mike Blumenkrantz
03d7273292 ci: add a panfrost flake
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/28669388

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18714>
2022-09-20 21:18:39 +00:00
Alyssa Rosenzweig
bf8c08a0df pan/bi: Implement unpack_64_2x32
This duplicates the lowering from nir_lower_packing. However, nir_lower_packing
also lowers a pile of other instructions that we do implement natively, and this
is easier than adding a bunch of knobs to nir_lower_packing to get just what we
need.

Fixes test-printf address_space_4.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
2022-09-19 17:22:58 +00:00
Alyssa Rosenzweig
e9b69c2f79 pan/bi: Stub out scoped_barrier
Implement like other workgroup barriers. No subgroup barriers yet, but that
doesn't seem needed yet.

Fixes test_basic.async_copy_global_to_local and a pile of other OpenCL tests.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
2022-09-19 17:22:58 +00:00
Alyssa Rosenzweig
bd8c9442f9 pan/bi: Fix 1D array indexing on Valhall
Array index always goes in the fourth 16-bit component on Valhall. I'm unsure
whether that should also apply to Bifrost. f256ec2a88 ("pan/bi: Fix 1DArray
image coordinate retrieval") says that it should be in the third component on
Bifrost, but I can't remember why that would be the case.

Fixes OpenCL test image_streams.write.1darray on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
2022-09-19 17:22:58 +00:00
Alyssa Rosenzweig
76d6bb4822 pan/bi: Use .auto for image stores
Works around LLVM/SPIR-V stupidity. In effect this means we always use typeless
image stores, which is good enough for both CL and GL.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
2022-09-19 17:22:58 +00:00
Alyssa Rosenzweig
8b6611f4bf pan/bi: Call nir_lower_64bit_phis
Fixes test_basic.local_kernel_scope

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
2022-09-19 17:22:58 +00:00
Alyssa Rosenzweig
1b03a04239 pan/bi: Scalarize phis before the opt loop
Scalarizing phis results in vector constructions (nir_op_vec) of the same size
as the phi, so a wide phi (>128-bit) will result in a wide vector op that the
backend can't handle. These wide vector ops can always be copypropped away, but
that relies on running NIR copy/prop after scalarizing phis, which was not
always happening before. By scalarizing phis before the opt loop instead of
after, we guarantee that copyprop and DCE run to completion and we get
appropriately lowered code in the backend.

Fixes parts of integer_ops.integer_divideAssign with longs.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18656>
2022-09-19 17:22:58 +00:00