Everything required for conformant OpenGL ES 3.1 support on Valhall (v9) is now
upstream -- all that's left is to enable implementations! Add the GPU ID for the
Mali-G57 implemented in the MediaTek MT8192 system-on-chip.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16890>
Like other optimizations, breaking this pass may not affect functional
correctness. It's also dead simple to unit test the pass, so we have no excuse
not to. Add unit tests for the functionality we currently support, since we just
extended it and want to make sure everything still works.
This includes tests for use of modifiers to get more small constants. There are
lots of subtle gotchas there, so let's add lots of unit tests to make sure we
got it right.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
We need to distinguish signed integer instructions from unsigned integer
instructions, to distinguish sign-extension and zero-extension of sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
CSEL executes on the conversion unit (CVT), while MUX executes on the special
function unit (SFU). Throughput on CVT is 4x higher than SFU, so this is
(almost) always an optimization.
The "real" MUX is still used for unusual cases, like 8-bit and bitselect.
Note that it's easier for us to use MUX everywhere for the IR. This is an easy
fixup to get better codegen on Valhall without touching the core Bifrost code.
shader-db is a bit of a toss up: register pressure and instruction count are
hurt in some cases due to restrictions on FAU access. In particular, a shader
that muxes between two uniforms needs an extra move due to extra constant
(zero). However, in terms of throughput this is still a win: 2 CVT instructions
(MOV + CSEL) have 2x throughput to 1 SFU instruction (MUX). The MOV has
opportunities for CSE, but that can hurt pressure in turn. Overall, cycles are
helped substantially.
total instructions in shared programs: 2728438 -> 2731597 (0.12%)
instructions in affected programs: 414391 -> 417550 (0.76%)
helped: 87
HURT: 1063
helped stats (abs) min: 1.0 max: 6.0 x̄: 5.17 x̃: 6
helped stats (rel) min: 0.19% max: 15.79% x̄: 4.12% x̃: 4.11%
HURT stats (abs) min: 1.0 max: 56.0 x̄: 3.40 x̃: 2
HURT stats (rel) min: 0.11% max: 23.43% x̄: 1.15% x̃: 0.63%
95% mean confidence interval for instructions value: 2.47 3.03
95% mean confidence interval for instructions %-change: 0.61% 0.90%
Instructions are HURT.
total cycles in shared programs: 142103 -> 142015.75 (-0.06%)
cycles in affected programs: 1263.45 -> 1176.20 (-6.91%)
helped: 281
HURT: 176
helped stats (abs) min: 0.015625 max: 2.234375 x̄: 0.50 x̃: 0
helped stats (rel) min: 0.71% max: 54.17% x̄: 16.93% x̃: 15.31%
HURT stats (abs) min: 0.015625 max: 30.0 x̄: 0.30 x̃: 0
HURT stats (rel) min: 0.84% max: 120.00% x̄: 7.16% x̃: 5.00%
95% mean confidence interval for cycles value: -0.33 -0.05
95% mean confidence interval for cycles %-change: -9.08% -6.22%
Cycles are helped.
total cvt in shared programs: 13983.34 -> 14891.70 (6.50%)
cvt in affected programs: 7498.36 -> 8406.72 (12.11%)
helped: 71
HURT: 4711
helped stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
helped stats (rel) min: 5.41% max: 40.00% x̄: 10.23% x̃: 9.30%
HURT stats (abs) min: 0.015625 max: 2.640625 x̄: 0.19 x̃: 0
HURT stats (rel) min: 0.18% max: 141.18% x̄: 16.21% x̃: 9.52%
95% mean confidence interval for cvt value: 0.18 0.20
95% mean confidence interval for cvt %-change: 15.21% 16.42%
Cvt are HURT.
total sfu in shared programs: 11320.44 -> 7882.56 (-30.37%)
sfu in affected programs: 7618.50 -> 4180.62 (-45.13%)
helped: 4782
HURT: 0
helped stats (abs) min: 0.0625 max: 10.5625 x̄: 0.72 x̃: 0
helped stats (rel) min: 1.34% max: 100.00% x̄: 41.91% x̃: 37.50%
95% mean confidence interval for sfu value: -0.75 -0.68
95% mean confidence interval for sfu %-change: -42.68% -41.14%
Sfu are helped.
total ls in shared programs: 129660 -> 129690 (0.02%)
ls in affected programs: 25 -> 55 (120.00%)
helped: 0
HURT: 1
total quadwords in shared programs: 1482728 -> 1484128 (0.09%)
quadwords in affected programs: 58624 -> 60024 (2.39%)
helped: 24
HURT: 195
helped stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
helped stats (rel) min: 3.70% max: 20.00% x̄: 10.34% x̃: 10.00%
HURT stats (abs) min: 8.0 max: 24.0 x̄: 8.16 x̃: 8
HURT stats (rel) min: 1.41% max: 50.00% x̄: 4.84% x̃: 2.56%
95% mean confidence interval for quadwords value: 5.70 7.09
95% mean confidence interval for quadwords %-change: 2.22% 4.14%
Quadwords are HURT.
total spills in shared programs: 125 -> 127 (1.60%)
spills in affected programs: 0 -> 2
helped: 0
HURT: 1
total fills in shared programs: 800 -> 828 (3.50%)
fills in affected programs: 0 -> 28
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
It's portable, and useful to both Bifrost and Valhall, in the clause scheduler
and in an instruction selection respectively. Move it from the Bifrost clause
scheduler to common code so we can share the benefits.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
It's unnecessary and breaks the empty shader optimizations. Noticed while
inspecting a trace from dEQP-GLES3.functional.color_clear.masked_scissored_rgb,
which does not produce any varyings other than gl_Position in its vertex shader
and hence should omit the varying shader.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16868>
We now have infrastructure in place to generate variants of vertex shaders
specialized for transform feedback. All that's left is launching these
compute-like kernels before the IDVS job, implementing both the
transform feedback and the regular rasterization pipeline. This implements
transform feedback on Valhall, passing the relevant GLES3.1 tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Wire the Gallium interface for transform feedback up to the system values that
will be fed into our lowering code. This is based on our existing transform
feedback implementation for Midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Translate the intrinsics we introduced to lower away transform feedback into
Panfrost system values which the GL driver can handle.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Add a simple NIR-based implementation of transform feedback, appropriate for
OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders).
Stores to varyings that will be captured are replaced by stores to transform
feedback buffers and some addressing math. This allows implementing the semantic
of transform feedback in a compute-like stage.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
These keep flaking. Icecream95 observes the issue relates to AFBC in the
discussion of the flake in issue 6604. Until the root cause can be identified
and fixed, mark the tests as known flakes for CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16855>
Mali-G31 has the old CLPER instruction, not the new one, which means we don't
get to specify a custom lane op. But the clper_xor helper incorrectly checked
the arch, not the implementation quirk.
Fixes: c00e7b729f ("pan/bi: Optimize abs(derivative)")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16846>
Because we lower SPLIT and COLLECT before RA, we need to consider offsets when
determining the dimensions of vectors, in order to align properly. Lowering
COLLECT post-RA would avoid this special case.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
Speeds up compiling shaders/skia/781.shader_test in shader-db by 8x
(Icecream95).
...At least it did before I extended to support register allocation of vec8. On
Valhall, texture instructions require up to 8 consecutive registers. To handle
this, provide for vec8 register allocation. Liveness was already (accidentally?)
vec8. The increased memory requirement is acceptable given that the interference
matrix is now stored sparsely (Alyssa).
Icecream95 reports the vec8 changes hurt RA performance by about 1% on average.
I consider this acceptable for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
This is an array which can either be sparse or dense, and was designed
to be used to track liveness and interference information.
Either a sparse array with sorted indices or dense array is used.
Other data structures were tried, such as red-black trees or hash
tables, but they were slower. When used for storing constraints, the
indices do not have to be sorted as duplicating elements is okay, but
the speedup from that was not enough to justify the extra complexity.
v2: Add a comment about how to potentially speed it up. But it seems
fast enough even without this change.
v3: Use a custom struct rather than relying on util_dynarray.
v4: Split out functions only used for liveness analysis, rather than the simpler
data structure needed for the register interference matrix. If we need to
optimize liveness, that can follow on after. Also make it for vec8 (Alyssa).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
Triggered a BIR validation error, which made debugging a breeze. That validation
pass (dimensionality checks) gets a lot of use, it seems :-)
Fixes:
dEQP-VK.ssbo.layout.2_level_array.std430.row_major_mat4x2_comp_access_store_cols
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>
This handles VK_REMAINING_* for us, instead of underflowing and clearing no
levels/layers.
Fixes dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>
We need to pack special AFBC-specific plane descriptors instead of the generic
plane descriptor. Nothing too fancy here, though.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
Add the required handling when packing render target and depth buffer
descriptors on Valhall. This is mostly equivalent to Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
Map a canonical format (a hardware-independent pipe_format) to a compression
mode (Valhall-specific hardware enum defined in GenXML). To be used for packing
plane descriptors and render target descriptors when AFBC is in use on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
For callers that have a device object, it's easy to pass dev->arch instead of
dev. But this requires callers to have a reference to the device, which is
tricky for callers that only have the arch via PAN_ARCH. Pass dev->arch instead
of dev to accommodate them.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
..and assert the other sources are null. The one place this might fail in the
future is for real FMA, but we don't support that for GL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
These do not convey any additional information, and fail to account for
shrinking. In particular, a 64-bit writemask with .keephi would fail to
disassemble and instead trip the assertion, since that would be the ZW
components. Just delete the broken code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
Otherwise, we can get vec3 with u2u32 with 64-bit sources which we need lowered.
Since our current approach is "scalarize all 64-bit ops", we need to check for
conversions too.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
Logically at the same part of the compile pipeline as clause scheduling on
Bifrost. Lots of similarities, too. Now that we generate flow control only as a
late pass, various hacks in the compiler are no longer necessary and are
dropped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
Test that we correctly track the scoreboard, helper invocations, reconvergence,
and ends and insert NOPs to effect this expected flow control.
As the pass inserts NOPs but does not otherwise modify the shader, this is easy
to test with well-defined behaviour of the pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
On Bifrost, to terminate helper threads we set the td bit on the clause. On
Valhall, we need to use the .discard flow control. Extend the flow control NOP
insertion to insert NOP.discard where necessary to terminate helper threads.
This should reduce wasted work in fragment shaders.
This requires fairly involved data flow analysis, but the handling here should
be optimal.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
To set flow control modifiers correctly and efficiently, we need a pass that
runs after register allocation and scheduling, but before packing. Add such a
pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
The current helper termination analysis code is hardwired for clauses, so it
won't work for Valhall. However, the bulk of it is dataflow analysis which is
portable between Bifrost and Valhall. Export the interesting bits so we can
reuse them on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>