We don't need blending in the blitter. That means blend shaders are only needed
on Midgard. Simplify accordingly.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
This way, blend info that's left zeroed out defaults to disabled. Fixes an
assertion failure in KHR-GLES31.core.draw_buffers_indexed.color_masks due to
the blend CSO max_rt being set unreliably. This is also probably clearer.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
This is silly. They'll be masked out later, but it's still pointless, and adds
noise when grepping BIFROST_MESA_DEBUG=shaders,internal for "pan_blend" when
debugging why an app is using blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
On Midgard, we need a "blend" shader even if blending is disabled, if the format
isn't blendable. This is inefficient. Bifrost solves this by decoupling the
format conversion from the blending, allowing opaque (unblended) output to any
format without a blend shader or fragment key.
Unfortunately, our blend code is from the Midgard era -- I wrote an early
version of nir_lower_blend when I was still in high school! -- so we've been
using blend shaders for opaque output even on Bifrost. Whoops!
In SuperTuxKart, reduces blend shader calls by 30%, translating to a 15%
reduction in i-cache misses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
..on Bifrost and later, where the conversion hardware makes this reasonable.
This saves us from inserting a pile of conversions in the compiler to lower away
the 8-bit input/output. This also generates substantially better code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
It's unnecessary since the hardware already does the conversion for us.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
The type of the output variable will propagate through the store_output
intrinsic's src_type field to the BLEND instruction's register format field. On
Valhall, the register format for a BLEND comes from the instruction -- the
register format specified in the conversion descriptor (used on Bifrost) is
ignored. That means it has to match.
Previously, we always used a blend shader for integer rendering. Since blend
shaders ignore the register format of the BLEND instruction, that masked this
issue. That also means we don't need to backport this.
Will prevent a regression from the following commit.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
For untyped_color_outputs, we need to ignore the type of the colour output in
the shader and instead use the type from the format. We have all the information
to do this at blend descriptor pack time, but not at shader compile time. This
means we need a (somewhat expensive) fixup in this edge case to ingest
NIR-to-TGSI. This will prevent a regression from the rest of the series.
Although the register_format field is also present on Valhall blend descriptors,
it is ignored so we don't need the fixup there.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Colour outputs in TGSI are untyped so we have to ignore the register format on
the store_output. Luckily, Valhall gives us the auto32 escape hatch to deal with
it, and Bifrost doesn't care anyway. This will avoid regressions from native int
output without corrupting the compiler for GLSL and SPIR-V shaders that are
well-typed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Clause scheduler edition of db2bdc1dc3 ("pan/bi: Require ATEST coverage mask
input in R60"). ATEST wants to read r60, which can't work if its input isn't
even in a register.
When per-sample shading isn't in use, prevents regressions in:
KHR-GLES31.core.sample_variables.mask.*
These tests previously passed because per-sample shading was forced. It's
not clear whether the bug addressed in this patch is possible to hit "in the
wild", i.e. without the optimizations in this series that allow us to use
per-pixel shading in more cases.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Fixes flaking in
dEQP-GLES31.functional.image_load_store.cube.qualifiers.volatile_r32i due to
image reads being moved past a BARRIER.
To make this more robust/optimal, we probably need scheduling information
(coherent/volatile/etc) added to instructions like ACO does. That's left for a
future extension, for now I just want the test to stop flaking.
Fixes: 569e5dc745 ("pan/bi: Schedule for pressure pre-RA")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Unnecessary rename that breaks forward compatibility... but Apple says
this is just NULL. Do the simpler thing. Note that the argument is a
mach_port_t, which is a natural_t == uint32_t in userspace... even
though it's a pointer in the kernel. Although Apple's docs claim that
kIOMasterPortDefault is NULL, it's really just 0.
../src/asahi/lib/agx_device.c:290:35: warning: 'kIOMasterPortDefault' is deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations]
IOServiceGetMatchingService(kIOMasterPortDefault, matching);
^~~~~~~~~~~~~~~~~~~~
kIOMainPortDefault
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18121>
Shader-db stats with RV530:
total instructions in shared programs: 166499 -> 164362 (-1.28%)
instructions in affected programs: 80056 -> 77919 (-2.67%)
total temps in shared programs: 21658 -> 21565 (-0.43%)
temps in affected programs: 1780 -> 1687 (-5.22%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
Shader-db stats with RV530:
total instructions in shared programs: 169509 -> 166013 (-2.06%)
instructions in affected programs: 99126 -> 95630 (-3.53%)
total presub in shared programs: 10975 -> 10758 (-1.98%)
presub in affected programs: 744 -> 527 (-29.17%)
total temps in shared programs: 21722 -> 21649 (-0.34%)
temps in affected programs: 1350 -> 1277 (-5.41%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
Skip the merge if one of the instructions writes just w channel
and we are compiling a fragment shader. We can pair-schedule it together
later anyway and it will also give the scheduler a bit more flexibility.
Shader-db stats with RV530:
total instructions in shared programs: 169522 -> 169509 (<.01%)
instructions in affected programs: 14170 -> 14157 (-0.09%)
total temps in shared programs: 21712 -> 21722 (0.05%)
temps in affected programs: 324 -> 334 (3.09%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
To allow a simple extension with more merging combinations in the
later commits.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
Constant folding first, than copy propagate simple movs, after that
we run the merge movs pass and finally peephole. The important part
is to do the copy propagate for the whole program before running merge
movs.
We no longer check the return from merge_movs so convert it to void.
Shader-db changes with RV530:
total instructions in shared programs: 155361 -> 154787 (-0.37%)
instructions in affected programs: 67920 -> 67346 (-0.85%)
total temps in shared programs: 20836 -> 20773 (-0.30%)
temps in affected programs: 711 -> 648 (-8.86%)
total presub in shared programs: 8226 -> 8202 (-0.29%)
presub in affected programs: 223 -> 199 (-10.76%)
total temps in shared programs: 20836 -> 20773 (-0.30%)
temps in affected programs: 711 -> 648 (-8.86%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
The main problem here is we can have a negate bit set for an unused
channel, so we can't just OR together the negates when channel merging.
Right now the bug is hidden because how we run the pass order, but
that will change in a later commit. Add some helpers for merging of the
negates, they will be also used more in a later commits. As a bonus
construct the new source separatelly and only rewrite the original
instructions after checking that the final swizzle is valid.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
This will prevent a regression in the number of inlined constants in
a later commit. Constructs like 4.000000 (0x48).w110 works just fine.
There is a small behavioral change. We would previously allow positive
and negative same-value contants to be produced, e.g.,
4.000000 (0x48).w-w__ and this would be later split into some extra
movs in the dataflow swizzle pass. We now explicitly check that the
final swizzle is valid while inlining. So there is a minor decrease
in inlined constants and in the total instructions.
total lits in shared programs: 4328 -> 4194 (-3.10%)
lits in affected programs: 554 -> 420 (-24.19%)
total instructions in shared programs: 155488 -> 155361 (-0.08%)
instructions in affected programs: 5707 -> 5580 (-2.23%)
Additonally, a fix for pair translation is needed since the constant
inlining can now produce swizzles like this: 4.000000 (0x48).w-0-0-_
so we have to teach pair translation to also ignore the sign for zero
swizzle.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
When SCC is clobbered between s_cmp and its operand's writer,
the current optimization that eliminates s_cmp won't kick in.
However, when s_cmp is the only user of its operand temporary,
it is possible to "pull down" the instruction that wrote the operand.
Fossil DB stats on Navi 21:
Totals from 63302 (46.92% of 134906) affected shaders:
CodeSize: 176689272 -> 176418332 (-0.15%)
Instrs: 33552237 -> 33484502 (-0.20%)
Latency: 205847485 -> 205816205 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 34321285 -> 34319908 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16266>
This was simply left out by accident when I wrote this.
Fossil DB stats on Navi 21:
Totals from 70165 (52.01% of 134906) affected shaders:
CodeSize: 246375656 -> 245814396 (-0.23%)
Instrs: 46519773 -> 46379458 (-0.30%)
Latency: 385159303 -> 385089261 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 66490172 -> 66487867 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16266>
Also delete them when they are already dead in process_instruction().
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16266>
if the final hash was the same as the last-used hash for this program,
it should be safe to reuse the last pipeline object in the presence of
dynamic vertex input since the number of permutations is low
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18135>