NIR has two implementations of lower_idiv, keyed on the imprecise_32bit_lowering
flag. This flag is misleading: the results when setting this flag "imprecise",
they're completely wrong for some values. Non-constant highp integer division is
rare enough this shouldn't noticeably harm performance of real applications.
Address r600 portion of #6555.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18154>
When unpacking the texture sample result ensure it is moved to the
proper expected dest register.
This fixes incorrect texturing in Chromium using PixiJS framework.
CC: mesa-stable
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18122>
This way, blend info that's left zeroed out defaults to disabled. Fixes an
assertion failure in KHR-GLES31.core.draw_buffers_indexed.color_masks due to
the blend CSO max_rt being set unreliably. This is also probably clearer.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
This is silly. They'll be masked out later, but it's still pointless, and adds
noise when grepping BIFROST_MESA_DEBUG=shaders,internal for "pan_blend" when
debugging why an app is using blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
On Midgard, we need a "blend" shader even if blending is disabled, if the format
isn't blendable. This is inefficient. Bifrost solves this by decoupling the
format conversion from the blending, allowing opaque (unblended) output to any
format without a blend shader or fragment key.
Unfortunately, our blend code is from the Midgard era -- I wrote an early
version of nir_lower_blend when I was still in high school! -- so we've been
using blend shaders for opaque output even on Bifrost. Whoops!
In SuperTuxKart, reduces blend shader calls by 30%, translating to a 15%
reduction in i-cache misses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
For untyped_color_outputs, we need to ignore the type of the colour output in
the shader and instead use the type from the format. We have all the information
to do this at blend descriptor pack time, but not at shader compile time. This
means we need a (somewhat expensive) fixup in this edge case to ingest
NIR-to-TGSI. This will prevent a regression from the rest of the series.
Although the register_format field is also present on Valhall blend descriptors,
it is ignored so we don't need the fixup there.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17841>
Shader-db stats with RV530:
total instructions in shared programs: 166499 -> 164362 (-1.28%)
instructions in affected programs: 80056 -> 77919 (-2.67%)
total temps in shared programs: 21658 -> 21565 (-0.43%)
temps in affected programs: 1780 -> 1687 (-5.22%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
Shader-db stats with RV530:
total instructions in shared programs: 169509 -> 166013 (-2.06%)
instructions in affected programs: 99126 -> 95630 (-3.53%)
total presub in shared programs: 10975 -> 10758 (-1.98%)
presub in affected programs: 744 -> 527 (-29.17%)
total temps in shared programs: 21722 -> 21649 (-0.34%)
temps in affected programs: 1350 -> 1277 (-5.41%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
Skip the merge if one of the instructions writes just w channel
and we are compiling a fragment shader. We can pair-schedule it together
later anyway and it will also give the scheduler a bit more flexibility.
Shader-db stats with RV530:
total instructions in shared programs: 169522 -> 169509 (<.01%)
instructions in affected programs: 14170 -> 14157 (-0.09%)
total temps in shared programs: 21712 -> 21722 (0.05%)
temps in affected programs: 324 -> 334 (3.09%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
To allow a simple extension with more merging combinations in the
later commits.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
Constant folding first, than copy propagate simple movs, after that
we run the merge movs pass and finally peephole. The important part
is to do the copy propagate for the whole program before running merge
movs.
We no longer check the return from merge_movs so convert it to void.
Shader-db changes with RV530:
total instructions in shared programs: 155361 -> 154787 (-0.37%)
instructions in affected programs: 67920 -> 67346 (-0.85%)
total temps in shared programs: 20836 -> 20773 (-0.30%)
temps in affected programs: 711 -> 648 (-8.86%)
total presub in shared programs: 8226 -> 8202 (-0.29%)
presub in affected programs: 223 -> 199 (-10.76%)
total temps in shared programs: 20836 -> 20773 (-0.30%)
temps in affected programs: 711 -> 648 (-8.86%)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
The main problem here is we can have a negate bit set for an unused
channel, so we can't just OR together the negates when channel merging.
Right now the bug is hidden because how we run the pass order, but
that will change in a later commit. Add some helpers for merging of the
negates, they will be also used more in a later commits. As a bonus
construct the new source separatelly and only rewrite the original
instructions after checking that the final swizzle is valid.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
This will prevent a regression in the number of inlined constants in
a later commit. Constructs like 4.000000 (0x48).w110 works just fine.
There is a small behavioral change. We would previously allow positive
and negative same-value contants to be produced, e.g.,
4.000000 (0x48).w-w__ and this would be later split into some extra
movs in the dataflow swizzle pass. We now explicitly check that the
final swizzle is valid while inlining. So there is a minor decrease
in inlined constants and in the total instructions.
total lits in shared programs: 4328 -> 4194 (-3.10%)
lits in affected programs: 554 -> 420 (-24.19%)
total instructions in shared programs: 155488 -> 155361 (-0.08%)
instructions in affected programs: 5707 -> 5580 (-2.23%)
Additonally, a fix for pair translation is needed since the constant
inlining can now produce swizzles like this: 4.000000 (0x48).w-0-0-_
so we have to teach pair translation to also ignore the sign for zero
swizzle.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17560>
if the final hash was the same as the last-used hash for this program,
it should be safe to reuse the last pipeline object in the presence of
dynamic vertex input since the number of permutations is low
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18135>
non-generated tcs has unique mechanics in that it doesn't have a base shader key,
so split that out to avoid unnecessary complexity
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18135>
these values don't need to be checked at all if dynamic vertex is enabled,
which wouldn't previously have been possible without adding even more
data to check to the pipeline state
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18135>
this is one of the hottest paths in the driver, and having to load
a function variant with all the extra dynamic state paths is not optimal
when they can never be used
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18135>