This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
Instead of lowering it in NIR, use the lookup tables as inputs to a
second-order Taylor expansion. shader-db results aren't amazing but keep
in mind this is without backend CSE yet.
total instructions in shared programs: 115913 -> 115707 (-0.18%)
instructions in affected programs: 3151 -> 2945 (-6.54%)
helped: 12
HURT: 0
Instructions are helped.
total nops in shared programs: 84045 -> 84041 (<.01%)
nops in affected programs: 1571 -> 1567 (-0.25%)
helped: 1
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total clauses in shared programs: 20498 -> 20489 (-0.04%)
clauses in affected programs: 188 -> 179 (-4.79%)
helped: 6
HURT: 0
Clauses are helped.
total quadwords in shared programs: 90395 -> 90291 (-0.12%)
quadwords in affected programs: 2287 -> 2183 (-4.55%)
helped: 12
HURT: 0
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
Useful for representing -0 in transcendental sequences matching the
blob.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
Midgard writeout arguments need to be written to in the same bundle the
writeout happens. Both csel, csel_v and their float variants also
require their conditional to be performed on the same bundle.
This patch prevents scheduling csel the same bundle as a writeout,
fixing the scheduling issue.
But... there's still room for optimizations since in some cases it might
be possible to fit all these instructions in the same bundle.
No shader-db changes.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9340>
We need to iterate the whole row, we can't be clever and only look at
one side, the symmetry doesn't work like that. See the original paper.
total instructions in shared programs: 69392 -> 69322 (-0.10%)
instructions in affected programs: 9002 -> 8932 (-0.78%)
helped: 82
HURT: 28
Instructions are helped.
total bundles in shared programs: 32225 -> 32155 (-0.22%)
bundles in affected programs: 4286 -> 4216 (-1.63%)
helped: 82
HURT: 28
Bundles are helped.
total quadwords in shared programs: 56102 -> 56102 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 4552 -> 4572 (0.44%)
registers in affected programs: 298 -> 318 (6.71%)
helped: 18
HURT: 38
Registers are HURT.
total threads in shared programs: 3772 -> 3775 (0.08%)
threads in affected programs: 84 -> 87 (3.57%)
helped: 15
HURT: 14
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 0 -> 0
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 0 -> 0
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Fixes: 66ad64d73d ("pan/midgard: Implement linearly-constrained register allocation")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9338>
Per discussion with Daniel Schürmann on IRC about the joys of SSA form
and why you don't actually need use-def chains. Indeed, I didn't. No
shader-db changes, time difference in shader-db is neglible since the
win from this is particularly for large shaders.
Total runtime of
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std430_instance_array
reduced from 1.04s to 0.77s (25%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
If there are no destinations, don't produce a _to version, and let the
bare version return the bi_instr.
If there are multiple destinations, take each in the _to version and
don't produce a bare version.
Both cares are probably what you wanted anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
Assuming it's only the first destination breaks assumptions across the
compiler. Add a destination source and fix up the many corresponding
issues. Nothing to backport as far as I understand since multidest
instruction are new.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
Only used in one place (and should never be used elsewhere -- even this
use is questionable). By inlining we avoid O(N^2) behaviour on the
number of sources in liveness updates.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
Fixes excessive (and failing) spilling in dEQP-GLES31.functional.ssbo.layout.*
shader-db results are a toss up (I suspect we'd see better results if we
tracked register pressure directly):
total instructions in shared programs: 161377 -> 161377 (0.00%)
total nops in shared programs: 121159 -> 121203 (0.04%)
nops in affected programs: 1839 -> 1883 (2.39%)
Nops are HURT.
total clauses in shared programs: 31604 -> 31606 (<.01%)
clauses in affected programs: 38 -> 40 (5.26%)
Inconclusive result (value mean confidence interval includes 0).
total quadwords in shared programs: 130847 -> 130845 (<.01%)
quadwords in affected programs: 1246 -> 1244 (-0.16%)
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 18 -> 18 (0.00%)
total spills in shared programs: 705 -> 705 (0.00%)
total fills in shared programs: 1645 -> 1645 (0.00%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
dEQP-GLES31.functional.texture.multisample.samples_16.sample_position is
failing on Bifrost, and we already weren't advertising on Midgard. Until
someone gets spare cycles to debug all the problems, keep it hidden
behind a debug flag to avoid introducing bugs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
At least this way failed RA will crash (by having no spill node to pick)
instead? Seen in
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.21 on
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9164>
How many times can I break such a small pass?
Fixes: a805d999c0 ("pan/bi: Fix jumps to terminal block again")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9137>
Here's another edge case: there could be instructions in the last block
after NIR->BIR but they could be optimized out by backend DCE, causing
the block to become a terminal block.
Noticed while toying with geometry shaders.
Fixes: a805d999c0 ("pan/bi: Fix jumps to terminal block again")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9137>
Otherwise we get a bad RA if RT 0 = RT 3 (for example), fixes
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.fragment.sampler2d
Fixes: a6f1500bed ("pan/bi: Workaround BLEND precolour with explicit moves")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
Passes the relevant tests of
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.*, a few
failures that seem to relate to MRT instead of this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
Used in the LD_VAR_IMM. Wondering if preload requirements shouldn't
instead be pushed from the compiler based on actual usage instead of
guessing from the NIR...
Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.sample_qualifier.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
Used to order fragments. With that clarified it's clear that we need to
wait on slot 7 for LD_TILE too (outside the limited context of a blend
shader).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
Needs to support non-blend shader operation (conversion descriptor
sourced from a sysval), as well as MRT. Fixes fbfetch on Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
Don't hardcode the RT to 0. Affects ES3.0 which already exposes MRT --
despite no dEQP coverage of this particular corner case, apps could hit
this in the wild on 21.0. Fixes
dEQP-GLES31.functional.draw_buffers_indexed.overwrite_indexed.common_blend_func_buffer_blend_func
Fixes: c7e1ef7c0c ("panfrost: Advertise ES3.0 on Bifrost")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
We want to be able to set a descriptor table and have the instruction
pair "magically" come to be. To do so, we adjust the definition of
DTSEL_IMM (deviating a bit from the architectural definition but in
practice simplifying disassembly immensely) and add a scheduler
lowering. This ensures DTSEL is always paired correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>
All the same formula: calculate an address, emit a pseudoinstruction for
the atomic, emit a postprocess that can be DCE'd if not needed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9105>