The header size is the header stride times the number of rows in the header
(number of tiles of superblocks). We already calculate the header stride, so
eliminate the separate header size calculation.
Delete the old header size calculation. It has no notion of wide blocks, let
alone tiled AFBC headers.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
Extract a helper for calculating AFBC strides. This is used in two places in
pan_layout. It will need extension for tiled AFBC, and the extended version
could benefit from unit testing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
Let's keep all the AFBC computations inside the layout code, to keep pan_cs
dumb. This helper will need some extension for tiled AFBC.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
Otherwise we can fail to allocate tied operands if we spill the tied operand.
Seen in shaders/android/com.miHoYo.GenshinImpact/16.shader_test with a
particularly bad scheduling causing excessive spilling.
No shader-db changes.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16378>
It's meaningful for this intrinsic and so does not add noise to the
lowering pass.
(Although dual-source writes must be to RT 0, depth and stencil
writes, which store_combined_output_pan is also used for, can still be
done with MRT enabled.)
Fixes: 5c168f09eb ("nir: Eliminate store_combined_output_pan BASE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
The struct is returned from a function, so in debug builds the address
may change after returning, and pointers to patched_s will be broken.
Pass the pointer to the patched stencil view as a parameter to
pan_preload_get_views to avoid this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16343>
Now that our IR is much more strongly typed, and RA code quality depends on
correct typing, add a validation pass to make sure we didn't screw it up. This
pass found a massive number of bugs in early versions of this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
We tightened the rules around preloading substantially and take advantage of the
rules in RA. The safe helpers it introduced should ensure the rules are
followed, but just in case, add a validation pass to check our work. This pass
found (multiple) bugs in early versions of this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
In the current IR, any register may be preloaded by reading it anywhere, and any
register may be precoloured by writing it anywhere. This is convenient for
instruction selection, but requires the register allocator to do considerable
gymnastics to ensure it doesn't clobber precoloured registers. It also breaks
the purity of our SSA representation, which complicates optimization passes
(e.g. copyprop).
Let's trade some instruction selection complexity for simplifying register
allocation by constraining how register precolouring works. Under the new model:
* Registers may only be preloaded at the start of the program.
* Precoloured destinations are handled explicitly by RA.
Internally, a stronger invariant is placed for preloading: registers may only be
preloaded by MOV.i32 instructions at the beginning of the block, and these moves
must be unique. These invariants ensure RA can trivially coalesce the moves.
A bi_preload helper is added as a safe version of bi_register respecting these
invariants, allowing a smooth transition for instruction selection.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
Move can take in a vector and write a scalar, depending on the swizzle. We need
to handle this case. Split out mov and pack_32_2x16 so we can specify correct
behaviour for both. Also drop unused 1-bit boolean stuff which obscured the fix.
Fixes: 76cea8e27b ("panfrost: Fix pack_32_2x16 implementation")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
These move-like instructions will be generated during instruction selection and
lowered before/after register allocation.
These need special printer support until we get dynamic sources/destinations.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
In preparation for dynamic allocation, as needed for phi nodes and parallel
copies. For now, it just serves to simplify the semantics of splits and
collects.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
"Revisiting Out-of-SSA Translation for Correctness, Code Quality, and
Efficiency" discusses "value-based interference": two variables interfere if and
only if there exists a point in the program where they are both live *with
different values*. In particular, the source and destination of a move do not
interfere a priori, because they have the same value at that point in the
program. (If a later instruction overwrites one, the required interference will
be added there).
We can use this idea to avoid some extra interferences, avoiding a regression in
moves from split/collect.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
If we don't lower phis to scalar, when we go out of SSA, we can get vector
nir_registers. In particular, we can get code like:
r0 = vec2 r0.y, r0.x
This code looks like a move, but is in fact a swap. The trivial lowering of vec2
would not work -- the following fails to swap correctly:
r0.x = r0.y
r0.y = r0.x
Currently, we generate temporaries to handle these cases. It's easy to move the
complexity to NIR, though, and we'll want to scalarize phis for SSA-based RA
anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
Minor ISA detail missed in the Bifrost scheduler. I hit this in an early version
of this series (where a move feeding into a blend shader return was not
coalesced). Let's get it fixed in the scheduler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
Texture instructions on Valhall take 64-bit sources. Now that we have
infrastructure to handle this properly, we don't need to use a non-SSA node to
hack around the optimization.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>