In the bi_emit_load_attr call site, we can use the imm_index value even
if the function returns false. The bifrost path handles this correctly.
Fixes: 652e1c2e13 ("pan/bi: Rework indices for attributes on Valhall")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37464>
To lower 64bit sources we emit a COLLECT -> SPLIT pair to force
allocation into consecutive registers. When the sources for COLLECT
are outputs of the same SPLIT already, we can omit the COLLECT + SPLIT
pair.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37398>
On Valhall, we can end up with a lot of convertions for 8-bit and 16-bit
values.
However, since Valhall, we have access to a lot more swizzles on widen
sources.
The idea of this pass is to propagate replicate swizzle usages to
simplify things.
We do not attempt to propagate MKVEC.v2i16 as it is already handled by
bi_lower_swizzle.
This changes the following:
9 = V2S8_TO_V2S16 !7.b0
11 = IADD.v2s16 !9.h00, u4
88 = MKVEC.v2i8 11.b0, u256.b0, u256
13 = IMUL.v4i8 !88.b0, 8.b0
14 = V2S8_TO_V2S16 !13.b0
15 = IADD.v2s16 14.h00, !11.h00
89 = MKVEC.v2i8 !15.b0, u256.b0, u256
17 = IMUL.v4i8 !89.b0, !8.b0
Into this:
11 = IADD.v2s16 !7.b0, u4
13 = IMUL.v4i8 11.b0, 8.b0
15 = IADD.v2s16 13.b0, !11.h00
17 = IMUL.v4i8 !15.b0, !8.b0
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37167>
We are going to do more things here that will likely benefit from
possibly running multiple time.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37167>
When spilling registers on Valhall we are careful to leave the TLS
pointer aligned on 16 byte boundaries (so as to avoid accesses
crossing those boundaries). However, within the spill code we don't
need to have 16 byte alignment for spills of 32 or 64 bit values.
In the common case where most spills are 32 bits, we can save nearly
75% of the memory used by just aligning to 32 bit boundaries.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36676>
We were testing some conditions in the wrong order, so spilled
registers were being printed as if they were uniforms. This is
incorrect, but only subtly so, and lead to confusion.
Fixes: 6c64ad934f ("panfrost: spill registers in SSA form")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37092>
The intention of the code was to allow PHI values to be propagated
if they were in registers (as opposed to in memory). As written though
values were never propagated. I think this typo was due to some
debug code that wasn't removed properly.
Fixes: 6c64ad934f ("panfrost: spill registers in SSA form")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37092>
On v11+, all small integers instruction variants are gone, however we
can now use widen on src0 just fine.
That mean we can get ride of mid conversion by relying on swizzle
instead while respecting signess of the inner instruction.
This helps a little bit on clpeak with panvk+clvk, shader-db is also
happy:
Totals:
Instrs: 109541 -> 109354 (-0.17%)
CodeSize: 1110528 -> 1108864 (-0.15%)
Estimated normalized CVT cycles: 667.609375 -> 664.5625 (-0.46%)
Totals from 17 (2.12% of 803) affected shaders:
Instrs: 13637 -> 13450 (-1.37%)
CodeSize: 112256 -> 110592 (-1.48%)
Estimated normalized CVT cycles: 100.203125 -> 97.15625 (-3.04%)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37125>
nir_opt_if was unable to optimize some ifs later on so let's get ride of
them as soon as we generated them for simplicity.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37030>
Instead of allocating constants to the FAU entries on a
first-come-first-serve basis, it would be more efficient to put the most
frequently used constants in the FAU so we save the most amount of ADD_IMM
to push constants into registers.
This commit does so using a simple pass before the main constant lowering
pass.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36872>
Following on the previous commit, this commit adds support for selecting
and reporting constants to pull from the FAU instead of loading them
into registers first with ADD_IMM. This is beneficial because we can then
use them as a source directly and save ourselves one instruction to move
them into the register first.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36872>
The branch offset needs to fit in 8 bits, and with the shr(3) modifier,
this means the max legal value is 2040. Let's verify that while packing.
CID: 1503283
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
We already have a more robust helper for this, so let's use it rather
than open-coding the same.
While we're at it, return early on error for readability here. There's
no need to continue the logic in those cases.
CID: 1444074
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
Allocations can fail, and since this is an optimization pass, let's just
skip the pass and let some other code deal with the OOM situation.
Fixes: 800a861431 ("pan/bi: Fuse FCMP/ICMP on Valhall")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
We can do some movement for UBO and SSBO after they are lowered in
preprocess.
We already do this in postprocess but this now also catch SSBOs as they
are lowered in postprocess.
Overall, reduce fills (less load from TLS) in fossils (excluding
parallel-rdp as it crash still):
Totals:
Instrs: 115242 -> 115046 (-0.17%); split: -0.20%, +0.03%
CodeSize: 1168896 -> 1164928 (-0.34%); split: -0.35%, +0.01%
Estimated normalized CVT cycles: 762.015625 -> 757.109375 (-0.64%); split: -0.75%, +0.11%
Estimated normalized Load/Store cycles: 12693.0 -> 12680.0 (-0.10%); split: -0.11%, +0.01%
Number of spill instructions: 358 -> 359 (+0.28%)
Number of fill instructions: 1600 -> 1584 (-1.00%)
Totals from 127 (15.82% of 803) affected shaders:
Instrs: 31753 -> 31557 (-0.62%); split: -0.73%, +0.12%
CodeSize: 335104 -> 331136 (-1.18%); split: -1.22%, +0.04%
Estimated normalized CVT cycles: 205.546875 -> 200.640625 (-2.39%); split: -2.78%, +0.40%
Estimated normalized Load/Store cycles: 3935.0 -> 3922.0 (-0.33%); split: -0.36%, +0.03%
Number of spill instructions: 124 -> 125 (+0.81%)
Number of fill instructions: 452 -> 436 (-3.54%)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
This should have no effect apart cleaning up NIR_DEBUG print outputs a
bit.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
We now have lower_texture_early and lower_texture.
lower_texture_early handle nir_lower_tex and (in the future) could handle
anything that is backend specific that need to happen before nir_lower_io.
lower_texture handles actual lowering of backend specific things that
must happen after nir_lower_tex and nir_lower_io.
This allows us to finally not run nir_lower_tex two times in panvk.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
Moving it out of there will allow us to shuffle and move API specific parts
out of there.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
As we are going to move texture and IO lowering, this split preprocess
functions in two, one handling preprocess the other postprocess.
The split is done right before lower_io and has no functional change for
now.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
Unused outside of pan/bi and also remove orphan bifrost_nir_lower_xfb
declaration.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
This should only run on frag shaders, let's group it the same way we
have it in midgard compiler.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
This reorder things a bit, ensure we attempt more agressive vectorisation,
attempt to optimize cf and more.
Inspiration from NAK's optimize_nir function.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
We can end up with conversion instructions to the same type of integer
with nir_lower_bool_to_bitsize so let's make
bifrost_nir_lower_algebraic_late handle those cases.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
We can benifit from it, let's allow it when we aren't forced to follow
robustness2.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
Embrace modernity and consistency between alu and vectorization passes.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
That pass tries to remove blocks that are not needed:
blocks with only 1 predecessor and 1 successor containing no
instruction or only 1 branch instruction.
Also remove unnecessary branch at the end of block with only 1
successor which is the next block
Run this pass at the end of the compiler flow once all optimisations
have been applied.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36021>
Use bi_make_vec_to to allow to use only 1 MKVEC.v2i8 when possible.
Also add support for all swizzles instead of only mono-byte ones,
using bi_swizzle_to_byte_channels.
Update assert in bi_byte.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35643>
When making a vector of 4 elements, try to make it with only 1
instruction instead of 2 at the moment, if the last 2 elements respect
the pattern supported by MKVEC.v2i8
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35643>
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36516>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
We were not supporting non replicate swizzle and this trigger an
assertion on fossils/parallel-rdp/small_subgroup.foz.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 1481b14fcb ("pan/bi: Lower SWZ.v4i8 to multiple MKVEC.v2i8 on v11+")
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36349>
We have a lot of pattern like this on Valhall:
FCMP_OR.f32.lt.m1 r28, ^r28, r27, 0x0
FCMP_OR.f32.lt.m1 r29, r27, r25, 0x0
LSHIFT_AND.i32 r28, ^r28, 0x0.b00, ^r29
That can be simplified into:
FCMP_OR.f32.lt.m1 r29, r27, r25, 0x0
FCMP_AND.f32.lt.m1 r28, ^r28, r27, ^r29
This pass merge those specific cases while setting the appropriate
logical variant of the CMP instruction.
We do not try to merge the srcs that do not originate from a matching CMP
instruction with matching result type as the logical operation is
performed before the result type transformation.
Now this is enough to optimise a lot of common cases anyway so it is
still a win.
Results on fossils/sascha-willems:
Totals:
Instrs: 42157 -> 42059 (-0.23%)
CodeSize: 582784 -> 581760 (-0.18%)
Estimated normalized SFU cycles: 159.9375 -> 153.75 (-3.87%)
Totals from 13 (2.07% of 627) affected shaders:
Instrs: 3490 -> 3392 (-2.81%)
CodeSize: 29696 -> 28672 (-3.45%)
Estimated normalized SFU cycles: 15.8125 -> 9.625 (-39.13%)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36327>
We were allocating a fixed number of temporary registers; this isn't
always enough, and in fact we should have calculated the number of
temporaries required.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 6c64ad934f ("panfrost: spill registers in SSA form")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36135>
Adds support for 64-bit atomic operations for KHR_shader_atomic_int64
using 64-bit atomic instructions. Valhall is working but Bifrost will
require some more work to implement as it requires two instructions to
execute a 64-bit atomic.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35789>
Add 64-bit atomic instructions for bifrost/valhall
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35789>
Previously this was allowing invalid forms like
"CLPER.i32.subgroup8.zero lane-id, src1" to reach bi_pack.
This fixes the assert that can be seen with
"dEQP-VK.glsl.derivate.dfdxsubgroup.*" but doesn't fix failures.
Fixes: 0acc6b564e ("pan/bi: Rework FAU lowering")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36006>
Previously we were allowing passthrough to temps without using
bi_reads_temps.
This was causing instructions like CLPER to create undefined encodings.
We now check if the instruction support temps.
Fixes: 4252fb84f4 ("pan/bi: Add passthrough register rewriting helper")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36006>