It turns out that mova executes on the normal pipeline, which means that
users of a0.x on the scalar pipeline might cause a WAR hazard with mova.
Fixes: 876c5396a7 ("ir3: Add support for "scalar ALU"")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
fullsync would only sync after cat4/5/6 instructions. However, since the
introduction of scalar ALU, we also need to sync after writes to shared
registers. This commit fixes this by using the is_ss/sy_producer
helpers. This should also catch all cases where (ss) is need for WAR
hazards.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
These can be repeated like other instructions with one interesting
wrinkle: their immediate input location can also be repeated and its
value gets incremented by one for every repeat. They seem to be the only
instructions to support this.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Some store intrinsics (e.g., store_const_ir3) don't have a wrmask so
don't assume it always exists.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Some load/store intrinsics (e.g., load/store_const_ir3) use offsets in
units other than bytes. Currently, byte offsets were assumed in multiple
places.
This patch adds a new offset_scale field to intrinsic_info and uses it
were needed.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Phi nodes are mostly handled the same way as ALU instructions: if all
sources point to the same def (which happens if they are scalar or have
been previously vectorized), combine them into a single vectorized phi
node.
There is one case where this doesn't work, however: sources that come
from a loop back-edge. Since their defs haven't been processed yet, they
are generally not the same. We could simply refuse to vectorize such
phi nodes but this could leave many values used in loops unnecessarily
scalarized.
Instead, this patch implements a simple heuristic: if all defs coming
from a back-edge have the same instructions type and, in case of ALU,
the same operation, assume they will be vectorized later. Since we
require that normal edges are vectorized already, chances are that the
back-edge can also be vectorized.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
To handle phi nodes, it's important that all sources have been processed
before processing the phi node itself. The current traversal order
(depth-first on dom_children) does not guarantee this. This patch
rewrites the pass to visit blocks in source-code order.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Dispatch to different functions inside instr_try_combine. To prepare for
upcoming support for phi nodes.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Use the new repeat group builders to lower vectorized NIR instructions.
Add NIR pass to vectorize NIR before lowering.
Support for repeated instruction is added over a number of different
commits. Here's how they all tie together:
ir3 is a scalar architecture and as such most instructions cannot be
vectorized. However, many instructions support the (rptN) modifier that
allows us to mimic vector instructions. Whenever an instruction has the
(rptN) modifier set it will execute N more times, incrementing its
destination register for each repetition. Additionally, source registers
with the (r) flag set will also be incremented.
For example:
(rpt1)add.f r0.x, (r)r1.x, r2.x
is the same as:
add.f r0.x, r1.x, r2.x
add.f r0.y, r1.y, r2.x
The main benefit of using repeated instructions is a reduction in code
size. Since every iteration is still executed as a scalar instruction,
there's no direct benefit in terms of runtime. The only exception seems
to be for 3-source instructions pre-a7xx: if one of the sources is
constant (i.e., without the (r) flag), a repeated instruction executes
faster than the equivalent expanded sequence. Presumably, this is
because the ALU only has 2 register read ports. I have not been able to
measure this difference on a7xx though.
Support for repeated instructions consists of two parts. First, we need
to make sure NIR is (mostly) vectorized when translating to ir3. I have
not been able to find a way to keep NIR vectorized all the way and still
generate decent code. Therefore, I have taken the approach of
vectorizing the (scalarized) NIR right before translating it to ir3.
Secondly, ir3 needs to be adapted to ingest vectorized NIR and translate
it to repeated instructions. To this end, I have introduced the concept
of "repeat groups" to ir3. A repeat group is a group of instructions
that were produced from a vectorized NIR operation and linked together.
They are, however, still separate scalar instructions until quite late.
More concretely:
1. Instruction emission: for every vectorized NIR operation, emit
separate scalar instructions for its components and link them
together in a repeat group. For every instruction builder ir3_X, a
new repeat builder ir3_X_rpt has been added to facilitate this.
2. Optimization passes: for now, repeat groups are completely ignored by
optimizations.
3. Pre-RA: clean up repeat groups that can never be merged into an
actual rptN instruction (e.g., because their instructions are not
consecutive anymore). This ensures no useless merge sets will be
created in the next step.
4. RA: create merge sets for the sources and defs of the instructions in
repeat groups. This way, RA will try to allocate consecutive
registers for them. This will not be forced though because we prefer
to split-up repeat groups over creating movs to reorder registers.
5. Post-RA: create actual rptN instructions for repeat groups where the
allocated registers allow it.
The idea for step 2 is that we prefer that any potential optimizations
take precedence over creating rptN instructions as the latter will only
yield a code size benefit. However, it might be interesting to
investigate if we could make some optimizations repeat aware. For
example, the scheduler could try to schedule instructions of a repeat
group together.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Clean up repeat groups that can never be merged into an actual rptN
instruction (e.g., because their instructions are not consecutive
anymore). This ensures no useless merge sets will be created for RA.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Create merge sets for the sources and defs of the instructions in repeat
groups. This way, RA will try to allocate consecutive registers for
them. This will not be forced though because we prefer to split-up
repeat groups over creating movs to reorder registers.
When choosing a register for a repeat group's merge set, if its merge
set is unique (i.e., only used for these repeated instructions), try to
first allocate one of their sources (for the same reason as for ALU/SFU
instructions). This also prevents us from allocating a new register
range for this merge set when the one from a source could be reused.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
For every instruction builder ir3_X, this patch adds new repeat builder
ir3_X_rpt to create a repeated version of an instruction.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
In order to represent repeated instructions (rptN) in ir3, this patch
introduces the concept of "repeat groups". A repeat group is a group of
instructions that were produced from a vectorized NIR operation and
linked together. They are, however, still separate scalar instructions.
Repeat groups are created by linking together multiple instructions
using a new rpt_node list. This patch adds this list as well as a number
of helper functions the can be used to create and manipulate repeat
groups.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
The tex prefetch heuristic simply counts the number of NIR instructions.
Since a vectorized NIR instruction expands to an ir3 instruction per
component, we have to take this into account while counting them.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
After spilling during regular RA, merge sets need to be fixed up. To
find all merge sets, fixup_merge_sets used ra_foreach_dst. However,
after shared RA has run, shared dsts wouldn't have the IR3_REG_SSA flag
set anymore leaving their merge sets lingering. This patch fixes this by
using foreach_dst instead.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
The preferred register for merge sets was not updated after allocating
one. This caused a new merge set to be allocated for every register it
contains. This patch fixes this by reusing the update function from the
standard RA.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
This accidentally allowed DCC with format conversion, which is not supported.
Also disable EFC with VCN5 for now.
Fixes: 40c3a53fec ("radeonsi: Implement is_video_target_buffer_supported")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30562>
Instead of allowing defining it in the job, but then not doing that.
The alternative being to delete only the dead `${LLVM_VERSION:=` and `}`
parts, but this way allows for the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30657>
Previously only `-mtls-dialect=gnu2` was probed, which was appropriate
for arm, x86 and x86_64, but not for newer architectures such as
aarch64, loongarch64 and riscv64 which all use `-mtls-dialect=desc`
instead. Because the driver option is not consistent across
architectures (and probably will not), try both variants and choose the
first one working.
While at it, rename "gnu2_*" variables to "tlsdesc_*" respectively, for
clarity.
Cc: mesa-stable
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
Although the ORCJIT codepath is fresh and relatively less tested, this
is still better than no llvmpipe at all for those newer architectures
that will not gain MCJIT support, such as LoongArch or RISC-V.
Fixes: 6f02ec5ed1 ("llvmpipe: add an implementation with llvm orcjit")
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>