Phi nodes are mostly handled the same way as ALU instructions: if all
sources point to the same def (which happens if they are scalar or have
been previously vectorized), combine them into a single vectorized phi
node.
There is one case where this doesn't work, however: sources that come
from a loop back-edge. Since their defs haven't been processed yet, they
are generally not the same. We could simply refuse to vectorize such
phi nodes but this could leave many values used in loops unnecessarily
scalarized.
Instead, this patch implements a simple heuristic: if all defs coming
from a back-edge have the same instructions type and, in case of ALU,
the same operation, assume they will be vectorized later. Since we
require that normal edges are vectorized already, chances are that the
back-edge can also be vectorized.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
To handle phi nodes, it's important that all sources have been processed
before processing the phi node itself. The current traversal order
(depth-first on dom_children) does not guarantee this. This patch
rewrites the pass to visit blocks in source-code order.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Dispatch to different functions inside instr_try_combine. To prepare for
upcoming support for phi nodes.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Use the new repeat group builders to lower vectorized NIR instructions.
Add NIR pass to vectorize NIR before lowering.
Support for repeated instruction is added over a number of different
commits. Here's how they all tie together:
ir3 is a scalar architecture and as such most instructions cannot be
vectorized. However, many instructions support the (rptN) modifier that
allows us to mimic vector instructions. Whenever an instruction has the
(rptN) modifier set it will execute N more times, incrementing its
destination register for each repetition. Additionally, source registers
with the (r) flag set will also be incremented.
For example:
(rpt1)add.f r0.x, (r)r1.x, r2.x
is the same as:
add.f r0.x, r1.x, r2.x
add.f r0.y, r1.y, r2.x
The main benefit of using repeated instructions is a reduction in code
size. Since every iteration is still executed as a scalar instruction,
there's no direct benefit in terms of runtime. The only exception seems
to be for 3-source instructions pre-a7xx: if one of the sources is
constant (i.e., without the (r) flag), a repeated instruction executes
faster than the equivalent expanded sequence. Presumably, this is
because the ALU only has 2 register read ports. I have not been able to
measure this difference on a7xx though.
Support for repeated instructions consists of two parts. First, we need
to make sure NIR is (mostly) vectorized when translating to ir3. I have
not been able to find a way to keep NIR vectorized all the way and still
generate decent code. Therefore, I have taken the approach of
vectorizing the (scalarized) NIR right before translating it to ir3.
Secondly, ir3 needs to be adapted to ingest vectorized NIR and translate
it to repeated instructions. To this end, I have introduced the concept
of "repeat groups" to ir3. A repeat group is a group of instructions
that were produced from a vectorized NIR operation and linked together.
They are, however, still separate scalar instructions until quite late.
More concretely:
1. Instruction emission: for every vectorized NIR operation, emit
separate scalar instructions for its components and link them
together in a repeat group. For every instruction builder ir3_X, a
new repeat builder ir3_X_rpt has been added to facilitate this.
2. Optimization passes: for now, repeat groups are completely ignored by
optimizations.
3. Pre-RA: clean up repeat groups that can never be merged into an
actual rptN instruction (e.g., because their instructions are not
consecutive anymore). This ensures no useless merge sets will be
created in the next step.
4. RA: create merge sets for the sources and defs of the instructions in
repeat groups. This way, RA will try to allocate consecutive
registers for them. This will not be forced though because we prefer
to split-up repeat groups over creating movs to reorder registers.
5. Post-RA: create actual rptN instructions for repeat groups where the
allocated registers allow it.
The idea for step 2 is that we prefer that any potential optimizations
take precedence over creating rptN instructions as the latter will only
yield a code size benefit. However, it might be interesting to
investigate if we could make some optimizations repeat aware. For
example, the scheduler could try to schedule instructions of a repeat
group together.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Clean up repeat groups that can never be merged into an actual rptN
instruction (e.g., because their instructions are not consecutive
anymore). This ensures no useless merge sets will be created for RA.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
Create merge sets for the sources and defs of the instructions in repeat
groups. This way, RA will try to allocate consecutive registers for
them. This will not be forced though because we prefer to split-up
repeat groups over creating movs to reorder registers.
When choosing a register for a repeat group's merge set, if its merge
set is unique (i.e., only used for these repeated instructions), try to
first allocate one of their sources (for the same reason as for ALU/SFU
instructions). This also prevents us from allocating a new register
range for this merge set when the one from a source could be reused.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
For every instruction builder ir3_X, this patch adds new repeat builder
ir3_X_rpt to create a repeated version of an instruction.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
In order to represent repeated instructions (rptN) in ir3, this patch
introduces the concept of "repeat groups". A repeat group is a group of
instructions that were produced from a vectorized NIR operation and
linked together. They are, however, still separate scalar instructions.
Repeat groups are created by linking together multiple instructions
using a new rpt_node list. This patch adds this list as well as a number
of helper functions the can be used to create and manipulate repeat
groups.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
The tex prefetch heuristic simply counts the number of NIR instructions.
Since a vectorized NIR instruction expands to an ir3 instruction per
component, we have to take this into account while counting them.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
After spilling during regular RA, merge sets need to be fixed up. To
find all merge sets, fixup_merge_sets used ra_foreach_dst. However,
after shared RA has run, shared dsts wouldn't have the IR3_REG_SSA flag
set anymore leaving their merge sets lingering. This patch fixes this by
using foreach_dst instead.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
The preferred register for merge sets was not updated after allocating
one. This caused a new merge set to be allocated for every register it
contains. This patch fixes this by reusing the update function from the
standard RA.
Fixes: fa22b0901a ("ir3/ra: Add specialized shared register RA/spilling")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28341>
This accidentally allowed DCC with format conversion, which is not supported.
Also disable EFC with VCN5 for now.
Fixes: 40c3a53fec ("radeonsi: Implement is_video_target_buffer_supported")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30562>
Instead of allowing defining it in the job, but then not doing that.
The alternative being to delete only the dead `${LLVM_VERSION:=` and `}`
parts, but this way allows for the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30657>
Previously only `-mtls-dialect=gnu2` was probed, which was appropriate
for arm, x86 and x86_64, but not for newer architectures such as
aarch64, loongarch64 and riscv64 which all use `-mtls-dialect=desc`
instead. Because the driver option is not consistent across
architectures (and probably will not), try both variants and choose the
first one working.
While at it, rename "gnu2_*" variables to "tlsdesc_*" respectively, for
clarity.
Cc: mesa-stable
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
Although the ORCJIT codepath is fresh and relatively less tested, this
is still better than no llvmpipe at all for those newer architectures
that will not gain MCJIT support, such as LoongArch or RISC-V.
Fixes: 6f02ec5ed1 ("llvmpipe: add an implementation with llvm orcjit")
Reviewed-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Yukari Chiba <i@0x7f.cc>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30599>
Fixes a "regression" where comically large FPS tests regressed.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 19dba854 ("wsi/x11: Rewrite implementation to always use threads.")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30638>
Currently sequence headers (VPS, SPS, PPS) are always inserted
on each IDR frame and AUD is inserted on every frame, but this
should be decided by application what headers it wants.
AUD is optional and is almost never needed, in some cases sequence
headers also are not needed each IDR frame and currently this only
wastes bits.
With FFmpeg/GStreamer this changes AUD to not be inserted by default,
there is no change to sequence headers as those are already requested
to be inserted on each IDR.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30585>
FFmpeg sends AUD as part of VA_ENC_PACKED_HEADER_SEQUENCE and
VA_ENC_PACKED_HEADER_SLICE.
GStreamer sends it separately as VA_ENC_PACKED_HEADER_RAW_DATA.
It's now also needed to keep track what packed headers were enabled
to include VPS/SPS/PPS with VAEncSequenceParameterBuffer when sequence
packed headers are disabled.
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30585>
Applications should not send types that were not enabled when creating
config and even if they do it will not cause any unexpected issues.
Remove the checks as it is another place that would need to be
updated when adding support for new packed header types.
Reviewed-By: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30585>
An extra control register word, tpu_tag_cdm_ctrl, will be present when
TPU_DM_GLOBAL_REGISTERS feature is present.
Emit it when it's needed.
The document of this register is available, however I don't think any of
the bits are needed to be set for our current feature set, so just emit
0 now.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30552>
This corresponds to the RGX_FEATURE_TPU_DM_GLOBAL_REGISTERS in the DDK
kernel module source code, and will introduce one more control word to
compute command streams.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30552>
The currently allocated transfer fw_stream buffer lacks the space for a
field that exists conditionally for multicore GPUs, frag_screen.
Enlarge the transfer fw_stream buffer for this field.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30543>
The versioned libgallium library can be confusing on Android, and it is
probably not even needed there, so simplify the build on Android by
always build the unversioned `libgallium_dri.so` overriding the
`-Dunversion-libgallium=true` option added in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30579
Remove also all the bits that deal with the versioned library which are
not needed anymore.
Fixes: 9568976c52 ("android: fix build in multiple ways")
Acked-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30641>