Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.
This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.
Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Moved to RADV. No pipeline-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This actually supports more of the extension than the LLVM backend but we
can't enable it because ACO doesn't work with all stages yet.
With more of it enabled, some CTS tests fail because our 64-bit sqrt
is very imprecise. I can't find any precision requirements for it
anywhere, so I'm thinking it might be a CTS issue.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.
To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.
This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.
v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition
v3 (by Timur Kristóf):
- Fix several subgroups regressions
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
In particular, increase the cost of 64-bit integer division.
Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom
, with ACO used for GS this creates shaders requiring a branch with
>32767 dword offset.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The workaround got accidentally moved to the wrong place
Fixes: 08d510010b aco: increase accuracy of SGPR limits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes the UBO/SSBO dEQP-VK.descriptor_indexing.* tests
v2: remove bld.copy() usage
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).
Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)
At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.
v2: remove fmod/frem handling in init_context()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes a few building errors similar to the following:
In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26:
In file included from external/libcxx/include/algorithm:639:
external/libcxx/include/utility:321:9:
error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>'
_T2 second;
^
Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
We currently lower them, but nir_opt_algebraic() can add new ones because
lower_sub=true.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
nir_lower_idiv() asserts on 64-bit integers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.
ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.
Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>