This patch tries to give instructions with the same execution
mask also the same pass_flags and enables VN for SALU instructions
using exec as Operand.
This patch also adds back VN for VOPC instructions and removes VN for phis.
v2 (by Timur Kristóf):
- Fix some regressions.
v3 (by Daniel Schürmann):
- Fix additional issues
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Seems we can use DPP's row_mask field to get an effect similar to
modifying exec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In particular, increase the cost of 64-bit integer division.
Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom
, with ACO used for GS this creates shaders requiring a branch with
>32767 dword offset.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The workaround got accidentally moved to the wrong place
Fixes: 08d510010b aco: increase accuracy of SGPR limits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Variables spilled on both branch legs need to be assigned to the same spilling slot.
These affinities can be transitive through multiple merge blocks.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Converting to 'Conventional SSA Form' ensures correctness w.r.t. spilling of phi nodes.
Previously, it was possible that phi operands have intersecting live-ranges, and thus,
couldn't get spilled to the same spilling slot. For this reason, ACO tried to avoid to
spill phis, even if it was beneficial.
This patch implements a conversion pass which is currently only called if spilling is necessary.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.
This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
GFX10 hazards require a different approach compared to previous
generations, for example it doesn't need s_nop, and most hazards
can't be solved by adding NOPs at all. Also, they are not
resolved by branch instructions.
This commit reorganizes aco_insert_NOPs so that there is now a
separate pass for GFX10. The new GFX10 pass also respects the
control flow of the shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This commit refines the VMEMtoScalarWriteHazard mitigation, based
upon a closer look at what LLVM does. Also changes the code to
match the structure of the other hazard mitigations.
* The hazard is not only triggered by VMEM, FLAT and GLOBAL
but also SCRATCH and DS instructions.
* The SMEM/SALU instructions only cause a hazard when they
write a register that the VMEM/etc. are reading.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
There is a hazard caused by there is a branch between a
VMEM/GLOBAL/SCRATCH instruction and a DS instruction.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
There is a hazard that happens when an SMEM instruction
reads an SGPR and then a VALU instruction writes that same SGPR.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
There is a hazard when a non-VALU instruction reads the EXEC mask
and then a VALU instruction writes the EXEC mask.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Any permlane instruction that follows any VOPC instruction can cause a hazard,
this commit implements a workaround that avoids this causing a problem.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO currently mitigates VMEMtoScalarWriteHazard and Offset3fBug
(names from LLVM). There are some bugs that ACO needn't care about.
Just to be on the safe side, add an assertion that makes sure
that we aren't hit by FlatSegmentOffsetBug.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>