Commit graph

185 commits

Author SHA1 Message Date
Daniel Schürmann
9d3e070524 aco: value number instructions using the execution mask
This patch tries to give instructions with the same execution
mask also the same pass_flags and enables VN for SALU instructions
using exec as Operand.
This patch also adds back VN for VOPC instructions and removes VN for phis.

v2 (by Timur Kristóf):
- Fix some regressions.
v3 (by Daniel Schürmann):
- Fix additional issues

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-14 17:27:10 +01:00
Daniel Schürmann
8657eede8a aco: check if SALU instructions are predeceeded by exec when calculating WQM needs
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-14 17:27:10 +01:00
Rhys Perry
6914b0236f aco: combine read_invocation and shuffle implementations
They do mostly the same thing now.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-12 17:21:38 +00:00
Rhys Perry
2c98d79d11 aco: don't propagate vgprs into v_readlane/v_writelane
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
2019-11-12 17:21:38 +00:00
Rhys Perry
5a1bacb6f9 aco: fix read_invocation with VGPR lane index
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
2019-11-12 17:21:38 +00:00
Rhys Perry
f97d933426 aco: fix shuffle with uniform operands
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
2019-11-12 17:21:38 +00:00
Rhys Perry
3204e83768 aco: use DPP instead of exec modification when lowering GFX10 shuffles
Seems we can use DPP's row_mask field to get an effect similar to
modifying exec.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-12 17:21:38 +00:00
Daniel Schürmann
746b9380bd aco: rematerialize s_movk instructions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-12 15:59:48 +00:00
Daniel Schürmann
b6f5085dfe aco: preserve kill flag on moved operands during RA
Fixes: 93c8ebfa78 aco: Initial commit of independent AMD compiler

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-12 15:59:48 +00:00
Daniel Schürmann
a2a6880743 aco: fix invalid access on Pseudo_instructions
Fixes: 93c8ebfa78 aco: Initial commit of independent AMD compiler

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-12 15:59:48 +00:00
Timur Kristóf
911a826141 ac: Handle invalid GFX10 format correctly in ac_get_tbuffer_format.
It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-08 13:30:30 +01:00
Rhys Perry
78e3ea9a0f aco: add Instruction::usesModifiers() and add more checks in the optimizer
No pipeline-db changes.

v2: use early-exit for VOP3

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)
2019-11-08 00:14:06 +00:00
Rhys Perry
76544f632d radv: adjust loop unrolling heuristics for int64
In particular, increase the cost of 64-bit integer division.

Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom
, with ACO used for GS this creates shaders requiring a branch with
>32767 dword offset.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-07 23:29:12 +00:00
Daniel Schürmann
a47e232ccd aco: workaround Tonga/Iceland hardware bug
The workaround got accidentally moved to the wrong place

Fixes: 08d510010b aco: increase accuracy of SGPR limits

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-07 09:19:50 +01:00
Samuel Pitoiset
d3f9957de4 radv: determine shaders wavesize at pipeline level
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-06 09:20:34 +01:00
Daniel Schürmann
efe737fc4f aco: fix accidential reordering of instructions when scheduling
Fixes: 8678699918 "aco: implement VGPR spilling"

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-04 20:14:14 +01:00
Daniel Schürmann
5c7dcb15e0 aco: only use single-dword loads/stores for spilling
Fixes: 8678699918 "aco: implement VGPR spilling"

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-04 20:14:14 +01:00
Daniel Schürmann
d97c0bdd55 aco: fix immediate offset for spills if scratch is used
Fixes: 8678699918 "aco: implement VGPR spilling"

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-04 20:14:14 +01:00
Daniel Schürmann
8678699918 aco: implement VGPR spilling
VGPR spilling is implemented via MUBUF instructions and scratch memory.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
c79972b604 aco: always set scratch_offset in startpgm
This patch also moves private_segment_buffer and
scratch_offset to Program to easily access it.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
b0de16b7de aco: omit linear VGPRs as spill variables
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
aded548e66 aco: ensure that spilled VGPR reloads are done after p_logical_start
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
a7ff1bb5b9 aco: simplify calculation of target register pressure when spilling
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Rhys Perry
e73de4e1d8 aco: fix new_demand calculation for first instructions
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
93b42a1907 aco: don't add interferences between spilled phi operands
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
fdf8ad0256 aco: consider loop_exit blocks like merge blocks, even if they have only one predecessor
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
d48d72e98a aco: don't insert the exec mask into set of live-out variables when spilling
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
cd20e29de1 aco: fix transitive affinities of spilled variables
Variables spilled on both branch legs need to be assigned to the same spilling slot.
These affinities can be transitive through multiple merge blocks.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
8023dcd71e aco: fix live-range splits of phis
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
655a703349 aco: remove potential critical edge on loops.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:33 +00:00
Daniel Schürmann
78bca0d0ce aco: improve live variable analysis
This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:32 +00:00
Daniel Schürmann
0b8216b2cd aco: Lower to CSSA
Converting to 'Conventional SSA Form' ensures correctness w.r.t. spilling of phi nodes.
Previously, it was possible that phi operands have intersecting live-ranges, and thus,
couldn't get spilled to the same spilling slot. For this reason, ACO tried to avoid to
spill phis, even if it was beneficial.
This patch implements a conversion pass which is currently only called if spilling is necessary.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:32 +00:00
Rhys Perry
e1bcc7a828 aco: rename README to README.md
Closes: #1974
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 18:16:00 +00:00
Rhys Perry
d4684a294b aco: a couple loop handling fixes for GFX10 hazard pass
It was joining from the wrong blocks and block.kind is a bitmask instead
of an enum.

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-10-30 18:13:53 +00:00
Rhys Perry
8235bc6411 aco: try to group together VMEM loads of the same resource
v2: remove accidental shaderInt16 change
v2: simplify can_move_down initialization
v2: simplify VMEM_CLAUSE_MAX_GRAB_DIST

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-30 17:23:49 +01:00
Daniel Schürmann
8b5aee78cc aco: don't schedule instructions through depending VMEM instructions
Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 16:12:10 +00:00
Daniel Schürmann
636d45e46a aco: add can_reorder flags to load_ubo and load_constant
These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 16:12:10 +00:00
Daniel Schürmann
576f92d900 aco: only skip RAR dependencies if the variable is killed somewhere
This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 16:12:10 +00:00
Daniel Schürmann
703ce617ca aco: restrict scheduling depending on max_waves
Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 16:12:10 +00:00
Timur Kristóf
c52ebbcea4 aco: Introduce vgpr_limit to keep track of available VGPRs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Timur Kristóf
d59f702e26 aco: Implement subgroup shuffle in GFX10 wave64 mode.
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.

This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Rhys Perry
c2eebfe3ea aco: Remove dead code in reduction lowering.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Rhys Perry
3865448012 aco: Fix reductions on GFX10.
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Timur Kristóf
c580f134ae aco: Refactor hazard mitigations, separate pass for GFX10.
GFX10 hazards require a different approach compared to previous
generations, for example it doesn't need s_nop, and most hazards
can't be solved by adding NOPs at all. Also, they are not
resolved by branch instructions.

This commit reorganizes aco_insert_NOPs so that there is now a
separate pass for GFX10. The new GFX10 pass also respects the
control flow of the shader.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:42 +02:00
Timur Kristóf
b01847bd94 aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.
This commit refines the VMEMtoScalarWriteHazard mitigation, based
upon a closer look at what LLVM does. Also changes the code to
match the structure of the other hazard mitigations.

* The hazard is not only triggered by VMEM, FLAT and GLOBAL
  but also SCRATCH and DS instructions.
* The SMEM/SALU instructions only cause a hazard when they
  write a register that the VMEM/etc. are reading.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:42 +02:00
Timur Kristóf
c037ba1bb7 aco/gfx10: Mitigate LdsBranchVmemWARHazard.
There is a hazard caused by there is a branch between a
VMEM/GLOBAL/SCRATCH instruction and a DS instruction.
This commit adds a workaround that avoids the problem.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:42 +02:00
Timur Kristóf
09d676d81a aco/gfx10: Mitigate SMEMtoVectorWriteHazard.
There is a hazard that happens when an SMEM instruction
reads an SGPR and then a VALU instruction writes that same SGPR.
This commit adds a workaround that avoids the problem.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:42 +02:00
Timur Kristóf
d6dfce02d0 aco/gfx10: Mitigate VcmpxExecWARHazard.
There is a hazard when a non-VALU instruction reads the EXEC mask
and then a VALU instruction writes the EXEC mask.
This commit adds a workaround that avoids the problem.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:42 +02:00
Timur Kristóf
e5a8616973 aco/gfx10: Mitigate VcmpxPermlaneHazard.
Any permlane instruction that follows any VOPC instruction can cause a hazard,
this commit implements a workaround that avoids this causing a problem.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:42 +02:00
Timur Kristóf
99aed688d3 aco/gfx10: Add notes about some GFX10 hazards.
ACO currently mitigates VMEMtoScalarWriteHazard and Offset3fBug
(names from LLVM). There are some bugs that ACO needn't care about.
Just to be on the safe side, add an assertion that makes sure
that we aren't hit by FlatSegmentOffsetBug.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-25 10:10:41 +02:00