Commit graph

182 commits

Author SHA1 Message Date
Timur Kristóf
07754a9c9e aco/wave32: Replace hardcoded numbers in spiller with wave size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
c0dbf42a03 aco/wave32: Change uniform bool optimization to work with wave32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
dd9dad731b aco: Optimize load_subgroup_id to one bit field extract instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
753670e902 aco: Remove lower_linear_bool_phi, it is not needed anymore.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
0d2d672020 aco: Remove superfluous argument from emit_boolean_logic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Timur Kristóf
9a43d26b74 aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Rhys Perry
01cacdb71e aco: fix block_kind_discard s_andn2 definition to exec
Improves generated code of dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
because a loop exit phi can then be fixed to exec, removing copies and
improving jump threading.

No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-02 16:56:24 +00:00
Rhys Perry
0e8da9f607 aco: handle loop exit and IF merge phis with break/discard
ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.

This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.

Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-02 16:56:19 +00:00
Rhys Perry
06fc83989c aco: validate the CFG
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-02 16:56:05 +00:00
Samuel Pitoiset
0eb78a078e aco: drop useless lowering of deref operations for shared memory
Moved to RADV. No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 21:58:25 +01:00
Daniel Schürmann
b690543851 aco: fix a couple of value numbering issues
Fixes: 3a20ef4a32 'aco: refactor value numbering'

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-29 21:54:27 +01:00
Daniel Schürmann
8861a82be7 aco: don't split live-ranges of linear VGPRs
Fixes: 93c8ebfa78 'aco: Initial commit of independent AMD compiler'

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-29 21:54:27 +01:00
Rhys Perry
73783ed389 aco: implement global atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:02 +00:00
Rhys Perry
389ee819c0 aco: improve FLAT/GLOBAL scheduling
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:02 +00:00
Rhys Perry
cc742562c1 aco: don't enable store_global for helper invocations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:02 +00:00
Rhys Perry
31e68e230f aco: fix SADDR with FLAT on GFX10
The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
082e3a68fa aco: fix assembly of FLAT/GLOBAL atomics
They can take both a definition and data operand

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
f1381e6715 aco: fix GFX10 opcodes for some global/flat atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
5986e00194 aco: improve WAR hazard workaround with >64bit stores
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
a9fc81b098 aco: add v_nop inbetween exec write and VMEM/DS/FLAT
LLVM and the proprietary compiler seem to do this

Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
54742e157d aco: fix incorrect cast in parse_wait_instr()
s_waitcnt is SOPP, not SOPK

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
11f43caaec aco: fix i2i64
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
ff70ccad16 aco: propagate p_wqm on an image_sample's coordinate p_create_vector
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2156
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:19:52 +00:00
Daniel Schürmann
7cd548d352 aco: don't value-number instructions from within a loop with ones after the loop.
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
46420dd294 aco: set dlc/glc correctly for image loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
37843e454e aco: allow constant offsets for global/scratch instructions on GFX10
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.

Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-11-26 14:39:27 +00:00
Rhys Perry
459bc77763 aco: enable load/store vectorizer
Totals from affected shaders:
SGPRS: 1890373 -> 1900772 (0.55 %)
VGPRS: 1210024 -> 1215244 (0.43 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size: 81937504 -> 74608304 (-8.94 %) bytes
LDS: 746 -> 746 (0.00 %) blocks
Max Waves: 230491 -> 230158 (-0.14 %)

In NeiR:Automata and GTA V, the code decrease is especially large: -13.79%
and -15.32%, respectively.

v9: rework the callback function
v10: handle load_shared/store_shared in the callback

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
2019-11-25 13:59:11 +00:00
Connor Abbott
01eb6ef870 aco: Make unused workgroup id's 0
It shouldn't matter, but the 1 was leftover from when it was handled
together with workgroup_size and num_work_groups.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
bb78f9b4e4 aco: Use common argument handling
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
e7f4cadd02 radv: Replace supports_spill with explict_scratch_args
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 14:17:51 +01:00
Connor Abbott
4d6676d78a aco: Make num_workgroups and local_invocation_ids one argument each
To match the LLVM argument setup code.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
a7f1c63442 aco: Split vector arguments at the beginning
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
b45c54ff8d aco: Use radv_shader_args in aco_compile_shader()
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Connor Abbott
680b086db1 aco: Constify radv_nir_compiler_options in isel
It's already const for everything else.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-25 14:17:51 +01:00
Rhys Perry
517728477c aco: fix waitcnts for barriers at block ends
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: d1b9deee ('aco: improve waitcnt insertion around loops')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-22 19:56:31 +00:00
Rhys Perry
29d131d619 aco: fix copy+paste error
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Rhys Perry
d1b9deeea8 aco: improve waitcnt insertion around loops
Do this by repeating processing of loops until no progress is made.

Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)

v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
    of loops instead of at each continue

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Marek Olšák
ebe7579655 nir: move data.image.access to data.access
The size of the data structure doesn't change.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:05 -05:00
Rhys Perry
56c06c79fc aco: implement 64-bit integer reductions
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.

No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.

v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
2019-11-19 18:58:04 +00:00
Rhys Perry
33277bd66e aco: refactor reduction lowering helpers
Should make 64-bit integer reductions easier to implement.

v4: use num_opcodes instead of last_opcode

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
2019-11-19 18:56:21 +00:00
Rhys Perry
df645fa369 aco: implement VK_KHR_shader_float_controls
This actually supports more of the extension than the LLVM backend but we
can't enable it because ACO doesn't work with all stages yet.

With more of it enabled, some CTS tests fail because our 64-bit sqrt
is very imprecise. I can't find any precision requirements for it
anywhere, so I'm thinking it might be a CTS issue.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-15 17:36:21 +00:00
Rhys Perry
be1d11249b aco: fix 64-bit fsign with 0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-15 17:36:21 +00:00
Rhys Perry
b062b92ab1 aco: don't combine literals into v_cndmask_b32/v_subb/v_addc
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-15 17:36:21 +00:00
Timur Kristóf
9b8dc6929e aco: Optimize out trivial code from uniform bools.
This should remove most of the excess code size that was
introduced by making all booleans per-lane.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-14 17:27:11 +01:00
Timur Kristóf
8995c0b30a aco: Treat all booleans as per-lane.
Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.

To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.

This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.

v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition

v3 (by Timur Kristóf):
- Fix several subgroups regressions

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-14 17:27:11 +01:00
Daniel Schürmann
a1622c1a11 aco: use s_and_b64 exec to reduce uniform booleans to one bit
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-14 17:27:10 +01:00
Timur Kristóf
94e355148f aco: Make sure not to mistakenly propagate 64-bit constants.
ACO's optimizer would try to propagate 64-bit constants, but
does so in such a way that wouldn't work due to how the 64-bit
constants are handled in the IR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-14 17:27:10 +01:00
Daniel Schürmann
9d3e070524 aco: value number instructions using the execution mask
This patch tries to give instructions with the same execution
mask also the same pass_flags and enables VN for SALU instructions
using exec as Operand.
This patch also adds back VN for VOPC instructions and removes VN for phis.

v2 (by Timur Kristóf):
- Fix some regressions.
v3 (by Daniel Schürmann):
- Fix additional issues

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-14 17:27:10 +01:00
Daniel Schürmann
8657eede8a aco: check if SALU instructions are predeceeded by exec when calculating WQM needs
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-11-14 17:27:10 +01:00
Rhys Perry
6914b0236f aco: combine read_invocation and shuffle implementations
They do mostly the same thing now.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-12 17:21:38 +00:00