Commit graph

16 commits

Author SHA1 Message Date
Rhys Perry
9fea90ad51 aco: keep track of which events are used in a barrier
And properly handle unordered events so that they always wait for 0.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3774>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3774>
2020-03-03 15:38:13 +00:00
Samuel Pitoiset
34fd894e42 aco: fix waiting for scalar stores before "writing back" data on GFX8-GFX9
Seems required also on GFX8-GFX9 to achieve correct behaviour. This
is an undocumented behaviour but it makes real sense to me.

pipeline-db on GFX9:
Totals from affected shaders:
SGPRS: 1018 -> 1018 (0.00 %)
VGPRS: 516 -> 516 (0.00 %)
Code Size: 40516 -> 40636 (0.30 %) bytes
Max Waves: 280 -> 280 (0.00 %)

This fixes some sort of sun flickering with Assassins Creed Origins.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2488
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3750>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3750>
2020-02-10 12:07:25 +00:00
Daniel Schürmann
71440ba0f5 aco: reorder VMEM operands in ACO IR
For all VMEM instructions, the resource constant is now
in operands[0]. For MIMG instructions, the sampler shares
operands[1] with write data in case this instruction writes memory.
Moving the VADDR to be the last operand for MIMG is the first step to
support Navi NSA encoding.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3602>
2020-01-29 18:45:23 +00:00
Rhys Perry
40bb81c9dd radv/aco,aco: implement GS on GFX9+
v2: implement GFX10
v3: rebase
v7: rebase after shader args MR
v8: fix gs_vtx_offset usage on GFX9/GFX10
v8: use unreachable() instead of printing intrinsic
v8: rename output_state to ge_output_state
v8: fix formatting around nir_foreach_variable()
v8: rename some helpers in the scheduler
v8: rename p_memory_barrier_all to p_memory_barrier_common
v8: fix assertion comparing ctx.stage against vertex_geometry_gs

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Rhys Perry
70f63c1988 aco: improve support for s_sendmsg
In particular, the messages needed for GS.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
2020-01-24 13:35:07 +00:00
Timur Kristóf
ed815d503e aco/wave32: Use wave_size for barrier intrinsic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Rhys Perry
54742e157d aco: fix incorrect cast in parse_wait_instr()
s_waitcnt is SOPP, not SOPK

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-29 17:46:01 +00:00
Rhys Perry
517728477c aco: fix waitcnts for barriers at block ends
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: d1b9deee ('aco: improve waitcnt insertion around loops')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-22 19:56:31 +00:00
Rhys Perry
29d131d619 aco: fix copy+paste error
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Rhys Perry
d1b9deeea8 aco: improve waitcnt insertion around loops
Do this by repeating processing of loops until no progress is made.

Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)

v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
    of loops instead of at each continue

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-21 20:28:57 +00:00
Rhys Perry
118a32e5ba Revert "aco: only emit waitcnt on loop continues if we there was some load or export"
We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this
waitcnt was necessary for barriers and correctly waiting for SMEM before
s_dcache_wb on GFX10.

Totals from affected shaders:
SGPRS: 33200 -> 33200 (0.00 %)
VGPRS: 31376 -> 31376 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2431804 -> 2433956 (0.09 %) bytes
LDS: 316 -> 316 (0.00 %) blocks
Max Waves: 1609 -> 1609 (0.00 %)

This reverts commit 2c050b49b3.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Timur Kristóf
1749953ea3 aco/gfx10: Wait for pending SMEM stores before loads
Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.

Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Timur Kristóf
a89153d038 aco: Fix s_dcache_wb on GFX10.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Rhys Perry
68c9554732 aco: Have s_waitcnt_vscnt write to NULL.
Not sure if this instruction actually writes anything, but LLVM
disassembles a destination and sets it to NULL.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-10 09:57:53 +02:00
Daniel Schürmann
2c050b49b3 aco: only emit waitcnt on loop continues if we there was some load or export
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-09-23 13:39:33 +02:00
Daniel Schürmann
93c8ebfa78 aco: Initial commit of independent AMD compiler
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.

ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.

Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com>

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-19 12:10:00 +02:00