Commit graph

6 commits

Author SHA1 Message Date
Timur Kristóf
e0bcefc3a0 aco/wave32: Use lane mask regclass for exec/vcc.
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Daniel Schürmann
78bca0d0ce aco: improve live variable analysis
This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:32 +00:00
Timur Kristóf
c52ebbcea4 aco: Introduce vgpr_limit to keep track of available VGPRs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-28 23:52:50 +00:00
Rhys Perry
fc04a2fc31 aco: take LDS into account when calculating num_waves
pipeline-db (Vega):
SGPRS: 344 -> 344 (0.00 %)
VGPRS: 424 -> 524 (23.58 %)
Spilled SGPRs: 84 -> 80 (-4.76 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 52812 -> 52484 (-0.62 %) bytes
LDS: 135 -> 135 (0.00 %) blocks
Max Waves: 56 -> 53 (-5.36 %)

v2: consider WGP, rework to be clearer and apply the
    "maximum 16 workgroups per CU" limit properly
v2: use "SIMD" instead of "EU"
v2: fix spiller by introducing "Program::max_waves"
v2: rename "lds_size" to "lds_limit"
v3: make max_waves actually independant of register usage
v3: fix issue where max_waves was way too high
v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1)
v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp"
v4: fix typo from "workgroups_per_cu" rename

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
2019-10-23 19:11:21 +01:00
Rhys Perry
08d510010b aco: increase accuracy of SGPR limits
SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed
number of SGPRs and has 106 addressable SGPRs.

pipeline-db (Vega):
SGPRS: 5912 -> 6232 (5.41 %)
VGPRS: 1772 -> 1780 (0.45 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 88228 -> 87904 (-0.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 559 -> 571 (2.15 %)

piepline-db (Navi):
SGPRS: 341256 -> 363384 (6.48 %)
VGPRS: 171536 -> 170960 (-0.34 %)
Spilled SGPRs: 832 -> 581 (-30.17 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14207332 -> 14190872 (-0.12 %) bytes
LDS: 33 -> 33 (0.00 %) blocks
Max Waves: 18072 -> 18251 (0.99 %)

v2: unconditionally count vcc as an extra sgpr on GFX10+
v3: pass SGPRs rounded to 8

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-23 19:11:21 +01:00
Daniel Schürmann
93c8ebfa78 aco: Initial commit of independent AMD compiler
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.

ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.

Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com>
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com>

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-19 12:10:00 +02:00