Commit graph

54 commits

Author SHA1 Message Date
Daniel Schürmann
39c828cb9f aco: remove aco::rt_stack variable
Since we initialize scratch in the RT proglog,
there is no need for this variable anymore.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21780>
2023-03-16 01:40:30 +00:00
Daniel Schürmann
b338d59047 radv: unconditionally enable scratch for RT shaders
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21159>
2023-02-16 19:37:25 +00:00
Rhys Perry
50073d6135 aco/gfx11: increase gfx1100/gfx1101 physical vgprs
https://reviews.llvm.org/D134522

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18825>
2022-11-02 17:09:32 +00:00
Daniel Schürmann
f676326a1a aco/live_var_analysis: implement faster merging of live_out sets for some cases
If we know that logical and linear predecessors are the same,
we don't need to check for the register type of each variable.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18105>
2022-08-30 16:03:26 +00:00
Rhys Perry
d2d94b62f2 aco: initialize scratch base registers on GFX9-GFX10.3
fossil-db (navi21):
Totals from 1142 (0.70% of 162293) affected shaders:
Instrs: 271636 -> 271974 (+0.12%)
CodeSize: 1532020 -> 1533792 (+0.12%)
Latency: 7484066 -> 7485698 (+0.02%)
InvThroughput: 4048824 -> 4049579 (+0.02%)
SClause: 4171 -> 4212 (+0.98%)
PreSGPRs: 11203 -> 12276 (+9.58%)

fossil-db (vega10):
Totals from 3327 (2.06% of 161355) affected shaders:
Instrs: 257413 -> 257601 (+0.07%)
CodeSize: 1424244 -> 1425372 (+0.08%)
Latency: 8598402 -> 8600466 (+0.02%)
InvThroughput: 7906335 -> 7908234 (+0.02%)
SClause: 4932 -> 4973 (+0.83%)
PreSGPRs: 22010 -> 25405 (+15.42%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17079>
2022-07-08 14:49:03 +00:00
Marek Olšák
39800f0fa3 amd: change chip_class naming to "enum amd_gfx_level gfx_level"
This aligns the naming with PAL.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pellou-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16469>
2022-05-13 14:56:22 -04:00
Dave Airlie
a2701bfdb8 aco: move info pointer to a copy.
This is just setup to move this to a different struct later.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Daniel Schürmann
8d8c59b4cd aco: split num_waves adjustment into separate function
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Daniel Schürmann
6220046ad1 aco: remove 'max_waves' and use 'num_waves' to adjust for LDS and workgroup size
Totals from 21 (0.02% of 134913) affected shaders: (GFX10.3)
VGPRs: 1024 -> 1176 (+14.84%)
CodeSize: 127824 -> 127664 (-0.13%); split: -0.17%, +0.04%
MaxWaves: 416 -> 378 (-9.13%)
Instrs: 22521 -> 22502 (-0.08%); split: -0.17%, +0.09%
Latency: 146386 -> 143154 (-2.21%); split: -2.21%, +0.00%
InvThroughput: 28379 -> 28944 (+1.99%); split: -0.23%, +2.22%
VClause: 575 -> 579 (+0.70%); split: -0.87%, +1.57%
SClause: 692 -> 645 (-6.79%)
Copies: 780 -> 747 (-4.23%); split: -4.74%, +0.51%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Daniel Schürmann
b10c4d7dee aco: make program->needs_vcc independent of VCC hints
Totals from 5 (0.00% of 135048) affected shaders: (GFX9)
SGPRs: 208 -> 160 (-23.08%)
CodeSize: 2700 -> 2692 (-0.30%)
Instrs: 533 -> 531 (-0.38%)
Latency: 41688 -> 41680 (-0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann
44fb9ba84a aco/ra: only use VCC if program->needs_vcc == true
A future commit will make VCC register assignment independent
from register hints. Up to GFX9, VCC can alternatively be used
as regular SGPR, so prevent overlap.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann
40a93e271c aco: clang-format
No changes, just formatting.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13087>
2021-09-28 19:48:00 +00:00
Timur Kristóf
589ccf3d77 aco: Consider maximum number of workgroups per CU/WGP on Navi.
No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12517>
2021-08-27 16:41:08 +00:00
Timur Kristóf
c8698199a1 aco: Consider LDS usage by PS inputs in MaxWaves calculation.
Before PS waves are launched, PS inputs are moved from PC to LDS
and the corresponding part of the PC is deallocated.
Each PS input occupies 3 * vec4 (3 * 16 = 48 bytes) of LDS space.
See Figure 10.3 in the GCN3 ISA manual.

These limit occupancy the same way as other stages' LDS usage does.
Note that PS can request additional LDS space via EXTRA_LDS_SIZE,
so that also must be taken into account here.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12517>
2021-08-27 16:41:08 +00:00
Rhys Perry
5a536eca9c aco: calculate correct register demand for branch instructions
Since copies for the successor's linear phis are inserted before the
branch, we should consider the definitions and operands of the successor's
linear phis.

Fixes a Detroit: Become Human spilling failure with GCM+GVN.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12035>
2021-08-05 12:01:58 +00:00
Daniel Schürmann
71aab9607d aco/live_var_analysis: change worklist to a single integer
Reduces overall compile times by ~0.45%.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11879>
2021-07-14 18:10:56 +02:00
Daniel Schürmann
1e2639026f aco: Format.
Manually adjusted some comments for more intuitive line breaks.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258>
2021-07-12 21:27:31 +00:00
Daniel Schürmann
3f9e986d33 aco: add missing Licenses and remove Authors from files
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
2021-07-12 12:09:31 +00:00
Rhys Perry
776ba40115 aco: add and use Program::progress
This is used when printing the program and to avoid updating register
demand during post-RA liveness analysis.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
2021-04-21 11:09:33 +00:00
Rhys Perry
655ba1e3a9 aco: don't update register demand during RA validation
It isn't intended to be accurate after RA, so num_waves can become zero,
breaking the sgpr_limit calculation.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
2021-04-21 11:09:33 +00:00
Daniel Schürmann
d75c73e6a6 aco: fix kill flags on phi operands
Fossil-db changes are likely due to how the CSSA pass works.
Totals from 1782 (1.31% of 136546) affected shaders (Raven):
CodeSize: 25333292 -> 25294020 (-0.16%); split: -0.16%, +0.00%
Instrs: 4916059 -> 4908218 (-0.16%); split: -0.16%, +0.00%
Latency: 282860167 -> 282707176 (-0.05%); split: -0.08%, +0.03%
InvThroughput: 136487564 -> 136394958 (-0.07%); split: -0.12%, +0.05%
VClause: 74791 -> 74795 (+0.01%)
Copies: 542115 -> 534280 (-1.45%); split: -1.48%, +0.04%
Branches: 168977 -> 168966 (-0.01%); split: -0.01%, +0.01%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>
2021-04-13 18:40:57 +00:00
Rhys Perry
3d4c13f3b8 aco: add DeviceInfo
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>
2021-02-15 13:44:22 +00:00
Rhys Perry
b759557cac aco: consider that GFX10.3 allocates LDS in 1024 byte blocks
fossil-db (GFX10.3):
Totals from 3 (0.00% of 139391) affected shaders:
VMEM: 513 -> 511 (-0.39%)
SMEM: 94 -> 92 (-2.13%)
VClause: 31 -> 30 (-3.23%)

fossil-db (GFX10.3, wave32):
Totals from 4 (0.00% of 139391) affected shaders:
VClause: 82 -> 81 (-1.22%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>
2021-02-15 13:35:38 +00:00
Rhys Perry
f520f4c299 aco: add Program::wgp_mode
Instead of assuming WGP mode on GFX10+ in different places, add a member
to Program that can be used instead.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>
2021-02-15 13:35:14 +00:00
Rhys Perry
592d64611c aco: fix waves calculation for wave32
fossil-db (GFX10.3, wave32):
Totals from 176 (0.13% of 139391) affected shaders:
SGPRs: 16648 -> 16640 (-0.05%)
VGPRs: 18920 -> 19076 (+0.82%); split: -0.30%, +1.12%
CodeSize: 2354172 -> 2354288 (+0.00%); split: -0.01%, +0.01%
MaxWaves: 1618 -> 1627 (+0.56%); split: +0.68%, -0.12%
Instrs: 435756 -> 435761 (+0.00%); split: -0.02%, +0.02%
Cycles: 8858360 -> 8869960 (+0.13%); split: -0.01%, +0.14%
VMEM: 55899 -> 57220 (+2.36%); split: +2.53%, -0.17%
SMEM: 10323 -> 10374 (+0.49%); split: +0.73%, -0.23%
VClause: 8307 -> 8290 (-0.20%); split: -0.24%, +0.04%
SClause: 16573 -> 16577 (+0.02%); split: -0.01%, +0.03%
Copies: 24641 -> 24652 (+0.04%); split: -0.24%, +0.28%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>
2021-02-15 13:35:13 +00:00
Daniel Schürmann
5d7b3bf1a7 aco: handle non-temp phi definitions and operands
This will be necessary as we make exec non-temp.

No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870>
2021-02-12 22:41:31 +00:00
Daniel Schürmann
44a76ba16d aco: use VCC as regular SGPR pair on GFX10
There is no need to reserve it for special purposes, only.

Totals from 139391 (100.00% of 139391) affected shaders (Navi10):
VGPRs: 4738296 -> 4738156 (-0.00%); split: -0.01%, +0.00%
SpillSGPRs: 16188 -> 14968 (-7.54%); split: -7.60%, +0.06%
CodeSize: 294204472 -> 294118048 (-0.03%); split: -0.04%, +0.01%
MaxWaves: 2119584 -> 2119619 (+0.00%); split: +0.00%, -0.00%
Instrs: 56075079 -> 56056235 (-0.03%); split: -0.05%, +0.01%
Cycles: 1757781564 -> 1755354032 (-0.14%); split: -0.16%, +0.02%
VMEM: 52995887 -> 52996319 (+0.00%); split: +0.07%, -0.07%
SMEM: 9005338 -> 9004858 (-0.01%); split: +0.16%, -0.17%
VClause: 1178436 -> 1178331 (-0.01%); split: -0.02%, +0.01%
SClause: 2403649 -> 2404542 (+0.04%); split: -0.14%, +0.18%
Copies: 3447073 -> 3432417 (-0.43%); split: -0.66%, +0.23%
Branches: 1166542 -> 1166422 (-0.01%); split: -0.11%, +0.10%
PreSGPRs: 4229322 -> 4235538 (+0.15%)
PreVGPRs: 3817111 -> 3817040 (-0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
2021-02-12 19:00:18 +00:00
Daniel Schürmann
947bf0bd67 aco: don't decrease the vgpr_limit when encountering bpermute
Instead we recalculate vgpr_limit on demand, depending on
the number of needed shared VGPRs.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
2021-02-12 19:00:18 +00:00
Daniel Schürmann
b98a4d4dd7 aco: refactor GPR limit calculation
This patch delays the calculation of GPR limits in order to
precisely incorporate extra registers (VCC etc.) and shared VGPRs.

Additionally, the allocation granularity is used to set the config.
This has some effect on the reported SGPR stats.

Totals (Navi10):
SGPRs: 6971787 -> 17753642 (+154.65%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
2021-02-12 19:00:18 +00:00
Daniel Schürmann
eaf681724e aco: change gpr_alloc_granule to full alignment
This also switches the alloc_granule of Tonga and Iceland
to 96, so that the calculation is consistent.
Also changes the granularity for RDNA to 16 to keep
better stats with the upcoming patch.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
2021-02-12 19:00:18 +00:00
Rhys Perry
d1f93261b1 aco: always set exec_live=false
Register demand calculation for exec masks doesn't always match
get_live_changes() and get_temp_registers(). For now, just set
exec_live=false.

fossil-db (GFX10.3):
Totals from 108230 (77.64% of 139391) affected shaders:
SGPRs: 5759658 -> 5756818 (-0.05%); split: -0.08%, +0.03%
VGPRs: 4061104 -> 4061248 (+0.00%); split: -0.00%, +0.01%
SpillSGPRs: 14114 -> 15198 (+7.68%); split: -0.10%, +7.78%
CodeSize: 266548396 -> 266603288 (+0.02%); split: -0.01%, +0.03%
MaxWaves: 1390885 -> 1390855 (-0.00%); split: +0.00%, -0.00%
Instrs: 50983353 -> 50992972 (+0.02%); split: -0.02%, +0.04%
Cycles: 1733042048 -> 1735443264 (+0.14%); split: -0.02%, +0.16%
VMEM: 41933625 -> 41914722 (-0.05%); split: +0.04%, -0.09%
SMEM: 7197675 -> 7197789 (+0.00%); split: +0.16%, -0.16%
VClause: 1050885 -> 1050978 (+0.01%); split: -0.02%, +0.03%
SClause: 2074913 -> 2071844 (-0.15%); split: -0.23%, +0.08%
Copies: 3181464 -> 3188125 (+0.21%); split: -0.38%, +0.59%
Branches: 1127526 -> 1127716 (+0.02%); split: -0.10%, +0.12%
PreSGPRs: 3376687 -> 3586076 (+6.20%); split: -0.00%, +6.20%
PreVGPRs: 3339740 -> 3339811 (+0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8807>
2021-02-05 17:03:17 +00:00
Rhys Perry
489aa8c7cb aco: fix num_waves on GFX10+
There are half the SIMDs per CU and physical_vgprs should be 512 instead
of 256.

fossil-db (GFX10.3):
Totals from 3622 (2.60% of 139391) affected shaders:
VGPRs: 298192 -> 289732 (-2.84%); split: -3.43%, +0.59%
CodeSize: 29443432 -> 29458388 (+0.05%); split: -0.00%, +0.06%
MaxWaves: 21703 -> 23395 (+7.80%); split: +7.84%, -0.05%
Instrs: 5677920 -> 5681438 (+0.06%); split: -0.01%, +0.07%
Cycles: 280715524 -> 280895676 (+0.06%); split: -0.00%, +0.07%
VMEM: 981142 -> 981894 (+0.08%); split: +0.18%, -0.10%
SMEM: 243315 -> 243454 (+0.06%); split: +0.07%, -0.02%
VClause: 88991 -> 89767 (+0.87%); split: -0.02%, +0.89%
SClause: 200660 -> 200659 (-0.00%); split: -0.00%, +0.00%
Copies: 430729 -> 434160 (+0.80%); split: -0.07%, +0.86%
Branches: 158004 -> 158021 (+0.01%); split: -0.01%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8523>
2021-01-20 16:46:54 +00:00
Samuel Pitoiset
97afb2a0a9 aco: remove unused radv_shader.h includes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
408195ec53 aco: remove useless occurences of radv_nir_compiler_options
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Rhys Perry
d2c18b7bf3 aco: use bit vectors for liveness sets
This seems to be much faster than hash sets. When compiling pipelines from
5 games, live_var_analysis takes about a third the time it used to and
fossilize-replay is ~1.77% faster.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6733>
2020-09-21 13:47:28 +00:00
Samuel Pitoiset
c2b1978aa4 aco: rework the way various compilation/validation errors are reported
The upcoming change will allow to report all ACO errors (or warnings)
directly to the app via VK_EXT_debug_report. This is similar to what
we already do for reporting various SPIRV->NIR errors.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6318>
2020-08-20 08:15:06 +02:00
Rhys Perry
37988b5b8e aco: fix max_waves_per_simd on Polaris, VegaM and GFX10.3
fossil-db (Polaris):
Totals from 20263 (14.75% of 137414) affected shaders:
SGPRs: 871407 -> 871679 (+0.03%); split: -0.00%, +0.03%
VGPRs: 513828 -> 550028 (+7.05%); split: -1.68%, +8.72%
CodeSize: 18869680 -> 18828148 (-0.22%); split: -0.23%, +0.01%
MaxWaves: 162012 -> 162030 (+0.01%); split: +0.01%, -0.00%
Instrs: 3629172 -> 3618817 (-0.29%); split: -0.30%, +0.02%
Cycles: 15682244 -> 15638244 (-0.28%); split: -0.30%, +0.02%
VMEM: 10675942 -> 10673344 (-0.02%); split: +0.18%, -0.21%
SMEM: 1209717 -> 1206088 (-0.30%); split: +0.03%, -0.33%
VClause: 81780 -> 81227 (-0.68%); split: -0.73%, +0.06%
SClause: 231724 -> 231561 (-0.07%); split: -0.07%, +0.00%
Copies: 187126 -> 180831 (-3.36%); split: -3.62%, +0.26%
Branches: 26841 -> 26837 (-0.01%); split: -0.03%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5546>
2020-08-04 20:39:33 +01:00
Daniel Schürmann
2ae27b96ef aco: change live_out variables to std::unordered_set
Improves performance of live_var_analysis for larger shaders

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4130>
2020-04-09 15:08:57 +00:00
Daniel Schürmann
b66f474121 aco: improve speed of live_var_analysis
by merging live_sgprs and live_vgprs sets.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4130>
2020-04-09 15:08:57 +00:00
Samuel Pitoiset
2f424c83e0 aco: only break SMEM clauses if XNACK is enabled (mostly APUs)
According to LLVM, it seems only required for APUs like RAVEN, but
we still ensure that SMEM stores are in their own clause.

pipeline-db (VEGA10):
Totals from affected shaders:
SGPRS: 1775364 -> 1775364 (0.00 %)
VGPRS: 1287176 -> 1287176 (0.00 %)
Spilled SGPRs: 725 -> 725 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 65386620 -> 65107460 (-0.43 %) bytes
Max Waves: 287099 -> 287099 (0.00 %)

pipeline-db (POLARIS10):
Totals from affected shaders:
SGPRS: 1797743 -> 1797743 (0.00 %)
VGPRS: 1271108 -> 1271108 (0.00 %)
Spilled SGPRs: 730 -> 730 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 64046244 -> 63782324 (-0.41 %) bytes
Max Waves: 254875 -> 254875 (0.00 %)

This only affects GFX6-GFX9 chips because the compiler uses a
different pass for GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349>
2020-04-01 17:50:31 +00:00
Timur Kristóf
0f35b3795d aco: Fix workgroup size calculation.
Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.

As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.

Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes

Fixes: a8d15ab6da
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Rhys Perry
1872759f55 aco: add a late kill flag
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
2020-03-16 16:09:02 +00:00
Rhys Perry
c51348bd9b aco: move some register demand helpers into aco_live_var_analysis.cpp
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
2020-03-16 16:09:02 +00:00
Rhys Perry
b088a4b113 aco: only reserve sgprs for vcc if it's used
pipeline-db (Vega):

Totals:
SGPRS: 5186302 -> 5075616 (-2.13 %)
VGPRS: 3704580 -> 3704580 (0.00 %)
Spilled SGPRs: 144859 -> 144859 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 4124 -> 4124 (0.00 %) dwords per thread
Code Size: 247315944 -> 247315944 (0.00 %) bytes
LDS: 1311 -> 1311 (0.00 %) blocks
Max Waves: 674560 -> 674562 (0.00 %)

Totals from affected shaders:
SGPRS: 536992 -> 426306 (-20.61 %)
VGPRS: 356404 -> 356404 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 8498748 -> 8498748 (0.00 %) bytes
LDS: 8 -> 8 (0.00 %) blocks
Max Waves: 113832 -> 113834 (0.00 %)

There are some small code size changes in a few RotTR shaders and a small
increase in max_waves in two Detroit: Become Human shaders.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3906>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3906>
2020-03-05 20:18:34 +00:00
Rhys Perry
4e83e05e62 aco: error when block has no logical preds but VGPRs are live at the start
This would have caught the liveness error fixed in the previous commit.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
2020-01-29 18:02:27 +00:00
Rhys Perry
3f96a1ed86 aco: fix operand kill flags when a temporary is used more than once
Helps create v_mac_f32 from v_mad_f32(b, a, b)

Totals from affected shaders:
SGPRS: 35824 -> 35824 (0.00 %)
VGPRS: 33460 -> 33456 (-0.01 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2187264 -> 2180976 (-0.29 %) bytes
LDS: 127 -> 127 (0.00 %) blocks
Max Waves: 3802 -> 3802 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3486>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3486>
2020-01-22 15:55:00 +00:00
Rhys Perry
b5c9688516 aco: limit register usage for large work groups
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-10 12:10:37 +00:00
Rhys Perry
afe1a8ff5b aco: fix vgpr alloc granule with wave32
We still need to increase the number of physical vgprs

Totals from affected shaders:
SGPRS: 671976 -> 675288 (0.49 %)
VGPRS: 550112 -> 562596 (2.27 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 27621660 -> 27606532 (-0.05 %) bytes
Max Waves: 81083 -> 87833 (8.32 %)
Instructions: 5391560 -> 5389031 (-0.05 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-21 12:38:42 +01:00
Timur Kristóf
e0bcefc3a0 aco/wave32: Use lane mask regclass for exec/vcc.
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Daniel Schürmann
78bca0d0ce aco: improve live variable analysis
This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-30 19:48:32 +00:00