Rhys Perry
9934c86761
aco: use v_fma_mix to combine mul/add/fma input conversions
...
fossil-db (Sienna Cichlid):
Totals from 11558 (8.57% of 134913) affected shaders:
VGPRs: 829392 -> 825200 (-0.51%); split: -0.52%, +0.02%
SpillSGPRs: 7845 -> 8399 (+7.06%)
CodeSize: 101822704 -> 101677172 (-0.14%); split: -0.25%, +0.11%
MaxWaves: 172216 -> 173182 (+0.56%); split: +0.59%, -0.03%
Instrs: 19061343 -> 18883450 (-0.93%); split: -0.93%, +0.00%
Latency: 256011590 -> 255177378 (-0.33%); split: -0.39%, +0.06%
InvThroughput: 46104438 -> 45604059 (-1.09%); split: -1.12%, +0.04%
VClause: 352211 -> 351948 (-0.07%); split: -0.21%, +0.13%
SClause: 676506 -> 676961 (+0.07%); split: -0.04%, +0.11%
Copies: 1246571 -> 1237745 (-0.71%); split: -0.97%, +0.26%
Branches: 626229 -> 626241 (+0.00%); split: -0.02%, +0.03%
PreSGPRs: 882176 -> 888853 (+0.76%); split: -0.00%, +0.76%
PreVGPRs: 796705 -> 792304 (-0.55%); split: -0.56%, +0.00%
fossil-db (Navi):
Totals from 11558 (8.57% of 134913) affected shaders:
VGPRs: 803900 -> 798660 (-0.65%); split: -0.73%, +0.08%
SpillSGPRs: 7894 -> 8492 (+7.58%); split: -0.10%, +7.68%
CodeSize: 96892596 -> 97134716 (+0.25%); split: -0.05%, +0.29%
MaxWaves: 181454 -> 183014 (+0.86%); split: +0.94%, -0.08%
Instrs: 18186813 -> 18093994 (-0.51%); split: -0.56%, +0.05%
Latency: 253385909 -> 253325528 (-0.02%); split: -0.15%, +0.12%
InvThroughput: 43315355 -> 42805541 (-1.18%); split: -1.33%, +0.15%
VClause: 338755 -> 338535 (-0.06%); split: -0.16%, +0.10%
SClause: 656561 -> 656829 (+0.04%); split: -0.07%, +0.11%
Copies: 1162235 -> 1153558 (-0.75%); split: -1.07%, +0.32%
Branches: 588536 -> 588542 (+0.00%); split: -0.03%, +0.03%
PreSGPRs: 854849 -> 861640 (+0.79%); split: -0.00%, +0.80%
PreVGPRs: 783401 -> 779031 (-0.56%); split: -0.56%, +0.00%
fossil-db (Vega):
Totals from 11516 (8.53% of 135048) affected shaders:
SGPRs: 1072128 -> 1076288 (+0.39%); split: -0.01%, +0.40%
VGPRs: 821312 -> 818124 (-0.39%); split: -0.43%, +0.04%
SpillSGPRs: 11952 -> 12677 (+6.07%)
CodeSize: 96378496 -> 96707596 (+0.34%); split: -0.04%, +0.38%
MaxWaves: 42614 -> 42883 (+0.63%); split: +0.68%, -0.04%
Instrs: 18672844 -> 18600274 (-0.39%); split: -0.44%, +0.05%
Latency: 296658786 -> 296338296 (-0.11%); split: -0.21%, +0.10%
InvThroughput: 111665547 -> 111283559 (-0.34%); split: -0.40%, +0.06%
VClause: 343001 -> 342826 (-0.05%); split: -0.14%, +0.09%
SClause: 646684 -> 646657 (-0.00%); split: -0.05%, +0.04%
Copies: 1715316 -> 1712895 (-0.14%); split: -0.53%, +0.39%
PreSGPRs: 850737 -> 856543 (+0.68%); split: -0.04%, +0.72%
PreVGPRs: 775293 -> 772215 (-0.40%); split: -0.41%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769 >
2022-03-17 19:04:17 +00:00
Rhys Perry
eeef1bbe65
aco: refactor selection of mad/fma
...
In the future, whether we need to use fma will depend on which
multiplication is chosen.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769 >
2022-03-17 19:04:17 +00:00
Rhys Perry
6b1dfa7eac
aco: fix neg(mul)/abs(mul) optimization with different bit-size
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14810 >
2022-02-03 16:02:04 +00:00
Rhys Perry
13bbc7c882
aco: don't combine add/mul of different bit-size
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14810 >
2022-02-03 16:02:04 +00:00
Rhys Perry
3d8a8c6fc1
aco: don't apply omod/clamp of different bit-size
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14810 >
2022-02-03 16:02:04 +00:00
Rhys Perry
7e30f99b0a
aco: don't combine fneg/fabs of different bit-size
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14810 >
2022-02-03 16:02:04 +00:00
Rhys Perry
16e0c312fa
aco: preserve pass_flags during format conversions
...
This helps the "vopc() & exec" optimization.
fossil-db (Sienna Cichlid):
Totals from 1638 (1.21% of 134913) affected shaders:
CodeSize: 3331804 -> 3327520 (-0.13%); split: -0.19%, +0.06%
Instrs: 611807 -> 610096 (-0.28%)
Latency: 5579326 -> 5574874 (-0.08%)
InvThroughput: 936782 -> 936731 (-0.01%); split: -0.01%, +0.00%
Copies: 43324 -> 43302 (-0.05%); split: -0.06%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14773 >
2022-01-31 13:45:01 +00:00
Rhys Perry
1804c21fb5
aco: optimize abs(mul(a, b))
...
fossil-db (Sienna Cichlid):
Totals from 18 (0.01% of 134913) affected shaders:
CodeSize: 173924 -> 173852 (-0.04%)
Instrs: 33864 -> 33846 (-0.05%)
Latency: 122233 -> 122211 (-0.02%)
InvThroughput: 22482 -> 22462 (-0.09%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14773 >
2022-01-31 13:45:01 +00:00
Rhys Perry
452975f257
aco: fix neg(abs(mul(a, b))) if the mul is not VOP3
...
Previously, is_abs was just ignored if mul_instr->isVOP3()==false.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa78 ("aco: Initial commit of independent AMD compiler")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14773 >
2022-01-31 13:45:01 +00:00
Rhys Perry
43e32ad074
aco: consider legacy multiplications in optimizer
...
Optimize omod, -(a*b), b2f(a)*b, a*1, a*0 and create MAD/FMA.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436 >
2022-01-20 22:54:42 +00:00
Tatsuyuki Ishi
da0412e55b
aco: support DPP8
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13971 >
2021-12-31 20:56:39 +00:00
Daniel Schürmann
91f17e1c73
aco/optimizer: apply extract from subdword p_split_vector
...
Totals from 1345 (1.00% of 134572) affected shaders: (GFX10.3)
VGPRs: 76752 -> 76744 (-0.01%); split: -0.02%, +0.01%
SpillSGPRs: 1459 -> 1460 (+0.07%)
SpillVGPRs: 1776 -> 1784 (+0.45%); split: -0.39%, +0.84%
CodeSize: 13310964 -> 13309420 (-0.01%); split: -0.06%, +0.05%
Scratch: 178176 -> 179200 (+0.57%)
Instrs: 2516874 -> 2516860 (-0.00%); split: -0.05%, +0.05%
Latency: 23228506 -> 23230338 (+0.01%); split: -0.14%, +0.15%
InvThroughput: 6002384 -> 6000158 (-0.04%); split: -0.24%, +0.21%
VClause: 41115 -> 41117 (+0.00%); split: -0.28%, +0.29%
SClause: 104639 -> 104664 (+0.02%); split: -0.07%, +0.09%
Copies: 185121 -> 184862 (-0.14%); split: -0.69%, +0.55%
Branches: 100740 -> 100735 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 70119 -> 69968 (-0.22%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13576 >
2021-12-31 14:52:14 +00:00
Daniel Schürmann
fb622775b5
aco/optimizer: optimize extract(extract())
...
Totals from 53 (0.04% of 134572) affected shaders: (GFX10.3)
SpillVGPRs: 1780 -> 1776 (-0.22%); split: -0.34%, +0.11%
CodeSize: 968352 -> 963196 (-0.53%); split: -0.55%, +0.02%
Scratch: 180224 -> 178176 (-1.14%)
Instrs: 169800 -> 169158 (-0.38%); split: -0.39%, +0.01%
Latency: 6186064 -> 6141408 (-0.72%); split: -1.16%, +0.44%
InvThroughput: 2605044 -> 2582967 (-0.85%); split: -1.37%, +0.52%
VClause: 4851 -> 4866 (+0.31%); split: -0.16%, +0.47%
SClause: 1744 -> 1746 (+0.11%)
Copies: 42874 -> 42325 (-1.28%); split: -1.40%, +0.12%
Branches: 5762 -> 5765 (+0.05%); split: -0.02%, +0.07%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13576 >
2021-12-31 14:52:14 +00:00
Daniel Schürmann
5ad9c20d4a
aco/optimizer: apply extract from p_extract_vector
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13576 >
2021-12-31 14:52:14 +00:00
Daniel Schürmann
11712729eb
aco/optimizer: keep instr_mod_labels after applying extract
...
This allows to use clamp on SDWA and VOP3 opsel instructions.
Totals from 32 (0.02% of 134572) affected shaders: (GFX10.3)
SpillVGPRs: 1783 -> 1780 (-0.17%); split: -0.50%, +0.34%
CodeSize: 881480 -> 881496 (+0.00%); split: -0.15%, +0.15%
Instrs: 154400 -> 154388 (-0.01%); split: -0.13%, +0.12%
Latency: 5021791 -> 5033485 (+0.23%); split: -0.67%, +0.90%
InvThroughput: 2486454 -> 2492312 (+0.24%); split: -0.67%, +0.91%
VClause: 4763 -> 4755 (-0.17%); split: -0.52%, +0.36%
Copies: 42866 -> 42965 (+0.23%); split: -0.25%, +0.48%
Branches: 5640 -> 5639 (-0.02%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13576 >
2021-12-31 14:52:14 +00:00
Daniel Schürmann
30a7199e37
aco/optimizer: propagate and fold inline constants on VOP3P instructions
...
This patch aims to propagate and fold constants on VOP3P instructions
by using omod selection and the fneg modifier.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688 >
2021-12-21 13:23:36 +01:00
Daniel Schürmann
62bcfcd0a8
aco: change fneg for VOP3P to use fmul with +1.0
...
This will be useful to be able to also apply
fneg_lo and fneg_hi.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688 >
2021-12-21 13:23:36 +01:00
Daniel Schürmann
193bd740ab
aco/optimizer: fix fneg modifier propagation on VOP3P
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688 >
2021-12-21 13:23:36 +01:00
Rhys Perry
fa4e08112e
aco: remove SMEM constant/addition combining out of the loop
...
There's no reason for this optimization to be in this loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13755 >
2021-12-17 22:14:36 +00:00
Rhys Perry
dd18925f86
aco: skip &-4 before SMEM
...
The hardware ignores the low 2 bits. I'm not sure if they are ignored
before or after the address is calculated, but this optimization should be
cautious enough.
fossil-db (Sienna Cichlid):
Totals from 259 (0.19% of 134572) affected shaders:
SpillSGPRs: 1381 -> 1382 (+0.07%)
SpillVGPRs: 1783 -> 1782 (-0.06%); split: -0.67%, +0.62%
CodeSize: 1598612 -> 1596084 (-0.16%); split: -0.30%, +0.14%
Scratch: 180224 -> 179200 (-0.57%); split: -1.14%, +0.57%
Instrs: 284885 -> 284268 (-0.22%); split: -0.34%, +0.12%
Latency: 6585634 -> 6603388 (+0.27%); split: -0.48%, +0.75%
InvThroughput: 2638983 -> 2648474 (+0.36%); split: -0.58%, +0.94%
VClause: 6797 -> 6820 (+0.34%); split: -0.15%, +0.49%
SClause: 6569 -> 6574 (+0.08%); split: -1.11%, +1.19%
Copies: 50561 -> 50586 (+0.05%); split: -0.61%, +0.66%
Branches: 10058 -> 10062 (+0.04%); split: -0.01%, +0.05%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13755 >
2021-12-17 22:14:36 +00:00
Rhys Perry
cf5fc4b973
aco: disallow SMEM offsets that are not multiples of 4
...
These can't be encoded on GFX6/7, and combining these additions causes
CTS failures on GFX10.3.
I think the low 2 MSBs are ignored before the addition, not after, so
load(a + 3, 0) becomes load(a, 3), which is the same as load(a, 0).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13755 >
2021-12-17 22:14:36 +00:00
Rhys Perry
94603786c5
aco: fix check_vop3_operands() for f16vec2 ffma fneg combine
...
For v_pk_fma_f16, we should consider all three operands, not the first
two.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 15a375b4c8 ("radv,aco: don't lower some ffma instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14229 >
2021-12-17 11:16:12 +00:00
Rhys Perry
165ca5088b
radv,aco: implement nir_op_ffma
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9805 >
2021-12-13 11:22:33 +00:00
Rhys Perry
f4f5d577fc
aco: swap operands if necessary to create v_madak/v_fmaak
...
Also rewrite the check_literal logic to be more straightforward.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9805 >
2021-12-13 11:22:33 +00:00
Rhys Perry
2665320c78
aco: create v_fmamk_f32/v_fmaak_f32 from nir_op_ffma
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9805 >
2021-12-13 11:22:33 +00:00
Rhys Perry
a487747ebd
aco: use more predictable tiebreaker when forming MADs
...
fossil-db (GFX10.3):
Totals from 84981 (58.10% of 146267) affected shaders:
VGPRs: 3829896 -> 3820480 (-0.25%); split: -0.33%, +0.08%
CodeSize: 270860472 -> 270850132 (-0.00%); split: -0.08%, +0.08%
MaxWaves: 2035822 -> 2042516 (+0.33%); split: +0.39%, -0.06%
Instrs: 51285526 -> 51308869 (+0.05%); split: -0.03%, +0.08%
Latency: 931503706 -> 932556231 (+0.11%); split: -0.19%, +0.30%
InvThroughput: 217084232 -> 217070849 (-0.01%); split: -0.12%, +0.11%
fossil-db (GFX10):
Totals from 85520 (58.47% of 146267) affected shaders:
VGPRs: 3729132 -> 3725344 (-0.10%); split: -0.21%, +0.10%
CodeSize: 272796500 -> 272783084 (-0.00%); split: -0.09%, +0.08%
MaxWaves: 2246410 -> 2249012 (+0.12%); split: +0.17%, -0.05%
Instrs: 51643962 -> 51664865 (+0.04%); split: -0.04%, +0.08%
Latency: 932331949 -> 933274979 (+0.10%); split: -0.19%, +0.29%
InvThroughput: 214187040 -> 214130994 (-0.03%); split: -0.13%, +0.11%
fossil-db (GFX9):
Totals from 84619 (57.80% of 146401) affected shaders:
SGPRs: 5366240 -> 5366944 (+0.01%); split: -0.09%, +0.10%
VGPRs: 3765608 -> 3764972 (-0.02%); split: -0.23%, +0.22%
CodeSize: 263634732 -> 263616320 (-0.01%); split: -0.08%, +0.08%
MaxWaves: 546617 -> 547091 (+0.09%); split: +0.18%, -0.09%
Instrs: 51426195 -> 51458334 (+0.06%); split: -0.03%, +0.10%
Latency: 1164445660 -> 1161923480 (-0.22%); split: -0.46%, +0.24%
InvThroughput: 542964697 -> 542329595 (-0.12%); split: -0.26%, +0.14%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9805 >
2021-12-13 11:22:33 +00:00
Rhys Perry
65a78b2252
aco: properly update use counts if a extract is still used
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13909 >
2021-11-29 18:52:12 +00:00
Samuel Pitoiset
add883bf9b
aco: fix right shift of exponent 32 detected by UBSAN
...
src/amd/compiler/aco_optimizer.cpp:1316:17: runtime error: shift
exponent 32 is too large for 32-bit type 'unsigned int'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13951 >
2021-11-25 16:15:30 +00:00
Rhys Perry
9bc0fc89c8
aco: disable mul(cndmask(0, 1, b), a) optimization sometimes
...
This optimization doesn't work for SDWA or DPP multiplications and we
can't do it if denormal flushing is required because v_cndmask_b32 doesn't
do that and we can't do it if we can't assume operands are finite because
0.0 * inf is NaN, not 0.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13434 >
2021-10-20 07:35:52 +00:00
Timur Kristóf
ca6ef505ff
aco/optimizer: Skip SDWA on v_lshlrev when unnecessary in apply_extract.
...
In the following cases:
- lower 16 bits are extracted and the shift amount is 16 or more
- lower 8 bits are extracted and the shift amount is 24 or more
the undesireable upper bits are already shifted out, and therefore
there is no need to add SDWA to the v_lshlrev instruction.
Fossil DB stats on Sienna Cichlid with NGGC on:
Totals from 58239 (45.27% of 128647) affected shaders:
CodeSize: 153498624 -> 153265616 (-0.15%); split: -0.15%, +0.00%
Instrs: 29636304 -> 29578064 (-0.20%); split: -0.20%, +0.00%
Latency: 136931496 -> 136876379 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 21134367 -> 21078861 (-0.26%); split: -0.26%, +0.00%
Copies: 2777550 -> 2777548 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13121 >
2021-10-12 16:27:50 +00:00
Daniel Schürmann
40a93e271c
aco: clang-format
...
No changes, just formatting.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13087 >
2021-09-28 19:48:00 +00:00
Timur Kristóf
d3e0cf3d32
aco: Omit p_extract after ds_read with matching bit size.
...
Fossil DB stats on Sienna Cichlid:
Totals from 135 (0.10% of 128647) affected shaders:
CodeSize: 525184 -> 523704 (-0.28%)
Instrs: 92835 -> 92684 (-0.16%)
Latency: 311528 -> 311055 (-0.15%)
InvThroughput: 86572 -> 86455 (-0.14%)
Copies: 7666 -> 7650 (-0.21%)
Fossil DB stats on Sienna Cichlid with NGGC on:
Totals from 58374 (45.38% of 128647) affected shaders:
CodeSize: 160322912 -> 159622564 (-0.44%)
Instrs: 30755822 -> 30639193 (-0.38%)
Latency: 136713768 -> 136690360 (-0.02%)
InvThroughput: 21739219 -> 21658151 (-0.37%)
Copies: 3297969 -> 3297953 (-0.00%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11560 >
2021-09-28 17:59:27 +00:00
Timur Kristóf
f2e41eda9e
aco: Add ability to optimize v_lshl + v_sub into v_mad_i32_i24.
...
Also change combine_add_lshl to use check_vop3_operands instead
of its own checks of the operands.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12786 >
2021-09-20 12:39:03 +02:00
Filip Gawin
6b5e9352ef
aco: cleanup assignment of unique_ptrs
...
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12903 >
2021-09-18 11:09:24 +00:00
Daniel Schürmann
0988f7b9ba
aco: remove explicit dst_preserve flag
...
Instead, we can rely on the fact that subdword definitions
must preserve the unused bits while dword definitions either
pad or sign-extend.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640 >
2021-09-02 20:39:17 +02:00
Daniel Schürmann
9e3ff06c38
aco: rewrite SDWA selector
...
This commit introduces a new struct SubdwordSel
in order to ease and clean up the usage of SDWA
selections. This includes removing the distinction
between register-allocated and fixed SDWA selections.
Instead, SDWA selections can now also access the high
bits of subdword variables. Alignment and sizes are
validated accordingly. Size, offset and sign_extend
can be evaluated via helper methods.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640 >
2021-09-02 20:39:17 +02:00
Rhys Perry
33ddbd220f
aco: remove DPP when applying constants/literals/sgprs
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12601 >
2021-08-31 16:58:20 +00:00
Rhys Perry
e27946ca11
aco: don't constant propagate to DPP instructions
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12601 >
2021-08-31 16:58:20 +00:00
Timur Kristóf
76b9dd6266
aco: Unset 16 and 24-bit flags from operands in apply_extract.
...
Consider the following sequence in a shader:
b = p_extract a
c = v_mad_u32_u16 b, X, 0
The optimizer applies extract, resulting in:
c = v_mad_u32_u16 a, X, 0 (correct)
Then it mistakenly turns that into:
c = v_mul_u32_u24 a, X, 0 (incorrect)
In this case, the p_extract is applied to v_mad_u32_u16 by
apply_extract. After this, we can no longer be sure that
the operands are still 16 or 24-bit, so we have to remove
this flag.
No Fossil DB changes.
Fixes: 54292e99c7
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558 >
2021-08-30 14:05:33 +00:00
Daniel Schürmann
2eeaaabb8e
aco/optimizer: combine v_pk_mul_u16 + v_pk_add_u16 -> v_pk_mad_u16
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678 >
2021-08-27 19:57:59 +00:00
Daniel Schürmann
be16ebc5ca
aco/optimizer: fuse v_mul_f64 + v_add_f64 -> v_fma_f64
...
No fossil-db changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678 >
2021-08-27 19:57:59 +00:00
Daniel Schürmann
8e27ca9953
aco/optimizer: combine v_mul_lo_u16 + v_add_u16 -> v_mad_u16
...
Totals from 192 (0.13% of 150170) affected shaders: (GFX10.3)
CodeSize: 1027224 -> 1019872 (-0.72%)
Instrs: 174784 -> 173863 (-0.53%)
Latency: 4235742 -> 4232177 (-0.08%); split: -0.11%, +0.03%
InvThroughput: 1777026 -> 1775945 (-0.06%); split: -0.09%, +0.03%
Copies: 34098 -> 34099 (+0.00%); split: -0.03%, +0.03%
PreVGPRs: 4920 -> 4850 (-1.42%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678 >
2021-08-27 19:57:59 +00:00
Daniel Schürmann
23d5865f42
aco: refactor nir_op_imul selection
...
Previously, the optimization to use v_mul_lo_u16 for
32bit multiplications was done in instruction_selection.
This was moved to the optimizer to ease some case distinctions.
The mixed results are due to increased use of SDWA.
Totals from 2616 (1.74% of 150170) affected shaders: (GFX10.3)
VGPRs: 143888 -> 143872 (-0.01%); split: -0.02%, +0.01%
CodeSize: 5604032 -> 5604080 (+0.00%); split: -0.01%, +0.01%
Instrs: 1086798 -> 1083915 (-0.27%); split: -0.27%, +0.01%
Latency: 8215793 -> 8213023 (-0.03%); split: -0.10%, +0.07%
InvThroughput: 20765157 -> 20773766 (+0.04%); split: -0.02%, +0.06%
VClause: 35256 -> 35260 (+0.01%); split: -0.02%, +0.03%
SClause: 29021 -> 29024 (+0.01%); split: -0.00%, +0.01%
Copies: 74163 -> 74306 (+0.19%); split: -0.05%, +0.24%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678 >
2021-08-27 19:57:59 +00:00
Daniel Schürmann
d8eef134d8
aco: only apply extract if not used more than 4 times
...
Totals from 61 (0.04% of 150170) affected shaders: (GFX10.3)
CodeSize: 1087732 -> 1087380 (-0.03%); split: -0.22%, +0.18%
Instrs: 192343 -> 192205 (-0.07%); split: -0.16%, +0.09%
Latency: 7231670 -> 7148073 (-1.16%); split: -1.19%, +0.04%
InvThroughput: 3436715 -> 3394926 (-1.22%); split: -1.25%, +0.04%
VClause: 4831 -> 4833 (+0.04%)
Copies: 50130 -> 49934 (-0.39%); split: -0.67%, +0.28%
Branches: 5945 -> 5948 (+0.05%)
PreSGPRs: 3486 -> 3472 (-0.40%)
PreVGPRs: 5154 -> 5152 (-0.04%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678 >
2021-08-27 19:57:59 +00:00
Timur Kristóf
abcc83e713
aco: Fix to_uniform_bool_instr when operands are not suitable.
...
Don't attempt to transform uniform boolean instructions when
their operands are unsuitable. This can happen eg. due to other
optimizations that combine SALU instructions which clear out
the uniform instruction labels.
Cc: mesa-stable
Fixes: 8a32f57fff
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573 >
2021-08-25 12:43:50 +00:00
Rhys Perry
2201f5a58c
aco: remove label_extract if the extract is used by a non-VALU
...
If an extract is used by a non-VALU instruction, it can't be applied to
all instructions, so it's not beneficial to try to apply it.
This check isn't needed because can_apply_extract()/can_use_SDWA() should
already handle non-VALU instructions.
fossil-db (Sienna Cichlid):
Totals from 1020 (0.68% of 150170) affected shaders:
SpillSGPRs: 1577 -> 1571 (-0.38%)
CodeSize: 7863668 -> 7858336 (-0.07%); split: -0.07%, +0.00%
Instrs: 1431583 -> 1431083 (-0.03%); split: -0.04%, +0.01%
Latency: 25891250 -> 25890916 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7248683 -> 7248655 (-0.00%); split: -0.01%, +0.01%
SClause: 49072 -> 49071 (-0.00%)
Copies: 126649 -> 126580 (-0.05%); split: -0.11%, +0.06%
Branches: 39129 -> 39120 (-0.02%); split: -0.03%, +0.01%
PreSGPRs: 53071 -> 52943 (-0.24%); split: -0.26%, +0.02%
PreVGPRs: 57437 -> 57435 (-0.00%); split: -0.01%, +0.01%
fossil-db (Polaris10):
Totals from 654 (0.43% of 151696) affected shaders:
CodeSize: 5814552 -> 5811568 (-0.05%); split: -0.05%, +0.00%
Instrs: 1105783 -> 1105049 (-0.07%); split: -0.07%, +0.00%
Latency: 20261458 -> 20259744 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 9011785 -> 9011749 (-0.00%); split: -0.00%, +0.00%
Copies: 104693 -> 103904 (-0.75%); split: -0.76%, +0.00%
PreSGPRs: 36105 -> 36095 (-0.03%); split: -0.03%, +0.01%
PreVGPRs: 43813 -> 43809 (-0.01%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12212 >
2021-08-23 14:56:37 +01:00
Rhys Perry
2e6834d4f6
aco: combine DPP into VALU before RA
...
Mostly helps a bunch of Cyberpunk 2077 shaders. Catches some of the cases
that the post-RA can't optimize because of register assignment.
fossil-db (Siena Cichlid):
Totals from 25 (0.02% of 150170) affected shaders:
CodeSize: 78808 -> 75764 (-3.86%)
Instrs: 14311 -> 13547 (-5.34%)
Latency: 278697 -> 277885 (-0.29%)
InvThroughput: 63428 -> 62754 (-1.06%)
Copies: 1348 -> 1349 (+0.07%); split: -0.07%, +0.15%
PreVGPRs: 1035 -> 1011 (-2.32%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924 >
2021-08-19 18:17:33 +00:00
Rhys Perry
b97cfd72af
aco: handle DPP in the optimizer
...
There are a bunch of optimizations that are broken when DPP is involved.
fossil-db (Sienna Cichlid):
Totals from 100 (0.07% of 150170) affected shaders:
CodeSize: 325204 -> 325192 (-0.00%); split: -0.06%, +0.05%
Instrs: 62773 -> 62664 (-0.17%); split: -0.18%, +0.00%
Latency: 295348 -> 295266 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 73990 -> 73946 (-0.06%); split: -0.06%, +0.01%
Copies: 1650 -> 1609 (-2.48%); split: -2.55%, +0.06%
PreSGPRs: 3554 -> 3520 (-0.96%)
Fossil-db changes are probably because v_sub_f32_dpp(v_mul_f32) is no
longer being combined into MAD and then split back into separate
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924 >
2021-08-19 18:17:33 +00:00
Rhys Perry
1d894a8c85
aco: move a bunch of helpers into aco_ir.h/aco_ir.cpp
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924 >
2021-08-19 18:17:33 +00:00
Rhys Perry
211d1dfd34
aco: don't create v_madmk_f32/v_madak_f32 from v_fma_legacy_f16
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5105
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12004 >
2021-07-22 15:43:31 +00:00