Georg Lehmann
87d13ee73d
aco: combine a | ~b to bfi(b, a, -1)
...
Somehow I missed this when writing the a & ~b patch.
Foz-DB Navi21:
Totals from 1591 (1.20% of 132657) affected shaders:
Instrs: 2316379 -> 2315940 (-0.02%)
CodeSize: 12524240 -> 12528724 (+0.04%); split: -0.00%, +0.04%
Latency: 45393195 -> 45389285 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 8658991 -> 8657944 (-0.01%); split: -0.01%, +0.00%
Copies: 135777 -> 135778 (+0.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24505 >
2023-08-23 20:06:49 +00:00
Rhys Perry
9169fbf83c
aco: clarify bpermute pseudo opcode names
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24693 >
2023-08-23 12:36:46 +01:00
Georg Lehmann
1659d982c3
aco: combine a & ~b to bfi(b, 0, a)
...
Foz-DB Navi21:
Totals from 905 (0.68% of 132657) affected shaders:
Instrs: 1223583 -> 1221016 (-0.21%); split: -0.22%, +0.01%
CodeSize: 6567272 -> 6567064 (-0.00%); split: -0.04%, +0.03%
SpillSGPRs: 1231 -> 1223 (-0.65%)
SpillVGPRs: 829 -> 823 (-0.72%); split: -1.45%, +0.72%
Latency: 40952209 -> 40946230 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 9411929 -> 9397932 (-0.15%); split: -0.17%, +0.02%
VClause: 29108 -> 29112 (+0.01%); split: -0.04%, +0.05%
Copies: 105272 -> 105221 (-0.05%); split: -0.28%, +0.23%
Branches: 29330 -> 29329 (-0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24383 >
2023-08-01 07:15:28 +00:00
Georg Lehmann
5099137612
aco/optimizer: delete s_bitcmp optimization
...
This is now done in NIR.
No Foz-DB changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23298 >
2023-06-29 13:39:30 +00:00
Timur Kristóf
05928f4200
aco: Use ac_hw_stage instead of aco-specific HWStage.
...
The new ac_hw_stage is going to be used by drivers as well.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597 >
2023-06-23 12:49:04 +00:00
Georg Lehmann
71d30bcede
aco: combine scalar mul+pk_add to pk_fma
...
Foz-DB Navi21:
Totals from 12 (0.01% of 134913) affected shaders:
CodeSize: 37860 -> 37668 (-0.51%)
Instrs: 6757 -> 6733 (-0.36%)
Latency: 25632 -> 25589 (-0.17%)
InvThroughput: 2637 -> 2622 (-0.57%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21819 >
2023-06-20 14:29:24 +00:00
Georg Lehmann
6db61d0dc0
aco: use uses helpers for pk_fma opt
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21819 >
2023-06-20 14:29:24 +00:00
Eric Engestrom
6b21653ab4
aco: reformat according to its .clang-format
...
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23253 >
2023-06-16 19:59:52 +00:00
Georg Lehmann
dfb6d3e443
aco: use v_fma_mix for f2f32 and f2f16 on gfx11 if wave64
...
v_fma_mix can be dual issued, trade some code size for throughput.
Foz-DB GFX1100:
Totals from 8204 (6.08% of 134864) affected shaders:
CodeSize: 89608584 -> 89693968 (+0.10%)
Latency: 160744811 -> 160699309 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 19737977 -> 19678308 (-0.30%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21402 >
2023-06-07 12:30:11 +00:00
Georg Lehmann
177dba62a1
aco: use v_add_f{16,32} with clamp for fsat
...
v_add can be dual issued on gfx11, v_med3 cannot.
Don't use v_add directly to still optimize omod(fsat(x)).
Foz-DB GFX1100:
Totals from 32702 (24.24% of 134913) affected shaders:
Latency: 475008203 -> 474928037 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 59226198 -> 59140787 (-0.14%); split: -0.14%, +0.00%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21402 >
2023-06-07 12:30:11 +00:00
Georg Lehmann
a7cef01db1
aco/optimizer: allow DPP to use VOP3 on GFX11
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23059 >
2023-05-18 22:57:24 +00:00
Georg Lehmann
476149f90d
aco: use can_use_input_modifiers helper
...
Foz-DB GFX1100:
Totals from 80 (0.06% of 132657) affected shaders:
CodeSize: 504500 -> 503660 (-0.17%)
Instrs: 95033 -> 94824 (-0.22%)
Latency: 629695 -> 629235 (-0.07%)
InvThroughput: 97105 -> 97008 (-0.10%)
VClause: 1779 -> 1777 (-0.11%)
Copies: 3233 -> 3236 (+0.09%); split: -0.03%, +0.12%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23059 >
2023-05-18 22:57:24 +00:00
Georg Lehmann
644c5e95a0
aco: use get_operand_size for dpp opt
...
This matters now that v_fma_mixlo_f16/v_fma_mixhi_f16 can use dpp.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23059 >
2023-05-18 22:57:24 +00:00
Georg Lehmann
12b28d64ab
aco/gfx11: use fmamk/fmaak with opsel
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23059 >
2023-05-18 22:57:23 +00:00
Georg Lehmann
6a53af3fc8
aco: introduce helper to swap valu operands with modifiers
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23059 >
2023-05-18 22:57:23 +00:00
Georg Lehmann
04976beac7
aco: don't apply dpp if the alu instr uses the operand twice
...
CP77 has a ton of fma(dpp(a), dpp(a), b).
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22698 >
2023-05-12 13:31:16 +00:00
Georg Lehmann
151bcc1e8b
aco: use VOP3+DPP
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22698 >
2023-05-12 13:31:16 +00:00
Georg Lehmann
d0e73cb313
aco/optimizer: copy pass flags for newly created valu instructions
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22698 >
2023-05-12 13:31:15 +00:00
Georg Lehmann
d27e03d719
aco/optimizer: don't use pass_flags for mad idx
...
fma can use DPP on GFX11+, so we want to keep the exec id in pass_flags
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22698 >
2023-05-12 13:31:15 +00:00
Timur Kristóf
38447b3f63
aco: Disallow constant propagation on SOPP and fixed operands.
...
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690 >
2023-05-04 19:08:58 +00:00
Rhys Perry
1a6095b36e
aco: remove SMEM_instruction::prevent_overflow
...
This doesn't seem useful anymore, and it seems we forgot to set it in a
few places.
This commit changes the behaviour of the optimizer so that
prevent_overflow is always true.
fossil-db (navi21):
Totals from 7421 (5.47% of 135636) affected shaders:
Instrs: 5402823 -> 5440126 (+0.69%); split: -0.00%, +0.69%
CodeSize: 28731300 -> 28974152 (+0.85%); split: -0.00%, +0.85%
VGPRs: 317528 -> 317552 (+0.01%)
SpillSGPRs: 419 -> 415 (-0.95%)
Latency: 40712478 -> 40783115 (+0.17%); split: -0.01%, +0.19%
InvThroughput: 7612708 -> 7616751 (+0.05%); split: -0.00%, +0.06%
VClause: 123824 -> 123848 (+0.02%); split: -0.09%, +0.11%
SClause: 161915 -> 172741 (+6.69%); split: -0.03%, +6.71%
Copies: 393015 -> 394429 (+0.36%); split: -0.20%, +0.56%
PreSGPRs: 288658 -> 289603 (+0.33%); split: -0.04%, +0.36%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8864
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22553 >
2023-04-19 19:29:48 +00:00
Harri Nieminen
aea48a4ff1
amd: fix typos
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22432 >
2023-04-13 23:08:22 +00:00
Timur Kristóf
cff02468c6
aco: Fix optimization of v_cmp with subgroup invocation.
...
There was a typo in this optimization which went unnoticed.
Fixes: 2c40215ab9
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22393 >
2023-04-10 19:15:27 +00:00
Rhys Perry
0f60c18f29
aco: don't optimize s_or_b64(v_cmp_u_f32(a, b), cmp(a, a))
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22214 >
2023-03-31 19:41:54 +00:00
Georg Lehmann
bb7c2b70c1
aco/optimizer: remove to_SDWA
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
e699a4181c
aco: keep label_mul/usedef/minmax in apply_extract
...
16bit int mad/fma/minmax combining can work with opsel set.
All other optimizations should already check if the instruction uses sdwa,
because we don't check this when applying the label initially.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
7014145ab2
aco/optimizer: use opsel for VOP12C
...
Foz-DB GFX1100:
Totals from 11759 (8.72% of 134864) affected shaders:
VGPRs: 848288 -> 844556 (-0.44%); split: -0.44%, +0.00%
SpillSGPRs: 8527 -> 8543 (+0.19%)
SpillVGPRs: 1411 -> 1423 (+0.85%); split: -0.21%, +1.06%
CodeSize: 114337120 -> 113882472 (-0.40%); split: -0.40%, +0.01%
Scratch: 128768 -> 129024 (+0.20%); split: -0.20%, +0.40%
MaxWaves: 250962 -> 252014 (+0.42%)
Instrs: 22187426 -> 22062378 (-0.56%); split: -0.57%, +0.00%
Latency: 232655375 -> 232376977 (-0.12%); split: -0.20%, +0.08%
InvThroughput: 28292530 -> 28217699 (-0.26%); split: -0.45%, +0.18%
VClause: 352463 -> 352364 (-0.03%); split: -0.12%, +0.10%
SClause: 659282 -> 659354 (+0.01%); split: -0.02%, +0.04%
Copies: 1371369 -> 1342340 (-2.12%); split: -2.30%, +0.19%
Branches: 495903 -> 495941 (+0.01%); split: -0.00%, +0.01%
PreSGPRs: 867295 -> 863664 (-0.42%)
PreVGPRs: 793480 -> 790549 (-0.37%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
3907c54443
aco: don't label mul with opsel as abs/neg
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
a60b9313d3
aco: swap opsel when swapping VOP2/C operands
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
82f7b3acfa
aco: support neg(mul)/abs(mul) optimization in more cases
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
9d841507e1
aco: support v_cvt_f32_f16 with opsel in combine_mad_mix
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
9d6e223a7a
aco: update match_op3_for_vop3 for VOP12C opsel
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
0896ecec9a
aco: handle opsel in combine_constant_comparison_ordering
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
d8f07a0ddc
aco: handle opsel in combine_ordering_test
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
4db43415e5
aco: handle opsel in combine_comparison_ordering
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
8e6d79d10d
aco/optimizer: preserve opsel when fusing fma
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069 >
2023-03-30 03:34:34 +00:00
Georg Lehmann
fd3ea4ffc2
aco: clean up to_mad_mix
...
These instructions are 32bit, so they don't support opsel anyway.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22103 >
2023-03-28 23:30:08 +00:00
Georg Lehmann
b1668aedaf
aco: don't check usesModifiers for pseudo instructions
...
This can't happen.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22106 >
2023-03-27 14:22:07 +00:00
Timur Kristóf
b3933ffe60
aco: Don't add soffset to swizzled MUBUF base.
...
No Fossil DB changes on Rembrandt (GFX10.3).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21930 >
2023-03-17 00:34:20 +00:00
Georg Lehmann
13ff4a5f64
aco: use bitfield_array for temporary neg/abs/opsel
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766 >
2023-03-09 14:15:14 +00:00
Georg Lehmann
828aff2a2d
aco: use array indexing for opsel/opsel_lo/opsel_hi
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766 >
2023-03-09 14:15:13 +00:00
Georg Lehmann
a47c3f84fb
aco: use integer access for neg_lo/neg_hi
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766 >
2023-03-09 14:15:13 +00:00
Georg Lehmann
60cd3ba39f
aco: copy abs/neg with assignment
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766 >
2023-03-09 14:15:13 +00:00
Georg Lehmann
0614c2e8bd
aco: don't reallocate fma{mk,ak,_mix} instruction
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21762 >
2023-03-08 18:42:21 +00:00
Georg Lehmann
a4873071e6
aco/optimizer: don't reallocate instruction when converting to VOP3
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21762 >
2023-03-08 18:42:21 +00:00
Georg Lehmann
de4805f25f
aco: use bitfield array helpers for valu modifiers
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023 >
2023-03-07 11:53:23 +00:00
Georg Lehmann
097a97cc42
aco: remove VOP[123C]P? structs
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023 >
2023-03-07 11:53:23 +00:00
Georg Lehmann
08542318e7
aco/optimizer: simplify using VALU instruction
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023 >
2023-03-07 11:53:23 +00:00
Georg Lehmann
ede0630f9e
aco: use v_fma_mix_f32 for v_fma_f32 with 2 fp16 representable, different literals
...
We can pack two fp16 literals into one 32bit literal and use opsel to select
the correct value. Note that LLVM currently disassembles these instructions
incorrectly.
Foz-DB Navi21:
Totals from 13365 (9.91% of 134913) affected shaders:
VGPRs: 840880 -> 840016 (-0.10%); split: -0.11%, +0.01%
SpillSGPRs: 724 -> 722 (-0.28%)
CodeSize: 82439364 -> 82451336 (+0.01%); split: -0.06%, +0.08%
MaxWaves: 244858 -> 244980 (+0.05%)
Instrs: 15265976 -> 15247201 (-0.12%); split: -0.13%, +0.01%
Latency: 223316180 -> 223272495 (-0.02%); split: -0.03%, +0.02%
InvThroughput: 41981375 -> 41969917 (-0.03%); split: -0.04%, +0.01%
VClause: 266775 -> 266558 (-0.08%); split: -0.14%, +0.06%
SClause: 646602 -> 645996 (-0.09%); split: -0.16%, +0.07%
Copies: 794703 -> 776075 (-2.34%); split: -2.46%, +0.12%
Branches: 296317 -> 296316 (-0.00%)
PreSGPRs: 658796 -> 656479 (-0.35%); split: -0.35%, +0.00%
PreVGPRs: 744014 -> 743679 (-0.05%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587 >
2023-03-02 10:59:05 +00:00
Georg Lehmann
ed349951cb
aco: mark mad definition as precise if the mul/add were precise
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587 >
2023-03-02 10:59:05 +00:00