Commit graph

307 commits

Author SHA1 Message Date
Timur Kristóf
b3933ffe60 aco: Don't add soffset to swizzled MUBUF base.
No Fossil DB changes on Rembrandt (GFX10.3).

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21930>
2023-03-17 00:34:20 +00:00
Georg Lehmann
13ff4a5f64 aco: use bitfield_array for temporary neg/abs/opsel
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766>
2023-03-09 14:15:14 +00:00
Georg Lehmann
828aff2a2d aco: use array indexing for opsel/opsel_lo/opsel_hi
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766>
2023-03-09 14:15:13 +00:00
Georg Lehmann
a47c3f84fb aco: use integer access for neg_lo/neg_hi
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766>
2023-03-09 14:15:13 +00:00
Georg Lehmann
60cd3ba39f aco: copy abs/neg with assignment
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21766>
2023-03-09 14:15:13 +00:00
Georg Lehmann
0614c2e8bd aco: don't reallocate fma{mk,ak,_mix} instruction
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21762>
2023-03-08 18:42:21 +00:00
Georg Lehmann
a4873071e6 aco/optimizer: don't reallocate instruction when converting to VOP3
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21762>
2023-03-08 18:42:21 +00:00
Georg Lehmann
de4805f25f aco: use bitfield array helpers for valu modifiers
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
2023-03-07 11:53:23 +00:00
Georg Lehmann
097a97cc42 aco: remove VOP[123C]P? structs
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
2023-03-07 11:53:23 +00:00
Georg Lehmann
08542318e7 aco/optimizer: simplify using VALU instruction
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
2023-03-07 11:53:23 +00:00
Georg Lehmann
ede0630f9e aco: use v_fma_mix_f32 for v_fma_f32 with 2 fp16 representable, different literals
We can pack two fp16 literals into one 32bit literal and use opsel to select
the correct value. Note that LLVM currently disassembles these instructions
incorrectly.

Foz-DB Navi21:
Totals from 13365 (9.91% of 134913) affected shaders:
VGPRs: 840880 -> 840016 (-0.10%); split: -0.11%, +0.01%
SpillSGPRs: 724 -> 722 (-0.28%)
CodeSize: 82439364 -> 82451336 (+0.01%); split: -0.06%, +0.08%
MaxWaves: 244858 -> 244980 (+0.05%)
Instrs: 15265976 -> 15247201 (-0.12%); split: -0.13%, +0.01%
Latency: 223316180 -> 223272495 (-0.02%); split: -0.03%, +0.02%
InvThroughput: 41981375 -> 41969917 (-0.03%); split: -0.04%, +0.01%
VClause: 266775 -> 266558 (-0.08%); split: -0.14%, +0.06%
SClause: 646602 -> 645996 (-0.09%); split: -0.16%, +0.07%
Copies: 794703 -> 776075 (-2.34%); split: -2.46%, +0.12%
Branches: 296317 -> 296316 (-0.00%)
PreSGPRs: 658796 -> 656479 (-0.35%); split: -0.35%, +0.00%
PreVGPRs: 744014 -> 743679 (-0.05%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587>
2023-03-02 10:59:05 +00:00
Georg Lehmann
ed349951cb aco: mark mad definition as precise if the mul/add were precise
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587>
2023-03-02 10:59:05 +00:00
Rhys Perry
ab3184c0a2 aco: don't apply modifiers through DPP to unsupported instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21201>
2023-02-21 14:59:38 +00:00
Georg Lehmann
3bd5b583f9 aco: combine a ^ ~b and ~(a ^ b) to v_xnor_b32
Foz-DB Navi21:
Totals from 13 (0.01% of 134913) affected shaders:
CodeSize: 225432 -> 225180 (-0.11%)
Instrs: 41973 -> 41908 (-0.15%)
Latency: 297464 -> 297326 (-0.05%)
InvThroughput: 82536 -> 82467 (-0.08%)
Copies: 2452 -> 2440 (-0.49%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21410>
2023-02-21 13:35:31 +00:00
Timur Kristóf
2c40215ab9 aco/optimizer: Change v_cmp with subgroup invocation to constant.
When a shader has a comparison with the subgroup invocation id,
we can use a constant instead, saving a VALU instruction.
When the constant can't be represented as a 64-bit literal,
use the s_bfm_b64 instruction to generate it instead, which
is still a win.

Fossil DB stats on GFX11:
Totals from 300 (0.22% of 134913) affected shaders:
CodeSize: 2223052 -> 2214336 (-0.39%); split: -0.43%, +0.04%
Instrs: 430216 -> 429882 (-0.08%); split: -0.14%, +0.06%
Latency: 5881180 -> 5878181 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 731846 -> 729293 (-0.35%)
Copies: 31662 -> 31847 (+0.58%); split: -0.03%, +0.61%
Branches: 8241 -> 8100 (-1.71%)
PreVGPRs: 15788 -> 15786 (-0.01%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20843>
2023-02-18 21:16:58 +01:00
Timur Kristóf
084d10a702 aco: Remove MTBUF zero operand.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21363>
2023-02-16 17:16:34 +00:00
Georg Lehmann
5038a049f1 aco: add mov/cndmask opcodes to does_fp_op_flush_denorms
For completeness sake also add v_mov_b32, even if we don't use imod for it
because it's only supported since gfx10.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21170>
2023-02-08 13:07:46 +00:00
Georg Lehmann
e527f686ca Revert "aco: Combine v_cvt_u32_f32 with insert to v_cvt_pk_u8_f32."
This reverts commit 6d02054047.

v_cvt_pk_u8_f32 returns 0xff instead of v_cvt_u32_f32 & 0xff if the input is
larger than 255.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8128

Cc: mesa-stable
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20829>
2023-01-23 16:22:55 +00:00
Timur Kristóf
faba30a8f3 aco/optimizer: Optimize p_extract + v_mul_u32_u24 to v_mad_u32_u16.
This should perform the same but removes SDWA from the address
calculations in NGG culling shaders for example.

This is done because SDWA is no longer available on GFX11.

Fossil DB stats on GFX1100:
Totals from 36 (0.03% of 134913) affected shaders:
CodeSize: 300968 -> 300884 (-0.03%); split: -0.04%, +0.01%
Instrs: 60955 -> 60863 (-0.15%); split: -0.15%, +0.00%
Latency: 426809 -> 426819 (+0.00%); split: -0.06%, +0.06%
InvThroughput: 39076 -> 39025 (-0.13%); split: -0.14%, +0.01%
VClause: 1440 -> 1443 (+0.21%)
Copies: 5714 -> 5725 (+0.19%)

Fossil DB stats on GFX1100 with NGG culling enabled:
Totals from 60953 (45.18% of 134913) affected shaders:
VGPRs: 2273172 -> 2273160 (-0.00%)
CodeSize: 186401864 -> 186403036 (+0.00%); split: -0.00%, +0.00%
Instrs: 37038048 -> 36977353 (-0.16%); split: -0.16%, +0.00%
Latency: 146466770 -> 146350172 (-0.08%); split: -0.08%, +0.00%
InvThroughput: 15342790 -> 15228585 (-0.74%); split: -0.74%, +0.00%
VClause: 669662 -> 669665 (+0.00%)
Copies: 2972380 -> 2972482 (+0.00%); split: -0.01%, +0.01%

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17924>
2023-01-16 19:27:39 +00:00
Timur Kristóf
171d76ded1 aco/optimizer: Add missing v_lshlrev condition to can_apply_extract.
This was already handled by apply_extract but missing from
can_apply_extract, therefore may not be properly applied everywhere.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17924>
2023-01-16 19:27:39 +00:00
Rhys Perry
254b178d5b aco: disallow SGPRS/constants with interpolation instructions
https://reviews.llvm.org/D137575

The VINTRP format cannot encode anything except VGPRs.

Reading VINTERPInstructions.td, looks like it's the same for GFX11.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
2023-01-10 16:01:38 +00:00
Timur Kristóf
db5c3f170f aco: Emulate Wave64 bpermute on GFX11.
Similar to emit_gfx10_wave64_bpermute, but uses the new
v_permlane64_b32 instruction to swap data between wave halves.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
640e801651 aco: Split opcodes for GFX6 and GFX10 emulated bpermute.
Different sequences are emitted for these, so it makes sense to
have different opcodes too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
614348f28b aco: Don't accept constants on p_bpermute.
The sequence emitted for this pseudo instruction is not ready
to handle constants or literals at all.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Rhys Perry
381de3c809 aco: more carefully apply constant offsets into scratch accesses
Death stranding does scratch_arr[80-idx]. This doesn't seem to work if we
try to combine the subtraction into the access.

fossil-db (navi21):
Totals from 52 (0.04% of 135636) affected shaders:
Instrs: 78560 -> 79036 (+0.61%)
CodeSize: 427940 -> 431188 (+0.76%)
Latency: 1313809 -> 1318142 (+0.33%)
InvThroughput: 292833 -> 293842 (+0.34%)
VClause: 2361 -> 2555 (+8.22%); split: -0.51%, +8.73%
Copies: 8767 -> 8746 (-0.24%); split: -0.35%, +0.11%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 0e783d687a ("aco: use scratch_* for scratch load/store on GFX9+")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7735
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>
2022-12-06 15:23:38 +00:00
Rhys Perry
917cfd587c aco: use v_minmax/v_maxmin opcodes
fossil-db (gfx1100):
Totals from 29868 (22.12% of 135032) affected shaders:
MaxWaves: 741336 -> 741344 (+0.00%)
Instrs: 34624902 -> 34539766 (-0.25%); split: -0.25%, +0.00%
CodeSize: 187196804 -> 187192100 (-0.00%); split: -0.01%, +0.01%
VGPRs: 1816860 -> 1816788 (-0.00%); split: -0.01%, +0.01%
Latency: 502597202 -> 502245627 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 84813176 -> 84586122 (-0.27%); split: -0.28%, +0.01%
VClause: 633826 -> 633749 (-0.01%); split: -0.02%, +0.01%
SClause: 1317738 -> 1317047 (-0.05%); split: -0.06%, +0.01%
Copies: 2130610 -> 2130954 (+0.02%); split: -0.03%, +0.05%
Branches: 766093 -> 765969 (-0.02%); split: -0.02%, +0.00%
PreSGPRs: 1630250 -> 1630034 (-0.01%); split: -0.02%, +0.00%
PreVGPRs: 1590777 -> 1590664 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
dfbc8e0192 aco: change order in combine_minmax()
Prepare for future optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Georg Lehmann
7aa94efe82 aco: Combine constant bit test to s_bitcmp.
Foz-DB Navi21:
Totals from 73988 (54.84% of 134913) affected shaders:
VGPRs: 2959768 -> 2959752 (-0.00%)
SpillSGPRs: 10250 -> 10697 (+4.36%); split: -0.64%, +5.00%
SpillVGPRs: 2326 -> 2291 (-1.50%); split: -2.24%, +0.73%
CodeSize: 261339476 -> 261045912 (-0.11%); split: -0.12%, +0.00%
Scratch: 239616 -> 238592 (-0.43%)
Instrs: 49214044 -> 49188242 (-0.05%); split: -0.06%, +0.00%
Latency: 413214139 -> 413296229 (+0.02%); split: -0.03%, +0.05%
InvThroughput: 71741622 -> 71786300 (+0.06%); split: -0.07%, +0.13%
VClause: 856838 -> 856973 (+0.02%); split: -0.01%, +0.02%
SClause: 1504502 -> 1504567 (+0.00%); split: -0.01%, +0.02%
Copies: 4058433 -> 4060424 (+0.05%); split: -0.03%, +0.08%
Branches: 1502953 -> 1502945 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 3081927 -> 3081531 (-0.01%); split: -0.02%, +0.01%
PreVGPRs: 2513990 -> 2513992 (+0.00%)

The vast majority of instruction count regressions are caused by parallel-rdp.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
73be938c48 aco: Combine bit test to s_bitcmp.
Foz-DB Navi21:
Totals from 6396 (4.74% of 134913) affected shaders:
VGPRs: 483280 -> 483152 (-0.03%); split: -0.03%, +0.01%
SpillSGPRs: 8119 -> 7941 (-2.19%)
CodeSize: 63377880 -> 63268556 (-0.17%); split: -0.20%, +0.03%
MaxWaves: 86778 -> 86810 (+0.04%)
Instrs: 11745621 -> 11725857 (-0.17%); split: -0.20%, +0.03%
Latency: 162400148 -> 162282230 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 29179429 -> 29133173 (-0.16%); split: -0.16%, +0.00%
VClause: 208032 -> 208100 (+0.03%); split: -0.01%, +0.05%
SClause: 431390 -> 430849 (-0.13%); split: -0.24%, +0.11%
Copies: 896222 -> 893285 (-0.33%); split: -0.62%, +0.30%
Branches: 349806 -> 348770 (-0.30%); split: -0.90%, +0.60%
PreSGPRs: 618908 -> 613773 (-0.83%); split: -0.83%, +0.00%
PreVGPRs: 482901 -> 482893 (-0.00%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
853d2cb6f1 aco: Combine s_abs and s_sub/s_add to s_absdiff.
Totals from 2 (0.00% of 134913) affected shaders:
CodeSize: 1344 -> 1336 (-0.60%)
Instrs: 277 -> 275 (-0.72%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
7e1d77fd90 aco: Ignore instructions with exec operands in follow_operand.
No Foz-DB changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
65a3328b4c aco/optimizer: Cleanup ctx.uses handling for patterns which use follow_operand(..., true).
No Foz-DB changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Yonggang Luo
cdbe1ad570 aco: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in aco_optimizer.cpp
error message:
error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19969>
2022-11-24 02:56:03 +00:00
Samuel Pitoiset
bb90d29660 aco: add p_dual_src_export_gfx11 for dual source blending on GFX11
Dual source blending must be in strict WQM mode.

Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643>
2022-11-16 18:35:10 +00:00
Georg Lehmann
6d02054047 aco: Combine v_cvt_u32_f32 with insert to v_cvt_pk_u8_f32.
No Foz-DB difference on Navi21.

Foz-DB GFX11:
Totals from 746 (0.55% of 134913) affected shaders:
CodeSize: 8430248 -> 8416128 (-0.17%); split: -0.17%, +0.00%
Instrs: 1617202 -> 1614707 (-0.15%)
Latency: 13943398 -> 13934161 (-0.07%); split: -0.07%, +0.00%
InvThroughput: 2601620 -> 2596624 (-0.19%); split: -0.20%, +0.01%
Copies: 114346 -> 114334 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 48314 -> 48312 (-0.00%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18492>
2022-11-16 16:49:04 +00:00
Rhys Perry
6113ee650a aco/gfx11: fix FS input loads in quad-divergent control flow
This is not ideal and it would be great to somehow make it better some
day.

fossil-db (gfx1100):
Totals from 5208 (3.86% of 135032) affected shaders:
MaxWaves: 127058 -> 126962 (-0.08%); split: +0.01%, -0.09%
Instrs: 3983440 -> 4072736 (+2.24%); split: -0.00%, +2.24%
CodeSize: 21872468 -> 22230852 (+1.64%); split: -0.00%, +1.64%
VGPRs: 206688 -> 206984 (+0.14%); split: -0.05%, +0.20%
Latency: 37447383 -> 37491197 (+0.12%); split: -0.05%, +0.17%
InvThroughput: 6421955 -> 6422348 (+0.01%); split: -0.03%, +0.03%
VClause: 71579 -> 71545 (-0.05%); split: -0.09%, +0.04%
SClause: 148289 -> 147146 (-0.77%); split: -0.84%, +0.07%
Copies: 259011 -> 258084 (-0.36%); split: -0.61%, +0.25%
Branches: 101366 -> 101314 (-0.05%); split: -0.10%, +0.05%
PreSGPRs: 223482 -> 223460 (-0.01%); split: -0.21%, +0.20%
PreVGPRs: 184448 -> 184744 (+0.16%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19370>
2022-11-01 12:42:43 +00:00
Georg Lehmann
2eac571d61 aco: Use opsel for the third operand.
Foz-DB Navi21:
Totals from 2 (0.00% of 134913) affected shaders:
CodeSize: 7788 -> 7772 (-0.21%)
Instrs: 1305 -> 1303 (-0.15%)
Latency: 7175 -> 7163 (-0.17%)
InvThroughput: 2082 -> 2078 (-0.19%)
Copies: 57 -> 55 (-3.51%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19380>
2022-10-31 09:54:01 +00:00
Georg Lehmann
361b47b1f0 aco: Implement signed idot instructions on GFX11.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19114>
2022-10-24 19:07:16 +00:00
Georg Lehmann
616d3908dc aco: Don't use opsel for p_insert.
This doesn't make sense, opsel preserves the not selected half of the register,
p_insert zeros it.

No Foz-DB changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

Fixes: 54292e99c7 ("aco: optimize 32-bit extracts and inserts using SDWA")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19253>
2022-10-24 18:40:40 +00:00
Timur Kristóf
dd90273aaa aco: Optimize MUBUF 0 offset when idxen is also being used.
Now that we added an index src to the NIR intrinsic, it can
happen that these generate MUBUF instructions which have both
an index and an offset.

Extend this ACO optimization to the case when idxen is used.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>
2022-10-20 20:00:50 +00:00
Rhys Perry
7cecc81683 aco/gfx11: fix s_waitcnt printing
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
2022-09-30 20:57:02 +00:00
Daniel Schürmann
cd36a29759 aco/optimizer: change inverse_comparison in-place
This avoids creating a second comparison which is more expensive
than just using the existing s_not.

Totals from 1089 (0.81% of 134913) affected shaders: (GFX10.3)
VGPRs: 50472 -> 50184 (-0.57%)
CodeSize: 4724692 -> 4760824 (+0.76%); split: -0.03%, +0.79%
MaxWaves: 23964 -> 24012 (+0.20%)
Instrs: 859588 -> 859687 (+0.01%); split: -0.11%, +0.12%
Latency: 10674653 -> 10650353 (-0.23%); split: -0.41%, +0.18%
InvThroughput: 1752987 -> 1750238 (-0.16%); split: -0.20%, +0.04%
VClause: 20921 -> 20872 (-0.23%); split: -0.68%, +0.45%
SClause: 31417 -> 31550 (+0.42%)
Copies: 69428 -> 68738 (-0.99%); split: -1.52%, +0.53%
PreSGPRs: 48033 -> 49649 (+3.36%)
PreVGPRs: 44490 -> 43699 (-1.78%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Timur Kristóf
c8445c1691 aco: Change inverse-comparison optimization to work with s_not
Some time ago we stopped using s_andn2 with exec for boolean NOT.
The reasoning behind that change was that those booleans will be
always ANDed with exec when necessary.

This inhibited the inverse-comparison optimization in most cases
which is fixed by this patch.

Fossil DB stats on Navi 21:

Totals from 12251 (9.08% of 134913) affected shaders:
VGPRs: 801744 -> 802016 (+0.03%); split: -0.00%, +0.04%
SpillSGPRs: 8863 -> 8893 (+0.34%)
CodeSize: 100593244 -> 100370684 (-0.22%); split: -0.22%, +0.00%
MaxWaves: 204994 -> 204948 (-0.02%); split: +0.00%, -0.02%
Instrs: 18717001 -> 18668965 (-0.26%); split: -0.26%, +0.00%
Latency: 263255046 -> 262874896 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 52760249 -> 52721736 (-0.07%); split: -0.08%, +0.01%
VClause: 329631 -> 329680 (+0.01%); split: -0.03%, +0.04%
SClause: 681563 -> 681435 (-0.02%); split: -0.02%, +0.00%
Copies: 1331612 -> 1372446 (+3.07%); split: -0.03%, +3.10%
Branches: 548325 -> 548301 (-0.00%)
PreSGPRs: 911317 -> 909700 (-0.18%)
PreVGPRs: 766279 -> 767070 (+0.10%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Daniel Schürmann
cf5f9854bc aco/optimizer: optimize s_and(exec, s_and(x, y)) more aggressively
It is enough if either of the two operands is respecting the
current exec mask.

Totals from 1680 (1.25% of 134913) affected shaders: (GFX10.3)
CodeSize: 3929436 -> 3922372 (-0.18%); split: -0.18%, +0.00%
Instrs: 730305 -> 728536 (-0.24%); split: -0.24%, +0.00%
Latency: 6839314 -> 6835154 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 1371351 -> 1371267 (-0.01%); split: -0.01%, +0.00%
SClause: 32819 -> 32802 (-0.05%); split: -0.09%, +0.04%
Copies: 33264 -> 33271 (+0.02%); split: -0.01%, +0.03%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Daniel Schürmann
79a8e8b5b2 aco/optimizer: do can_eliminate_and_exec() optimization later
This will allow to optimize s_not(v_cmp()) safely.

Totals from 1024 (0.76% of 134913) affected shaders: (GFX10.3)
CodeSize: 5695424 -> 5701860 (+0.11%); split: -0.00%, +0.11%
Scratch: 242688 -> 241664 (-0.42%)
Instrs: 1040656 -> 1041635 (+0.09%); split: -0.00%, +0.09%
Latency: 16842282 -> 16922790 (+0.48%); split: -0.06%, +0.54%
InvThroughput: 4772728 -> 4810868 (+0.80%); split: -0.10%, +0.90%
VClause: 20013 -> 20000 (-0.06%); split: -0.12%, +0.05%
Copies: 115057 -> 114384 (-0.58%); split: -1.22%, +0.63%
Branches: 34531 -> 34532 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 46263 -> 46267 (+0.01%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Georg Lehmann
84c0529258 aco: Unswizzle v_pk_fma_f16 literals to produce more v_pk_fmac_f16.
No Foz-DB difference, but it reduces code size in some angle shaders.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18676>
2022-09-21 23:04:06 +00:00
Daniel Schürmann
98e3c446d8 aco/optimizer: disallow can_eliminate_and_exec() with s_not
Totals from 295 (0.22% of 134913) affected shaders: (GFX10.3)
CodeSize: 1016564 -> 1016896 (+0.03%); split: -0.05%, +0.09%
Instrs: 187659 -> 187724 (+0.03%); split: -0.08%, +0.11%
Latency: 2839516 -> 2839541 (+0.00%); split: -0.01%, +0.01%
Copies: 12301 -> 12305 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 10266 -> 10268 (+0.02%)

Closes: #7024
Cc: mesa-stable
Tested-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18722>
2022-09-21 15:33:43 +00:00
Georg Lehmann
94b5225c9b aco: Use v_fmaak/v_fmamk if two operands are the same literal.
Foz-DB Navi21:
Totals from 5744 (4.26% of 134913) affected shaders:
VGPRs: 237128 -> 237056 (-0.03%); split: -0.04%, +0.01%
CodeSize: 16654484 -> 16620668 (-0.20%); split: -0.23%, +0.03%
MaxWaves: 152838 -> 152846 (+0.01%)
Instrs: 3063214 -> 3058572 (-0.15%); split: -0.17%, +0.02%
Latency: 23935195 -> 23934827 (-0.00%); split: -0.03%, +0.03%
InvThroughput: 5478562 -> 5478160 (-0.01%); split: -0.01%, +0.01%
VClause: 60432 -> 60435 (+0.00%); split: -0.02%, +0.03%
SClause: 121032 -> 120896 (-0.11%); split: -0.20%, +0.09%
Copies: 147865 -> 143144 (-3.19%); split: -3.59%, +0.40%
PreSGPRs: 195722 -> 195661 (-0.03%); split: -0.06%, +0.03%
PreVGPRs: 182849 -> 182787 (-0.03%)

Foz-DB Vega10:
Totals from 5290 (3.92% of 135041) affected shaders:
SGPRs: 357952 -> 359616 (+0.46%); split: -0.11%, +0.57%
VGPRs: 204048 -> 203928 (-0.06%); split: -0.08%, +0.02%
CodeSize: 14043176 -> 14003100 (-0.29%); split: -0.29%, +0.00%
MaxWaves: 39401 -> 39398 (-0.01%); split: +0.01%, -0.02%
Instrs: 2636739 -> 2631246 (-0.21%); split: -0.21%, +0.00%
Latency: 25264088 -> 25256482 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 12039643 -> 12039346 (-0.00%); split: -0.00%, +0.00%
VClause: 55603 -> 55584 (-0.03%); split: -0.04%, +0.00%
SClause: 101577 -> 101342 (-0.23%); split: -0.30%, +0.07%
Copies: 213344 -> 207929 (-2.54%); split: -2.58%, +0.05%
Branches: 34053 -> 34054 (+0.00%)
PreSGPRs: 172405 -> 172260 (-0.08%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18645>
2022-09-21 12:16:47 +00:00
Daniel Schürmann
3d6ea4f666 aco: use std::vector::reserve() more often
This removes the majority of vector re-allocations.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18105>
2022-08-30 16:03:26 +00:00
Rhys Perry
f60cb8d0af aco: rename is_cmp to is_fp_cmp
The old name is no longer accurate.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18077>
2022-08-16 17:31:33 +00:00