Commit graph

292 commits

Author SHA1 Message Date
Daniel Schürmann
9e3ff06c38 aco: rewrite SDWA selector
This commit introduces a new struct SubdwordSel
in order to ease and clean up the usage of SDWA
selections. This includes removing the distinction
between register-allocated and fixed SDWA selections.
Instead, SDWA selections can now also access the high
bits of subdword variables. Alignment and sizes are
validated accordingly. Size, offset and sign_extend
can be evaluated via helper methods.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>
2021-09-02 20:39:17 +02:00
Rhys Perry
33ddbd220f aco: remove DPP when applying constants/literals/sgprs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12601>
2021-08-31 16:58:20 +00:00
Rhys Perry
e27946ca11 aco: don't constant propagate to DPP instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12601>
2021-08-31 16:58:20 +00:00
Timur Kristóf
76b9dd6266 aco: Unset 16 and 24-bit flags from operands in apply_extract.
Consider the following sequence in a shader:
b = p_extract a
c = v_mad_u32_u16 b, X, 0

The optimizer applies extract, resulting in:
c = v_mad_u32_u16 a, X, 0 (correct)

Then it mistakenly turns that into:
c = v_mul_u32_u24 a, X, 0 (incorrect)

In this case, the p_extract is applied to v_mad_u32_u16 by
apply_extract. After this, we can no longer be sure that
the operands are still 16 or 24-bit, so we have to remove
this flag.

No Fossil DB changes.

Fixes: 54292e99c7
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
2021-08-30 14:05:33 +00:00
Daniel Schürmann
2eeaaabb8e aco/optimizer: combine v_pk_mul_u16 + v_pk_add_u16 -> v_pk_mad_u16
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
be16ebc5ca aco/optimizer: fuse v_mul_f64 + v_add_f64 -> v_fma_f64
No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
8e27ca9953 aco/optimizer: combine v_mul_lo_u16 + v_add_u16 -> v_mad_u16
Totals from 192 (0.13% of 150170) affected shaders: (GFX10.3)
CodeSize: 1027224 -> 1019872 (-0.72%)
Instrs: 174784 -> 173863 (-0.53%)
Latency: 4235742 -> 4232177 (-0.08%); split: -0.11%, +0.03%
InvThroughput: 1777026 -> 1775945 (-0.06%); split: -0.09%, +0.03%
Copies: 34098 -> 34099 (+0.00%); split: -0.03%, +0.03%
PreVGPRs: 4920 -> 4850 (-1.42%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
23d5865f42 aco: refactor nir_op_imul selection
Previously, the optimization to use v_mul_lo_u16 for
32bit multiplications was done in instruction_selection.
This was moved to the optimizer to ease some case distinctions.

The mixed results are due to increased use of SDWA.

Totals from 2616 (1.74% of 150170) affected shaders: (GFX10.3)
VGPRs: 143888 -> 143872 (-0.01%); split: -0.02%, +0.01%
CodeSize: 5604032 -> 5604080 (+0.00%); split: -0.01%, +0.01%
Instrs: 1086798 -> 1083915 (-0.27%); split: -0.27%, +0.01%
Latency: 8215793 -> 8213023 (-0.03%); split: -0.10%, +0.07%
InvThroughput: 20765157 -> 20773766 (+0.04%); split: -0.02%, +0.06%
VClause: 35256 -> 35260 (+0.01%); split: -0.02%, +0.03%
SClause: 29021 -> 29024 (+0.01%); split: -0.00%, +0.01%
Copies: 74163 -> 74306 (+0.19%); split: -0.05%, +0.24%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Daniel Schürmann
d8eef134d8 aco: only apply extract if not used more than 4 times
Totals from 61 (0.04% of 150170) affected shaders: (GFX10.3)
CodeSize: 1087732 -> 1087380 (-0.03%); split: -0.22%, +0.18%
Instrs: 192343 -> 192205 (-0.07%); split: -0.16%, +0.09%
Latency: 7231670 -> 7148073 (-1.16%); split: -1.19%, +0.04%
InvThroughput: 3436715 -> 3394926 (-1.22%); split: -1.25%, +0.04%
VClause: 4831 -> 4833 (+0.04%)
Copies: 50130 -> 49934 (-0.39%); split: -0.67%, +0.28%
Branches: 5945 -> 5948 (+0.05%)
PreSGPRs: 3486 -> 3472 (-0.40%)
PreVGPRs: 5154 -> 5152 (-0.04%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>
2021-08-27 19:57:59 +00:00
Timur Kristóf
abcc83e713 aco: Fix to_uniform_bool_instr when operands are not suitable.
Don't attempt to transform uniform boolean instructions when
their operands are unsuitable. This can happen eg. due to other
optimizations that combine SALU instructions which clear out
the uniform instruction labels.

Cc: mesa-stable
Fixes: 8a32f57fff
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573>
2021-08-25 12:43:50 +00:00
Rhys Perry
2201f5a58c aco: remove label_extract if the extract is used by a non-VALU
If an extract is used by a non-VALU instruction, it can't be applied to
all instructions, so it's not beneficial to try to apply it.

This check isn't needed because can_apply_extract()/can_use_SDWA() should
already handle non-VALU instructions.

fossil-db (Sienna Cichlid):
Totals from 1020 (0.68% of 150170) affected shaders:
SpillSGPRs: 1577 -> 1571 (-0.38%)
CodeSize: 7863668 -> 7858336 (-0.07%); split: -0.07%, +0.00%
Instrs: 1431583 -> 1431083 (-0.03%); split: -0.04%, +0.01%
Latency: 25891250 -> 25890916 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7248683 -> 7248655 (-0.00%); split: -0.01%, +0.01%
SClause: 49072 -> 49071 (-0.00%)
Copies: 126649 -> 126580 (-0.05%); split: -0.11%, +0.06%
Branches: 39129 -> 39120 (-0.02%); split: -0.03%, +0.01%
PreSGPRs: 53071 -> 52943 (-0.24%); split: -0.26%, +0.02%
PreVGPRs: 57437 -> 57435 (-0.00%); split: -0.01%, +0.01%

fossil-db (Polaris10):
Totals from 654 (0.43% of 151696) affected shaders:
CodeSize: 5814552 -> 5811568 (-0.05%); split: -0.05%, +0.00%
Instrs: 1105783 -> 1105049 (-0.07%); split: -0.07%, +0.00%
Latency: 20261458 -> 20259744 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 9011785 -> 9011749 (-0.00%); split: -0.00%, +0.00%
Copies: 104693 -> 103904 (-0.75%); split: -0.76%, +0.00%
PreSGPRs: 36105 -> 36095 (-0.03%); split: -0.03%, +0.01%
PreVGPRs: 43813 -> 43809 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12212>
2021-08-23 14:56:37 +01:00
Rhys Perry
2e6834d4f6 aco: combine DPP into VALU before RA
Mostly helps a bunch of Cyberpunk 2077 shaders. Catches some of the cases
that the post-RA can't optimize because of register assignment.

fossil-db (Siena Cichlid):
Totals from 25 (0.02% of 150170) affected shaders:
CodeSize: 78808 -> 75764 (-3.86%)
Instrs: 14311 -> 13547 (-5.34%)
Latency: 278697 -> 277885 (-0.29%)
InvThroughput: 63428 -> 62754 (-1.06%)
Copies: 1348 -> 1349 (+0.07%); split: -0.07%, +0.15%
PreVGPRs: 1035 -> 1011 (-2.32%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924>
2021-08-19 18:17:33 +00:00
Rhys Perry
b97cfd72af aco: handle DPP in the optimizer
There are a bunch of optimizations that are broken when DPP is involved.

fossil-db (Sienna Cichlid):
Totals from 100 (0.07% of 150170) affected shaders:
CodeSize: 325204 -> 325192 (-0.00%); split: -0.06%, +0.05%
Instrs: 62773 -> 62664 (-0.17%); split: -0.18%, +0.00%
Latency: 295348 -> 295266 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 73990 -> 73946 (-0.06%); split: -0.06%, +0.01%
Copies: 1650 -> 1609 (-2.48%); split: -2.55%, +0.06%
PreSGPRs: 3554 -> 3520 (-0.96%)

Fossil-db changes are probably because v_sub_f32_dpp(v_mul_f32) is no
longer being combined into MAD and then split back into separate
instructions.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924>
2021-08-19 18:17:33 +00:00
Rhys Perry
1d894a8c85 aco: move a bunch of helpers into aco_ir.h/aco_ir.cpp
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924>
2021-08-19 18:17:33 +00:00
Rhys Perry
211d1dfd34 aco: don't create v_madmk_f32/v_madak_f32 from v_fma_legacy_f16
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5105
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12004>
2021-07-22 15:43:31 +00:00
Daniel Schürmann
9b1a296172 aco/optimizer: ensure to not erase high bits when propagating packed constants
Packed constants with non-zero values in the high half
might have been propagated as 16 bit, dropping the high half.

Cc: mesa-stable
Closes: #5070
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11954>
2021-07-20 07:48:39 +00:00
Timur Kristóf
60c5abf685 aco: Remove s_and with exec when all lanes are active.
This helps NGG GS and culling shaders.
No Fossil DB changes without NGG culling.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
2021-07-16 14:31:54 +00:00
Tony Wasserka
66e51dc474 aco: Remove use of deprecated Operand constructors
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)

Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
2021-07-13 17:43:26 +00:00
Daniel Schürmann
1e2639026f aco: Format.
Manually adjusted some comments for more intuitive line breaks.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258>
2021-07-12 21:27:31 +00:00
Daniel Schürmann
3f9e986d33 aco: add missing Licenses and remove Authors from files
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
2021-07-12 12:09:31 +00:00
Daniel Schürmann
59fdaa1985 aco: reorder and cleanup #includes
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
2021-07-12 12:09:31 +00:00
Timur Kristóf
5713e059ea aco: Add validation for v_permlane instructions.
Previously there hasn't been any validation for these instructions,
but after shooting myself in the leg with it a few times, I decided
to add the validation now.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
2021-06-09 16:48:51 +00:00
Rhys Perry
54292e99c7 aco: optimize 32-bit extracts and inserts using SDWA
Still need to use dst_u=preserve field to optimize packs

fossil-db (Sienna Cichlid):
Totals from 15974 (10.66% of 149839) affected shaders:
VGPRs: 1009064 -> 1008968 (-0.01%); split: -0.03%, +0.02%
SpillSGPRs: 7959 -> 7964 (+0.06%)
CodeSize: 101716436 -> 101159568 (-0.55%); split: -0.55%, +0.01%
MaxWaves: 284464 -> 284490 (+0.01%); split: +0.02%, -0.01%
Instrs: 19334216 -> 19224241 (-0.57%); split: -0.57%, +0.00%
Latency: 375465295 -> 375230478 (-0.06%); split: -0.14%, +0.08%
InvThroughput: 79006105 -> 78860705 (-0.18%); split: -0.25%, +0.07%

fossil-db (Polaris):
Totals from 11369 (7.51% of 151365) affected shaders:
SGPRs: 787920 -> 787680 (-0.03%); split: -0.04%, +0.01%
VGPRs: 681056 -> 681040 (-0.00%); split: -0.01%, +0.00%
CodeSize: 68127288 -> 67664120 (-0.68%); split: -0.69%, +0.01%
MaxWaves: 54370 -> 54371 (+0.00%)
Instrs: 13294638 -> 13214109 (-0.61%); split: -0.62%, +0.01%
Latency: 373515759 -> 373214571 (-0.08%); split: -0.11%, +0.03%
InvThroughput: 166529524 -> 166275291 (-0.15%); split: -0.20%, +0.05%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:43 +00:00
Rhys Perry
2f94353735 aco: add p_extract/p_insert
These will let us make the SDWA optimizer much simpler than if we were to
recognize combinations of shift/and/bfe.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:42 +00:00
Rhys Perry
3013670dfd aco: disallow SGPRs on DPP instructions
They can't be encoded.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10841>
2021-05-19 14:25:37 +00:00
Rhys Perry
e3c283e0bc aco: use -1.0*x and 1.0*|x| for fneg/fabs
Besides -1.0*x being 1 dword smaller than x^0x80000000, this commit also
improves generated code when the application requires that denormals are
flushed.

Future versions of DXVK will require that 32-bit denormals are flushed.

fossil-db (GFX8):
Totals from 21021 (14.22% of 147787) affected shaders:
SGPRs: 1288960 -> 1288944 (-0.00%); split: -0.01%, +0.01%
VGPRs: 792672 -> 792848 (+0.02%); split: -0.01%, +0.03%
CodeSize: 62439228 -> 62403552 (-0.06%); split: -0.11%, +0.05%
MaxWaves: 136182 -> 136181 (-0.00%); split: +0.00%, -0.00%
Instrs: 12230882 -> 12239927 (+0.07%); split: -0.01%, +0.08%

fossil-db (GFX10.3):
Totals from 20191 (13.80% of 146267) affected shaders:
VGPRs: 799992 -> 800032 (+0.01%)
CodeSize: 59763656 -> 59715484 (-0.08%); split: -0.12%, +0.03%
MaxWaves: 525378 -> 525376 (-0.00%)
Instrs: 11511082 -> 11517419 (+0.06%); split: -0.00%, +0.06%

fossil-db (GFX8, d3d float controls):
Totals from 87160 (58.98% of 147787) affected shaders:
SGPRs: 5395072 -> 5408480 (+0.25%); split: -0.06%, +0.31%
VGPRs: 3596716 -> 3581592 (-0.42%); split: -0.55%, +0.13%
CodeSize: 271347396 -> 266814460 (-1.67%); split: -1.67%, +0.00%
MaxWaves: 539669 -> 540400 (+0.14%); split: +0.15%, -0.02%
Instrs: 53395194 -> 52257505 (-2.13%); split: -2.13%, +0.00%

fossil-db (GFX10.3, d3d float controls):
Totals from 82306 (56.27% of 146267) affected shaders:
VGPRs: 3572312 -> 3558848 (-0.38%); split: -0.44%, +0.06%
CodeSize: 273494748 -> 269648968 (-1.41%); split: -1.41%, +0.00%
MaxWaves: 2007156 -> 2009950 (+0.14%); split: +0.15%, -0.01%
Instrs: 52251568 -> 51356424 (-1.71%); split: -1.71%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9079>
2021-03-24 14:02:41 +00:00
Rhys Perry
561fcfb50f aco: don't optimize min(a*1.0, ...) to min(a, ...) on GFX8
fossil-db (GFX8):
Totals from 2 (0.00% of 147787) affected shaders:
VMEM: 662 -> 642 (-3.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9079>
2021-03-24 14:02:41 +00:00
Daniel Schürmann
fc3606f29c aco/optimizer: set VCC hint on new v_cmp_* definitions
Totals from 11692 (7.99% of 146267) affected shaders (Navi10):
CodeSize: 97419384 -> 97352560 (-0.07%); split: -0.07%, +0.00%
Instrs: 18571138 -> 18570969 (-0.00%); split: -0.00%, +0.00%
Cycles: 1431348400 -> 1431346296 (-0.00%); split: -0.00%, +0.00%
SMEM: 696646 -> 696650 (+0.00%)
SClause: 668511 -> 668490 (-0.00%); split: -0.00%, +0.00%
Copies: 1279475 -> 1279474 (-0.00%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9531>
2021-03-18 17:15:00 +00:00
Rhys Perry
3d4c13f3b8 aco: add DeviceInfo
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>
2021-02-15 13:44:22 +00:00
Rhys Perry
0f178290ca aco: don't affect isPrecise() after applying output modifiers
fossil-db (GFX10.3):
Totals from 26679 (19.14% of 139391) affected shaders:
SGPRs: 1757155 -> 1757059 (-0.01%); split: -0.05%, +0.04%
VGPRs: 1175932 -> 1173556 (-0.20%); split: -0.21%, +0.01%
CodeSize: 86203592 -> 85572480 (-0.73%); split: -0.73%, +0.00%
MaxWaves: 315513 -> 315805 (+0.09%); split: +0.10%, -0.00%
Instrs: 16297785 -> 16143745 (-0.95%); split: -0.95%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8718>
2021-01-26 22:22:58 +00:00
Rhys Perry
2f0d480c73 aco: optimize out a*1.0 if it's used as a float
fossil-db (GFX10):
Totals from 370 (0.27% of 139391) affected shaders:
CodeSize: 641436 -> 634156 (-1.13%); split: -1.14%, +0.00%
Instrs: 117668 -> 115739 (-1.64%); split: -1.64%, +0.00%

fossil-db (GFX10.3):
Totals from 370 (0.27% of 139391) affected shaders:
SGPRs: 26888 -> 26912 (+0.09%)
VGPRs: 13964 -> 13916 (-0.34%)
CodeSize: 692008 -> 679008 (-1.88%)
MaxWaves: 4779 -> 4783 (+0.08%)
Instrs: 134265 -> 132026 (-1.67%); split: -1.67%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Rhys Perry
54a09545ec aco: optimize a*0.0
fossil-db (GFX10):
Totals from 1943 (1.39% of 139391) affected shaders:
SGPRs: 99952 -> 99544 (-0.41%); split: -0.44%, +0.03%
VGPRs: 60880 -> 60272 (-1.00%); split: -1.02%, +0.02%
CodeSize: 5138488 -> 5107500 (-0.60%); split: -0.61%, +0.01%
MaxWaves: 32193 -> 32380 (+0.58%)
Instrs: 983178 -> 975684 (-0.76%); split: -0.77%, +0.01%

fossil-db (GFX10.3):
Totals from 1943 (1.39% of 139391) affected shaders:
SGPRs: 99832 -> 99648 (-0.18%); split: -0.25%, +0.06%
VGPRs: 64708 -> 63944 (-1.18%); split: -1.27%, +0.09%
CodeSize: 5196732 -> 5157632 (-0.75%); split: -0.76%, +0.00%
MaxWaves: 27478 -> 27486 (+0.03%); split: +0.06%, -0.03%
Instrs: 1007222 -> 998737 (-0.84%); split: -0.84%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Rhys Perry
0c3d8e8e2e aco: disable a*1.0 optimization if the instruction is precise
fossil-db (GFX10):
Totals from 10370 (7.44% of 139391) affected shaders:
SGPRs: 564072 -> 564016 (-0.01%); split: -0.01%, +0.00%
VGPRs: 248312 -> 248532 (+0.09%); split: -0.02%, +0.11%
CodeSize: 12866732 -> 13208904 (+2.66%); split: -0.00%, +2.66%
MaxWaves: 190198 -> 190170 (-0.01%)
Instrs: 2460818 -> 2545351 (+3.44%)

fossil-db (GFX10.3):
Totals from 10370 (7.44% of 139391) affected shaders:
SGPRs: 563904 -> 564272 (+0.07%); split: -0.16%, +0.22%
VGPRs: 289344 -> 295016 (+1.96%); split: -0.88%, +2.84%
CodeSize: 13519204 -> 14197020 (+5.01%); split: -0.00%, +5.01%
MaxWaves: 155946 -> 154566 (-0.88%)
Instrs: 2719177 -> 2806919 (+3.23%); split: -0.00%, +3.23%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Rhys Perry
e115b01948 aco: return references in instruction cast methods
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>
2021-01-22 14:12:33 +00:00
Rhys Perry
1d245cd18b aco: use format-check methods
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>
2021-01-22 14:12:32 +00:00
Rhys Perry
70dbcfa1c9 aco: use instruction cast methods
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>
2021-01-22 14:12:32 +00:00
Rhys Perry
441ead5fb3 aco: remove Format::{VOP3A,VOP3B}
These are really the same as Format::VOP3.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>
2021-01-22 14:12:32 +00:00
Daniel Schürmann
7dcb9a0d8c aco/optimizer: convert extract_vector with index 0 into parallelcopies if possible
Totals from 273 (0.20% of 139391) affected shaders (Navi10):
VGPRs: 11600 -> 11792 (+1.66%)
CodeSize: 1389304 -> 1383152 (-0.44%); split: -0.53%, +0.08%
MaxWaves: 3848 -> 3752 (-2.49%)
Instrs: 240228 -> 239478 (-0.31%); split: -0.37%, +0.06%
Cycles: 20637708 -> 20580024 (-0.28%); split: -0.46%, +0.18%
VMEM: 39164 -> 38831 (-0.85%); split: +0.06%, -0.91%
SMEM: 21743 -> 22204 (+2.12%)
VClause: 4787 -> 4783 (-0.08%)
Copies: 39057 -> 38308 (-1.92%); split: -2.28%, +0.37%
Branches: 6556 -> 6557 (+0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
2021-01-21 11:05:36 +00:00
Daniel Schürmann
ebbf5fe716 aco/optimizer: expand subdword vectors with SGPRs on all generations
No fossildb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
2021-01-21 11:05:36 +00:00
Daniel Schürmann
96fafcca63 aco: propagate temporaries into PSEUDO instructions if it can take it
This patch relaxes copy-propagation for PSEUDO instructions with
subdword Operands / Definitions:
general:
- only propagate VGPR temps if the Definition is VGPR (or on p_as_uniform)

parallelcopy/create_vector/phis:
- size has to be the same

extract_vector/split_vector:
- propagate SGPR temps on GFX9+ or if the Definitions are not subdword
- split_vector: size must not increase

Totals from 282 (0.20% of 140985) affected shaders (Polaris10):
VGPRs: 14520 -> 14408 (-0.77%)
CodeSize: 2693956 -> 2694316 (+0.01%); split: -0.20%, +0.21%
Instrs: 512874 -> 512864 (-0.00%); split: -0.16%, +0.16%
Cycles: 26338860 -> 26320652 (-0.07%); split: -0.36%, +0.29%
VMEM: 49460 -> 49634 (+0.35%); split: +0.47%, -0.12%
SMEM: 10035 -> 10036 (+0.01%)
VClause: 7675 -> 7674 (-0.01%)
Copies: 66012 -> 65943 (-0.10%); split: -1.31%, +1.20%
Branches: 17265 -> 17281 (+0.09%); split: -0.10%, +0.19%
PreVGPRs: 12211 -> 12124 (-0.71%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
2021-01-21 11:05:36 +00:00
Daniel Schürmann
856fd4750d aco/optimizer: don't propagate subdword temps of different size
It could happen that due to inconsistent copy-propagation

  v1 = p_parallelcopy v2b

instructions were left after optimization on GFX8.

Cc: 20.3
Cc: 21.0

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
2021-01-21 11:05:36 +00:00
Daniel Schürmann
cd870d1b6a aco/optimizer: don't copy-prop logical phis
This is dangerous w.r.t. LCSSA-phis.

Totals from 746 (0.54% of 139391) affected shaders (Navi10):
CodeSize: 8592160 -> 8568156 (-0.28%); split: -0.30%, +0.02%
MaxWaves: 5172 -> 5171 (-0.02%); split: +0.02%, -0.04%
Instrs: 1653949 -> 1648489 (-0.33%); split: -0.36%, +0.03%
Cycles: 49474892 -> 49329224 (-0.29%); split: -0.33%, +0.03%
VMEM: 137574 -> 137421 (-0.11%); split: +0.18%, -0.29%
SMEM: 42391 -> 42439 (+0.11%); split: +0.12%, -0.01%
VClause: 26946 -> 26943 (-0.01%)
Copies: 130902 -> 126176 (-3.61%); split: -4.05%, +0.43%
Branches: 54891 -> 54556 (-0.61%); split: -0.64%, +0.03%
PreVGPRs: 53941 -> 53939 (-0.00%)

This has a slight effect on RA due to affinity changes.

Cc: 20.3
Cc: 21.0

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
2021-01-21 11:05:36 +00:00
Daniel Schürmann
412291ddef aco: propagate swizzles when optimizing packed clamp & fma
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Daniel Schürmann
b03be30e07 aco: optimize packed fneg
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Daniel Schürmann
e3790fc458 aco: optimize packed clamp
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Daniel Schürmann
a9fd9187e8 aco: optimize packed mul+add to v_pk_fma_f16
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Daniel Schürmann
01134b0bfe aco: simplify multiply-add combining
When both operands of a v_sub (same apply for v_add) are mul and one
already uses clamp/omod, pick the other operand to get a chance to
combine to a MAD.

No fossils-db changes.

Co-authored-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Daniel Schürmann
178b33c870 aco: allow SGPRs on every src position for VOP3P
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Daniel Schürmann
0db4263a3a aco: allow constants/literals on every src position for VOP3P
and prevent literals on VOP3P pre-GFX10.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
2021-01-13 17:46:56 +00:00
Rhys Perry
8c02a8e2d2 aco: add get_const/is_constant_representable helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7798>
2020-12-04 14:44:48 +00:00