Commit graph

102 commits

Author SHA1 Message Date
Samuel Pitoiset
b8abfafe86 aco: fix constant folding of SMRD instructions on GFX6
SMRD instructions have an 8-bit dword offset on SI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
2020-01-20 16:24:55 +00:00
Rhys Perry
e151398de6 aco: fix stack buffer overflow in apply_sgprs()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: cef7879719 ('aco: rewrite apply_sgprs()')
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2361
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3442>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3442>
2020-01-20 11:13:11 +00:00
Samuel Pitoiset
a445cb35bd aco: do not combine additions of DS instructions on GFX6
The offset field doesn't work as expected on GFX6.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>
2020-01-16 14:06:06 +00:00
Timur Kristóf
dfaa3c0af6 aco: Flip s_cbranch / s_cselect to optimize out an s_not if possible.
When possible, get rid of an s_not when all it does is invert the SCC,
and its successor s_cbranch / s_cselect can be inverted instead.

Also modify some parts of instruction_selection to take advantage of
this feature.

Example:
s2: %3900,  s1: %3899:scc = s_andn2_b64 %0:exec, %406
s2: %3902 = s_cselect_b64 -1, 0, %3900:scc
s2: %407,  s1: %3903:scc = s_not_b64 %3902
s2: %3906,  s1: %3905:scc = s_and_b64 %407, %0:exec
p_cbranch_z %3905:scc
Can now be optimized to:
s2: %3900,  s1: %3899:scc = s_andn2_b64 %0:exec, %406
p_cbranch_nz %3900:scc

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-14 21:21:06 +01:00
Timur Kristóf
c0f82165a7 aco: Optimize out s_and with exec, when used on uniform bitwise values.
Previously all booleans needed an s_and with exec when they were turned
into a scalar condition. However, this is not needed for uniform booleans.

v2 by Daniel Schürmann:
- Make the code more readable
v3 by Timur Kristóf:
- Fix regressions, make it work in wave32 mode

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-14 21:21:06 +01:00
Timur Kristóf
1c44129db3 aco: Don't skip combine_instruction when definitions[1] is used.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-14 21:21:06 +01:00
Timur Kristóf
d962bbd895 aco: Implement 64-bit constant propagation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2020-01-14 21:21:06 +01:00
Rhys Perry
f978e0e516 aco: add integer min/max to can_swap_operands
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
92ace0bb31 aco: replace extract_vector with copies
Helps a small number of small shaders with situations like this:
a = p_create_vector ...
b = p_extract_vector a, 3
and copy propagation can't be done

Totals from affected shaders:
SGPRS: 14304 -> 14416 (0.78 %)
VGPRS: 8716 -> 6592 (-24.37 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 184664 -> 176888 (-4.21 %) bytes
Max Waves: 6260 -> 6260 (0.00 %)
Instructions: 35561 -> 33617 (-5.47 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
e686e4765e aco: add min(-max(), ) and max(-min(), ) optimization
No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
fa8357eb70 aco: improve clamp optimization
Not sure why it checked the use count, it doesn't apply the constants.

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 269409 -> 269745 (0.12 %)
VGPRS: 238120 -> 238132 (0.01 %)
Spilled SGPRs: 305 -> 305 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 22908584 -> 22904672 (-0.02 %) bytes
Max Waves: 20217 -> 20217 (0.00 %)
Instructions: 4275312 -> 4263869 (-0.27 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 155409 -> 155233 (-0.11 %)
VGPRS: 153072 -> 153072 (0.00 %)
Spilled SGPRs: 269 -> 269 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 14650824 -> 14650396 (-0.00 %) bytes
Max Waves: 9609 -> 9609 (0.00 %)
Instructions: 2762802 -> 2755517 (-0.26 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
edc888ccb1 aco: fix clamp optimization
We can't do the optimization if there are neg/abs in-between.

No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
f664cb01ec aco: improve creation of v_madmk_f32/v_madak_f32
Using needs_vop3 check was flawed because it would only combine the
literal if the first operand is the literal. If the second or third
operand is the literal, then needs_vop3 will be true and the literal will
not be combined.

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 782051 -> 782051 (0.00 %)
VGPRS: 630048 -> 630048 (0.00 %)
Spilled SGPRs: 195 -> 195 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 54743740 -> 54585548 (-0.29 %) bytes
Max Waves: 67340 -> 67340 (0.00 %)
Instructions: 10182030 -> 10182030 (0.00 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 701990 -> 699590 (-0.34 %)
VGPRS: 566632 -> 566784 (0.03 %)
Spilled SGPRs: 218 -> 218 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 49173564 -> 49007856 (-0.34 %) bytes
Max Waves: 59650 -> 59612 (-0.06 %)
Instructions: 9315135 -> 9293330 (-0.23 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
15e25da3e5 aco: take advantage of GFX10's constant bus limit and VOP3 literals
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 2397159 -> 2392494 (-0.19 %)
VGPRS: 1756036 -> 1753920 (-0.12 %)
Spilled SGPRs: 461 -> 470 (1.95 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 110287304 -> 109946304 (-0.31 %) bytes
Max Waves: 318341 -> 318475 (0.04 %)
Instructions: 21019327 -> 20533618 (-2.31 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 0 -> 0 (0.00 %)
VGPRS: 0 -> 0 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
Max Waves: 0 -> 0 (0.00 %)
Instructions: 0 -> 0 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
9c2d37308f aco: allow an extra SGPR with multiple uses to be applied to VOP3
This is in a separate patch from the apply_sgprs() rewrite so that the
rewrite can be more easily tested.

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 3056 -> 3056 (0.00 %)
VGPRS: 1632 -> 1632 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 156468 -> 156304 (-0.10 %) bytes
Max Waves: 288 -> 288 (0.00 %)
Instructions: 29510 -> 29469 (-0.14 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 2984 -> 2984 (0.00 %)
VGPRS: 1616 -> 1616 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 156132 -> 155968 (-0.11 %) bytes
Max Waves: 289 -> 289 (0.00 %)
Instructions: 29426 -> 29385 (-0.14 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
f4c2c90e1a aco: allow applying two sgprs to an instruction
We could create VALU instructions which read two sgprs, but only if isel
created an instruction which already read one of them.

This change is in a separate patch from the apply_sgprs() rewrite so that
it can be tested if the rewrite affected anything.

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1756 -> 1708 (-2.73 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 312 -> 300 (-3.85 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1784 -> 1736 (-2.69 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 319 -> 307 (-3.76 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
7da07ca3e4 aco: follow through temporary when merging tests into constant comparisons
This can happen with v_mov_b32(s_mov_b32(literal))

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 632 -> 632 (0.00 %)
VGPRS: 492 -> 492 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 77488 -> 76928 (-0.72 %) bytes
Max Waves: 67 -> 67 (0.00 %)
Instructions: 14426 -> 14332 (-0.65 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 632 -> 632 (0.00 %)
VGPRS: 492 -> 492 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 77512 -> 76952 (-0.72 %) bytes
Max Waves: 67 -> 67 (0.00 %)
Instructions: 14432 -> 14338 (-0.65 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
dc6c35e1c3 aco: be more careful with literals in combine_salu_{n2,lshl_add}
No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
fcf52eb42d aco: add check_vop3_operands()
This will be useful when taking advantage of GFX10 features.

No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
cef7879719 aco: rewrite apply_sgprs()
This will make it easier to apply two different sgprs (for GFX10) or apply
the same sgpr twice (just remove the break).

No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
0be7409069 aco: rewrite literal combining
Should make taking advantage of GFX10's increased constant bus limit and
VOP3 literals easier.

No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
84b9f3786b aco: improve can_use_VOP3()
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
3cb98ed939 aco: combine two sgprs into a VALU if they're the same
This was supposed to be done before but it wasn't done correctly and
everywhere.

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 784680 -> 786128 (0.18 %)
VGPRS: 574012 -> 573892 (-0.02 %)
Spilled SGPRs: 461 -> 461 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 45477088 -> 45478172 (0.00 %) bytes
Max Waves: 81294 -> 81277 (-0.02 %)
Instructions: 8657970 -> 8622483 (-0.41 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 780664 -> 782072 (0.18 %)
VGPRS: 573880 -> 573760 (-0.02 %)
Spilled SGPRs: 629 -> 629 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 45445244 -> 45448340 (0.01 %) bytes
Max Waves: 81178 -> 81161 (-0.02 %)
Instructions: 8649902 -> 8614918 (-0.40 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
c240c1aecf aco: apply literals to split mads
Removing the return is also needed to apply literals to mads (which can be
done on GFX10).

pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 368787 -> 367555 (-0.33 %)
VGPRS: 312436 -> 312448 (0.00 %)
Spilled SGPRs: 461 -> 461 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 26113388 -> 26098260 (-0.06 %) bytes
Max Waves: 35982 -> 35982 (0.00 %)
Instructions: 5038670 -> 5028941 (-0.19 %)

pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 369843 -> 368659 (-0.32 %)
VGPRS: 317224 -> 317196 (-0.01 %)
Spilled SGPRs: 629 -> 629 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 26310540 -> 26295156 (-0.06 %) bytes
Max Waves: 36324 -> 36326 (0.01 %)
Instructions: 5073957 -> 5064164 (-0.19 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
2020-01-14 12:56:28 +00:00
Rhys Perry
809c8feb92 aco: check if multiplication/clamp is live when applying output modifier
It's possible that a multiplication/clamp is dead code and the single use
is from a different user.

Fixes portal rendering in Path of Exile when global illumination is
enabled.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
ef8abfa790 aco: disable add combining for ds_swizzle_b32
ds_bpermute_b32/ds_permute_b32 are fine, I think

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
69bed1c918 aco: don't DCE atomics with return values
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
21eafe30df aco: better handle neg/abs of sgprs
isel/label_instruction currently doesn't create these but we should
probably check anyway.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
f29a5a205c aco: check usesModifiers() when identifying a neg/abs
This was fine because a literal used to mean that it didn't use modifiers,
but now VOP3 can take a literal on GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
46fb341b8d aco: handle omod successors with the constant in the first operand
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Rhys Perry
7ce244b7d1 aco: handle VOP3 modifiers when combining a constant comparison's NaN test
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
2020-01-13 13:26:43 +00:00
Daniel Schürmann
8b7a42d6d0 aco: compact aco::span<T> to use uint16_t offset and size instead of pointer and size_t.
This reduces the size of the Instruction base class
from 40 bytes to 16 bytes. No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
2020-01-10 17:49:18 +00:00
Daniel Schürmann
ffb4790279 aco: compact various Instruction classes
No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
2020-01-10 17:49:18 +00:00
Rhys Perry
6ff92f3d68 aco/wave32: fix comparison optimizations
Previously, they weren't done in wave32.

Totals from affected shaders:
SGPRS: 507726 -> 508006 (0.06 %)
VGPRS: 450340 -> 450268 (-0.02 %)
Spilled SGPRs: 298 -> 298 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 39689708 -> 39384488 (-0.77 %) bytes
Max Waves: 39631 -> 39636 (0.01 %)
Instructions: 7865919 -> 7793650 (-0.92 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-21 12:38:42 +01:00
Daniel Schürmann
8259c97b2d aco: propagate temporaries into expanded vectors
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
da7ff58835 aco: make 1/2*PI a literal constant on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6a586a6006 aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Daniel Schürmann
caea4bbfdc aco: fix SMEM offsets for SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-07 11:23:11 +01:00
Timur Kristóf
c0dbf42a03 aco/wave32: Change uniform bool optimization to work with wave32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-12-04 10:36:01 +00:00
Rhys Perry
df645fa369 aco: implement VK_KHR_shader_float_controls
This actually supports more of the extension than the LLVM backend but we
can't enable it because ACO doesn't work with all stages yet.

With more of it enabled, some CTS tests fail because our 64-bit sqrt
is very imprecise. I can't find any precision requirements for it
anywhere, so I'm thinking it might be a CTS issue.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-15 17:36:21 +00:00
Rhys Perry
b062b92ab1 aco: don't combine literals into v_cndmask_b32/v_subb/v_addc
No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-15 17:36:21 +00:00
Timur Kristóf
9b8dc6929e aco: Optimize out trivial code from uniform bools.
This should remove most of the excess code size that was
introduced by making all booleans per-lane.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-14 17:27:11 +01:00
Timur Kristóf
94e355148f aco: Make sure not to mistakenly propagate 64-bit constants.
ACO's optimizer would try to propagate 64-bit constants, but
does so in such a way that wouldn't work due to how the 64-bit
constants are handled in the IR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-14 17:27:10 +01:00
Rhys Perry
2c98d79d11 aco: don't propagate vgprs into v_readlane/v_writelane
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
2019-11-12 17:21:38 +00:00
Rhys Perry
78e3ea9a0f aco: add Instruction::usesModifiers() and add more checks in the optimizer
No pipeline-db changes.

v2: use early-exit for VOP3

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)
2019-11-08 00:14:06 +00:00
Rhys Perry
c96289a70e aco: keep can_reorder/barrier when combining addition into SMEM
Affects 30 shaders in the pipeline-db (all youngblood).

Totals from affected shaders:
SGPRS: 2656 -> 2456 (-7.53 %)
VGPRS: 2260 -> 2260 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 240680 -> 240944 (0.11 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 90 -> 90 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry
bdf47a1273 aco: properly combine additions into ds_write2_b64/ds_read2_b64
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry
a400928f4a aco: fix 64-bit p_extract_vector on 32-bit p_create_vector
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Daniel Schürmann
4b458b3e8f aco: don't combine minmax3 if there is a neg or abs modifier in between
This fixes a graphical corruption in HotS.
No pipelinedb changes other than that.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-17 16:21:19 +00:00
Rhys Perry
45d6c69b99 aco: use can_accept_constant in valu_can_accept_literal
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-11 14:26:58 +00:00