mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2025-12-25 02:10:11 +01:00
nir/opt_algebraic: Fuse c - a * b to FMA
Algebraically it is clear that
-(a * b) + c = (-a) * b + c = fma(-a, b, c)
But this is not clear from the NIR
('fadd', ('fneg', ('fmul', a, b)), c)
Add rules to handle this case specially. Note we don't necessarily want
to solve this by pushing fneg into fmul, because the rule opt_algebraic
(not the late part where FMA fusing happens) specifically pulls fneg out
of fmul to push fneg up multiplication chains.
Noticed in the big glmark2 "terrain" shader, which has a cycle count
reduced by 22% on Mali-G57 thanks to having this pattern a ton and being
FMA bound.
BEFORE: 1249 inst, 16.015625 cycles, 16.015625 fma, ... 632 quadwords
AFTER: 997 inst, 12.437500 cycles, .... 504 quadwords
Results on the same shader on AGX are also quite dramatic:
BEFORE: 1294 inst, 8600 bytes, 50 halfregs, ...
AFTER: 1154 inst, 8040 bytes, 50 halfregs, ...
Similar rules apply for fabs.
v2: Use a loop over the bit sizes (suggested by Emma).
shader-db on Valhall (open + small subset of closed), results on Bifrost
are similar:
total instructions in shared programs: 167975 -> 164970 (-1.79%)
instructions in affected programs: 92642 -> 89637 (-3.24%)
helped: 492
HURT: 25
helped stats (abs) min: 1.0 max: 252.0 x̄: 6.25 x̃: 3
helped stats (rel) min: 0.30% max: 20.18% x̄: 3.21% x̃: 2.91%
HURT stats (abs) min: 1.0 max: 5.0 x̄: 2.80 x̃: 3
HURT stats (rel) min: 0.46% max: 9.09% x̄: 3.89% x̃: 3.37%
95% mean confidence interval for instructions value: -6.95 -4.68
95% mean confidence interval for instructions %-change: -3.08% -2.65%
Instructions are helped.
total cycles in shared programs: 10556.89 -> 10538.98 (-0.17%)
cycles in affected programs: 265.56 -> 247.66 (-6.74%)
helped: 88
HURT: 2
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.20 x̃: 0
helped stats (rel) min: 0.65% max: 22.34% x̄: 5.65% x̃: 4.25%
HURT stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
HURT stats (rel) min: 8.33% max: 12.50% x̄: 10.42% x̃: 10.42%
95% mean confidence interval for cycles value: -0.28 -0.12
95% mean confidence interval for cycles %-change: -6.30% -4.30%
Cycles are helped.
total fma in shared programs: 1582.42 -> 1535.06 (-2.99%)
fma in affected programs: 871.58 -> 824.22 (-5.43%)
helped: 502
HURT: 9
helped stats (abs) min: 0.015625 max: 3.578125 x̄: 0.09 x̃: 0
helped stats (rel) min: 0.60% max: 25.00% x̄: 5.46% x̃: 4.82%
HURT stats (abs) min: 0.015625 max: 0.0625 x̄: 0.03 x̃: 0
HURT stats (rel) min: 4.35% max: 12.50% x̄: 6.22% x̃: 4.35%
95% mean confidence interval for fma value: -0.11 -0.08
95% mean confidence interval for fma %-change: -5.58% -4.93%
Fma are helped.
total cvt in shared programs: 665.55 -> 665.95 (0.06%)
cvt in affected programs: 61.72 -> 62.12 (0.66%)
helped: 33
HURT: 43
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.04 x̃: 0
helped stats (rel) min: 1.01% max: 25.00% x̄: 6.68% x̃: 4.35%
HURT stats (abs) min: 0.015625 max: 0.109375 x̄: 0.04 x̃: 0
HURT stats (rel) min: 0.78% max: 38.46% x̄: 10.85% x̃: 6.90%
95% mean confidence interval for cvt value: -0.01 0.02
95% mean confidence interval for cvt %-change: 0.23% 6.24%
Inconclusive result (value mean confidence interval includes 0).
total quadwords in shared programs: 93376 -> 91736 (-1.76%)
quadwords in affected programs: 25376 -> 23736 (-6.46%)
helped: 169
HURT: 1
helped stats (abs) min: 8.0 max: 128.0 x̄: 9.75 x̃: 8
helped stats (rel) min: 1.52% max: 33.33% x̄: 8.35% x̃: 8.00%
HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
95% mean confidence interval for quadwords value: -11.18 -8.11
95% mean confidence interval for quadwords %-change: -8.95% -7.36%
Quadwords are helped.
total threads in shared programs: 4697 -> 4701 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Ol<C5><A1><C3><A1>k <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19312>
This commit is contained in:
parent
07bac4094a
commit
45a111c21c
4 changed files with 42 additions and 14 deletions
|
|
@ -2638,15 +2638,39 @@ late_optimizations = [
|
|||
|
||||
# nir_lower_to_source_mods will collapse this, but its existence during the
|
||||
# optimization loop can prevent other optimizations.
|
||||
(('fneg', ('fneg', a)), a),
|
||||
(('fneg', ('fneg', a)), a)
|
||||
]
|
||||
|
||||
# re-combine inexact mul+add to ffma. Do this before fsub so that a * b - c
|
||||
# gets combined to fma(a, b, -c).
|
||||
(('~fadd@16', ('fmul(is_only_used_by_fadd)', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma16'),
|
||||
(('~fadd@32', ('fmul(is_only_used_by_fadd)', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma32'),
|
||||
(('~fadd@64', ('fmul(is_only_used_by_fadd)', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma64'),
|
||||
(('~fadd@32', ('fmulz(is_only_used_by_fadd)', a, b), c), ('ffmaz', a, b, c), 'options->fuse_ffma32'),
|
||||
# re-combine inexact mul+add to ffma. Do this before fsub so that a * b - c
|
||||
# gets combined to fma(a, b, -c).
|
||||
for sz, mulz in itertools.product([16, 32, 64], [False, True]):
|
||||
# fmulz/ffmaz only for fp32
|
||||
if mulz and sz != 32:
|
||||
continue
|
||||
|
||||
# Fuse the correct fmul. Only consider fmuls where the only users are fadd
|
||||
# (or fneg/fabs which are assumed to be propagated away), as a heuristic to
|
||||
# avoid fusing in cases where it's harmful.
|
||||
fmul = ('fmulz' if mulz else 'fmul') + '(is_only_used_by_fadd)'
|
||||
ffma = 'ffmaz' if mulz else 'ffma'
|
||||
|
||||
fadd = '~fadd@{}'.format(sz)
|
||||
option = 'options->fuse_ffma{}'.format(sz)
|
||||
|
||||
late_optimizations.extend([
|
||||
((fadd, (fmul, a, b), c), (ffma, a, b, c), option),
|
||||
|
||||
((fadd, ('fneg(is_only_used_by_fadd)', (fmul, a, b)), c),
|
||||
(ffma, ('fneg', a), b, c), option),
|
||||
|
||||
((fadd, ('fabs(is_only_used_by_fadd)', (fmul, a, b)), c),
|
||||
(ffma, ('fabs', a), ('fabs', b), c), option),
|
||||
|
||||
((fadd, ('fneg(is_only_used_by_fadd)', ('fabs', (fmul, a, b))), c),
|
||||
(ffma, ('fneg', ('fabs', a)), ('fabs', b), c), option),
|
||||
])
|
||||
|
||||
late_optimizations.extend([
|
||||
# Subtractions get lowered during optimization, so we need to recombine them
|
||||
(('fadd@8', a, ('fneg', 'b')), ('fsub', 'a', 'b'), 'options->has_fsub'),
|
||||
(('fadd@16', a, ('fneg', 'b')), ('fsub', 'a', 'b'), 'options->has_fsub'),
|
||||
|
|
@ -2823,7 +2847,7 @@ late_optimizations = [
|
|||
(('extract_i8', ('extract_u8', a, b), 0), ('extract_i8', a, b)),
|
||||
(('extract_u8', ('extract_i8', a, b), 0), ('extract_u8', a, b)),
|
||||
(('extract_u8', ('extract_u8', a, b), 0), ('extract_u8', a, b)),
|
||||
]
|
||||
])
|
||||
|
||||
# A few more extract cases we'd rather leave late
|
||||
for N in [16, 32]:
|
||||
|
|
|
|||
|
|
@ -433,8 +433,12 @@ is_only_used_by_fadd(const nir_alu_instr *instr)
|
|||
const nir_alu_instr *const user_alu = nir_instr_as_alu(user_instr);
|
||||
assert(instr != user_alu);
|
||||
|
||||
if (user_alu->op != nir_op_fadd)
|
||||
if (user_alu->op == nir_op_fneg || user_alu->op == nir_op_fabs) {
|
||||
if (!is_only_used_by_fadd(user_alu))
|
||||
return false;
|
||||
} else if (user_alu->op != nir_op_fadd) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
|
|
|
|||
|
|
@ -30,7 +30,7 @@ traces:
|
|||
checksum: 32e8b627a33ad08d416dfdb804920371
|
||||
0ad/0ad-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: bf22fd7c3fc8baa7b0e9345728626d5f
|
||||
checksum: 638fa405f78a6631ba829a8fc98392a6
|
||||
glmark2/buffer:update-fraction=0.5:update-dispersion=0.9:columns=200:update-method=map:interleave=false-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 040232e01e394a967dc3320bb9252870
|
||||
|
|
@ -42,7 +42,7 @@ traces:
|
|||
checksum: df21895268db3ab185ae5ffa5b2d7f37
|
||||
glmark2/bump:bump-render=height-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: cd32f46925906c53fae747372a8f2ed8
|
||||
checksum: cceb2b8d4852b94709684b69c688638c
|
||||
glmark2/bump:bump-render=high-poly-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 11b7a4820b452934e6f12b57b8910a9a
|
||||
|
|
@ -126,7 +126,7 @@ traces:
|
|||
label: [crash]
|
||||
gputest/pixmark-julia-fp32-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 0aa3a82a5b849cb83436e52c4e3e95ac
|
||||
checksum: fbf5e44a6f46684b84e5bb5ad6d36c67
|
||||
gputest/pixmark-julia-fp64-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 1760aea00af985b8cd902128235b08f6
|
||||
|
|
|
|||
|
|
@ -123,7 +123,7 @@ traces:
|
|||
label: [crash]
|
||||
gputest/pixmark-julia-fp32-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 25f938c726c68c08a88193f28f7c4474
|
||||
checksum: 8b3584b1dd8f1d1bb63205564bd78e4e
|
||||
gputest/pixmark-julia-fp64-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 73ccaff82ea764057fb0f93f0024cf84
|
||||
|
|
@ -183,7 +183,7 @@ traces:
|
|||
checksum: f4af4067b37c00861fa5911e4c0a6629
|
||||
supertuxkart/supertuxkart-mansion-egl-gles-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: 092e8ca38e58aaa83df2a9f0b7b8aee5
|
||||
checksum: cc7092975dd6c9064aa54cd7f18053b6
|
||||
xonotic/xonotic-keybench-high-v2.trace:
|
||||
gl-virgl:
|
||||
checksum: f3b184bf8858a6ebccd09e7ca032197e
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue