Commit graph

15 commits

Author SHA1 Message Date
Ian Romanick
f5815a003e intel/brw: Use def analysis for simple cases of saturate propagation
I had hoped this would improve compilation performance too. I tried
several different long running fossils, and there was no difference.

Fossil-db results are all over the place from platform to platform.

All of the Tiger Lake shaders hurt for spills and fills are fragment
shaders in rdr2.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734088 -> 19733645 (<.01%)
instructions in affected programs: 71200 -> 70757 (-0.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1
helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48%
95% mean confidence interval for instructions value: -2.69 -2.07
95% mean confidence interval for instructions %-change: -0.93% -0.72%
Instructions are helped.

total cycles in shared programs: 916290473 -> 916180971 (-0.01%)
cycles in affected programs: 3403719 -> 3294217 (-3.22%)
helped: 89
HURT: 88
helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10
helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46%
HURT stats (abs)   min: 1 max: 8750 x̄: 195.98 x̃: 7
HURT stats (rel)   min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19%
95% mean confidence interval for cycles value: -1199.88 -37.43
95% mean confidence interval for cycles %-change: -0.66% 0.56%
Inconclusive result (%-change mean confidence interval includes 0).

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 151458346 -> 151457413 (-0.00%)
Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00%
Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5500560 -> 5500384 (-0.00%)

Totals from 479 (0.08% of 628970) affected shaders:
Instrs: 398836 -> 397903 (-0.23%)
Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29%
Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92%
Max dispatch width: 4392 -> 4216 (-4.01%)

Tiger Lake
Totals:
Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00%
Spill count: 61241 -> 61251 (+0.02%)
Fill count: 107304 -> 107314 (+0.01%)
Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5517568 -> 5517248 (-0.01%)

Totals from 486 (0.08% of 628673) affected shaders:
Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01%
Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51%
Spill count: 1302 -> 1312 (+0.77%)
Fill count: 3746 -> 3756 (+0.27%)
Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99%
Max dispatch width: 4576 -> 4256 (-6.99%)

Ice Lake
Totals:
Instrs: 151348422 -> 151347463 (-0.00%)
Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00%
Fill count: 108114 -> 108111 (-0.00%)
Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5611288 -> 5611256 (-0.00%)

Totals from 483 (0.08% of 634352) affected shaders:
Instrs: 393333 -> 392374 (-0.24%)
Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22%
Fill count: 3654 -> 3651 (-0.08%)
Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92%
Max dispatch width: 4312 -> 4280 (-0.74%)

Skylake
Totals:
Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00%
Max live registers: 31757558 -> 31757725 (+0.00%)
Max dispatch width: 5470040 -> 5469920 (-0.00%)

Totals from 3542 (0.57% of 624449) affected shaders:
Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00%
Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12%
Max live registers: 304531 -> 304698 (+0.05%)
Max dispatch width: 31016 -> 30896 (-0.39%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:05 -07:00
Ian Romanick
adcce2bba4 intel/brw: Small code refactor in brw_fs_opt_saturate_propagation
This bit of code will have a second use in the next commit.

v2: Fix some broken indentation. Noticed by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:03 -07:00
Ian Romanick
3d8fea0e09 intel/brw: Don't propagate saturate to an instruction that writes flags
There are two problems.

1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
   different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.

2. Ignoring the first problem, this only produces the desired flags
   for LE and G. All other cases can produce the wrong result.

For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the
following code:

    mad.ge.f0.0(8)  g51<1>F         g50<8,8,1>F     g46<8,8,1>F     g11<1,1,1>F
    mov.sat(8)      g52<1>F         g51<1,1,0>F
    ...
    (+f0.0) sel(8)  g54<1>UD        g53<8,8,1>UD    0x3f000000UD

Without this commit, the saturate is incorrectly propagated to the MAD.

A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275.

There are even worse cases like total_war_warhammer3.dx12vk-g6.foz
78328466761ef7ab and ee920491573860fc. The former has the following
code (and the latter has very similar code):

    mad.l.f0.0(16)  g95<1>F         g93<8,8,1>F     g62<8,8,1>F     g68<1,1,1>F
    ...
    mov.sat(16)     g109<1>F        -g95<1,1,0>F
    ...
    (+f0.0) sel(16) g68<1>UD        g111<1,1,0>UD   g54<1,1,0>UD
    (+f0.0) sel(16) g70<1>UD        g113<1,1,0>UD   g56<1,1,0>UD
    (+f0.0) sel(16) g72<1>UD        g115<1,1,0>UD   g58<1,1,0>UD

Saturate propagation makes a hash of this code:

    mad.sat.l.f0.0(16) g106<1>F     -g93<8,8,1>F    -g62<8,8,1>F    g68<1,1,1>F
    ...
    (+f0.0) sel(16) g70<1>UD        g110<1,1,0>UD   g56<1,1,0>UD
    (+f0.0) sel(16) g72<1>UD        g112<1,1,0>UD   g58<1,1,0>UD
    (+f0.0) sel(16) g68<1>UD        g108<1,1,0>UD   g54<1,1,0>UD

Not only is the saturate incorrectly applied to the MAD, but the MAD
result is negated without changing the conditional modifier to G!

NOTE: Backports of this commit to stable branches may need to be more
like the following commit to elk.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19729375 -> 19729377 (<.01%)
instructions in affected programs: 112 -> 114 (1.79%)
helped: 0
HURT: 2

total cycles in shared programs: 916234266 -> 916234288 (<.01%)
cycles in affected programs: 636 -> 658 (3.46%)
helped: 0
HURT: 2

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531594 -> 151531601 (+0.00%)
Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00%

Totals from 6 (0.00% of 630198) affected shaders:
Instrs: 4550 -> 4557 (+0.15%)
Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03%

Fixes: 947c828d5c ("i965/fs: Add a saturation propagation optimization pass.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:25:57 -07:00
Caio Oliveira
8ba8e33c39 intel/brw: Simplify @file annotations
Doxygen documentation says

> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.

so we can omit the filename itself when using the annotation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
2024-07-22 22:48:03 +00:00
Caio Oliveira
71ccf8e4cd intel/brw: Rename fs_reg_* helpers to brw_reg_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Kenneth Graunke
c18de3f048 intel/brw: Delay liveness calculations in saturate propagation
Wait and see if we actually have a candidate for saturate propagation
before requesting liveness info.  Saves the calculation in the case
where we have nothing to do.

Cuts compile time in Borderlands 3 by -0.304754% +/- 0.194162% (n=25).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29849>
2024-06-24 19:12:00 -07:00
Caio Oliveira
082735750b intel/brw: Simplify usage of reg immediate helpers
Use fs_reg and don't take the type as argument.  In all uses the type
passed is the type of the register.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>
2024-03-01 17:52:09 +00:00
Caio Oliveira
1bd175f458 intel/brw: Pull opt_saturate_propagation out of fs_visitor
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
2024-02-26 20:54:24 +00:00
Oleksii Bozhenko
d5d8bb1dbb brw: fix saturate propagation region overlap range
Fixes: 947c828d5c
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7691

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Oleksii Bozhenko <oleksii.bozhenko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20206>
2022-12-09 00:39:05 +00:00
Francisco Jerez
ea44de6d8c intel/compiler/fs: Switch liveness analysis to IR analysis framework
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward.  The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:57 -08:00
Juan A. Suarez Romero
b06ae53606 Revert "intel/compiler: split is_partial_write() into two variants"
This reverts commit 40b3abb4d1.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-25 09:19:10 +02:00
Rafael Antognolli
0778748eba intel/fs: Only propagate saturation if exec_size is the same.
Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:53:55 -07:00
Iago Toral Quiroga
40b3abb4d1 intel/compiler: split is_partial_write() into two variants
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

            Patched  |     Master    |    %    |
================================================
SIMD8  |    621,900  |    706,721    | -12.00% |
================================================
SIMD16 |     93,252  |     93,252    |   0.00% |
================================================

As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2019-04-18 11:05:18 +02:00
Matt Turner
a05af1f7b8 i965/fs: Handle negating immediates on MADs when propagating saturates
MADs don't take immediate sources, but we allow them in the IR since it
simplifies a lot of things. I neglected to consider that case.

Fixes: 4009a9ead4 ("i965/fs: Allow saturate propagation to propagate
                      negations into MADs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616
Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-11-21 10:13:07 -08:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp (Browse further)