Commit graph

21 commits

Author SHA1 Message Date
Yevhenii Kolesnikov
36abb0c691 intel/compiler: don't propagate cmp to add if add is saturated
From the Kaby Lake PRM Vol. 7 "Assigning Conditional Flags":

   * Note that the [post condition signal] bits generated at
     the output of a compute are before the .sat.

Paragraph about post_zero does not mention saturation, but
testing it on actual GPUs shows that conditional modifiers
are applied after saturation.

   * post_zero bit: This bit reflects whether the final
     result is zero after all the clamping, normalizing,
     or format conversion logic.

For signed types we don't care about saturation: it won't
change the result of conditional modifier.

For floating and unsigned types there two special cases,
when we can remove inst even if scan_inst is saturated: G
and LE. Since conditional modifiers are just comparations
against zero, saturating positive values to the upper
limit never changes the result of comparation.

For negative values:
(sat(x) >  0) == (x >  0) --- false
(sat(x) <= 0) == (x <= 0) --- true

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2610

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4167>
2020-07-11 00:25:48 +00:00
Yevhenii Kolesnikov
32b7ba66b0 intel/compiler: fix cmod propagation optimisations
Knowing following:
 - CMP writes to flag register the result of
   applying cmod to the `src0 - src1`.
   After that it stores the same value to dst.
   Other instructions first store their result to
   dst, and then store cmod(dst) to the flag
   register.
 - inst is either CMP or MOV
 - inst->dst is null
 - inst->src[0] overlaps with scan_inst->dst
 - inst->src[1] is zero
 - scan_inst wrote to a flag register

There can be three possible paths:

 - scan_inst is CMP:

   Considering that src0 is either 0x0 (false),
   or 0xffffffff (true), and src1 is 0x0:

   - If inst's cmod is NZ, we can always remove
     scan_inst: NZ is invariant for false and true. This
     holds even if src0 is NaN: .nz is the only cmod,
     that returns true for NaN.

   - .g is invariant if src0 has a UD type

   - .l is invariant if src0 has a D type

 - scan_inst and inst have the same cmod:

   If scan_inst is anything than CMP, it already
   wrote the appropriate value to the flag register.

 - else:

   We can change cmod of scan_inst to that of inst,
   and remove inst. It is valid as long as we make
   sure that no instruction uses the flag register
   between scan_inst and inst.

Nine new cmod_propagation unit tests:
 - cmp_cmpnz
 - cmp_cmpg
 - plnnz_cmpnz
 - plnnz_cmpz (*)
 - plnnz_sel_cmpz
 - cmp_cmpg_D
 - cmp_cmpg_UD (*)
 - cmp_cmpl_D (*)
 - cmp_cmpl_UD

(*) this would fail without changes to brw_fs_cmod_propagation.

This fixes optimisation that used to be illegal (see issue #2154)

= Before =
 0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
 1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
 0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F

Now it is optimised as such (note change of cmod in line 0):

= Before =
 0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
 1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
 0: linterp.nz.f0.0(8) vgrf0:F, g2:F, attr0<0>:F

No shaderdb changes

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2154

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
2020-03-11 21:21:25 +00:00
Francisco Jerez
ab6d792986 intel/compiler: Pass detailed dependency classes to invalidate_analysis()
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:39 -08:00
Francisco Jerez
d966a6b4c4 intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR.  For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.

This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:32 -08:00
Ian Romanick
e13a5c7d67 intel/fs: Allow cmod propagation across reads and writes of different flags
This also helps a later patch (intel/fs: Improve discard_if code
generation) on about 200 shaders.

v2: Document that other instruction sequences are also valid in
subtract_merge_with_compare_intervening_mismatch_flag_write.  Suggested
by Caio.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17224434 (<.01%)
instructions in affected programs: 296 -> 292 (-1.35%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.04% -0.81%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361468458 (<.01%)
cycles in affected programs: 2862 -> 2865 (0.10%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
HURT stats (abs)   min: 3 max: 4 x̄: 3.50 x̃: 3
HURT stats (rel)   min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -4.34 5.84
95% mean confidence interval for cycles %-change: -0.70% 0.90%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:45 -07:00
Ian Romanick
8030cb75c1 intel/fs: Fix flag_subreg handling in cmod propagation
There were two errors.  First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register.  For example,

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
    cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F

could be come

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F

Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0.  For example,

    linterp(16) vgrf6:F, g2:F, attr0:F
    cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
    (-f0.1) discard_jump(16) (null):UD

could become

    linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
    (-f0.1) discard_jump(16) (null):UD

None of these cases will occur currently.  The only time we use f0.1 is
for generating discard intrinsics.  In all those cases, we generate a
squence like:

    cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
    (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
    (-f0.1) discard_jump(16) (null):UD

Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation.  The next patch will change this.

No shader-db changes on any Intel platform.

v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:40 -07:00
Ian Romanick
a79570099b intel/fs: Allow cmod propagation to instructions with saturate modifier
v2: Add unit tests.  Suggested by Matt.

All Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs: 17229441 -> 17228658 (<.01%)
instructions in affected programs: 159574 -> 158791 (-0.49%)
helped: 489
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1
helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.72 -1.48
95% mean confidence interval for instructions %-change: -0.64% -0.58%
Instructions are helped.

total cycles in shared programs: 360944149 -> 360937144 (<.01%)
cycles in affected programs: 1072195 -> 1065190 (-0.65%)
helped: 254
HURT: 27
helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9
helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24%
HURT stats (abs)   min: 2 max: 83 x̄: 27.56 x̃: 24
HURT stats (rel)   min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16%
95% mean confidence interval for cycles value: -30.11 -19.75
95% mean confidence interval for cycles %-change: -0.70% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2019-05-14 11:38:21 -07:00
Juan A. Suarez Romero
b06ae53606 Revert "intel/compiler: split is_partial_write() into two variants"
This reverts commit 40b3abb4d1.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-25 09:19:10 +02:00
Iago Toral Quiroga
ddd1706ab3 intel/compiler: fix cmod propagation for non 32-bit types
v2:
 - Do not propagate if the bit-size changes

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
40b3abb4d1 intel/compiler: split is_partial_write() into two variants
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

            Patched  |     Master    |    %    |
================================================
SIMD8  |    621,900  |    706,721    | -12.00% |
================================================
SIMD16 |     93,252  |     93,252    |   0.00% |
================================================

As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2019-04-18 11:05:18 +02:00
Ian Romanick
c9d5bd050c intel/fs: Relax type matching rules in cmod propagation from MOV instructions
To allow cmod propagation from a MOV in a sequence like:

    and(16)         g31<1>UD       g20<8,8,1>UD   g22<8,8,1>UD
    mov.nz.f0(16)   null<1>F       g31<8,8,1>D

A similar change to the vec4 backend had no effect.

Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.

A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Matt Turner
2dff9a66b6 intel/compiler: Avoid propagating inequality cmods if types are different
v2: Fix silly bug in logic.  s/||/&&/

All but one of the affected shaders is in an Unreal4 demo.  The other is
in Tomb Raider.  All of the cases that Ian investigated appear to be
sequences like the following

    if (int(uint(some_float)) < 0) /* other relations too */
        ...

At least in Tomb Raider, it's not obvious that this sequence came from
the original shader.

In some of the Unreal demos, the shader contains code like

    if (int(uint(textureLod(...))) > 0)
        ...

which explicitly generates the offending sequence.

All Gen6+ platforms had similar results (Skylake shown):
total instructions in shared programs: 15437170 -> 15437187 (<.01%)
instructions in affected programs: 4492 -> 4509 (0.38%)
helped: 0
HURT: 17
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.57% 0.75%
Instructions are HURT.

total cycles in shared programs: 383007996 -> 383007992 (<.01%)
cycles in affected programs: 20542 -> 20538 (-0.02%)
helped: 6
HURT: 7
helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6
helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27%
95% mean confidence interval for cycles value: -3.30 2.69
95% mean confidence interval for cycles %-change: -0.19% 0.19%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: nagrigoriadis@gmail.com
Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
2019-02-15 11:11:02 -08:00
Ian Romanick
df9dbc03d3 i965/fs: Don't propagate conditional modifiers from integer compares to adds
No shader-db changes on any Intel platform... which probably explains
why no bugs have been bisected to this problem since it landed in Mesa
18.1. :( The commit mentioned below is in 18.2, so 18.1 would need a
slightly different fix (due to code refactoring).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 77f269bb56 "i965/fs: Refactor propagation of conditional modifiers from compares to adds"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (reviewed the original patch)
Cc: Matt Turner <mattst88@gmail.com> (reviewed the original patch)
2018-09-17 00:38:22 -07:00
Ian Romanick
4467040cb6 i965/fs: Propagate conditional modifiers from not instructions
Skylake
total instructions in shared programs: 14399081 -> 14399010 (<.01%)
instructions in affected programs: 26961 -> 26890 (-0.26%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.16% max: 0.80% x̄: 0.30% x̃: 0.18%
95% mean confidence interval for instructions value: -1.50 -0.99
95% mean confidence interval for instructions %-change: -0.35% -0.25%
Instructions are helped.

total cycles in shared programs: 532978307 -> 532976050 (<.01%)
cycles in affected programs: 468629 -> 466372 (-0.48%)
helped: 33
HURT: 20
helped stats (abs) min: 3 max: 360 x̄: 116.52 x̃: 98
helped stats (rel) min: 0.06% max: 3.63% x̄: 1.66% x̃: 1.27%
HURT stats (abs)   min: 2 max: 172 x̄: 79.40 x̃: 43
HURT stats (rel)   min: 0.04% max: 3.02% x̄: 1.48% x̃: 0.44%
95% mean confidence interval for cycles value: -81.29 -3.88
95% mean confidence interval for cycles %-change: -1.07% 0.12%
Inconclusive result (%-change mean confidence interval includes 0).

All Gen6+ platforms, except Ivy Bridge, had similar results. (Haswell shown)
total instructions in shared programs: 12973897 -> 12973838 (<.01%)
instructions in affected programs: 25970 -> 25911 (-0.23%)
helped: 55
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.16% max: 0.62% x̄: 0.28% x̃: 0.18%
95% mean confidence interval for instructions value: -1.14 -1.00
95% mean confidence interval for instructions %-change: -0.32% -0.24%
Instructions are helped.

total cycles in shared programs: 410355841 -> 410352067 (<.01%)
cycles in affected programs: 578454 -> 574680 (-0.65%)
helped: 47
HURT: 5
helped stats (abs) min: 3 max: 360 x̄: 85.74 x̃: 18
helped stats (rel) min: 0.05% max: 3.68% x̄: 1.18% x̃: 0.38%
HURT stats (abs)   min: 2 max: 242 x̄: 51.20 x̃: 4
HURT stats (rel)   min: <.01% max: 0.45% x̄: 0.15% x̃: 0.11%
95% mean confidence interval for cycles value: -104.89 -40.27
95% mean confidence interval for cycles %-change: -1.45% -0.66%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 11679351 -> 11679301 (<.01%)
instructions in affected programs: 28208 -> 28158 (-0.18%)
helped: 50
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.54% x̄: 0.23% x̃: 0.16%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.27% -0.19%
Instructions are helped.

total cycles in shared programs: 257445362 -> 257444662 (<.01%)
cycles in affected programs: 419338 -> 418638 (-0.17%)
helped: 40
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 65.05 x̃: 24
helped stats (rel) min: 0.02% max: 3.51% x̄: 1.26% x̃: 0.41%
HURT stats (abs)   min: 2 max: 1588 x̄: 634.00 x̃: 312
HURT stats (rel)   min: 0.05% max: 2.97% x̄: 1.21% x̃: 0.62%
95% mean confidence interval for cycles value: -97.96 65.41
95% mean confidence interval for cycles %-change: -1.56% -0.62%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

v2: Move 'if (cond != BRW_CONDITIONAL_Z && cond != BRW_CONDITIONAL_NZ)'
check outside the loop.  Suggested by Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
f2d8bb7a7b i965/fs: Rearrange code to remove most of the gotos
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
77f269bb56 i965/fs: Refactor propagation of conditional modifiers from compares to adds
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-15 17:22:27 -07:00
Ian Romanick
020b0055e7 i965/fs: Propagate conditional modifiers from compares to adds
The math inside the add and the cmp in this instruction sequence is the
same.  We can utilize this to eliminate the compare.

add(8)          g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
cmp.z.f0(8)     null<1>F        g2<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };

This is reduced to:

add.z.f0(8)     g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };

This optimization pass could do even better.  The nature of converting
vectorized code from the GLSL front end to scalar code in NIR results in
sequences like:

add(8)          g7<1>F          g4<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
add(8)          g6<1>F          g3<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
add(8)          g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
cmp.z.f0(8)     null<1>F        g2<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };
cmp.z.f0(8)     null<1>F        g3<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g10<1>F         (abs)g6<8,8,1>F 3e-37F          { align1 1Q };
cmp.z.f0(8)     null<1>F        g4<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g12<1>F         (abs)g7<8,8,1>F 3e-37F          { align1 1Q };

In this sequence, only the first cmp.z is removed.  With different
scheduling, all 3 could get removed.

Skylake
total instructions in shared programs: 14407009 -> 14400173 (-0.05%)
instructions in affected programs: 1307274 -> 1300438 (-0.52%)
helped: 4880
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.03% max: 8.70% x̄: 0.70% x̃: 0.52%
95% mean confidence interval for instructions value: -1.45 -1.35
95% mean confidence interval for instructions %-change: -0.72% -0.69%
Instructions are helped.

total cycles in shared programs: 532943169 -> 532923528 (<.01%)
cycles in affected programs: 14065798 -> 14046157 (-0.14%)
helped: 2703
HURT: 339
helped stats (abs) min: 1 max: 1062 x̄: 12.27 x̃: 2
helped stats (rel) min: <.01% max: 28.72% x̄: 0.38% x̃: 0.21%
HURT stats (abs)   min: 1 max: 739 x̄: 39.86 x̃: 12
HURT stats (rel)   min: 0.02% max: 27.69% x̄: 1.38% x̃: 0.41%
95% mean confidence interval for cycles value: -8.66 -4.26
95% mean confidence interval for cycles %-change: -0.24% -0.14%
Cycles are helped.

LOST:   0
GAINED: 1

Broadwell
total instructions in shared programs: 14719636 -> 14712949 (-0.05%)
instructions in affected programs: 1288188 -> 1281501 (-0.52%)
helped: 4845
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 8.00% x̄: 0.70% x̃: 0.52%
95% mean confidence interval for instructions value: -1.43 -1.33
95% mean confidence interval for instructions %-change: -0.72% -0.68%
Instructions are helped.

total cycles in shared programs: 559599253 -> 559581699 (<.01%)
cycles in affected programs: 13315565 -> 13298011 (-0.13%)
helped: 2600
HURT: 269
helped stats (abs) min: 1 max: 2128 x̄: 12.24 x̃: 2
helped stats (rel) min: <.01% max: 23.95% x̄: 0.41% x̃: 0.20%
HURT stats (abs)   min: 1 max: 790 x̄: 53.07 x̃: 20
HURT stats (rel)   min: 0.02% max: 15.96% x̄: 1.55% x̃: 0.75%
95% mean confidence interval for cycles value: -8.47 -3.77
95% mean confidence interval for cycles %-change: -0.27% -0.18%
Cycles are helped.

LOST:   0
GAINED: 8

Haswell
total instructions in shared programs: 12978609 -> 12973483 (-0.04%)
instructions in affected programs: 932921 -> 927795 (-0.55%)
helped: 3480
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.47 x̃: 1
helped stats (rel) min: 0.03% max: 7.84% x̄: 0.78% x̃: 0.58%
95% mean confidence interval for instructions value: -1.53 -1.42
95% mean confidence interval for instructions %-change: -0.80% -0.75%
Instructions are helped.

total cycles in shared programs: 410270788 -> 410250531 (<.01%)
cycles in affected programs: 10986161 -> 10965904 (-0.18%)
helped: 2087
HURT: 254
helped stats (abs) min: 1 max: 2672 x̄: 14.63 x̃: 4
helped stats (rel) min: <.01% max: 39.61% x̄: 0.42% x̃: 0.21%
HURT stats (abs)   min: 1 max: 519 x̄: 40.49 x̃: 16
HURT stats (rel)   min: 0.01% max: 12.83% x̄: 1.20% x̃: 0.47%
95% mean confidence interval for cycles value: -12.82 -4.49
95% mean confidence interval for cycles %-change: -0.31% -0.18%
Cycles are helped.

LOST:   0
GAINED: 5

Ivy Bridge
total instructions in shared programs: 11686082 -> 11681548 (-0.04%)
instructions in affected programs: 937696 -> 933162 (-0.48%)
helped: 3150
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 7.84% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: -1.49 -1.38
95% mean confidence interval for instructions %-change: -0.71% -0.67%
Instructions are helped.

total cycles in shared programs: 257514962 -> 257492471 (<.01%)
cycles in affected programs: 11524149 -> 11501658 (-0.20%)
helped: 1970
HURT: 239
helped stats (abs) min: 1 max: 3525 x̄: 17.48 x̃: 3
helped stats (rel) min: <.01% max: 49.60% x̄: 0.46% x̃: 0.17%
HURT stats (abs)   min: 1 max: 1358 x̄: 50.00 x̃: 15
HURT stats (rel)   min: 0.02% max: 59.88% x̄: 1.84% x̃: 0.65%
95% mean confidence interval for cycles value: -17.01 -3.35
95% mean confidence interval for cycles %-change: -0.33% -0.08%
Cycles are helped.

LOST:   9
GAINED: 1

Sandy Bridge
total instructions in shared programs: 10432841 -> 10429893 (-0.03%)
instructions in affected programs: 685071 -> 682123 (-0.43%)
helped: 2453
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 1.20 x̃: 1
helped stats (rel) min: 0.02% max: 7.55% x̄: 0.64% x̃: 0.46%
95% mean confidence interval for instructions value: -1.23 -1.17
95% mean confidence interval for instructions %-change: -0.67% -0.62%
Instructions are helped.

total cycles in shared programs: 146133660 -> 146134195 (<.01%)
cycles in affected programs: 3991634 -> 3992169 (0.01%)
helped: 1237
HURT: 153
helped stats (abs) min: 1 max: 2853 x̄: 6.93 x̃: 2
helped stats (rel) min: <.01% max: 29.00% x̄: 0.24% x̃: 0.14%
HURT stats (abs)   min: 1 max: 1740 x̄: 59.56 x̃: 12
HURT stats (rel)   min: 0.03% max: 78.98% x̄: 1.96% x̃: 0.42%
95% mean confidence interval for cycles value: -5.13 5.90
95% mean confidence interval for cycles %-change: -0.17% 0.16%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 1

GM45 and Iron Lake had similar results (GM45 shown):
total instructions in shared programs: 4800332 -> 4798380 (-0.04%)
instructions in affected programs: 565995 -> 564043 (-0.34%)
helped: 1451
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.35 x̃: 1
helped stats (rel) min: 0.05% max: 5.26% x̄: 0.47% x̃: 0.31%
95% mean confidence interval for instructions value: -1.40 -1.29
95% mean confidence interval for instructions %-change: -0.50% -0.45%
Instructions are helped.

total cycles in shared programs: 122032318 -> 122027798 (<.01%)
cycles in affected programs: 8334868 -> 8330348 (-0.05%)
helped: 1029
HURT: 1
helped stats (abs) min: 2 max: 40 x̄: 4.43 x̃: 2
helped stats (rel) min: <.01% max: 1.83% x̄: 0.09% x̃: 0.04%
HURT stats (abs)   min: 38 max: 38 x̄: 38.00 x̃: 38
HURT stats (rel)   min: 0.25% max: 0.25% x̄: 0.25% x̃: 0.25%
95% mean confidence interval for cycles value: -4.70 -4.08
95% mean confidence interval for cycles %-change: -0.09% -0.08%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Ian Romanick
5bbb3d60d3 i965/fs: Allow cmod propagation when src0 is a uniform or shader input
No shader-db changes.  This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input.  However, this
change allows the next commit to help about 900 more shaders.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Jason Ekstrand
7463d50580 intel/compiler: Don't propagate cmod into integer multiplies
No shader-db change on Sky Lake.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-10-05 11:54:49 -07:00
Jason Ekstrand
b91ecee04a intel/compiler: Don't cmod propagate into a saturated operation
Shader-db results on Sky Lake:

    total instructions in shared programs: 12954445 -> 12955125 (0.01%)
    instructions in affected programs: 141862 -> 142542 (0.48%)
    helped: 0
    HURT: 626

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-10-05 11:54:49 -07:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp (Browse further)