Commit graph

399 commits

Author SHA1 Message Date
Ian Romanick
8d6b35fffd nir/algebraic: Fail build when too many commutative expressions are used
Search patterns that are expected to have too many (e.g., the giant
bitfield_reverse pattern) can be added to a white list.

This would have saved me a few hours debugging. :(

v2: Implement the expected-failure annotation as a property of the
search-replace pattern instead of as a property of the whole list of
patterns.  Suggested by Connor.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 18:56:19 -07:00
Eric Anholt
8fd8964302 nir: Fix lowering of bitfield_insert to shifts.
The bfi/bfm behavior change replaced the bfi/bfm usage in
lower_bitfield_insert_to_shifts with actual shifts like the name says,
but it failed to handle the offset=0, bits==32 case in the new
lowering.

v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits,
    31) -> false optimization.

Fixes regressions in dEQP-GLES31.*bitfield_insert* on freedreno.

Fixes: 165b7f3a44 ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-06-28 16:38:23 -07:00
Caio Marcelo de Oliveira Filho
085c0f1f13 nir/algebraic: Add helpers and a rule involving wrapping
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules.  Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Jonathan Marek
a70ff70158 nir: remove fnot/fxor/fand/for opcodes
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-26 15:26:10 -04:00
Daniel Schürmann
a8b0b6e52b nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select.
bitfield_select is defined as:
bitfield_select(mask, base, insert) = (mask & base) | (~mask & insert)
matching the behavior of AMD's BFI instruction.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
1403c3a7bf nir/algebraic: Use unsigned comparison when lowering bitfield insert/extract
This lets us use the optimization pattern
(('ult', 31, ('iand', b, 31)), False) to remove the
bcsel instruction for code originating in D3D shaders.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
4eeb49ea71 nir/algebraic: Remove unnecessary iand of [iu]bfe and bfm sources
The [iu]bfe and bfm instructions are defined to only use the five
least significant bits.
This optimizes a common pattern from D3D -> SPIR-V translation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
a74f256c58 nir/algebraic: add optimization pattern for ('ult', a, ('and', b, a)) and friends.
These optimizations are based on the fact that
'and(a,b) <= umin(a,b)'.
For AMD, this series moves the optimization from LLVM to NIR,
so currently no vkpipeline-db changes here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-24 18:42:20 +02:00
Eduardo Lima Mitev
fb2169040a nir/opt_algebraic: Fix rules for imadsh_mix16
The rules added in patch 3addd7c are inverted:

It should be:

(al * bh) << 16 + c

instead of:

(ah * bl) << 16 + c

Fixes a number of regressions under
dEQP-GLES31.functional.draw_indirect.compute_interop.large.*
on Freedreno.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-10 22:27:46 +02:00
Eduardo Lima Mitev
3addd7c8d9 nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16
For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.

for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.

A couple of nir_search_helpers are added:

is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.

is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Kenneth Graunke
c7d1b52a2c nir: Combine lower_fmod16/32 back into a single lower_fmod.
We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
edd45af9ba nir: Drop lower_fmod64 option.
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64.  The version there also better handles imprecisions
due to lowered frcp@64.  Let's consolidate on one version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Alyssa Rosenzweig
d2d3cc66cf nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)
This pattern was noticed in glmark's jellyfish scene.

v2: Add inexact qualifier due to NaN behaviour.

Minimal shader-db changes (slightly helped).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-06-04 19:57:19 +00:00
Jonathan Marek
f889180ee1 nir: add lower_bitshift option
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Alyssa Rosenzweig
46494c3dc1 nir/algebraic: Remove problematic "optimization"
This line is no longer relevant now that booleans are 1-bit, and in fact
causes issues (infinite progress loop between algebraic optimizations
and copy prop) with constant vector masks.

No shader-db changes on Intel platforms (Jason).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2019-05-16 02:08:37 +00:00
Ian Romanick
32d259713b nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions
The goal is to avoid having an extra MOV instruction to perform the
saturate.  Doing the subtraction first allows the saturate to be applied
to the ADD instruction making the MOV unnecessary.  Values generated in
different block and values from non-ALU instructions (e.g., texture
instructions) almost always need the extra MOV.

Multiply instructions are restricted because doing this rearrangement
can interfere with the generation of flrp and ffma instructions.

v2: Now that the final method has been selected, squash three commits
into one.

All Intel platforms has similar results. (Ice Lake shown)
total instructions in shared programs: 17223214 -> 17219386 (-0.02%)
instructions in affected programs: 1524376 -> 1520548 (-0.25%)
helped: 2686
HURT: 26
helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37%
HURT stats (abs)   min: 1 max: 2 x̄: 1.69 x̃: 2
HURT stats (rel)   min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35%
95% mean confidence interval for instructions value: -1.46 -1.36
95% mean confidence interval for instructions %-change: -0.56% -0.50%
Instructions are helped.

total cycles in shared programs: 360811571 -> 360791896 (<.01%)
cycles in affected programs: 103650214 -> 103630539 (-0.02%)
helped: 1557
HURT: 675
helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16
helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1513 x̄: 66.44 x̃: 14
HURT stats (rel)   min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49%
95% mean confidence interval for cycles value: -14.82 -2.81
95% mean confidence interval for cycles %-change: -0.50% -0.20%
Cycles are helped.

LOST:   2
GAINED: 0

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:23 -07:00
Ian Romanick
a7f0c57673 nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)
v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224337 -> 17224269 (<.01%)
instructions in affected programs: 13578 -> 13510 (-0.50%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.05% -0.63%
Instructions are helped.

total cycles in shared programs: 360826090 -> 360825137 (<.01%)
cycles in affected programs: 94867 -> 93914 (-1.00%)
helped: 58
HURT: 1
helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18
helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22%
HURT stats (abs)   min: 76 max: 76 x̄: 76.00 x̃: 76
HURT stats (rel)   min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86%
95% mean confidence interval for cycles value: -19.53 -12.78
95% mean confidence interval for cycles %-change: -1.56% -1.08%
Cycles are helped.

No changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:23 -07:00
Ian Romanick
281f20e26d nir/algebraic: Strip double negatives from comparison sources
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224623 -> 17224337 (<.01%)
instructions in affected programs: 32648 -> 32362 (-0.88%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2
helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08%
95% mean confidence interval for instructions value: -1.97 -1.89
95% mean confidence interval for instructions %-change: -1.15% -1.00%
Instructions are helped.

total cycles in shared programs: 360828714 -> 360826090 (<.01%)
cycles in affected programs: 347416 -> 344792 (-0.76%)
helped: 148
HURT: 26
helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18
helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41%
HURT stats (abs)   min: 2 max: 337 x̄: 48.96 x̃: 6
HURT stats (rel)   min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27%
95% mean confidence interval for cycles value: -23.78 -6.38
95% mean confidence interval for cycles %-change: -1.59% -0.79%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
45c7ff95fc intel/compiler: Repeat nir_opt_algebraic_late
A tiny bit of help seems to come from nir_copy_prop.  Future patches
will benefit from this change.

Doing more copy propagation on the vec4 backend led to a disaster in
hurt cycles.

v2: Fix typo in comment.  Noticed by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224634 -> 17224623 (<.01%)
instructions in affected programs: 4586 -> 4575 (-0.24%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.19%
Instructions are helped.

total cycles in shared programs: 360828542 -> 360828714 (<.01%)
cycles in affected programs: 151159 -> 151331 (0.11%)
helped: 49
HURT: 28
helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6
helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42%
HURT stats (abs)   min: 1 max: 196 x̄: 52.36 x̃: 15
HURT stats (rel)   min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88%
95% mean confidence interval for cycles value: -13.48 17.95
95% mean confidence interval for cycles %-change: -0.69% 0.84%
Inconclusive result (value mean confidence interval includes 0).

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13529544 -> 13529542 (<.01%)
instructions in affected programs: 358 -> 356 (-0.56%)
helped: 2
HURT: 0

total cycles in shared programs: 357290311 -> 357289678 (<.01%)
cycles in affected programs: 178324 -> 177691 (-0.35%)
helped: 48
HURT: 40
helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13
helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66%
HURT stats (abs)   min: 1 max: 224 x̄: 22.00 x̃: 6
HURT stats (rel)   min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31%
95% mean confidence interval for cycles value: -18.28 3.89
95% mean confidence interval for cycles %-change: -1.01% 0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8159110 -> 8158980 (<.01%)
instructions in affected programs: 22719 -> 22589 (-0.57%)
helped: 65
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74%
95% mean confidence interval for instructions value: -2.06 -1.94
95% mean confidence interval for instructions %-change: -0.78% -0.68%
Instructions are helped.

total cycles in shared programs: 188609448 -> 188609214 (<.01%)
cycles in affected programs: 1875852 -> 1875618 (-0.01%)
helped: 109
HURT: 104
helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4
helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07%
HURT stats (abs)   min: 2 max: 20 x̄: 3.31 x̃: 2
HURT stats (rel)   min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for cycles value: -1.95 -0.25
95% mean confidence interval for cycles %-change: -0.04% -0.01%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
d2a9ba03e3 Revert "nir: add late opt to turn inot/b2f combos back to bcsel"
This reverts commit 7acc865226.

With these optimizations in place, the extra constant folding added in
the next commit extends some live ranges of 0.0 and ±1.0 constants, and
that causes several hundred shaders to have more spills and fills.

I believe this optimization we made basically irrelevant by 7725d60938
"intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))".

All Gen7.5+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225303 -> 17224634 (<.01%)
instructions in affected programs: 879402 -> 878733 (-0.08%)
helped: 679
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.02 -0.95
95% mean confidence interval for instructions %-change: -0.26% -0.22%
Instructions are helped.

total cycles in shared programs: 360842595 -> 360828542 (<.01%)
cycles in affected programs: 110443594 -> 110429541 (-0.01%)
helped: 389
HURT: 265
helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28
helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11%
HURT stats (abs)   min: 1 max: 7614 x̄: 185.96 x̃: 48
HURT stats (rel)   min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10%
95% mean confidence interval for cycles value: -75.65 32.67
95% mean confidence interval for cycles %-change: -0.49% -0.06%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12159 -> 12161 (0.02%)
spills in affected programs: 13 -> 15 (15.38%)
helped: 0
HURT: 1

total fills in shared programs: 25207 -> 25208 (<.01%)
fills in affected programs: 25 -> 26 (4.00%)
helped: 0
HURT: 1

Ivy Bridge
total instructions in shared programs: 12082019 -> 12082013 (<.01%)
instructions in affected programs: 1033 -> 1027 (-0.58%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.78% -0.45%
Instructions are helped.

total cycles in shared programs: 179849270 -> 179849157 (<.01%)
cycles in affected programs: 4735 -> 4622 (-2.39%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18
helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36%
95% mean confidence interval for cycles value: -82.73 26.23
95% mean confidence interval for cycles %-change: -7.98% 2.28%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10882750 -> 10882748 (<.01%)
instructions in affected programs: 266 -> 264 (-0.75%)
helped: 2
HURT: 0

Iron Lake
total cycles in shared programs: 188609440 -> 188609448 (<.01%)
cycles in affected programs: 4320 -> 4328 (0.19%)
helped: 0
HURT: 2

GM45
total cycles in shared programs: 129016868 -> 129016872 (<.01%)
cycles in affected programs: 2302 -> 2306 (0.17%)
helped: 0
HURT: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
3cb091f8b4 nir/algebraic: Eliminate a tautological compare
The value-range tracking pass that is coming is not clever enough to
know that the result of the ffma must be non-negative.  Making it that
smart will require quite a bit of work.  It might be possible to add a
special case that detects that a whole tree of fadd(fmul(fsat(a),
fneg(fsat(a))), 1.0) cannot be negative.

For cases when the comparison is used in the domain guard for a
square-root (see nir/algebraic: Simplify fsqrt domain guard), the
compare may be converted to a fmax.  This patch also handles that case.

All of the affected cases are in DiRT: Showdown.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225365 -> 17225303 (<.01%)
instructions in affected programs: 40051 -> 39989 (-0.15%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.31% -0.22%
Instructions are helped.

total cycles in shared programs: 360842788 -> 360842595 (<.01%)
cycles in affected programs: 1818081 -> 1817888 (-0.01%)
helped: 29
HURT: 22
helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14
helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42%
HURT stats (abs)   min: 1 max: 108 x̄: 18.45 x̃: 7
HURT stats (rel)   min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19%
95% mean confidence interval for cycles value: -14.48 6.91
95% mean confidence interval for cycles %-change: -0.71% 0.21%
Inconclusive result (value mean confidence interval includes 0).

No changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
9725e45b3d nir/algebraic: Simplify fsqrt domain guard
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17228376 -> 17225365 (-0.02%)
instructions in affected programs: 280732 -> 277721 (-1.07%)
helped: 1072
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2
helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07%
95% mean confidence interval for instructions value: -2.92 -2.70
95% mean confidence interval for instructions %-change: -1.48% -1.37%
Instructions are helped.

total cycles in shared programs: 360935690 -> 360842788 (-0.03%)
cycles in affected programs: 7838017 -> 7745115 (-1.19%)
helped: 1569
HURT: 69
helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20
helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12%
HURT stats (abs)   min: 1 max: 2820 x̄: 98.22 x̃: 47
HURT stats (rel)   min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31%
95% mean confidence interval for cycles value: -63.55 -49.89
95% mean confidence interval for cycles %-change: -3.33% -2.96%
Cycles are helped.

No changes on any other platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
5116646a76 nir/algebraic: Recognize open-coded fsat with modifiers
This change also enables a later change (nir/algebraic: Replace
1-fsat(a) with fsat(1-a)) to affect more shaders.

Almost all of the affected shaders are in Bioshock Infinite, and all of
those shaders all require GLSL 4.10.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17228584 -> 17228376 (<.01%)
instructions in affected programs: 31438 -> 31230 (-0.66%)
helped: 105
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1
helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70%
95% mean confidence interval for instructions value: -2.20 -1.76
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.

total cycles in shared programs: 360936431 -> 360935690 (<.01%)
cycles in affected programs: 420100 -> 419359 (-0.18%)
helped: 71
HURT: 21
helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10
helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48%
HURT stats (abs)   min: 1 max: 198 x̄: 29.90 x̃: 10
HURT stats (rel)   min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90%
95% mean confidence interval for cycles value: -16.77 0.66
95% mean confidence interval for cycles %-change: -0.85% -0.06%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
c769641c8e nir/algebraic: Push unary operations into source operands of fsat source
Pushing a unary operation, like fneg, into the operation that generates
its operand allows the fsat to be applied to the inner instruction
instead of on a separate instruction that performs the unary operation.
This changes

    fmul	ssa_100, ssa_99, ssa_98
    fmov.sat	ssa_101, -ssa_100

into

    fmul.sat	ssa_100, -ssa_99, ssa_98

Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown)
total instructions in shared programs: 17228658 -> 17228584 (<.01%)
instructions in affected programs: 3163 -> 3089 (-2.34%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2
helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51%
95% mean confidence interval for instructions value: -1.66 -1.37
95% mean confidence interval for instructions %-change: -4.37% -3.00%
Instructions are helped.

total cycles in shared programs: 360937144 -> 360936431 (<.01%)
cycles in affected programs: 24029 -> 23316 (-2.97%)
helped: 47
HURT: 2
helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16
helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50%
95% mean confidence interval for cycles value: -16.05 -13.05
95% mean confidence interval for cycles %-change: -4.07% -3.15%
Cycles are helped.

All Gen7 and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 13536059 -> 13535884 (<.01%)
instructions in affected programs: 8797 -> 8622 (-1.99%)
helped: 150
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96%
95% mean confidence interval for instructions value: -1.23 -1.11
95% mean confidence interval for instructions %-change: -3.97% -3.05%
Instructions are helped.

total cycles in shared programs: 357696119 -> 357694193 (<.01%)
cycles in affected programs: 50216 -> 48290 (-3.84%)
helped: 109
HURT: 14
helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16
helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37%
HURT stats (abs)   min: 2 max: 26 x̄: 10.14 x̃: 5
HURT stats (rel)   min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92%
95% mean confidence interval for cycles value: -19.27 -12.05
95% mean confidence interval for cycles %-change: -7.34% -5.31%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
3b74790941 nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))
All Gen6+ GPUs had similar results. (Skylake shown)
total instructions in shared programs: 15336712 -> 15336622 (<.01%)
instructions in affected programs: 3952 -> 3862 (-2.28%)
helped: 24
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4
helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46%
95% mean confidence interval for instructions value: -4.06 -3.44
95% mean confidence interval for instructions %-change: -2.47% -2.22%
Instructions are helped.

total cycles in shared programs: 355722052 -> 355721235 (<.01%)
cycles in affected programs: 27326 -> 26509 (-2.99%)
helped: 20
HURT: 4
helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14
helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23%
HURT stats (abs)   min: 2 max: 64 x̄: 19.50 x̃: 6
HURT stats (rel)   min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55%
95% mean confidence interval for cycles value: -61.61 -6.47
95% mean confidence interval for cycles %-change: -5.59% -0.39%
Cycles are helped.

No changes on Ice Lake, Iron Lake, or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:21 -07:00
Ian Romanick
a7724b1cbb nir/algebraic: Add missing ffma(-1, a, b) pattern
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17229439 -> 17229377 (<.01%)
instructions in affected programs: 9859 -> 9797 (-0.63%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67%
95% mean confidence interval for instructions value: -1.88 -1.14
95% mean confidence interval for instructions %-change: -2.48% -0.81%
Instructions are helped.

total cycles in shared programs: 360944145 -> 360942989 (<.01%)
cycles in affected programs: 178167 -> 177011 (-0.65%)
helped: 36
HURT: 19
helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5
helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45%
HURT stats (abs)   min: 1 max: 34 x̄: 11.21 x̃: 6
HURT stats (rel)   min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50%
95% mean confidence interval for cycles value: -36.01 -6.02
95% mean confidence interval for cycles %-change: -4.18% -0.57%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:03 -07:00
Ian Romanick
7b4ff6a1af nir: Mark ffma as 2src_commutative
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns.  Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.

No shader-db changes on any Intel platform.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
ab86926156 nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
dc566a033c nir/algebraic: Pull common multiplication out of flrp arguments
All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
a83a6e9690 nir/algebraic: Pull common addition out of flrp arguments
v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Jason Ekstrand
00d4e78ea9 nir/algebraic: Optimize integer cast-of-cast
These have been popping up more and more with the OpenCL work and other
bits causing extra conversions to/from 64-bit.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-26 04:26:08 -05:00
Erico Nunes
4577eb7b7c nir/algebraic: add lowering for fsign
The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-19 15:42:23 +00:00
Ian Romanick
6b97fa9a99 nir/algebraic: Strength reduce some compares of x and -x
Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick
f3d6df719c nir/algebraic: Fix some 1-bit Boolean weirdness
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick
403aac7500 nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel
All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick
25bfba3335 nir/algebraic: Recognize open-coded copysign(1.0, a)
All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Christian Gmeiner
b6bed115a5 nir: add lower_ftrunc
Port TGSI TRUNC lowering to nir

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-13 17:54:48 +00:00
Caio Marcelo de Oliveira Filho
d08a74d2bf nir/algebraic: Lower CS derivatives to zero when no group defined
In compute shaders if no derivative group is defined, the derivatives
will always be zero.  Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules.  (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:32 -07:00
Jason Ekstrand
ad8c145658 nir/algebraic: Add some logical OR and AND patterns
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang.  Not sure about the other two new patterns but we
may as well throw them in for completeness.  While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15321227 -> 15321129 (<.01%)
    instructions in affected programs: 3594 -> 3496 (-2.73%)
    helped: 6
    HURT: 0

    total cycles in shared programs: 357481321 -> 357479725 (<.01%)
    cycles in affected programs: 44109 -> 42513 (-3.62%)
    helped: 6
    HURT: 0

VkPipeline-DB results on Kaby Lake:

    total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
    instructions in affected programs: 19058 -> 18288 (-4.04%)
    helped: 163
    HURT: 0

    total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
    cycles in affected programs: 750958 -> 736984 (-1.86%)
    helped: 158
    HURT: 1

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:06 -05:00
Jason Ekstrand
03a72d96d8 nir/algebraic: Drop some @bool specifiers
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time.  We can drop these specifiers and now the optimizations
will apply more generally.

Shader-DB results on Kaby Lake:

    total instructions in shared programs: 15321168 -> 15321227 (<.01%)
    instructions in affected programs: 8836 -> 8895 (0.67%)
    helped: 1
    HURT: 31

    total cycles in shared programs: 357481781 -> 357481321 (<.01%)
    cycles in affected programs: 146524 -> 146064 (-0.31%)
    helped: 22
    HURT: 10

    total spills in shared programs: 23675 -> 23673 (<.01%)
    spills in affected programs: 11 -> 9 (-18.18%)
    helped: 1
    HURT: 0

    total fills in shared programs: 32040 -> 32036 (-0.01%)
    fills in affected programs: 27 -> 23 (-14.81%)
    helped: 1
    HURT: 0

No change in VkPipeline-DB

Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb.  In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL.  Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win?  Also it helps spilling in one Car Chase compute shader.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:00 -05:00
Ian Romanick
ae21b52e1d nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick
cbad201c2b nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick
bc17f5a2a3 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-28 15:35:52 -07:00
Ian Romanick
c152672e68 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Iago Toral Quiroga
763c8aabed compiler/nir: add lowering for 16-bit ldexp
v2 (Topi):
 - Make bit-size handling order be 16-bit, 32-bit, 64-bit
 - Clamp lower exponent range at -28 instead of -30.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga
3766334923 compiler/nir: add lowering for 16-bit flrp
And enable it on Intel.

v2:
 - Squash the change to enable it on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga
ca31df6f1f compiler/nir: add lowering option for 16-bit fmod
And enable it on Intel.

v2:
 - Squash the change to enable this lowering on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Jason Ekstrand
2b76de9b5d nir/algebraic: Add a couple optimizations for iabs and ishr
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15225213 -> 15222365 (-0.02%)
    instructions in affected programs: 43524 -> 40676 (-6.54%)
    helped: 203
    HURT: 0

Lots of shaders in Shadow Warrior had this pattern along with Deus Ex,
Civ, Shadow of Mordor, and several others.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-15 01:02:19 +00:00