Commit graph

4029 commits

Author SHA1 Message Date
Ian Romanick
a7724b1cbb nir/algebraic: Add missing ffma(-1, a, b) pattern
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17229439 -> 17229377 (<.01%)
instructions in affected programs: 9859 -> 9797 (-0.63%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67%
95% mean confidence interval for instructions value: -1.88 -1.14
95% mean confidence interval for instructions %-change: -2.48% -0.81%
Instructions are helped.

total cycles in shared programs: 360944145 -> 360942989 (<.01%)
cycles in affected programs: 178167 -> 177011 (-0.65%)
helped: 36
HURT: 19
helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5
helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45%
HURT stats (abs)   min: 1 max: 34 x̄: 11.21 x̃: 6
HURT stats (rel)   min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50%
95% mean confidence interval for cycles value: -36.01 -6.02
95% mean confidence interval for cycles %-change: -4.18% -0.57%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:03 -07:00
Ian Romanick
7b4ff6a1af nir: Mark ffma as 2src_commutative
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns.  Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.

No shader-db changes on any Intel platform.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
e049a9c92b nir: Add support for 2src_commutative ops that have 3 sources
v2: Instead of handling 3 sources as a special case, generalize with
loops to N sources.  Suggested by Jason.

v3: Further generalize by only checking that number of sources is >= 2.
Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
ede45bf9cf nir: Rename commutative to 2src_commutative
The meaning of the new name is that the first two sources are
commutative.  Since this is only currently applied to two-source
operations, there is no change.

A future change will mark ffma as 2src_commutative.

It is also possible that future work will add 3src_commutative for
opcodes like fmin3.

v2: s/commutative_2src/2src_commutative/g.  I had originally considered
this, but I discarded it because I did't want to deal with identifiers
that (should) start with 2.  Jason suggested it in review, so we decided
that _2src_commutative would be used in nir_opcodes.py.  Also add some
comments documenting what 2src_commutative means.  Also suggested by
Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Jason Ekstrand
712f99934c nir/validate: Use a single set for SSA def validation
The current SSA def validation we do in nir_validate validates three
things:

 1. That each SSA def is only ever used in the function in which it is
    defined.

 2. That an nir_src exists in an SSA def's use list if and only if it
    points to that SSA def.

 3. That each nir_src is in the correct use list (uses or if_uses) based
    on whether it's an if condition or not.

The way we were doing this before was that we had a hash table which
provided a map from SSA def to a small ssa_def_validate_state data
structure which contained a pointer to the nir_function_impl and two
hash sets, one for each use list.  This meant piles of allocation and
creating of little hash sets.  It also meant one hash lookup for each
SSA def plus one per use as well as two per src (because we have to look
up the ssa_def_validate_state and then look up the use.)  It also
involved a second walk over the instructions as a post-validate step.

This commit changes us to use a single low-collision hash set of SSA
sources for all of this by being a bit more clever.  We accomplish the
objectives above as follows:

 1. The list is clear when we start validating a function.  If the
    nir_src references an SSA def which is defined in a different
    function, it simply won't be in the set.

 2. When validating the SSA defs, we walk the uses and verify that they
    have is_ssa set and that the SSA def points to the SSA def we're
    validating.  This catches the case of a nir_src being in the wrong
    list.  We then put the nir_src in the set and, when we validate the
    nir_src, we assert that it's in the set.  This takes care of any
    cases where a nir_src isn't in the use list.  After checking that
    the nir_src is in the set, we remove it from the set and, at the end
    of nir_function_impl validation, we assert that the set is empty.
    This takes care of any cases where a nir_src is in a use list but
    the instruction is no longer in the shader.

 3. When we put a nir_src in the set, we set the bottom bit of the
    pointer to 1 if it's the condition of an if.  This lets us detect
    whether or not a nir_src is in the right list.

When running shader-db with an optimized debug build of mesa on my
laptop, I get the following shader-db CPU times:

   With NIR_VALIDATE=0       3033.34 seconds
   Before this commit       20224.83 seconds
   After this commit         6255.50 seconds

Assuming shader-db is a representative sampling of GLSL shaders, this
means that making this change yields an 81% reduction in the time spent
in nir_validate.  It still isn't cheap but enabling validation now only
increases compile times by 2x instead of 6.6x.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-13 14:43:47 +00:00
Jason Ekstrand
460567eabf nir/validate: Use a ralloc context for our temporary data
All of our hash tables and sets are already using ralloc.  There's
really no good reason why we don't just make a ralloc context rather
than try to remember to clean everything up manually.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-13 14:43:47 +00:00
Ruslan Kabatsayev
974c4d679c nir: Fix wrong sign in lower_rcp
The nested fma calls were supposed to implement

x_new = x + x * (1 - x*src),

but instead current code is equivalent to

x_new = x - x * (1 - x*src).

The result is that Newton-Raphson steps don't improve precision at all.
This patch fixes this problem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-11 09:25:22 -07:00
Alyssa Rosenzweig
006cafc243 nir: Add blend_const_color_rgba sysval
This represents a float vec4 constant color, as passed to glBlendColor.
While the existing 4 shader sysvals are retained to minimize code churn,
a single vectorized intrinsic is required for efficient blending on
vector architectures. (This may also apply to archictectures like
Bifrost where ALU is scalar but load/store is vector; it largely depends
on how blending is implemented per-driver.)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:49:28 +00:00
Alyssa Rosenzweig
f41be53a17 compiler: Add enums for blend state
We add enums corresponding to (GLES) blend state to shader_enums.h,
complementing the existing advanced blending enums in the file. This
allows us to represent blending state in a driver-agnostic, API-agnostic
way to permit lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:49:01 +00:00
Jonathan Marek
d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Brian Paul
2e28983ed2 glsl: s/GLboolean/bool/ to silence MSVC compiler warning
It complains about mixing GLboolean and bool in the |= expression.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Ian Romanick
ed5f024515 nir/flrp: Reassociate add in flrp(±1, b, c) lowering path
With this reassociation, this lowering path is still beneficial.

Ice Lake
total instructions in shared programs: 17220191 -> 17207181 (-0.08%)
instructions in affected programs: 999871 -> 986861 (-1.30%)
helped: 3703
HURT: 17
helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3
helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35%
HURT stats (abs)   min: 1 max: 9 x̄: 1.47 x̃: 1
HURT stats (rel)   min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55%
95% mean confidence interval for instructions value: -4.01 -2.99
95% mean confidence interval for instructions %-change: -2.29% -2.11%
Instructions are helped.

total cycles in shared programs: 360871298 -> 360755040 (-0.03%)
cycles in affected programs: 9931334 -> 9815076 (-1.17%)
helped: 2388
HURT: 1569
helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18
helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07%
HURT stats (abs)   min: 1 max: 1917 x̄: 68.27 x̃: 22
HURT stats (rel)   min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72%
95% mean confidence interval for cycles value: -39.48 -19.28
95% mean confidence interval for cycles %-change: -0.86% -0.46%
Cycles are helped.

total spills in shared programs: 12355 -> 12159 (-1.59%)
spills in affected programs: 295 -> 99 (-66.44%)
helped: 2
HURT: 1

total fills in shared programs: 25398 -> 25207 (-0.75%)
fills in affected programs: 288 -> 97 (-66.32%)
helped: 2
HURT: 1

LOST:   3
GAINED: 44

Iron Lake
total instructions in shared programs: 8169225 -> 8159729 (-0.12%)
instructions in affected programs: 1025712 -> 1016216 (-0.93%)
helped: 3352
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3
helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05%
95% mean confidence interval for instructions value: -2.86 -2.80
95% mean confidence interval for instructions %-change: -1.56% -1.46%
Instructions are helped.

total cycles in shared programs: 188656796 -> 188612280 (-0.02%)
cycles in affected programs: 18633584 -> 18589068 (-0.24%)
helped: 3085
HURT: 14
helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -14.55 -14.18
95% mean confidence interval for cycles %-change: -0.76% -0.69%
Cycles are helped.

GM45
total instructions in shared programs: 5026905 -> 5021856 (-0.10%)
instructions in affected programs: 584169 -> 579120 (-0.86%)
helped: 1776
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3
helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98%
95% mean confidence interval for instructions value: -2.88 -2.80
95% mean confidence interval for instructions %-change: -1.50% -1.37%
Instructions are helped.

total cycles in shared programs: 129047376 -> 129018918 (-0.02%)
cycles in affected programs: 12941924 -> 12913466 (-0.22%)
helped: 1722
HURT: 14
helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -16.65 -16.13
95% mean confidence interval for cycles %-change: -0.76% -0.66%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-08 07:41:54 -07:00
Ian Romanick
ba203a3cd7 nir/flrp: Fix typo on the flrp(±1, b, c) path
After Samuel reported the bisect, I was able to find the bug by
inspection.  Good thing for well-named varibles. :)

Unfortunately, this undoes almost all of the benefit of the original
patch.

Ice Lake
total instructions in shared programs: 17183159 -> 17218166 (0.20%)
instructions in affected programs: 1308722 -> 1343729 (2.67%)
helped: 98
HURT: 4746
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57%
HURT stats (abs)   min: 1 max: 691 x̄: 7.40 x̃: 8
HURT stats (rel)   min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83%
95% mean confidence interval for instructions value: 6.82 7.64
95% mean confidence interval for instructions %-change: 5.22% 6.15%
Instructions are HURT.

total cycles in shared programs: 360705959 -> 360853522 (0.04%)
cycles in affected programs: 10754380 -> 10901943 (1.37%)
helped: 1594
HURT: 3331
helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60
helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64%
HURT stats (abs)   min: 1 max: 10208 x̄: 101.63 x̃: 38
HURT stats (rel)   min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78%
95% mean confidence interval for cycles value: 21.11 38.81
95% mean confidence interval for cycles %-change: 3.76% 5.15%
Cycles are HURT.

total spills in shared programs: 12158 -> 12355 (1.62%)
spills in affected programs: 98 -> 295 (201.02%)
helped: 1
HURT: 2

total fills in shared programs: 25204 -> 25398 (0.77%)
fills in affected programs: 94 -> 288 (206.38%)
helped: 0
HURT: 3

LOST:   15
GAINED: 8

Iron Lake
total instructions in shared programs: 8121430 -> 8166733 (0.56%)
instructions in affected programs: 1148353 -> 1193656 (3.95%)
helped: 2
HURT: 4046
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89%
HURT stats (abs)   min: 1 max: 43 x̄: 11.20 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87%
95% mean confidence interval for instructions value: 11.02 11.37
95% mean confidence interval for instructions %-change: 6.84% 7.94%
Instructions are HURT.

total cycles in shared programs: 188376326 -> 188601568 (0.12%)
cycles in affected programs: 27416674 -> 27641916 (0.82%)
helped: 68
HURT: 3947
helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 57.31 x̃: 64
HURT stats (rel)   min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09%
95% mean confidence interval for cycles value: 55.01 57.20
95% mean confidence interval for cycles %-change: 2.88% 5.19%
Cycles are HURT.

LOST:   35
GAINED: 3

GM45
total instructions in shared programs: 4979794 -> 5003551 (0.48%)
instructions in affected programs: 635174 -> 658931 (3.74%)
helped: 1
HURT: 2142
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85%
HURT stats (abs)   min: 1 max: 43 x̄: 11.09 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53%
95% mean confidence interval for instructions value: 10.85 11.33
95% mean confidence interval for instructions %-change: 6.25% 7.74%
Instructions are HURT.

total cycles in shared programs: 128519586 -> 128654990 (0.11%)
cycles in affected programs: 17635304 -> 17770708 (0.77%)
helped: 46
HURT: 2088
helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 65.25 x̃: 66
HURT stats (rel)   min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99%
95% mean confidence interval for cycles value: 61.75 65.15
95% mean confidence interval for cycles %-change: 2.58% 5.34%
Cycles are HURT.

LOST:   38
GAINED: 38

Fixes: 5b908db604 ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently")
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-08 07:41:26 -07:00
Timothy Arceri
4fd8161773 glsl_to_nir: remove unused type_is_int()
This was missed in e00fa99b08.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-08 14:11:38 +10:00
Vasily Khoruzhick
e67e4e90b2 nir: implement lowering for fsin and fcos
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Ian Romanick
ab86926156 nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
c995d1ca3a nir/flrp: Lower flrp(a, b, #c) differently
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
ae02622d8f nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists
There is little effect on Intel GPUs now because we almost always take
the "always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any other Intel platforms.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 188852500 -> 188852484 (<.01%)
cycles in affected programs: 14612 -> 14596 (-0.11%)
helped: 4
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
95% mean confidence interval for cycles value: -4.00 -4.00
95% mean confidence interval for cycles %-change: -0.13% -0.09%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
6698d861a5 nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any Intel platform.  Before a number of large rebases this
helped cycles in a couple shaders on Iron Lake and GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
5b908db604 nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently
No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
instructions in affected programs: 1199037 -> 1163061 (-3.00%)
helped: 4124
HURT: 10
helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
95% mean confidence interval for instructions value: -8.84 -8.56
95% mean confidence interval for instructions %-change: -5.12% -4.77%
Instructions are helped.

total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
cycles in affected programs: 27505596 -> 27325850 (-0.65%)
helped: 4026
HURT: 77
helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
95% mean confidence interval for cycles value: -44.75 -42.87
95% mean confidence interval for cycles %-change: -2.44% -2.17%
Cycles are helped.

LOST:   3
GAINED: 35

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
23c5501b77 nir/flrp: Lower flrp(#a, #b, c) differently
If the magnitudes of #a and #b are such that (b-a) won't lose too much
precision, lower as a+c(b-a).

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8192503 -> 8192383 (<.01%)
instructions in affected programs: 18417 -> 18297 (-0.65%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
95% mean confidence interval for instructions value: -2.48 -1.05
95% mean confidence interval for instructions %-change: -1.56% -0.63%
Instructions are helped.

total cycles in shared programs: 188662536 -> 188661956 (<.01%)
cycles in affected programs: 744476 -> 743896 (-0.08%)
helped: 62
HURT: 0
helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
95% mean confidence interval for cycles value: -12.37 -6.34
95% mean confidence interval for cycles %-change: -0.48% -0.06%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
158370ed2a nir/flrp: Add new lowering pass for flrp instructions
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923 ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
dc566a033c nir/algebraic: Pull common multiplication out of flrp arguments
All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
a83a6e9690 nir/algebraic: Pull common addition out of flrp arguments
v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Christian Gmeiner
e00fa99b08 glsl_to_nir: drop supports_ints
At initial nir level all drivers are supporting ints.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:59 +02:00
Christian Gmeiner
4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Vasily Khoruzhick
443c5a3cd6 nir: add int_to_float lowering pass
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
John Stultz
c7f2145b4b mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list
In commit a99c360a46 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: a99c360a46 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Alistair Strachan
0fda3eac31 mesa: android: Remove unnecessary dependency tracking rules
The current AOSP master build system breaks building mesa due to the
following error:

external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error:
  writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h"

This error is bogus -- nothing "writes" to ir.h -- but the rule is
unnecessary because the generated header that is a dependency of the
non-generated header should be added to LOCAL_GENERATED_SOURCES and this
will track if the dependency needs to be regenerated.

(This change fixes a similar problem affecting nir.h too.)

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
[jstultz: Forward ported and tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:25 +00:00
Karol Herbst
7f85283103 spirv/cl: support vload/vstore
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Karol Herbst
d11b807da5 nir: Add nir_op_vec helper
with that we can simplify code where nir vectors are created

v2: merge both lines in nir_vec

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Karol Herbst
681fb7ea05 nir: Add a nir_builder_alu variant which takes an array of components
v2: rename to nir_build_alu_src_arr

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Karol Herbst
c91ea6343f vtn: handle bitcast with pointer src/dest
v2: use vtn_push_ssa and vtn_ssa_value

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Jason Ekstrand
91899495a1 nir: Add a SSA type gathering pass
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl.  It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float.  Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information.  However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-04 03:52:05 +00:00
Connor Abbott
d0ea9877b8 nir/algebraic: Don't emit empty initializers for MSVC
Just don't emit the transform array at all if there are no transforms

v2:
- Don't use len(array) > 0 (Dylan)
- Keep using ARRAY_SIZE to make the generated C code easier to read
(Jason).
2019-05-04 00:13:21 +02:00
Dylan Baker
c613861b23 meson: Don't build glsl cache_test when shader cache is disabled
v2: - Use new with_shader_cache variable instead of
      host_machine.system() == 'windows'

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-03 10:58:31 -07:00
Dylan Baker
5eb0f33e4f glsl/tests: define ssize_t on windows
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-03 10:58:24 -07:00
Dylan Baker
113bb8d448 glsl: fix general_ir_test with mingw
Somewhere down in the depths of the mingw headers 'interface' is
defined, change it to iface like a similar patch did.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-03 10:57:17 -07:00
Dave Airlie
6fd6246d92 nir: fix lower vars to ssa for larger vector sizes.
This has a couple of hardcoded vec4 limits in it, change them
to the proper sizing to avoid future issues.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-03 15:23:00 +10:00
Dave Airlie
2774d39366 spirv: fix SpvOpBitSize return value.
The spir-v spec says this returns a bool.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-03 15:22:57 +10:00
Rob Clark
b73dd91f60 nir: fix nir tex print harder
Fixes: 691d5a825a nir: rework tex instruction printing
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 15:06:01 -07:00
Marek Olšák
b3a26d4628 glsl: fix and clean up NV_compute_shader_derivatives support
- make sure compute shader derivatives are exposed for all extensions
- unify duplicated code

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-02 16:09:24 -04:00
Rob Clark
a99c360a46 nir: add pass to lower fb reads
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark
a2c89a85f4 nir: fix lower_wpos_ytransform in load_frag_coord case
Apparently we never hit this path.  Or at least haven't for a rather
long time.  But in either case (load_deref or load_frag_coord), we can
just directly use the intrinsic's ssa dest.  So stop passing the
nir_variable (which would be NULL in the load_frag_coord case) around
and instead just use &intr->dest.ssa.

(This ofc means we need to setup the cursor to insert *after* the
instruction, which seems to be another bug of the original
implementation.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark
691d5a825a nir: rework tex instruction printing
The extra comma at the end was annoying me.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Connor Abbott
6ec4ed48fc nir/search: Add debugging code to dump the pattern matched
This was useful while debugging the previous commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 16:14:06 +02:00
Connor Abbott
7ce86e6938 nir/search: Add automaton-based pre-searching
nir_opt_algebraic is currently one of the most expensive NIR passes,
because of the many different patterns we've added over the years. Even
though patterns are already sorted by opcode, there are still way too
many patterns for common opcodes like bcsel and fadd, which means that
many patterns are tried but only a few actually match. One way to fix
this is to add a pre-pass over the code that scans it using an automaton
constructed beforehand, similar to the automatons produced by lex and
yacc for parsing source code. This automaton has to walk the SSA graph
and recognize possible pattern matches.

It turns out that the theory to do this is quite mature already, having
been developed for instruction selection as well as other non-compiler
things. I followed the presentation in the dissertation cited in the
code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep
the naming similar. To create the automaton, we have to perform
something like the classical NFA to DFA subset construction used by lex,
but it turns out that actually computing the transition table for all
possible states would be way too expensive, with the dissertation
reporting times of almost half an hour for an example of size similar to
nir_opt_algebraic. Instead, we adopt one of the "filter" approaches
explained in the dissertation, which trade much faster table generation
and table size for a few more table lookups per instruction at runtime.
I chose the filter which resulted the fastest table generation time,
with medium table size. Right now, the table generation takes around .5
seconds, despite being implemented in pure Python, which I think is good
enough. Based on the numbers in the dissertation, the other choice might
make table compilation time 25x slower to get 4x smaller table size, but
I don't think that's worth it. As of now, we get the following binary
size before and after this patch:

    text   data	    bss	     dec	   hex	filename
11979455 464720	 730864	13175039	c908ff	before i965_dri.so
   text	   data	    bss	    dec	           hex	filename
12037835 616244	 791792	13445871	cd2aef	after i965_dri.so

There are a number of places where I've simplified the automaton by
getting rid of details in the LHS patterns rather than complicate things
to deal with them. For example, right now the automaton doesn't
distinguish between constants with different values. This means that it
isn't as precise as it could be, but the decrease in compile time is
still worth it -- these are the compilation time numbers for a shader-db
run with my (admittedly old) database on Intel skylake:

Difference at 95.0% confidence
	-42.3485 +/- 1.375
	-7.20383% +/- 0.229926%
	(Student's t, pooled s = 1.69843)

We can always experiment with making it more precise later.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 16:14:06 +02:00
Brian Paul
48107b5a2b glsl: fix typo in #warning message
Trivial.  Spotted by Eric Engestrom.
2019-05-02 06:32:57 -06:00
Brian Paul
413e55b5b9 glsl: work around MinGW 7.x compiler bug
I'm not sure what triggered this, but building with
scons platform=windows toolchain=crossmingw machine=x86 build=profile
with MinGW g++ 7.3 or 7.4 causes an internal compiler error.

We can work around it by forcing -O1 optimization.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2019-05-01 20:06:54 -06:00