Commit graph

298 commits

Author SHA1 Message Date
Kenneth Graunke
900498bd11 nir: Allocate nir_phi_src values out of the nir_phi_instr.
Phi sources are part of the phi instruction and should have the same
lifetime.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:13 -07:00
Kenneth Graunke
b05d53404c nir: Allocate nir_call_instr::params out of the nir_call itself.
The lifetime of the params array needs to be match the nir_call_instr
itself.  So, allocate it using the instruction itself as the context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:13 -07:00
Jason Ekstrand
2e3b35a1cb nir/lower_tex_projector: Don't use designated initializers
These don't work in MSVC or in older versions of GCC

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89899
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-07 11:49:39 -07:00
Matt Turner
d131630c08 nir: Remove fsin_reduced/fcos_reduced.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-06 10:13:22 -07:00
Matt Turner
5c71cf8531 glsl: Remove never used sin_reduced/cos_reduced.
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-06 10:13:22 -07:00
Rob Clark
f2ecc95e44 nir: add lowering for idiv/udiv/umod
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).

A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:

  http://people.freedesktop.org/~robclark/div-lowering.py

I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Eric Anholt <eric@anholt.net> (vc4)
2015-04-05 09:20:35 -04:00
Rob Clark
7880bea2fb nir: fix typo for f2b/i2b/b2i expressions (v2)
v2: discovered that i2b/b2i are also confused

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-05 08:56:24 -04:00
Rob Clark
6829d76e02 nir: add option to lower slt/sge/seq/sne
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0.  But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).

v2: use b2f rather than u2f

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-05 08:56:24 -04:00
Jason Ekstrand
9c53e80b9b nir/lower_samplers: Use the right memory context for realloc'ing tex sources
As of da5ec2a, we allocate instruction sources out of the instruction
itself.  When we realloc the texture sources we need to use the right
memory context or ralloc will get angry and assert-fail

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-04-03 17:02:20 -07:00
Jason Ekstrand
52e718097f nir: Add a cubemap normalizing pass
This commit adds a pass to L1-normalize cube-map coordinates.  Some hardware
such as i965 requires that largest cube-map coordinate is +-1.  We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

v2 (Suggested by Eric):
 - Do a vector fabs and split into components later
 - Move to core NIR

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-03 14:12:49 -07:00
Jason Ekstrand
dccc57eaba nir/from_ssa: Don't set reg->parent_instr for ssa_undef instructions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-04-03 14:04:31 -07:00
Jason Ekstrand
7bdba4a245 nir: Add a src_get_parent_instr function
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-04-03 14:04:12 -07:00
Eric Anholt
ea811b7868 nir: Add a lowering pass for texture projectors.
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-03 11:50:24 -07:00
Eric Anholt
64bdfc698d nir: Add an interface to turn a nir_src into a nir_ssa_def.
We use nir_ssa_defs for nir_builder args, so this takes a nir_src and
makes one so it can be passed in.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-03 11:50:22 -07:00
Eric Anholt
ec02970205 nir: Add an interface for the builder to insert instructions before.
So far we'd only used nir_builder to build brand new programs.  But if
we're doing modifications to instructions (like in a lowering pass), then
we want to generate new stuff before the instruction we're modifying.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-03 11:50:18 -07:00
Kenneth Graunke
da5ec2ac0b nir: Allocate nir_tex_instr::sources out of the instruction itself.
The lifetime of the sources array needs to be match the nir_tex_instr
itself.  So, allocate it using the instruction itself as the context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:03 -07:00
Kenneth Graunke
7380c641b1 nir: Allocate predecessor and dominance frontier sets from block itself.
These sets are part of the block, and their lifetime needs to match the
block itself.  So, allocate them using the block itself as the context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:02 -07:00
Kenneth Graunke
131444e1c5 nir: Allocate register fields out of the register itself.
The lifetime of each register's use/def/if_use sets needs to match the
register itself.  So, allocate them using the register itself as the
context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:01 -07:00
Kenneth Graunke
587b3a20a1 nir: Make nir_create_function() strdup the function name.
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.

We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:00 -07:00
Kenneth Graunke
f61b6c3e48 nir: Free dead variables when removing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:19:58 -07:00
Kenneth Graunke
f4e4491080 nir: Combine remove_dead_local_vars() and remove_dead_global_vars().
We can just pass a pointer to the list of variables, and reuse the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:19:56 -07:00
Jason Ekstrand
ca3b4d6d17 nir/opt_peephole_ffma: Fix a couple typos in a comment
Acked-by: Matt Turner <mattst88@gmail.com>
2015-04-02 11:09:37 -07:00
Jason Ekstrand
0573d0e484 nir/print: Correctly print swizzles for explicitly sized alu sources
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-02 10:21:18 -07:00
Matt Turner
781badee7a nir: Remove useless ftrunc inside f2i/f2u.
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
97e6c1b957 nir: Recognize (a < b || a < c) as a < max(b, c).
Doesn't work for analogous && cases, because of NaNs.

total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs:     42000 -> 41117 (-2.10%)
helped:                                403

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
a2b6e908cf nir: Add addition/multiplication identities of exp/log.
instructions in affected programs:     2858 -> 2808 (-1.75%)
helped:                                12

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
099c729b4c nir: Add identities for the log function.
The rcp(log(x)) pattern affects instruction counts.

instructions in affected programs:     144 -> 138 (-4.17%)
helped:                                6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
8a6ae384b2 nir: Add identities for the exponential function.
No changes in shader-db.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
e26783d445 nir: Recognize another open coded lrp.
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs:     4876 -> 4720 (-3.20%)
helped:                                58
HURT:                                  10

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner
e82437e141 nir: Recognize open coded lrp.
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs:     34773 -> 33083 (-4.86%)
helped:                                147
HURT:                                  6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Jason Ekstrand
7f344721b1 nir/peephole_ffma: Be less agressive about fusing multiply-adds
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs:     355876 -> 349811 (-1.70%)
helped:                                1455
HURT:                                  14
GAINED:                                5
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
a8c8b3b872 nir: Add a dedicated ffma peephole optimization
i965/nir: Use the dedicated ffma peephole

total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs:     1292790 -> 1268660 (-1.87%)
helped:                                5999
HURT:                                  457
GAINED:                                4
LOST:                                  9

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand
e06a3d0282 nir: Move the compare-with-zero optimizations to the late section
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs:     4230 -> 4286 (1.32%)
helped:                                0
HURT:                                  12

While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
da294f9b2f nir/algebraic: Add a seperate section for "late" optimizations
i965/nir: Use the late optimizations

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
1779dc060f nir/algebraic: Remove a duplicate optimization
This optimization is repeated verbatim above

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
22ee7eeb4e nir/algebraic: #define around structure definitions
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions.  To solve this, we play the
age-old header file trick and just #define around it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand
793a94d6b5 nir/print: Don't print extra swizzzle components
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw.  This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.

Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
2015-04-01 12:49:49 -07:00
Eric Anholt
15b03b7964 nir: Recognize a pattern of bool frobbing from TGSI KILL_IF.
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value.  Only, in NIR we want a proper
bool once again, so we compare with 0.  This is a lot of pointless extra
instructions.

total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs:     1342 -> 1309 (-2.46%)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt
6e8d4a2f80 nir: Recognize a pattern for doing b2f without the opcode.
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand.  This is common when generating NIR from TGSI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Kenneth Graunke
72b06fb08e nir: Fix copy and pasted error message in nir_validate.
These are nir_cf_nodes, not ALU instructions.
Also, use unreachable() to preempt said review feedback.

v2: Do it right (thanks Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-28 09:36:46 -07:00
Kenneth Graunke
bf2c3bc316 nir: Lower subtraction to add with negation when !lower_negate.
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.

Suggested by Connor Abbott.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:34 -07:00
Kenneth Graunke
06f7bea96a nir: Add builder helpers for MOVs with ALU sources and swizzling MOVs.
These will be useful for prog->nir and tgsi->nir.

v2: Don't forget to mark nir_swizzle as inline (Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:33 -07:00
Kenneth Graunke
75c922e0fe nir: Add nir_builder helpers for creating load_const intrinsics.
Both prog->nir and tgsi->nir will want to use these.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:33 -07:00
Eric Anholt
afa9fc1561 nir: Add optional lowering of flrp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-27 13:29:48 -07:00
Kenneth Graunke
3120345f40 nir: Add glsl_float_type() wrapper.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-03-25 16:17:19 -07:00
Matt Turner
babd0fa3e2 nir: Fix typo. 2015-03-24 19:14:40 -07:00
Matt Turner
3fb56805f0 nir: Recognize sat(add(b2f(a), b2f(b))) as a logical OR.
Transform this into b2f(or(a, b)).

instructions in affected programs:     432 -> 430 (-0.46%)
helped:                                2

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-24 14:43:37 -07:00
Matt Turner
c31158d2cb nir: Recognize mul(b2f(a), b2f(b)) as a logical AND.
Transform this into b2f(and(a, b)).

total instructions in shared programs: 6205448 -> 6204391 (-0.02%)
instructions in affected programs:     284030 -> 282973 (-0.37%)
helped:                                903
HURT:                                  6

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-24 14:43:37 -07:00
Matt Turner
95729d2458 nir: Handle mixed scalar/vector arguments to logical and/or/xor.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-24 14:43:37 -07:00
Jason Ekstrand
8a33f95b7a nir/lower_io: Add a assign_locations function that sorts by [in]direct use
v2: Delete the set of indirectly accessed variables when we're done with it
v3: Rename from _packed to _scalar

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-19 13:18:39 -07:00