Commit graph

98 commits

Author SHA1 Message Date
Kenneth Graunke
30f51f1a1a glsl: Optimize "if (cond) discard;" to a conditional discard.
st_glsl_to_tgsi and ir_to_mesa have handled conditional discards for a
long time; the previous patch added that capability to i965.

i965 (Haswell) shader-db stats:

Without NIR:
total instructions in shared programs: 5792133 -> 5776360 (-0.27%)
instructions in affected programs:     737585 -> 721812 (-2.14%)
helped:                                6300
HURT:                                  68
GAINED:                                2

With NIR:
total instructions in shared programs: 5787538 -> 5769569 (-0.31%)
instructions in affected programs:     767843 -> 749874 (-2.34%)
helped:                                6522
HURT:                                  35
GAINED:                                6

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-02-24 15:24:53 -08:00
Jason Ekstrand
190073c737 nir: Add a global code motion (GCM) pass
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Use nir_dominance_lca for computing least common anscestors
 - Use the block index for comparing dominance tree depths
 - Pin things that do partial derivatives

Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Eric Anholt
2a135c470e nir: Add an ALU op builder kind of like ir_builder.h
v2: Rebase on the nir_opcodes.h python code generation support.
v3: Use SSA values, and set an appropriate writemask on dot products.
v4: Make the arguments be SSA references as well.  This lets you stack up
    expressions in the arguments of other expressions, at the cost of
    having to insert a fmov/imov if you want to swizzle.  Also, add
    the generated file to NIR_GENERATED_FILES.
v5: Use more pythonish style for iterating the list.
v6: Infer the size of the dest from the size of the srcs, and auto-swizzle
    a single small src out to the appropriate size.
v7: Add little helpers for initializing the struct, add a typedef for the
    struct like other nir types have.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v6)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v7)
2015-02-18 22:28:42 -08:00
Emil Velikov
72e602905d nir: add missing header to the sources list
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-02-12 13:19:13 +00:00
Connor Abbott
a135f34080 nir: add an optimization to remove useless phi nodes
This removes phi nodes whose sources all point to the same thing.

Shader-db results:

total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%)
NIR instructions in affected programs:     126564 -> 122480 (-3.23%)
helped:                                615
HURT:                                  0

total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%)
FS instructions in affected programs:     24622 -> 23174 (-5.88%)
helped:                                138
HURT:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 16:00:13 -05:00
Jason Ekstrand
f2adcd36cb nir: Add a pass to lower vector phi nodes to scalar phi nodes
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Add better comments
 - Use nir_ssa_dest_init and nir_src_for_ssa more places
 - Fix some void * casts

v3 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Rework the way we determine whether or not to sccalarize a phi node to
   make the recursion non-bogus
 - Treat load_const instructions as scalarizable

v4 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Allow uniform and input loads to be scalarizable

v5 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Also consider loads of inputs (varying, uniform, or ubo) to be
   scalarizable.  We were already doing this for load_var on uniforms and
   inputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-03 12:33:11 -08:00
Jason Ekstrand
89285e4d47 nir: add new constant folding infrastructure
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.

The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.

v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - Use mako to generate one function per opcode instead of doing piles of
   string splicing

v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - More comments and better indentation in the mako
 - Add a description of the constant expression language in nir_opcodes.py
 - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-24 21:35:35 -08:00
Connor Abbott
fa4bc6c130 nir: use Python to autogenerate opcode information
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before.  In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.

v2:
 - make Opcode derive from object (Dylan)
 - don't use assert like it's a function (Dylan)
 - style fixes for fnoise, use xrange (Dylan)
 - use iterkeys() in nir_opcodes_h.py (Dylan)
 - use pydoc-style comments (Jason)
 - don't make fmin/fmax commutative and associative yet (Jason)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

v3 Jason Ekstrand <jason.ekstrand@intel.com>
 - Alphabetize source file lists
 - Generate nir_opcodes.h in the builddir instead of the source dir
 - Include $(builddir)/src/glsl/nir in the i965 build
 - Rework nir_opcodes.h generation so it generates a complete header file
   instead of one that has to be embedded inside an enum declaration
2015-01-24 21:33:56 -08:00
Eric Anholt
447ddfc137 nir: Add nir_lower_alu_to_scalar.
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.

v2: Use the nir_src_for_ssa() helper, and another instance of
    nir_alu_src_copy().
v3: Drop the non-SSA support.  All intended callers will have SSA-only ALU
    ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
    unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
    about weird input_sizes[].

Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
2015-01-23 16:37:23 -08:00
Matt Turner
618c3b35f1 glsl: Build with subdir-objects.
Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.
2015-01-23 14:28:42 -08:00
Matt Turner
a8b880bd63 nir: Add headers to distribution. 2015-01-23 14:27:39 -08:00
Carl Worth
1c9877327e glsl: Add blob.c---a simple interface for serializing data
This new interface allows for writing a series of objects to a chunk
of memory (a "blob").. The allocated memory is maintained within the
blob itself, (and re-allocated by doubling when necessary).

There are also functions for reading objects from a blob as well. If
code attempts to read beyond the available memory, the read functions
return 0 values (or its moral equivalent) without reading past the
allocated memory. Once the caller is done with the reads, it can check
blob->overrun to ensure whether any invalid values were previously
returned due to attempts to read too far.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-16 13:47:40 -08:00
Jason Ekstrand
4839d1aed1 nir: Add a worklist helper structure
A worklist is a common concept in optimizations.  This adds a structure
that we can reuse for many different types of optimizations.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 16:54:21 -08:00
Jason Ekstrand
d3636da902 nir: Add a pass for lowering copy instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand
55b5058e69 nir: Rename lower_variables to lower_vars_to_ssa
The original name wasn't particularly descriptive.  This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand
d6fe35a418 nir: Remove the ffma peephole
This is no longer needed because it's now part of the algebraic
optimization pass

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:21 -08:00
Jason Ekstrand
f77f4c00ce nir: Add a basic constant folding pass
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:20 -08:00
Jason Ekstrand
d5410bd8f6 nir: Add an algebraic optimization pass
This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:20 -08:00
Jason Ekstrand
0057dfd673 nir: Add an expression matching framework
This framework provides a simple way to do simple search-and-replace
operations on NIR code.  The nir_search.h header provides four simple data
structures for representing expressions:  nir_value and four subtypes:
nir_variable, nir_constant, and nir_expression.  An expression tree can
then be represented by nesting these data structures as needed.  The
nir_replace_instr function takes an instruction, an expression, and a
value; if the instruction matches the expression, it is replaced with a new
chain of instructions to generate the given replacement value.  The
framework keeps track of swizzles on sources and automatically generates
the currect swizzles for the replacement value.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:20 -08:00
Jason Ekstrand
919426631b nir: Add a lowering pass for adding source modifiers where possible
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:20 -08:00
Jason Ekstrand
d1d12efb36 nir: Remove the old variable lowering code
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:03 -08:00
Jason Ekstrand
6962c332e5 nir: Add a pass to lower global variables to local variables
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:02 -08:00
Jason Ekstrand
619b2e2499 nir: Add a pass for lowering input/output loads/stores
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:02 -08:00
Jason Ekstrand
aff431293b nir: Add a pass to lower local variables to registers
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:02 -08:00
Jason Ekstrand
d477beab07 nir: Add a pass to lower local variable accesses to SSA values
This pass analizes all of the load/store operations and, when a variable is
never aliased (potentially used by an indirect operation), it is lowered
directly to an SSA value.  This pass translates to SSA directly and does
not require any fixup by the original to-SSA pass.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:02 -08:00
Jason Ekstrand
615ba5ad04 nir: Add a copy splitting pass
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:02 -08:00
Jason Ekstrand
6bdce55c44 nir: Add a basic CSE pass
This pass is still fairly basic.  It only handles ALU operations, constant
loads, and phi nodes.  No texture ops or intrinsics yet.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:01 -08:00
Jason Ekstrand
20a5812606 nir: Add a fused multiply-add peephole 2015-01-15 07:19:01 -08:00
Jason Ekstrand
13ec15bdbf nir: Add a peephole select optimization
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:01 -08:00
Jason Ekstrand
f86902e75d nir: Add an SSA-based liveness analysis pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:00 -08:00
Jason Ekstrand
49911cf4db nir: Add a basic metadata management system
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:00 -08:00
Jason Ekstrand
9d986d19d0 nir: Add a lower_vec_to_movs pass
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:19:00 -08:00
Jason Ekstrand
2943522d80 nir: Add a naieve from-SSA pass
This pass is kind of stupidly implemented but it should be enough to get us
up and going.  We probably want something better that doesn't generate all
of the redundant moves eventually.  However, the i965 backend should be
able to handle the movs, so I'm not too worried about it in the short term.
2015-01-15 07:18:59 -08:00
Connor Abbott
7602385ac5 nir: add an SSA-based dead code elimination pass
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
8b7cb7674c nir: add an SSA-based copy propagation pass 2015-01-15 07:18:58 -08:00
Connor Abbott
4553887d4a nir: add a pass to convert to SSA
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
b559ee709b nir: calculate dominance information 2015-01-15 07:18:58 -08:00
Connor Abbott
cff1deff72 nir: add an optimization to turn global registers into local registers
After linking and inlining, this allows us to convert these registers
into SSA values and optimise more code.
2015-01-15 07:18:58 -08:00
Connor Abbott
613bf6818a nir: add a pass to lower atomics
v2: Jason Ekstrand <jason.ekstrand@intel.com>
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
8692c6a023 nir: add a pass to lower system value reads
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
8cdcfce5ce nir: add a pass to lower sampler instructions 2015-01-15 07:18:58 -08:00
Connor Abbott
370e875b32 nir: add a pass to remove unused variables
After we lower variables, we want to delete them in order to free up
some memory.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
    whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
c2f36cf125 nir: add a pass to lower variables for scalar backends 2015-01-15 07:18:58 -08:00
Connor Abbott
7f0daaa5e7 nir: add a glsl-to-nir pass
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Make glsl_to_nir build again
   fix whitespace
2015-01-15 07:18:58 -08:00
Connor Abbott
dbb76421da nir: add a validation pass
This is similar to ir_validate.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
98fa28bff7 nir: add a printer
This is similar to ir_print_visitor.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
2812e5de93 nir: add core helper functions
These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace and automake fixes
2015-01-15 07:18:58 -08:00
Connor Abbott
30c4678f64 nir: add the core datastructures
This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Include ralloc and hash_table from the util directory
   whitespace fixes

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>
2015-01-15 07:18:57 -08:00
Connor Abbott
b5ca34a211 nir: add a simple C wrapper around glsl_types.h
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
    whitespace and automake fixes

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 07:18:57 -08:00
Matt Turner
838ac978f4 glsl: Add headers to distribution. 2014-12-12 12:11:46 -08:00