This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends. This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.
Fixes fs-swap-problem with my scalarizing patches.
v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
This reverts commit d7d340fb2f.
We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Acked-by: Matt Turner <mattst88@gmail.com>
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.
v2: use new helper name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.
The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.
v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
- Use mako to generate one function per opcode instead of doing piles of
string splicing
v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.
v2: Just pass a NULL state, now that it won't crash when you do.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.
v2: Use the nir_src_for_ssa() helper, and another instance of
nir_alu_src_copy().
v3: Drop the non-SSA support. All intended callers will have SSA-only ALU
ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
about weird input_sizes[].
Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
There aren't many users yet, but I wanted to do this from my scalarizing
pass.
v2: Constify the src arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Most of these exist in the GLSL IR algebraic pass already. However,
SSA allows us to find more instances of the patterns.
total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs: 124189 -> 120026 (-3.35%)
helped: 604
total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs: 261295 -> 254507 (-2.60%)
helped: 1295
HURT: 3
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.
The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.
total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs: 411143 -> 405922 (-1.27%)
helped: 2233
HURT: 214
A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x. We can always clean that up later.
total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs: 784980 -> 775093 (-1.26%)
helped: 4508
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GLSL IR optimization pass contained these; we may as well include
them too.
v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).
No change in the number of NIR instructions on a shader-db run.
total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs: 542 -> 537 (-0.92%)
helped: 2 (in glamor)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that. While there,
I decided to add a bunch more of them.
v2: Delete bogus fand/for optimizations (caught by Jason).
total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs: 149634 -> 146937 (-1.80%)
helped: 1032
total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs: 537 -> 542 (0.93%)
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value. This
made ALU operations on those values appear distinct as well.
Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.
Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered. I didn't implement
support for variables for the time being.
v2: Assert that info->num_variables == 0 (requested by Jason).
total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs: 2413496 -> 2001071 (-17.09%)
helped: 16872
total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs: 640654 -> 620094 (-3.21%)
helped: 2071
HURT: 585
GAINED: 14
LOST: 25
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.
The next patch will introduce another case that requires SSA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value. But once we start doing vector bools in a few commits,
we end up getting bad values.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do. Just provide the
simpler helper, instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Designated initializers with anonymous unions don't work in MSVC or
GCC < 4.6. With a couple of constructor methods, we don't need them any
more and the code is actually cleaner.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467
Reviewed-by: Connor Abbot <cwabbott0@gmail.com>
Fix build error on Mac OS X.
CC nir_to_ssa.lo
nir_to_ssa.c:29:10: fatal error: 'malloc.h' file not found
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88478
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This is a rework of the liveness algorithm using a worklist as suggested by
Connor. Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop. Also, the stuff after the last loop
in the funciton will only ever get visited once.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
A worklist is a common concept in optimizations. This adds a structure
that we can reuse for many different types of optimizations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the set API required the user to do all of the hashing of keys
as it passed them in. Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it. Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names. This is
especially bad when the standard call looks something like
_mesa_set_add(ht, _mesa_pointer_hash(key), key);
In the above case, there is no reason why the hash set shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
This is analygous to 94303a0750 where we did this for hash_table
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already have search_pre_hashed. This makes the APIs match better.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without the break, it was possible that an instruction would match multiple
expressions. If this happened, you could end up trying to replace it
multiple times and get a segfault. This makes it so that, after a
successful replacement, it moves on to the next instruction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This refactor allows you to more easily get the deref node associated with
a given variable. We then use that new functionality in the
deref_may_be_aliased function instead of creating a 1-element deref chain.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Additional description was added to a variety of places. Also, we no
longer use the term "leaf" to describe fully-qualified direct derefs.
Instead, we simply use the term "direct" or spell it out completely.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>