Commit graph

3254 commits

Author SHA1 Message Date
Matt Turner
69ad5fd4ce glsl: Optimize (f2i(trunc x)) into (f2i x).
total instructions in shared programs: 5950326 -> 5949286 (-0.02%)
instructions in affected programs:     88264 -> 87224 (-1.18%)
helped:                                692
2015-02-11 13:50:19 -08:00
Matt Turner
c262b2b582 glsl: Optimize round-half-up pattern.
Hurts some Psychonauts shaders, but after the next patch (which this
enables) they're fewer instructions than before this patch.
2015-02-11 13:50:19 -08:00
Matt Turner
a5455ab1ca glsl: Add trunc() to ir_builder. 2015-02-11 13:50:19 -08:00
Matt Turner
4c42e1116b nir: Recognize open-coded fmin/fmax.
And unfortunately other shaders do the same thing but with >=/<= which
we can't apply this optimization to because of NaNs.

instructions in affected programs:     23309 -> 22938 (-1.59%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-11 13:50:19 -08:00
Eric Anholt
56e21647e2 nir: Add algebraic opt for int comparisons with identical operands.
No change on shader-db on i965.

v2: Reword the comment due to feedback from Erik Faye-Lund

Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)
2015-02-11 11:52:38 -08:00
Eric Anholt
2919bdf466 nir: Fix load_const comparisons for CSE.
We want the size of a float per component, not the size of a whole vec4.

NIR instructions on i965:
total instructions in shared programs: 1261937 -> 1261929 (-0.00%)
instructions in affected programs:     114 -> 106 (-7.02%)

Looking at one of these examples (tesseract), it's from vec4 load_consts
for a MRT solid fill, which do get CSEed now that we don't memcmp off the
end of the const value and into the SSA def.  For the 1-component loads
that are common in i965, we were only memcmping off into the rest of the
usually zero-filled const_value.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-11 11:52:38 -08:00
Matt Turner
ea0f0eb6c0 glsl: Optimize 1/exp(x) into exp(-x).
Lots of shaders divide by exp2(...) which we turn into a multiplication
by the reciprocal. We can avoid the reciprocal by simply negating exp2's
argument.

total instructions in shared programs: 5947154 -> 5946695 (-0.01%)
instructions in affected programs:     118661 -> 118202 (-0.39%)
helped:                                380

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-10 17:48:44 -08:00
Matt Turner
a9065cef48 nir: Remove casts from void*.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-10 17:48:42 -08:00
Matt Turner
bb1e007157 nir: Replace assert(0) with unreachable().
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-10 17:48:31 -08:00
Matt Turner
942b56ad05 nir: Remove unused has_indirect variable.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-10 17:48:16 -08:00
Francisco Jerez
e6146e6f14 glsl: Forbid calling the constructor of any opaque type.
The spec doesn't define any opaque type constructors.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-10 15:49:43 +02:00
Francisco Jerez
c4111dfa0a glsl: Return correct number of coordinate components for cubemap array images.
Cubemap array images are unlike cubemap array samplers in that they don't need
an additional coordinate to index individual cubemaps in the array, instead
they behave like a 2D array of 6n layers, with n the number of cubemaps in the
array.  Take this exception into account.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-10 15:49:43 +02:00
Kenneth Graunke
480ee1f0b4 nir: Mark nir_print_instr's instr pointer as const.
Printing instructions doesn't modify them, so we can mark the parameter
const.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-10 03:37:55 -08:00
Eric Anholt
bff4cbdafa nir: Fix broken fsat recognizer.
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-06 15:57:55 -08:00
Eric Anholt
6706537dd4 nir: Slightly simplify algebraic code generation by reusing a struct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-06 15:57:55 -08:00
Iago Toral Quiroga
71a36e0a2c glsl: GLSL ES identifiers cannot exceed 1024 characters
v2 (Ian Romanick)
- Move the check to the lexer before rallocing a copy of the large string.

Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-06 12:21:42 +01:00
Connor Abbott
a135f34080 nir: add an optimization to remove useless phi nodes
This removes phi nodes whose sources all point to the same thing.

Shader-db results:

total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%)
NIR instructions in affected programs:     126564 -> 122480 (-3.23%)
helped:                                615
HURT:                                  0

total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%)
FS instructions in affected programs:     24622 -> 23174 (-5.88%)
helped:                                138
HURT:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 16:00:13 -05:00
Jason Ekstrand
572d1f6e41 nir/validate: Ensure that phi sources are SSA-only
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 12:52:42 -08:00
Jason Ekstrand
5420774510 nir/validate: Validate that only float ALU outputs are saturated
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 12:46:55 -08:00
Jason Ekstrand
c0df85cca4 nir/lower_source_mods: Don't lower saturate for non-float outputs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 12:46:38 -08:00
Jason Ekstrand
f2adcd36cb nir: Add a pass to lower vector phi nodes to scalar phi nodes
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Add better comments
 - Use nir_ssa_dest_init and nir_src_for_ssa more places
 - Fix some void * casts

v3 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Rework the way we determine whether or not to sccalarize a phi node to
   make the recursion non-bogus
 - Treat load_const instructions as scalarizable

v4 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Allow uniform and input loads to be scalarizable

v5 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Also consider loads of inputs (varying, uniform, or ubo) to be
   scalarizable.  We were already doing this for load_var on uniforms and
   inputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-03 12:33:11 -08:00
Matt Turner
d8be1b9aba glsl/list: Note that exec_lists may not be realloc'd.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-03 12:25:14 -08:00
Iago Toral Quiroga
5dfb085ff3 glsl: Improve precision of mod(x,y)
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:

Operation                           Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750)      0.0000000447
mod(121.57, 13.29)                   0.0000023842
mod(3769.12, 321.99)                 0.0000762939
mod(3769.12, 1321.99)                0.0001220703
mod(-987654.125, 123456.984375)      0.0160663128
mod( 987654.125, 123456.984375)      0.0312500000

This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:

mod(x,y) = x - y * floor(x/y)

This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.

v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.

Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-03 13:19:36 +01:00
Iago Toral Quiroga
ec7dcaf578 glsl: can't have 'const' qualifier used with struct or interface block members
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-03 13:19:36 +01:00
Iago Toral Quiroga
5d655a43e6 glsl: interface blocks must be declared at global scope
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-03 13:19:36 +01:00
Kenneth Graunke
0f06f12c11 glsl: Pick ast_conditional branch regardless of op1/2 being constant.
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.

Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.

No changes in shader-db, probably because we usually optimize this later
anyway.  But it does make us generate less stupid code up front.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-02-02 17:14:55 -08:00
Jason Ekstrand
604ae33c8b nir/opt_algebraic: Add some constant bcsel reductions
total instructions in shared programs: 5998190 -> 5997603 (-0.01%)
instructions in affected programs:     54276 -> 53689 (-1.08%)
helped:                                293

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:11:13 -08:00
Jason Ekstrand
7f19cd5a56 nir/opt_algebraic: Add some boolean simplifications
total instructions in shared programs: 5998321 -> 5998287 (-0.00%)
instructions in affected programs:     4520 -> 4486 (-0.75%)
helped:                                8

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:11:10 -08:00
Jason Ekstrand
70273c5cd5 nir/algebraic: Support specifying variable as constant or by type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
81f77e4f3a nir/algebraic: Fail to compile of a variable is used in a replace but not the search
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
026b5cc792 nir/search: Allow for matching variables based on types
This allows you to match on an unknown value but only if it is of a given
type.  90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.

nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
d8999bcdce nir/search: Add support for matching unknown constants
There are some algebraic transformations that we want to do but only if
certain things are constants.  For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.

nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand
5ab1489ae6 nir: Add an invalid type
This allows us to indicate a concept of an invalid type.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Eric Anholt
fc884eadf1 nir: Add variants of some of the comparison simplifications.
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not.  vc4
results:

total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs:     4253 -> 3960 (-6.89%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-29 11:44:06 -08:00
Eric Anholt
9a3a60cb13 nir: Don't try to to-SSA ALU instructions that are already SSA.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:43:33 -08:00
Eric Anholt
68d476167c nir: Fix a bit of broken indentation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:42:08 -08:00
Eric Anholt
36c604c824 nir: Add a couple of helpers for glsl types.
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.

v2: Add a missing space.

Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:41:17 -08:00
Eric Anholt
dd4d9a4e62 nir: Make vec-to-movs handle src/dest aliasing.
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends.  This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.

Fixes fs-swap-problem with my scalarizing patches.

v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
2015-01-28 16:33:34 -08:00
Jason Ekstrand
bb26ebac13 nir/opcodes: Use a return type of tfloat for ldexp
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:21:40 -08:00
Jason Ekstrand
f0340ff625 Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"
This reverts commit d7d340fb2f.

We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806

Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:19:47 -08:00
Jason Ekstrand
d7d340fb2f nir/opcodes: Use fpclassify() instead of isnormal() for ldexp
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28 03:42:41 -08:00
Connor Abbott
f1a9252def nir: fix a bug with constant folding non-per-component instructions
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.

v2: use new helper name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Connor Abbott
816f0515a2 nir: add a helper function for getting the number of source components
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.

v2: use new name nir_ssa_alu_instr_src_components()

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Jason Ekstrand
dd74369a0a nir/opcodes: Don't go through doubles when constant-folding iabs
Previously, we called the abs() function in math.h.  However, this involves
unnecessarily going through double.  This commit changes it to use integers
directly with a ternary.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-26 11:25:02 -08:00
Jason Ekstrand
9bd28fe3a3 nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 11:25:02 -08:00
Jason Ekstrand
27c6e3e4ca nir: Use pointers for nir_src_copy and nir_dest_copy
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 11:24:58 -08:00
Connor Abbott
0aa31bf9c3 nir/constant_folding: use the new constant folding infrastructure
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-24 21:35:35 -08:00
Jason Ekstrand
89285e4d47 nir: add new constant folding infrastructure
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.

The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.

v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - Use mako to generate one function per opcode instead of doing piles of
   string splicing

v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - More comments and better indentation in the mako
 - Add a description of the constant expression language in nir_opcodes.py
 - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-24 21:35:35 -08:00
Connor Abbott
fa4bc6c130 nir: use Python to autogenerate opcode information
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before.  In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.

v2:
 - make Opcode derive from object (Dylan)
 - don't use assert like it's a function (Dylan)
 - style fixes for fnoise, use xrange (Dylan)
 - use iterkeys() in nir_opcodes_h.py (Dylan)
 - use pydoc-style comments (Jason)
 - don't make fmin/fmax commutative and associative yet (Jason)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

v3 Jason Ekstrand <jason.ekstrand@intel.com>
 - Alphabetize source file lists
 - Generate nir_opcodes.h in the builddir instead of the source dir
 - Include $(builddir)/src/glsl/nir in the i965 build
 - Rework nir_opcodes.h generation so it generates a complete header file
   instead of one that has to be embedded inside an enum declaration
2015-01-24 21:33:56 -08:00
Matt Turner
579157e6c1 glsl: Add a foreach_in_list_reverse_safe macro.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:39 -08:00