This could be done in a separate pass like we do in GLSL IR, but it seems
to me like having the definitions of the transformations in the two
directions next to each other makes a lot of sense.
v2: Reorder the comment about the transformation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic
and other passes should try and generate certain types of opcodes or
patterns. Extend that to NIR by defining our own struct, which is
automatically generated from the Mesa struct in glsl_to_nir and provided
directly by the driver in TGSI-to-NIR.
v2: Split out the previous two prep patches.
v3: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will be used so that we can customize the transforms for the target
GPU, so we don't un-lower expressions that had already been lowered (or
introduce new lowering transformations that not all GPUs want)
v2: Drop the complication of having the condition->index dictionary, since
we don't actually expect there to be many different conditions (change
by Kenneth).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to give the optimization passes a chance to customize
behavior for the particular target device.
v2: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
When compiling in C99 or C++11 modes, Solaris defines isnormal() as
a macro via <math.h>, which causes the function definition to become
too mangled to compile.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The macro is defined to provide a trailing ; so this caused the expansion
to end in ";;" which made the Solaris Studio compilers issue warnings for
every line of:
"builtin_type_macros.h", line 113: Warning: extra ";" ignored.
for every file that included the header, filling build logs with thousands
of useless warnings.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
opt_copy_propagation and opt_copy_propagation_elements create new ACP
and Kill sets each time they enter a new control flow block. For if
blocks, they also copy the entire existing ACP set contents into the
new set.
When we exit the control flow block, we discard the new sets. However,
we weren't freeing them - so they lived on until the pass finished.
This can waste a lot of memory (57MB on one pessimal shader).
This patch makes the pass allocate ACP entries using this->acp as the
memory context, and Kill entries out of this->kill. It also steals
kill entries when moving them from the inner kill list to the parent.
It then frees the lists, including their contents.
v2: Move ralloc_free(this->acp) just before this->acp = orig_acp
(suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.5 10.4" <mesa-stable@lists.freedesktop.org>
glcpp/glcpp.c:124:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
const static struct option
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes a warning when running make check.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GLSL IR labels gl_FrontFacing as an input variable and not a system value.
This commit makes NIR silently translate gl_FrontFacing to a system value
so that it properly gets translated into a load_system_value intrinsic.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use nir/nir_opcodes.h as is (w/o the absolute path), as it is the target
name used to generate the actual file. Otherwise the target is missing,
the file won't get generated and the build will fail.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
And unfortunately other shaders do the same thing but with >=/<= which
we can't apply this optimization to because of NaNs.
instructions in affected programs: 23309 -> 22938 (-1.59%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No change on shader-db on i965.
v2: Reword the comment due to feedback from Erik Faye-Lund
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)
We want the size of a float per component, not the size of a whole vec4.
NIR instructions on i965:
total instructions in shared programs: 1261937 -> 1261929 (-0.00%)
instructions in affected programs: 114 -> 106 (-7.02%)
Looking at one of these examples (tesseract), it's from vec4 load_consts
for a MRT solid fill, which do get CSEed now that we don't memcmp off the
end of the const value and into the SSA def. For the 1-component loads
that are common in i965, we were only memcmping off into the rest of the
usually zero-filled const_value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Lots of shaders divide by exp2(...) which we turn into a multiplication
by the reciprocal. We can avoid the reciprocal by simply negating exp2's
argument.
total instructions in shared programs: 5947154 -> 5946695 (-0.01%)
instructions in affected programs: 118661 -> 118202 (-0.39%)
helped: 380
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cubemap array images are unlike cubemap array samplers in that they don't need
an additional coordinate to index individual cubemaps in the array, instead
they behave like a 2D array of 6n layers, with n the number of cubemaps in the
array. Take this exception into account.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Printing instructions doesn't modify them, so we can mark the parameter
const.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 (Ian Romanick)
- Move the check to the lexer before rallocing a copy of the large string.
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <jason.ekstrand@intel.com>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <jason.ekstrand@intel.com>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:
Operation Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750) 0.0000000447
mod(121.57, 13.29) 0.0000023842
mod(3769.12, 321.99) 0.0000762939
mod(3769.12, 1321.99) 0.0001220703
mod(-987654.125, 123456.984375) 0.0160663128
mod( 987654.125, 123456.984375) 0.0312500000
This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:
mod(x,y) = x - y * floor(x/y)
This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.
v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.
Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.
Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.
No changes in shader-db, probably because we usually optimize this later
anyway. But it does make us generate less stupid code up front.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows you to match on an unknown value but only if it is of a given
type. 90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.
nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some algebraic transformations that we want to do but only if
certain things are constants. For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.
nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not. vc4
results:
total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs: 4253 -> 3960 (-6.89%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>