Verify that interface blocks match when combining compilation
units at the same stage. (For example, when merging all vertex
shaders.)
Fixes piglit glsl-1.50 test:
* linker/interface-blocks-multiple-vs-member-count-mismatch.shader_test
v5 (Ken): Rename to link_interface_blocks.cpp and drop the separate .h
file for consistency with other linker code. Remove "ok" variable.
Fold cross_validate_interface_blocks into its caller.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Convert interface blocks with instance names into flat
interface blocks without an instance name.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will eventually replace do_vec_index_to_cond_assign. This lowering
pass is called in all the places where do_vec_index_to_cond_assign or
do_vec_index_to_swizzle is called.
v2: Use WRITEMASK_* instead of integer literals. Use a more concise
method of generating broadcast_index. Both suggested by Eric.
v3: Use a series of scalar compares instead of a single vector compare.
Suggested by Eric and Ken. It still uses 'if (cond) v.x = y;' instead
of conditional assignments because ir_builder doesn't do conditional
assignments, and I'd rather keep the code simple.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass flips (matrix * vector) operations to (vector *
matrixTranspose) for certain built-in matrices (currently
gl_ModelViewProjectionMatrix and gl_TextureMatrix).
This is equivalent, but results in dot products rather than multiplies
and adds. On some hardware, this is more efficient.
This pass is conditionalized on ctx->mvp_with_dp4, the flag drivers set
to indicate they prefer dot products.
Improves performance in Lightsmark by 1.01131% +/- 0.162069% (n = 10)
on a Haswell GT2 system. Passes Piglit on Ivybridge.
v2: Use struct gl_shader_compiler_options instead of plumbing through
another boolean flag for this purpose.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I love 800+ line switch-statements as much as the next guy... Future
commits will make changes to this part of the AST-to-HIR conversion, and
extracting this code will make that a bit easier.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GLBenchmark 2.7's shaders contain conditional blocks like:
if (x) {
if (y) {
...
}
}
where the outer conditional's then clause contains exactly one statement
(the nested if) and there are no else clauses. This can easily be
optimized into:
if (x && y) {
...
}
This saves a few instructions in GLBenchmark 2.7:
total instructions in shared programs: 11833 -> 11649 (-1.55%)
instructions in affected programs: 8234 -> 8050 (-2.23%)
It also helps CS:GO slightly (-0.05%/-0.22%). More importantly,
however, it simplifies the control flow graph, which could enable other
optimizations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also slightly change the compatibility test. Instead of comparing the
offsets of the block variables, compare the packing mode of the blocks.
Ideally we don't want to assign the offsets until a later stage of
linking.
This is put in a new file called link_uniform_blocks.cpp. Some new
functions related to uniform blocks are going to live in that file as
well.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lower them to arithmetic and bit manipulation expressions.
v2: Rewrite using ir_builder [for idr].
v3: Comment typos. [for mattst88]
v4: Fix arithmetic error in comments.
Factor out a shift instruction.
Don't heap allocate factory.instructions.
[for paul]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Matt Tuner <mattst88@gmail.com> (v3)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v4)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Squashed with two reverts:
Revert "android: Update for builtin_stubs.cpp move"
This reverts commit c0def90ede.
Revert "scons: Update for builtin_stubs.cpp"
This reverts commit 8ac4b82699.
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Tested-on-Android-by: Chad Versace <chad.versace@linux.intel.com>
linker.cpp is getting pretty big, and we're about to add even more
varying packing code, so split out the linker code that concerns
varyings to its own file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lowering pass generates GLSL code that manually packs varyings
into vec4 slots, for the benefit of back-ends that don't support
packed varyings natively.
No functional change--the lowering pass is not yet used.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Don't use ir_hierarchical_visitor--just loop over instructions
directly. Also, make the names of the packed varyings include the
names of the original varyings that were packed into them.
Like in src/mesa, use GLSL_BUILDDIR/GLSL_SRCDIR to unambiguously
distinguish between in-tree and generated files.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping
the ir_variable as the variable part of the offset from handle_rvalue(),
and track the constant offsets from that with a plain old integer value,
avoiding a bunch of temporary variables in the array and struct handling.
Also, fix file description doxygen.
v3: Fix a row vs col typo, and fix spelling in a comment.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Use AM_V_GEN to silence generated code rules. Add BUILT_SOURCES to CLEANFILES
v3:
- Fix an accidental // in a path
- Use automake make rules for lex/yacc rather than writing our own
- Update .gitignore appropriately
- Build a libglcpp convenience library rather than awkwardly including
the files in libglsl and delegating the generation
- Remove libglsl.a compatibility link on clean
v4:
- Automake's rules for lex/yacc make .cc if source is .ll or .yy, and apparently we
must use those extensions "because of scons", so update everywhere glsl_parser.cpp
-> glsl_parser.cc and glsl_lexer.cpp -> glsl_lexer.cc. This fixes 'make tarballs'
and building with dricore enabled.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Matt Turner <mattst88@gmail.com>
v2: Fix handling of arrays-of-structure. Thanks to Eric Anholt for
pointing this out.
v3: Minor comment change based on feedback from Ken.
Fixes piglit glsl-1.20/execution/uniform-initializer/fs-structure-array
and glsl-1.20/execution/uniform-initializer/vs-structure-array.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C++ constructors with placement new, while functional, are
extremely verbose, leading to generation of simple GLSL IR expressions
like (a * b + c * d) expanding to many lines of code and using lots of
temporary variables. By creating a new ir_builder.h that puts simple
generators in our namespace and taking advantage of ralloc_parent(),
we can generate much more compact code, at a minor runtime cost.
v2: Replace ir_instruction usage with just ir_rvalue.
v3: Drop remaining missed as_rvalue() in v2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've had this code laying around almost done for a long time. The
idea is like opt_structure_splitting, that we've got a bunch of
transforms at the GLSL IR level that only understand scalars and
vectors, which just skip complicated dereferences. While driver
backends may manage some optimization after they split matrices up
themselves, it would be better to bring all of our optimization to
bear on the problem.
While I wasn't expecting changes quite yet, a few programs end up
winning: a gstreamer convolution shader, and the Humus dynamic
branching demo:
Total instructions: 269430 -> 269342
3/2148 programs affected (0.1%)
1498 -> 1410 instructions in affected programs (5.9% reduction)
automake uses variables named *_SOURCES.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
This is similar to Gallium's existing glsl_to_tgsi::remove_output_read
lowering pass, but done entirely inside the GLSL compiler.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With the hope that Android.mk and SConscript can share the file to reduce
future breakage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>