In this patch we also copy the offset value from the ast and
implement offset linking rules by adding it to the record_compare()
function.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"Two blocks linked together in the same program with the same block
name must have the exact same set of members qualified with
offset and their integral-constant-expression values must be the
same, or a link-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This implements the rules for the offset qualifier on block members.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"The offset qualifier can only be used on block members of blocks
declared with std140 or std430 layouts."
...
"It is a compile-time error to specify an offset that is smaller than
the offset of the previous member in the block or that lies within the
previous member of the block."
...
"The specified offset must be a multiple of the base alignment of the
type of the block member it qualifies, or a compile-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Global in validation is already handled, this will do the validation
for variables, blocks and block members.
This fixes some CTS tests for the new enhanced layouts transform
feedback qualifiers.
V2: add some more valid input flags
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Previously interface blocks were giving the global default flags of
uniform blocks. This meant we could not check for invalid qualifiers
on interface blocks because they always contained invalid flags.
This changes parsing so that interface blocks now get an empty
set of layouts.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If the following patch we will stop setting these layouts by default
on interface blocks, so we need to do this to avoid hitting the
assert.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:
float((val >> 16u) & 0xffu)
float((val >> 8u) & 0xffu)
float((val >> 0u) & 0xffu)
Instead of shifting, masking, and type converting like this:
shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD
and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD
mov(8) g17<1>F g16<8,8,1>UD
shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD
and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD
mov(8) g20<1>F g19<8,8,1>UD
and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD
mov(8) g22<1>F g21<8,8,1>UD
i965 can simply extract a byte and convert to float in a single
instruction:
mov(8) g17<1>F g25.2<32,8,4>UB
mov(8) g20<1>F g25.1<32,8,4>UB
mov(8) g22<1>F g25.0<32,8,4>UB
This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.
instructions in affected programs: 28568 -> 27450 (-3.91%)
helped: 7
cycles in affected programs: 210076 -> 203144 (-3.30%)
helped: 7
This patch decreases the number of instructions in the two Unigine
programs by:
#1721: 4520 -> 4374 instructions (-3.23%)
#1706: 3752 -> 3582 instructions (-4.53%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error. The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.
Fixes four dEQP subtests:
dEQP-GLES31.functional.shaders.builtin_functions.precision.acos.
highp_compute.{scalar,vec2,vec3,vec4}
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will allow us to share the implementation while using different
polynomials for asin() and acos().
Francisco Jerez did this in the SPIR-V front-end; I'm merely porting
his idea to the GLSL world.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When we find indirect indexing into an array, the current implementation
of the array spliiting optimization pass does not look further into the
expression tree. However, if the variable expression involves variable
indexing into other arrays, we can miss that these other arrays also have
variable indexing. If that happens, the pass will crash later on after
hitting an assertion put there to ensure that split arrays are in fact
always indexed via constants:
shader_runner: opt_array_splitting.cpp:296:
void ir_array_splitting_visitor::split_deref(ir_dereference**): Assertion `constant' failed.
This patch fixes the problem by letting the pass step into the variable
index expression to identify these cases properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89607
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Commit 65dfb30 added exec_list EmptyUniformLocations, but only
initialized the list if ARB_explicit_uniform_location was enabled,
leading to crashes if the extension was not available.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The makefile was implicitly picking up YACC_HEADER_SUFFIX from the Android
build system, but this variable is now gone. Add it locally to fix the
build with AOSP master.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Commits a39a8fbbaa ("nir: move to compiler/") and eb63640c1d
("glsl: move to compiler/") broke Android builds. Fix them.
There is also a missing dependency between generated NIR headers and
several libraries. This isn't a new issue, but seems to have been
exposed by the NIR move.
Built with i915, i965, freedreno, r300g, r600g, vc4, and virgl enabled.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The two extensions are identical, and are largely taking bits of already
existing desktop functionality. We continue to do a poor job of
supporting the 'precise' keyword, just like we do on desktop.
This passes the relevant dEQP tests that I could find.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Unclear to me whether it actually is a horizontal operation that cannot
be vectorized, but the fact that i965 generates the same code in either
case makes me less interested in finding out.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94199
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously loops like
do {
// ...
} while (false);
that did not have any other loop-branch instructions would not be
unrolled. This is commonly used to wrap multiline preprocessor macros.
This produces IR like
(loop (
...
break
))
Since limiting_terminator was NULL, the loop unroller would
throw up its hands and say, "I don't know how many iterations. How
can I unroll this?"
We can detect this another way. If there is no limiting_terminator
and the only loop-branch is a break as the last IR, there's only one
iteration.
On my very old checkout of shader-db, this removes a loop from Orbital
Explorer, but it does not otherwise affect the shader. The loop removed
is the one the compiler inserts surrounding the switch statement.
This change does prevent some seriously bad code generation in some
patches to meta shaders that I recently sent out for review.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Unfortunately, glslang gives us cull/clip distance and GS streams even if
the shader doesn't use it whenever a shader is declared as version 450.
This is a glslang bug, but we can easily enough ignore it for now.
This is basically just the same atomic functions exposed by
ARB_shader_image_load_store, with one exception:
"highp float imageAtomicExchange(
coherent IMAGE_PARAMS,
float data);"
There's no float atomic exchange overload in the original
ARB_shader_image_load_store or GL 4.2, so this seems like new
functionality that requires specific back-end support and a separate
availability condition in the built-in function generator.
v2: Move image availability predicate logic into a separate static
function for clarity. Had to pull out the image_function_flags
enum from the builtin_builder class for that to be possible.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Specifically, for the case where we initialize a dmat with a source
matrix that has fewer columns/rows.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Need to set some non-zero limits for MaxCombinedUniformComponents,
otherwise we hit an "Too many <type> shader uniform components" error
in the linker.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
src/compiler/glsl/lower_discard_flow.cpp:79:1: warning: ‘ir_visitor_status {anonymous}::lower_discard_flow_visitor::visit_enter(ir_loop_jump*)’ defined but not used [-Wunused-function]
lower_discard_flow_visitor::visit_enter(ir_loop_jump *ir)
^~~~~~~~~~~~~~~~~~~~~~~~~~
The base class method that was intended to be overridden was
'visit(ir_loop_jump *ir)', not visit_enter().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
src/compiler/glsl/ast_to_hir.cpp: In function ‘unsigned int ast_process_struct_or_iface_block_members(exec_list*, _mesa_glsl_parse_state*, exec_list*, glsl_struct_field**, bool, glsl_matrix_layout, bool, ir_variable_mode, ast_type_qualifier*,
unsigned int, unsigned int)’:
src/compiler/glsl/ast_to_hir.cpp:6339:52: warning: ‘first_member_has_explicit_location’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (!layout->flags.q.explicit_location &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
((first_member_has_explicit_location &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
!qual->flags.q.explicit_location) ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(!first_member_has_explicit_location &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
qual->flags.q.explicit_location))) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both GCC and Clang disallow this, and glslang has recently started
disallowing it as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94188
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch moves the calculation of current uniforms to
link_uniforms, which makes use of UniformRemapTable which
stores all the reserved uniform locations.
Location assignment for implicit uniforms now tries to use
any gaps left in the table after the location assignment
for explicit uniforms. This gives us more space to store more
uniforms.
Patch is based on earlier patch with following changes/additions:
1: Move the counting of explicit locations to
check_explicit_uniform_locations and then pass
the number to link_assign_uniform_locations.
2: Count the number of empty slots in UniformRemapTable
and store them in a list_head.
3: Try to find an empty slot for implicit locations from
the list, if that fails resize UniformRemapTable.
Fixes following CTS tests:
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93696
"Result is the same as computing the sum of the absolute values of
OpDPdx and OpDPdy on P."
We were doing sum of absolute values of OpDPdx of P and OpDPdx of NULL.
We already have one in the IR code that can be used everywhere its
needed in the AST code so remove the one from the AST.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is usually handled by the backends in order to handle the
various interactions with the gl_*Color built-ins.
The problem is this means linking will fail if one side on the
interface adds the smooth qualifier to the varying and the other
side just uses the default even though they match.
This fixes various deqp tests. The spec is not clear what to for
desktop GL so leave it as is for now.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92743
The ImageAccess array is statically sized to MAX_IMAGE_UNIFORMS:
GLenum ImageAccess[MAX_IMAGE_UNIFORMS];
There was no bounds checking ensuring we don't overflow. Passing in a
shader with too many uniforms would cause writes to extend into other
fields, such as sh->NumImages.
Later linker checks already handle reporting an error when there are too
many images, so just avoid corrupting structures here.
This rearranges the logic a bit to look more like the sampler case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>