This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration. This changes
nir_print to print the type along with each parameter or return variable.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
NIR has never been used on IR where we haven't already done function
inlining so this code has been dead from the beginning. Let's just get rid
of it for now. We can always put it back in if we decide to use NIR for
function inlining at some point in the future.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We were failing to reset our location tracking when encountering a
NEWLINE in the <HASH> state. Rip the code from the <*>{NEWLINE} rule,
which handles this properly.
Also, update 146-version-first-hash.c to have proper expectations.
When I introduced the test, I didn't verify that the line/column
numbers were correct, and it turns out they varied based on the type
of newline ending.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94447
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Before we would always report 16 for both and we would only fail if either
one exceeded 16. Now we fail if the maximum for each is exceeded, even if
it is smaller than 16 and we report the correct maximum.
Also, expand the size of to_assign[] to 32. There is code at the top
of the function handling max_index up to 32, so this just makes the
code more consistent.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This applies the rule to empty declarations.
Fixes:
dEQP-GLES3.functional.shaders.arrays.invalid.empty_declaration_without_var_name_vertex
dEQP-GLES3.functional.shaders.arrays.invalid.empty_declaration_without_var_name_fragment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we store some member qualifiers in the interface type
we need to be more careful about rejecting shaders just because
the pointer doesn't match. Its perfectly valid for some qualifiers
such as precision to not match across shader interfaces.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This new pass lowers load/store_var intrinsics that act on indirect derefs
to if-ladder of direct load/store_var intrinsics. The if-ladders perform a
simple binary search on the indirect.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Apparently this causes a slight difference in the parser's token
expectations, leading to a different error message.
It seems harmless, but I wanted to be cautious and separate it out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I didn't want to pollute the previous patch with all the $4 -> $3
changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We now have a bigger hammer. The HASH_TOKEN NEWLINE rule still needs
to exist to ensure the 146-version-hash-first.c test still passes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We resolved the implicit version directive when processing control lines,
such as #ifdef, to ensure any built-in macros exist. However, we failed
to resolve it when handling ordinary text.
For example,
int x = __VERSION__;
should resolve __VERSION__ to 110, but since we never resolved the implicit
version, none of the built-in macros exist, so it was left as is.
This also meant we allowed the following shader to slop through:
123
#version 120
Nothing would cause the implicit version to take effect, so when we saw
the #version directive, we thought everything was peachy.
This patch makes the lexer's per-token action resolve the implicit
version on the first non-space/newline/hash token that isn't part of
a #version directive, fulfilling the GLSL language spec:
"The #version directive must occur in a shader before anything else,
except for comments and white space."
Because we emit #version as HASH_TOKEN then VERSION_TOKEN, we have to
allow HASH_TOKEN to slop through as well, so we don't resolve the
implicit version as soon as we see the # character. However, this is
fine, because the parser's HASH_TOKEN NEWLINE rule does resolve the
version, disallowing cases like:
#
#version 120
This patch also adds the above shaders as new glcpp tests.
Fixes dEQP-GLES2.functional.shaders.preprocessor.predefined_macros.
{gl_es_1_vertex,gl_es_1_fragment}.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In a shader such as:
struct S { float f; }
float identity(float S) { return S; }
we would think that "S" in "return S" referred to a structure, even
though it's shadowed by the "float S" parameter in the inner struct.
This led to the parser's grammar seeing TYPE_IDENTIFIER and getting
confused.
Fixes dEQP-GLES2.functional.shaders.scoping.valid.
function_parameter_hides_struct_type_{vertex,fragment}.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The lexer/parser use a symbol table to classify identifiers as
variables, functions, or structure types.
For some reason, we neglected to add variables in simple declarations
such as
int x = 5;
but did add subsequent variables in multi-declarations:
int x = 5, y = 6; // y gets added, but not x, for some reason
Fixes four dEQP-GLES2.functional.shaders.scoping.valid subcases:
- local_int_variable_hides_struct_type_vertex
- local_int_variable_hides_struct_type_fragment
- local_struct_variable_hides_struct_type_vertex
- local_struct_variable_hides_struct_type_fragment
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This fixes a crash in
dEQP-GLES3.functional.transform_feedback.array_element.separate.points.lowp_mat3x2
and likely others. The vertex shader has > 16 input variables (without
explicit locations), which causes us to index outside of the to_assign
array.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
From Section 4.4.5 (Uniform and Shader Storage Block Layout
Qualifiers) of the OpenGL 4.50 spec:
"The align qualifier makes the start of each block member have a
minimum byte alignment. It does not affect the internal layout
within each member, which will still follow the std140 or std430
rules. The specified alignment must be a power of 2, or a
compile-time error results.
The actual alignment of a member will be the greater of the
specified align alignment and the standard (e.g., std140) base
alignment for the member's type. The actual offset of a member is
computed as follows: If offset was declared, start with that
offset, otherwise start with the next available offset. If the
resulting offset is not a multiple of the actual alignment,
increase it to the first offset that is a multiple of the actual
alignment. This results in the actual offset the member will have.
When align is applied to an array, it affects only the start of
the array, not the array's internal stride. Both an offset and an
align qualifier can be specified on a declaration.
The align qualifier, when used on a block, has the same effect as
qualifying each member with the same align value as declared on
the block, and gets the same compile-time results and errors as if
this had been done. As described in general earlier, an individual
member can specify its own align, which overrides the block-level
align, but just for that member.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The old comment was for the location not the offset, we now use
the field for block members so mention that also.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
In this patch we also copy the offset value from the ast and
implement offset linking rules by adding it to the record_compare()
function.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"Two blocks linked together in the same program with the same block
name must have the exact same set of members qualified with
offset and their integral-constant-expression values must be the
same, or a link-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This implements the rules for the offset qualifier on block members.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"The offset qualifier can only be used on block members of blocks
declared with std140 or std430 layouts."
...
"It is a compile-time error to specify an offset that is smaller than
the offset of the previous member in the block or that lies within the
previous member of the block."
...
"The specified offset must be a multiple of the base alignment of the
type of the block member it qualifies, or a compile-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Global in validation is already handled, this will do the validation
for variables, blocks and block members.
This fixes some CTS tests for the new enhanced layouts transform
feedback qualifiers.
V2: add some more valid input flags
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Previously interface blocks were giving the global default flags of
uniform blocks. This meant we could not check for invalid qualifiers
on interface blocks because they always contained invalid flags.
This changes parsing so that interface blocks now get an empty
set of layouts.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If the following patch we will stop setting these layouts by default
on interface blocks, so we need to do this to avoid hitting the
assert.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:
float((val >> 16u) & 0xffu)
float((val >> 8u) & 0xffu)
float((val >> 0u) & 0xffu)
Instead of shifting, masking, and type converting like this:
shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD
and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD
mov(8) g17<1>F g16<8,8,1>UD
shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD
and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD
mov(8) g20<1>F g19<8,8,1>UD
and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD
mov(8) g22<1>F g21<8,8,1>UD
i965 can simply extract a byte and convert to float in a single
instruction:
mov(8) g17<1>F g25.2<32,8,4>UB
mov(8) g20<1>F g25.1<32,8,4>UB
mov(8) g22<1>F g25.0<32,8,4>UB
This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.
instructions in affected programs: 28568 -> 27450 (-3.91%)
helped: 7
cycles in affected programs: 210076 -> 203144 (-3.30%)
helped: 7
This patch decreases the number of instructions in the two Unigine
programs by:
#1721: 4520 -> 4374 instructions (-3.23%)
#1706: 3752 -> 3582 instructions (-4.53%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error. The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.
Fixes four dEQP subtests:
dEQP-GLES31.functional.shaders.builtin_functions.precision.acos.
highp_compute.{scalar,vec2,vec3,vec4}
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will allow us to share the implementation while using different
polynomials for asin() and acos().
Francisco Jerez did this in the SPIR-V front-end; I'm merely porting
his idea to the GLSL world.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When we find indirect indexing into an array, the current implementation
of the array spliiting optimization pass does not look further into the
expression tree. However, if the variable expression involves variable
indexing into other arrays, we can miss that these other arrays also have
variable indexing. If that happens, the pass will crash later on after
hitting an assertion put there to ensure that split arrays are in fact
always indexed via constants:
shader_runner: opt_array_splitting.cpp:296:
void ir_array_splitting_visitor::split_deref(ir_dereference**): Assertion `constant' failed.
This patch fixes the problem by letting the pass step into the variable
index expression to identify these cases properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89607
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Commit 65dfb30 added exec_list EmptyUniformLocations, but only
initialized the list if ARB_explicit_uniform_location was enabled,
leading to crashes if the extension was not available.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>