GLSL-to-NIR generates system value usage, and vc4/freedreno would both
like the system value instead of the varying, so switch this pass over to
it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
According to Connor, it's safe to assume that the first operand of
bcsel, as well as the operand of b2f and b2i, must be well formed
booleans.
https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html
With the previous improvements to a@bool handling, this now has no
change in shader-db instruction counts on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave
differently. Having the logic spread out over three functions makes it
harder to decide where to put new logic, as well.
So, combine them all. It's a bit simpler because there's now only one
recursive function rather than a pair of mutually recursive functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, 'a@type' can only match if 'a' is produced by an ALU
instruction. This is rather limited - there are other cases we
can easily detect which we should handle.
Extending the code in-place would be fairly messy, so we introduce
a new src_is_type() helper.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The first simply picks the bany_inequal[234] opcodes based on the SSA
def's number of components. The latter implicitly compares with zero
to achieve the same semantics of GLSL's any().
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
No change except to the copyright symbol. The next patch will generate
this file with Python, and Unicode + Python = pure rage.
v2: Massive rebase... I guess a lot can change in a year.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This won't affect the output, but it was, technically, wrong.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This won't affect the output, but it was, technically, wrong.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_unop_fract already forbade integer types in ir_validate. ir_unop_rcp,
ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some shaders include code that looks like:
uniform int i;
uniform vec4 bones[...];
foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);
CSE would do some work on this:
x = i * 3
foo(bones[x], bones[x + 1], bones[x + 2]);
The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like
x = i * 3
foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);
Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.
x = i * 48;
foo(bones[x], bones[x + 16], bones[x + 32]);
So, ~6 instructions becomes ~3.
Some individual shader-db results look pretty bad. However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.
G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0
total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317
Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307
Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261
Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5
total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415
Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5
total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian recently changed the preprocessor to allow this in most GLSL
versions, but not GLSL ES 3.00+. This patch converts the existing
test that expects a failure to a #version 300 es shader, and adds
a #version 110 shader to make sure that it's allowed.
Fixes 'make check'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97307
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
For regular ast_add, we can implicitly change either a or b's type.
However in an assignment situation, the type of the lvalue is fixed. So
if the implicit conversion logic decides to change it, it means that the
rhs's type could not be converted to the lhs type.
Emit a specific error for this rather than the rather mysterious "is not
an lvalue" error that results from having a i2f or other operation as
the lvalue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96729
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The GL spec is very unclear on this point. Apparently this is discussed
without resolution in the closed Khronos bugtracker at
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=7829 . The
recommendation is to allow dropping the [0] for looking up the bindings.
The approach taken in this patch is to instead tack on [0]'s for each
arrayness level of the output's type, and doing the lookup again. That
way, for
out vec4 foo[2][2][2]
we will end up looking for bindings for foo, foo[0], foo[0][0], and
foo[0][0][0], in that order of preference.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
Old languages (GLSL <= 4.20 and GLSL ES 1.00) require "invariant"
to be specified on both inputs and outputs, and match when linking.
New languages only allow outputs to be qualified as "invariant"
and remove the "invariant must match" restriction when linking
varyings (because no input can have that qualifier).
Commit 426a50e208 introduced the new
behavior for ES 3.00. It also removed the "must match" restriction
for ES 1.00 shaders, which I believe is incorrect. This patch adds
that back, as well as making 4.30+ follow the new rules.
Thanks to Qiankun Miao for noticing this discrepancy.
Fixes a WebGL 2.0 conformance test when run in Chromium:
https://www.khronos.org/registry/webgl/sdk/tests/deqp/data/gles3/shaders/qualification_order.html?webglVersion=2
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96971
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The previous commit fixed xfb_buffer handling, which was largely copy
and pasted from the stream handling. The difference is that stream
was set in input_layout_mask, so it worked.
However, that's totally rubbish: stream is only valid on geometry shader
outputs. Presumably this was to hack around inout. Instead, apply the
solution I used in the previous fix.
Really, we just need to separate shader interface and parameter
qualifier handling so this isn't a mess, but this patch at least
tidies it slightly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
inout variables have q.in and q.out set. We were trying to set
xfb_buffer = 1 for shader output variables (and inadvertantly setting
it on inout parameters, too). But input_layout_mask doesn't have
xfb_buffer set, so it was seen as in invalid input qualifier.
This meant that all 'inout' parameters were broken.
Caught by running a WebGL conformance test in Chromium:
https://www.khronos.org/registry/webgl/sdk/tests/deqp/data/gles3/shaders/qualification_order.html?webglVersion=2
Fixes Piglit's tests/spec/glsl-4.40/compiler/inout-parameter-qualifier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Section 3.4 (Preprocessor) of the GLSL ES 3.00 spec says:
It is an error to undefine or to redefine a built-in (pre-defined)
macro name.
The GLSL ES 1.00 spec does not contain this text.
Section 3.3 (Preprocessor) of the GLSL 1.30 spec says:
#define and #undef functionality are defined as is standard for C++
preprocessors for macro definitions both with and without macro
parameters.
At least as far as I can tell GCC allow '#undef __FILE__'. Furthermore,
there are desktop OpenGL conformance tests that expect '#undef
__VERSION__' and '#undef GL_core_profile' to work.
Fixes:
GL45-CTS.shaders.preprocessor.definitions.undefine_version_vertex
GL45-CTS.shaders.preprocessor.definitions.undefine_version_fragment
GL45-CTS.shaders.preprocessor.definitions.undefine_core_profile_vertex
GL45-CTS.shaders.preprocessor.definitions.undefine_core_profile_fragment
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
Builtins already have locations assigned so this shouldn't
change anything. We want to call it earlier so we can tranform
GLSL IR to NIR earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Here a new function link_varyings_and_uniforms() is created this
should help make it easier to follow the code in link_shader()
which was getting very large.
Note the end of the new function contains a for loop with some
lowering calls that currently don't seem related to varyings or
uniforms but they are a dependancy for converting to NIR ealier
so we move things here now to keep things easy to follow.
Reviewed-by: Eric Anholt <eric@anholt.net>
If a shader has an output array, it will get treated as though it were
gl_FragData and rewritten into gl_out_FragData instances. We only want
this to happen on the actual gl_FragData and not everything else.
This is a small part of the problem pointed out by the below bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As requested with the initial creation of util/bitscan.h
now move other bitscan related functions into util.
v2: Split into two patches.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously we would not print a swizzle on ssa_52 when only its .x
component is used (as seen in the definition of ssa_53):
vec3 ssa_52 = fadd ssa_51, ssa_51
vec1 ssa_53 = flog2 ssa_52
vec1 ssa_54 = flog2 ssa_52.y
vec1 ssa_55 = flog2 ssa_52.z
But this makes the interpretation of the RHS of the definition difficult
to understand and dependent on the size of the LHS. Just print swizzles
when they are not the identity swizzle, so the previous example is now
printed as:
vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz
vec1 ssa_53 = flog2 ssa_52.x
vec1 ssa_54 = flog2 ssa_52.y
vec1 ssa_55 = flog2 ssa_52.z
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit a37e46323c.
It broke the game Overlord such that it hung a GCN GNU. While I don't know
how the hang happened because of its randomness and gfx corruption precedes
it, many of the shaders contain this:
out vec4 FragData[gl_MaxDrawBuffers];
These are largely identical, except that the GS version has a few
extra error conditions. We can just pass in the stage and skip these.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We need to subtract VARYING_SLOT_PATCH0, not VARYING_SLOT_VAR0.
Since "patch" only applies to inputs and outputs, we can just handle
this once outside the switch statement, rather than replicating the
check twice and complicating the earlier conditions.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
These are lowered to gl_TessLevel{Outer,Inner}MESA. We need them to
appear in the program resource list with their original names and types.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This assertion is bogus. Varying structs, and arrays of structs, are
allowed by GLSL, and we can see them here. While we currently don't
have any partial-variable support for those, simply returning false
and marking the entire thing as used is certainly legitimate.
I believe this is often swept under the rug by varying packing,
but that's disabled in certain tessellation situations.
Hit by 20 dEQP-GLES31.functional.tessellation.user_defined_io.* tests.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This better matches the grammar in section 4.3.9 of the GLSL 4.5 spec,
and also removes some redundant code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Similar to has_geometry_shader(), has_compute_shader(), and so on.
This will make it easier to add more conditions here later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
When an argument for a structure constructor or initializer doesn't
match the expected type, only Section 4.1.10 “Implicit Conversions”
are allowed to try to match that expected type.
From page 32 (page 38 of the PDF) of the GLSL 1.20 spec:
" The arguments to the constructor will be used to set the structure's
fields, in order, using one argument per field. Each argument must
be the same type as the field it sets, or be a type that can be
converted to the field's type according to Section 4.1.10 “Implicit
Conversions.”"
From page 35 (page 41 of the PDF) of the GLSL 4.20 spec:
" In all cases, the innermost initializer (i.e., not a list of
initializers enclosed in curly braces) applied to an object must
have the same type as the object being initialized or be a type that
can be converted to the object's type according to section 4.1.10
"Implicit Conversions". In the latter case, an implicit conversion
will be done on the initializer before the assignment is done."
v2: Remove also the now redundant constant conversion, the
constant_record_constructor helper and the replacement code
(Timothy).
Fixes GL44-CTS.shading_language_420pack.initializer_list_negative
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
v2: Refactor also the conversion to constant and replacement code
(Timothy).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Implicit conversions were added in the GLSL 1.20 spec version.
v2: Join the checks for GLSL 1.10 and ESSL (Timothy).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
I found a shader in Tales of Maj'Eyal that contains:
if ssa_21 {
block block_1:
/* preds: block_0 */
...instructions that prevent the select peephole...
vec1 32 ssa_23 = imov ssa_4
vec1 32 ssa_24 = imov ssa_4.y
vec1 32 ssa_25 = imov ssa_4.z
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
vec1 32 ssa_26 = imov ssa_4
vec1 32 ssa_27 = imov ssa_4.y
vec1 32 ssa_28 = imov ssa_4.z
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26
vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27
vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28
Here, copy propagation will bail because phis cannot perform swizzles,
and CSE won't do anything because there is no dominance relationship
between the imovs. By making nir_opt_remove_phis handle identical moves,
we can eliminate the phis and rewrite everything to use ssa_4 directly,
so all the moves become dead and get eliminated.
I don't think we need to check "exact" - just the alu sources.
Presumably phi sources should match in their exactness.
On Broadwell:
total instructions in shared programs: 11639872 -> 11638535 (-0.01%)
instructions in affected programs: 134222 -> 132885 (-1.00%)
helped: 338
HURT: 0
v2: Fix return value to be NULL, not false (caught by Iago).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>