Similar to has_geometry_shader(), has_compute_shader(), and so on.
This will make it easier to add more conditions here later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
When an argument for a structure constructor or initializer doesn't
match the expected type, only Section 4.1.10 “Implicit Conversions”
are allowed to try to match that expected type.
From page 32 (page 38 of the PDF) of the GLSL 1.20 spec:
" The arguments to the constructor will be used to set the structure's
fields, in order, using one argument per field. Each argument must
be the same type as the field it sets, or be a type that can be
converted to the field's type according to Section 4.1.10 “Implicit
Conversions.”"
From page 35 (page 41 of the PDF) of the GLSL 4.20 spec:
" In all cases, the innermost initializer (i.e., not a list of
initializers enclosed in curly braces) applied to an object must
have the same type as the object being initialized or be a type that
can be converted to the object's type according to section 4.1.10
"Implicit Conversions". In the latter case, an implicit conversion
will be done on the initializer before the assignment is done."
v2: Remove also the now redundant constant conversion, the
constant_record_constructor helper and the replacement code
(Timothy).
Fixes GL44-CTS.shading_language_420pack.initializer_list_negative
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
v2: Refactor also the conversion to constant and replacement code
(Timothy).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Implicit conversions were added in the GLSL 1.20 spec version.
v2: Join the checks for GLSL 1.10 and ESSL (Timothy).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
I found a shader in Tales of Maj'Eyal that contains:
if ssa_21 {
block block_1:
/* preds: block_0 */
...instructions that prevent the select peephole...
vec1 32 ssa_23 = imov ssa_4
vec1 32 ssa_24 = imov ssa_4.y
vec1 32 ssa_25 = imov ssa_4.z
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
vec1 32 ssa_26 = imov ssa_4
vec1 32 ssa_27 = imov ssa_4.y
vec1 32 ssa_28 = imov ssa_4.z
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26
vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27
vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28
Here, copy propagation will bail because phis cannot perform swizzles,
and CSE won't do anything because there is no dominance relationship
between the imovs. By making nir_opt_remove_phis handle identical moves,
we can eliminate the phis and rewrite everything to use ssa_4 directly,
so all the moves become dead and get eliminated.
I don't think we need to check "exact" - just the alu sources.
Presumably phi sources should match in their exactness.
On Broadwell:
total instructions in shared programs: 11639872 -> 11638535 (-0.01%)
instructions in affected programs: 134222 -> 132885 (-1.00%)
helped: 338
HURT: 0
v2: Fix return value to be NULL, not false (caught by Iago).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
nir_opt_peephole_select has the job of removing IF statements with no side
effects. However, if the IF statement's successor didn't have any
instructions in it, we were skipping it, which occurred in mupen64 on vc4
with glsl_to_nir enabled:
instructions in affected programs: 6134 -> 4120 (-32.83%)
total uniforms in shared programs: 38268 -> 38219 (-0.13%)
No changes on Haswell shader-db.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Looks like a copy and paste error from f752effa08
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
AST_NUM_OPERATORS stores the dimension of the ast_operators
enumeration but was not updated after its last modification.
This doesn't add any real modification for any code paths but it makes
sense for coherence.
v2 (Eric Engestrom): Just place the define at the end of the
enumeration, not below.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This was added with ARB_enhanced_layouts.
V2: Add an extra format specifier for the new qualifier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Valgrind detected that variable ir_copy_propagation_visitor::killed_all
is uninitialized.
Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The order of optimizations can lead to the conditional discard optimization
being applied twice to the same discard statement. In this case, we must
ensure that both conditions are applied.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96762
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are only used by get_matching_input() which has been call
at this point so free the hash tables.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were only restricting based on ES/non-ES-ness and whether
the overall enable bit had been flipped on. However we have been adding
more fine-grained restrictions, such as based on compat profiles, as
well as specific ES versions. Most of the time this doesn't matter, but
it can create awkward situations and duplication of logic.
Here we separate the main extension table into a separate object file,
linked to the glsl compiler, which makes use of it with a custom
function which takes the ES-ness of the shader into account (thus
allowing desktop shaders to properly use ES extensions that would
otherwise have been disallowed.) We can also now use this logic to
generate #define's for all supported extensions automatically, removing
the duplicate (and often inaccurate) list in glcpp.
The effect of this change should be nil in most cases. However in some
situations, extensions like GL_ARB_gpu_shader5 which were formerly
available in compat contexts on the GLSL side of things will now become
inaccessible.
This regresses two ES CTS tests:
ES3-CTS.shaders.shader_integer_mix.define
ES31-CTS.shader_integer_mix.define
however that is due to them using #version 100 instead of 300 es. As the
extension is only defined for ES3, I believe this is the correct
behavior.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v2)
v2 -> v3: integrate glcpp defines into the same mechanism
"flat centroid" and "flat sample" both just mean "flat", so we should
ignore interpolateAtCentroid/Sample and just return the flat value.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For lod query instructions, we really don't care whether or not the sampler
is an array type because that doesn't factor into the LOD.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
On i965, we can't support coordinate offsets for texelFetch or rectangle
textures. Previously, we were doing this with a GLSL pass but we need to
do it in NIR if we want those workarounds for SPIR-V.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
While SPIR-V technically doesn't support "old style" shadow, the
shadow-compare gather instruction does return a vec4 so we need to be able
to set the old_style_shadow bit in NIR.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
We can't get an lod with txf_ms and SPIR-V considers textureGrad to be an
explicit-LOD texturing instruction.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
subroutine variables are to be used just in the way functions are
called. Although the spec doesn't say it explicitely, this means that
these variables are not to be used in any other way than those left
for function calls. Therefore, a comparison between 2 subroutine
variables should also cause a compilation error.
From The OpenGL® Shading Language 4.40, page 117:
" To use subroutines, a subroutine type is declared, one or more
functions are associated with that subroutine type, and a
subroutine variable of that type is declared. The function
currently assigned to the variable function is then called by
using function calling syntax replacing a function name with the
name of the subroutine variable. Subroutine variables are
uniforms, and are assigned to specific functions only through
commands (UniformSubroutinesuiv) in the OpenGL API."
From The OpenGL® Shading Language 4.40, page 118:
" Subroutine uniform variables are called the same way functions
are called. When a subroutine variable (or an element of a
subroutine variable array) is associated with a particular
function, all function calls through that variable will call that
particular function."
Fixes GL44-CTS.shader_subroutine.subroutines_cannot_be_assigned_float_int_values_or_be_compared
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 52e75dcb8c made nir_lower_io
start using nir_intrinsic_set_base instead of writing const_index[0]
directly. However, those intrinsics apparently don't /have/ a base,
so this caused assert failures.
However, the old code was happily setting non-existent const_index
fields, so it was pretty bogus too.
Jason pointed out that load_shared and store_shared have a base,
and that the i965 driver uses that field. So presumably atomics
should have one as well, so that loads/stores/atomics all refer
to variables with consistent addressing.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We can still do packing we just need to get the packing type from the consumer
rather than the producer.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97033
This makes sure we give the correct driver location
for doubles when using component packing. Specifically
it handles packing a dvec3 with a double which is the
only packing scenario allowed which spans across two
locations.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Now nir_lower_io can optionally produce load_interpolated_input
and load_barycentric_* intrinsics for fragment shader inputs.
flat inputs continue using regular load_input.
v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean
passing (in response to review feedback from Chris Forbes).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Backends can normally handle shader inputs solely by looking at
load_input intrinsics, and ignore the nir_variables in nir->inputs.
One exception is fragment shader inputs. load_input doesn't capture
the necessary interpolation information - flat, smooth, noperspective
mode, and centroid, sample, or pixel for the location. This means
that backends have to interpolate based on the nir_variables, then
associate those with the load_input intrinsics (say, by storing a
map of which variables are at which locations).
With GL_ARB_enhanced_layouts, we're going to have multiple varyings
packed into a single vec4 location. The intrinsics make this easy:
simply load N components from location <loc, component>. However,
working with variables and correlating the two is very awkward; we'd
much rather have intrinsics capture all the necessary information.
Fragment shader input interpolation typically works by producing a
set of barycentric coordinates, then using those to do a linear
interpolation between the values at the triangle's corners.
We represent this by introducing five new load_barycentric_* intrinsics:
- load_barycentric_pixel (ordinary variable)
- load_barycentric_centroid (centroid qualified variable)
- load_barycentric_sample (sample qualified variable)
- load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample())
- load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset())
Each of these take the interpolation mode (smooth or noperspective only)
as a const_index, and produce a vec2. The last two also take a sample
or offset source.
We then introduce a new load_interpolated_input intrinsic, which
is like a normal load_input intrinsic, but with an additional
barycentric coordinate source.
The intention is that flat inputs will still use regular load_input
intrinsics. This makes them distinguishable from normal inputs that
need fancy interpolation, while also providing all the necessary data.
This nicely unifies regular inputs and interpolateAt functions.
Qualifiers and variables become irrelevant; there are just
load_barycentric intrinsics that determine the interpolation.
v2: Document the interp_mode const_index value, define a new
BARYCENTRIC() helper rather than using SYSTEM_VALUE() for
some of them (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For intrinsics we don't care about, just skip to the next loop iteration
and process the next instruction. We don't want to execute the rest of
the code.
This was a bug in commit cdfc05ea6e.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass. The code generated for frexp() uses
fabs, and this resulted in an extra instruction. Ultimately I ended up
not using frexp.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This isn't the lowering pass you want. Most GPUs that can support GLSL
1.30 have a multiply unit that can do something more interesting than
32x32->32. Many have 32x16->48. Any GPU that does, should do the
lowering in the backend. This is just the thing that will always work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>