Constant propagation on arrays doesn't make a lot of sense. If the
array is only accessed with constant indexes, then opt_array_splitting
would split it up. Otherwise, we have variable indexing. If there's
multiple accesses, then constant propagation would end up replicating
the data.
The lower_const_arrays_to_uniforms pass creates uniforms for each
ir_constant with array type that it encounters. This means that it
creates redundant uniforms for each copy of the constant, which means
uploading too much data. It can even mean exceeding the maximum number
of uniform components, causing link failures.
We could try and teach the pass to de-duplicate the data by hashing
constants, but it makes more sense to avoid duplicating it in the first
place. We should promote constant arrays to uniforms, then propagate
the uniform access.
Fixes the TressFX shaders from Tomb Raider, which exceeded the maximum
number of uniform components by a huge margin and failed to link.
On Broadwell:
total instructions in shared programs: 9067702 -> 9068202 (0.01%)
instructions in affected programs: 10335 -> 10835 (4.84%)
helped: 10 (Hoard, Shadow of Mordor, Amnesia: The Dark Descent)
HURT: 20 (Natural Selection 2)
loops in affected programs: 4 -> 0
The hurt programs appear to no longer have a constarray uniform, as
all constants were successfully propagated. Apparently before this
patch, we successfully unrolled a loop containing array access, but
only after promoting constant arrays to uniforms. With this patch,
we unroll it first, so all array access is direct, and the array
is split up, and individual constants are propagated. This seems
better.
Cc: mesa-stable@lists.freedesktop.org
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
There's really no point in looking at ir_dereference_array of a
constant. It also misses cases like:
(assign () (var_ref tmp) (constant (array ...) ...))
No changes in shader-db, but keeps it working after the next commit.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The new uniform may need precise as well.
Fixes copy propagation of constant array uniforms in Tomb Raider shaders.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Previously, we failed to split constant arrays. Code such as
int[2] numbers = int[](1, 2);
would generates a whole-array assignment:
(assign () (var_ref numbers)
(constant (array int 4) (constant int 1) (constant int 2)))
opt_array_splitting generally tried to visit ir_dereference_array nodes,
and avoid recursing into the inner ir_dereference_variable. So if it
ever saw a ir_dereference_variable, it assumed this was a whole-array
read and bailed. However, in the above case, there's no array deref,
and we can totally handle it - we just have to "unroll" the assignment,
creating assignments for each element.
This was mitigated by the fact that we constant propagate whole arrays,
so a dereference of a single component would usually get the desired
single value anyway. However, I plan to stop doing that shortly;
early experiments with disabling constant propagation of arrays
revealed this shortcoming.
This patch causes some arrays in Gl32GSCloth's geometry shaders to be
split, which allows other optimizations to eliminate unused GS inputs.
The VS then doesn't have to write them, which eliminates the entire VS
(5 -> 2 instructions). It still renders correctly.
No other change in shader-db.
v2: Drop !AOA check and improve a comment (feedback from Tim Arceri).
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
opt_constant_propagation.cpp contains constant folding code which can
actually do constant propagation in some cases. It was happily
propagating constants into the left-hand-side of assignments.
For example,
(assign () (var_ref temp) (constant ...))
would brilliantly be turned into:
(assign () (constant ...) (constant ....))
This is a bigger hammer than necessary - it prevents propagation
into the left-hand-side altogether. We could certainly do better
someday. Notably, the constant propagation pass itself already
takes this approach - it's just the constant propagation pass's
built-in constant folding code (which actually propagates, too)
that was broken.
No change in shader-db, but prevents regressions after future commits.
It seems plausible that this could be hit today, but I haven't seen it
happen.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We already store these in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.
This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We already store this in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.
This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.
V2: remove unnecessary NULL check
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral <itoral@igalia.com>
There's special logic around finding gl_FragData. It latches onto any
array with FRAG_RESULT_DATA0. However gl_SecondaryFragDataEXT[], added
by GL_EXT_blend_func_extended, fits those parameters as well. The real
frag data array should have index 0 though, so we can use that to
distinguish them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96617
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously some callers of precision_qualifier_allowed would strip the
arrayness from the type and some would not. As a result, some places
would not notice that float[6], for example, needed a precision
qualifier.
Fixes the new piglit test no-default-float-array-precision.frag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Instead use the internal gl_shader_stage enum everywhere. This
makes things more consistent and gets rid of unnecessary
conversions.
Ideally it would be nice to remove the Type field from gl_shader
altogether but currently it is used to differentiate between
gl_shader and gl_shader_program in the ShaderObjects hash table.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 has no special hardware for this, so the best way to implement
this is to pass it in via a uniform.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
i965 has no special hardware for this, so we need to pass this value in
as a uniform (unless the TES is linked against a TCS, in which case the
linker can just replace this with a constant).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Built-in variable "MaxCombinedShaderStorageBlocks" was added to GLSL 4.40
revision 9.
Section "1.2.1 Changes since revision 8 of GLSL version 4.40",
page 3 of the PDF states:
"Bug 11734: Add gl_MaxCombinedShaderOutputResources and mark
gl_MaxCombinedImageUnitsAndFragmentOutputs as deprecated."
Fixes: GL44-CTS.shader_image_load_store.basic-glsl-const
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This change makes sure to remove arrays when checking if type
is a double.
The check for the end of the first slot of a multi-slot double
is also fixed by bumping the check to 4 rather than 3.
Previously we were we not reserving the last component.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since this extension allows more than one varying to share a single
location we can't just count the number of slots a varying takes and
add it to the total.
Instead we now reuse the reserved varyings bitfield to determine how
many slots are reserved for explicit locations instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Adding 64-bit integers support was going to make this file worse,
just remove the tabs from it now.
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the future int64 support will have the same requirements.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for int64 support, changing
places where 64-bit matters no doubles.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.
These uniform types don't use this storage.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Could cause issues if you tried to read from an uninitialised pointer.
This just initalises the pointer to null to avoid that being a problem.
Discovered by Coverity.
CID: 1343616
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We've had a FINISHME here since Eric originally wrote the code in 2011.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are basically a wash:
No change in instruction counts.
total cycles in shared programs: 78685980 -> 78680730 (-0.01%)
cycles in affected programs: 2102646 -> 2097396 (-0.25%)
helped: 48
HURT: 83
I figured if we're going to do this for one copy propagation pass,
we may as well do it in both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've had a FINISHME here since Eric originally wrote the code in 2010.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are not terribly impressive:
total instructions in shared programs: 9008589 -> 9008613 (0.00%)
instructions in affected programs: 4293 -> 4317 (0.56%)
helped: 0
HURT: 10
total cycles in shared programs: 78550978 -> 78575760 (0.03%)
cycles in affected programs: 655426 -> 680208 (3.78%)
helped: 75
HURT: 88
GAINED: 2
Most of the "regressions" appear to be us successfully copy propagating
uniforms, which i965 is loading as pull constants instead of push, so we
occasionally have two pulls instead of one. That doesn't seem like this
pass's job - it's propagating correctly, and we should be smarter about
pull loads in the backend.
This patch is also useful for a couple of reasons:
1. It can clean up copies created by varying packing (previously, we
couldn't if the uses were inside a loop).
This fixes a bug when interpolateAt*() is used on a packed varying
inside a loop: glsl_to_nir struggles to see through the extra copy
and mistakenly believed the variable was not an input.
2. It will help propagate uniform array access created by
lower_const_array_to_uniforms().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."
Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.
This makes things a bit nicer in gdb, and improves type
safety.
v2: fix a few pieces of interface I missed that caused some
piglit regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().
Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in
v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };
The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.
v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When lowering, we always want to use the clip dist varying.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Because apparently layout(max_vertices=0) is a thing.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
e2791b38b4
mesa/program_interface_query: fix transform feedback varyings.
caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.
The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit aac90ba292.
The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom
Also the CTS test it was meant to fix seems like it may be bogus.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
The CTS test:
GL45-CTS.multi_bind.dispatch_bind_image_textures
binds 192 image uniforms, we reject this later,
but not until after we trash the contents of the
struct gl_shader.
Error now reads:
Too many compute shader image uniforms (192 > 16)
instead of
Too many compute shader image uniforms (2745344416 > 16)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This partially fixes CTS test:
GL44-CTS.enhanced_layouts.xfb_get_program_resource_api
The test now fails at a tes evaluation shader with unsized output arrays.
The ARB_enhanced_layouts spec says:
"It is a compile-time error to apply xfb_offset to the declaration of an
unsized array."
So this seems like a bug in the CTS.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The spec says gl_NextBuffer and gl_SkipComponents need to be
returned to userspace in the program interface queries.
We currently throw those away, this requires a complete piglit
run to make sure no drivers fallover due to the extra varyings.
This fixes:
GL45-CTS.program_interface_query.transform-feedback-built-in
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These types can't be returned.
This fixes:
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
for the return type case.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This stops the offset being bumped again when and an explicit
alignment has already been applied.
Fixes alignment issues in:
GL44-CTS.enhanced_layouts.uniform_block_alignment
Note the test still fails due to unrelated issues with doubles.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The old code called this on the prelinked shader list,
but at this point we have the linked shader, so we should
call the interface on that alone.
This fixes a regression in:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13
introduced in
5b2675093e
glsl: handle implicit sized arrays in ssbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reported-by: Mark James
Signed-off-by: Dave Airlie <airlied@redhat.com>
v2: Also support GL_EXT_shader_io_blocks. It's pretty much identical to
the OES extension. Suggested by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The interface type, interpolation mode, precision, the type of the
outermost structure, and whether or not the variable has an explicit
location will be used for SSO validation on OpenGL ES.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>