This increases memory pressure during linking but makes it easier
for backend to free IR after it is not needed anymore.
v2: use resource list as ralloc context in case of relink (Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
This fixes compilation failures in Dota 2 Reborn where a texture unit
binding point was used that was numerically higher than the max
per stage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
These are basically just moves, so they should be safe as well.
When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:
if ssa_21 {
block block_1:
/* preds: block_0 */
vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2
Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass. But with
the vec4 operation, they were not. We want to keep getting selects.
Normal i965 on Broadwell:
instructions in affected programs: 200 -> 176 (-12.00%)
helped: 4
With brw_fs_channel_expressions() disabled:
instructions in affected programs: 1832 -> 1646 (-10.15%)
helped: 30
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
It's not totally clear whether other Mesa drivers can safely cope with
over-sized UBOs, but at least for llvmpipe receiving a UBO larger than
its limit causes problems, as it won't fit into its internal display
lists.
This fixes piglit "arb_uniform_buffer_object-maxuniformblocksize
fsexceed" without regressions for llvmpipe.
NVIDIA driver also fails to link the shader from
"arb_uniform_buffer_object-maxuniformblocksize fsexceed".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65525
PS: I don't recommend cherry-picking this for Mesa stable, as some app
might inadvertently been relying on UBOs larger than
GL_MAX_UNIFORM_BLOCK_SIZE to work on other drivers, so even if this
commit is universally accepted it's probably best to let it mature in
master for a while.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gl_NumSamples should only be enabled when ARB_sample_shading is enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
A number of builtin variables have checks based on the extension being
enabled, but were missing enablement via a higher GLSL version.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
This allows mod(int, int) to become selected as float mod when doubles
are supported.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
This reverts commit adee54f826.
Further down in the GLSL ES 3.10 spec it say:
"If an array is declared as the last member of a shader storage block
and the size is not specified at compile-time, it is sized at run-time.
In all other cases, arrays are sized only at compile-time."
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Rather than forcing everyone to provide their own definition of the symbol
provide a common (dummy) one.
This helps us resolve the build of the standalone pipe-drivers (amongst
others), which are missing the symbol.
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Without this patch, the following constructs (not an extensive list)
would crash mesa:
- mat2 foo = mat2(1); vec4 bar = vec4(foo);
- mat3 foo = mat3(1); vec4 bar = vec4(foo);
- mat3 foo = mat3(1); ivec4 bar = ivec4(foo);
The first case is explicitely allowed by the GLSL spec, as seen on
page 101 of the GLSL 4.40 spec:
"vec4(mat2) // the vec4 is column 0 followed by column 1"
The other cases are implicitely allowed also.
The actual changes are quite minimal. We first split each column of
the matrix to a list of vectors and then use them to initialize the
vector. An additional check to make sure that we are not trying to
copy 0 elements of a vector fix the (i)vec4(mat3) case as the last
vector (3rd column) is not needed at all.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
On Lollipop, apparently stlport is gone and libcxx must be used instead.
We still support stlport when building on earlier android releases.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
Flagged by Oracle's parfait static analyzer:
Error: Format string argument mismatch (CWE 628)
In call to printf with format string "usage: %s [options] <file.vert | file.geom | file.frag>\n\nPossible options are:\n"
Too many arguments for format string (got more than 1 arguments)
at line 285 of src/glsl/main.cpp in function 'usage_fail'.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This change introduces a new field in gl_uniform_storage to
explicitely say that a uniform is built-in. In the case where it is,
no storage is defined to make it clear that it is read-only from the
mesa side. I fixed all the places in the code that made use of the
structure that I changed. Any place making a wrong assumption and using
the storage straight away will just crash.
This patch seems to implement the path of least resistance towards
listing built-in uniforms in GL_ACTIVE_UNIFORM (and other APIs).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.
This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.
The crash can be reproduced ~20% of times with the dEQP test:
dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When we compute the output swizzle we want to consider the number of
components in the add operation. So far we were using the writemask
of the multiplication for this instead, which is not correct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This makes piglit mixing-clip-distance-and-clip-vertex-disallowed have 0
definitely lost blocks with valgrind. (Same non-0 number of possibly
lost blocks though.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Some shaders in Civilization V and Beyond Earth do
pow(pow(x, 2.2), 0.454545)
which is converting to and from sRGB colorspace.
A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.
instructions in affected programs: 934 -> 886 (-5.14%)
helped: 16
We now have is_array() and without_array() that make the
code much clearer and remove the need for this.
For all remaining calls to this we already knew that
the type was an array so returning a null wasn't adding any value.
v2: use without_array() in _mesa_ast_array_index_to_hir() and don't use
without_array() in lower_clip_distance_visitor() as we want to make sure the
array is 2D.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics. However, this set to 1
by every pass that ever creates a load/store intrinsic. Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1. Let's just delete it.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Patch marks uniforms inside UBO properly referenced by stages.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90397
This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.
I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:
ES3-CTS.shaders.struct.uniform.sampler_array_fragment
v2: remove unnecessary comment (Topi)
simplify changes and the overall code (Jason)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
Previously, this case was being handled in match_expression prior to
calling match_value. However, there is really no good reason for this
given that match_value has all of the information it needs. Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit switches us from the current setup of using hash sets for
use/def sets to using linked lists. Doing so should save us quite a bit of
memory because we aren't carrying around 3 hash sets per register and 2 per
SSA value. It should also save us CPU time because adding/removing things
from use/def sets is 4 pointer manipulations instead of a hash lookup.
Running shader-db 50 times with USE_NIR=0, NIR, and NIR + use/def lists:
GLSL IR Only: 586.4 +/- 1.653833
NIR with hash sets: 675.4 +/- 2.502108
NIR + use/def lists: 641.2 +/- 1.557043
I also ran a memory usage experiment with Ken's patch to delete GLSL IR and
keep NIR. This patch cuts an aditional 42.9 MiB of ralloc'd memory over
and above what we gained by deleting the GLSL IR on the same dota trace.
On the code complexity side of things, some things are now much easier and
others are a bit harder. One of the operations we perform constantly in
optimization passes is to replace one source with another. Due to the fact
that an instruction can use the same SSA value multiple times, we had to
iterate through the sources of the instruction and determine if the use we
were replacing was the only one before removing it from the set of uses.
With this patch, uses are per-source not per-instruction so we can just
remove it safely. On the other hand, trying to iterate over all of the
instructions that use a given value is more difficult. Fortunately, the
two places we do that are the ffma peephole where it doesn't matter and GCM
where we already gracefully handle duplicates visits to an instruction.
Another aspect here is that using linked lists in this way can be tricky to
get right. With sets, things were quite forgiving and the worst that
happened if you didn't properly remove a use was that it would get caught
in the validator. With linked lists, it can lead to linked list corruption
which can be harder to track. However, we do just as much validation of
the linked lists as we did of the sets so the validator should still catch
these problems. While working on this series, the vast majority of the
bugs I had to fix were caught by assertions. I don't think the lists are
going to be that much worse than the sets.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>