Found by address sanitizer:
==22621==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61400000cbd8 at pc 0x7f561610a4ff bp 0x7ffca85f9d50 sp 0x7ffca85f94f8
READ of size 344 at 0x61400000cbd8 thread T0
#0 0x7f561610a4fe (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x5f4fe)
#1 0x7f560bb305a5 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 0x7f560bb305a5 in blob_write_bytes ../../../mesa-src/src/compiler/glsl/blob.c:136
#3 0x7f560be7d7ff in encode_type_to_blob ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:153
#4 0x7f560be81222 in write_program_resource_data ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:950
#5 0x7f560be81222 in write_program_resource_list ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1118
#6 0x7f560be81222 in shader_cache_write_program_metadata(gl_context*, gl_shader_program*) ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1407
#7 0x7f560b825fdb in link_program ../../../mesa-src/src/mesa/main/shaderapi.c:1163
Fixes: 073a84ff60 ("glsl: stop adding pointers from glsl_struct_field to the cache")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Steam is already analysing cache items, unfortunatly we did not
introduce a versioning mechanism for identifying structural changes
to cache entries earlier so the only way to do so is to rename the
cache directory.
Since we are renaming it we take the opportunity to give the directory
a more meaningful name.
Adding a version field to the header of cache entries will help us to
avoid having to rename the directory in future. Please note this is
versioning for the internal structure of the entries as defined in
disk_cache.{c,h} as opposed to the structure of the data provided to
the disk cache by the GLSL compiler and the various driver backends.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In the following patch we will stop writing the pointer to cache.
Unfortunately adding empty strings to that cache seems to be the
only thing we can do here once we no longer have the pointers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
copy_constant_to_storage, set_uniform_initializer,
populate_consumer_input_sets, and get_matching_input are all used by
tests in src/compiler/glsl/tests:
glsl/tests/varyings_test.o: In function `link_varyings_single_simple_input_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:131: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_gl_ClipDistance_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:159: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_gl_CullDistance_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:186: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_single_interface_input_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:208: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_one_interface_and_one_simple_input_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:241: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o:src/compiler/glsl/tests/varyings_test.cpp:272: more undefined references to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)' follow
glsl/tests/varyings_test.o: In function `link_varyings_interface_field_doesnt_match_noninterface_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:289: undefined reference to `linker::get_matching_input(void*, ir_variable const*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_interface_field_doesnt_match_noninterface_vice_versa_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:314: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
src/compiler/glsl/tests/varyings_test.cpp:328: undefined reference to `linker::get_matching_input(void*, ir_variable const*, hash_table*, hash_table*, ir_variable**)'
Fixes: ca73c3358c ("glsl: Mark functions static")
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Here we also make use of the UseSTD430AsDefaultPacking constant
and call the new get_internal_ifc_packing() helper.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes KHR-GL45.shader_ballot_tests.ShaderBallotAvailability, which
causes some silly swizzles to appear, triggering this optimization to
get hit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
This fixes an assert during IR validation in LLVMpipe.
Fixes: e2e2c5abd2 (glsl: calculate number of operands in an expression once)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102274
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
We continue in the code to do some more things with the rhs, including
setting a constant initializer. If the type is wrong, this causes some
confusion down the line, leading to assertions. This makes sure that the
rhs processing continues to flow as-if the type was correct to start
with (even though the state has been marked as an error state).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101766
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
The cloning was introduced in f81ede4699 to fix a problem with
shaders including IR that was owned by builtins.
However the approach of cloning the whole function each time we
reference a builtin lead to a significant reduction in the GLSL
IR compilers performance.
The previous patch fixes the ownership problem in a more precise
way. So we can now remove this cloning.
Testing on a Ryzen 7 1800X shows a ~15% decreases in compiling the
Deus Ex: Mankind Divided shaders on radeonsi (which take 5min+ on
some machines). Looking just at the GLSL IR compiler the speed up
is ~40%.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The main motivation for this is that threaded compilation can fall
over if we were to allocate IR inside constant_expression_value()
when calling it on a builtin. This is because builtins are shared
across the whole OpenGL context.
f81ede4699 worked around the problem by cloning the entire
builtin before constant_expression_value() could be called on
it. However cloning the whole function each time we referenced
it lead to a significant reduction in the GLSL IR compiler
performance. This change along with the following patch
helps fix that performance regression.
Other advantages are that we reduce the number of calls to
ralloc_parent(), and for loop unrolling we free constants after
they are used rather than leaving them hanging around.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Deus Ex: Mankind Divided shaders go from spending ~20 seconds
in the GLSL IR compilers front-end down to ~18.5 seconds on a
Ryzen 1800X.
Tested by compiling once with shader-db then deleting the index file
from the shader cache and compiling again.
v2:
- fix rebasing issue in v1
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
We are currently copying the name for each member dereference
but we can just share a single instance of the string provided
by the type.
This change also stops us recalculating the field index
repeatedly.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Also add a comment that this should only be used by the ir_reader
interface for testing purposes.
v2:
- fix grammar in comment
- use unreachable rather than assert
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Extra validation is added to ir_validate to make sure this is
always updated to the correct numer of operands, as passes like
lower_instructions modify the instructions directly rather then
generating a new one.
The reduction in time is so small that it is not really
measurable. However callgrind was reporting this function as
being called just under 34 million times while compiling the
Deus Ex shaders (just pre-linking was profiled) with 0.20%
spent in this function.
v2:
- make num_operands a unit8_t
- fix unsigned/signed mismatches
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Other ones are either unsupported or don't have any helper
function checks.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Check if shaders have transform feedback varyings also after the
post-link step.
This fixes:
KHR-GL45.enhanced_layouts.xfb_vertex_streams
piglit/spec/arb_enhanced_layouts/gs-stream-location-aliasing
v2: add claryfing comments (Timothy)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The optimizations are only valid for 32-bit integers. They were
mistakenly firing for 64-bit integers as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we have an interface block like:
layout (xfb_buffer = 0, xfb_offset = 0) out Block {
vec4 var1;
layout (xfb_stride = 48) vec4 var2;
vec4 var3;
};
According to ARB_enhanced_layouts spec:
"The *xfb_stride* qualifier specifies how many bytes are consumed by
each captured vertex. It applies to the transform feedback buffer
for that declaration, whether it is inherited or explicitly
declared. It can be applied to variables, blocks, block members, or
just the qualifier out. [ ...] While *xfb_stride* can be declared
multiple times for the same buffer, it is a compile-time or
link-time error to have different values specified for the stride
for the same buffer."
This means xfb_stride actually applies to the buffer, and not to the
individual components.
In the above example, it means that var2 consumes 16 bytes, and var3 is
at offset 32.
This has been confirmed also by John Kessenich, the main contact for the
ARB_enhanced_layouts specs, and also because this commit fixes:
GL45.enhanced_layouts.xfb_block_member_stride
This commit is in practice a revert of 598790e856 (glsl: apply
xfb_stride to implicit offsets for ifc block members).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is a further lowering of default-block uniform loads that transforms
load_uniform intrinsics into load_ubo intrinsics. This simplifies the rest
of the backend.
v2: transform from load_uniform instead of straight from variables
Reviewed-by: Eric Anholt <eric@anholt.net>
This pass is a replacement for the nir_lower_samplers pass, which has the
advantage of keeping sampler references as derefs. This allows a unified
treatment of texture instructions and image intrinsics in the backend.
Some drivers prefer to treat gl_FragCoord as a system value rather than
a fragment shader input, see Const.GLSLFragCoordIsSysVal.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From the ARB_uniform_buffer_object spec:
""shared" uniform blocks, the default layout, ..."
This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was added in 2d03f48a65 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's incorrect to use $(LOCAL_PATH) in makefile recipes since it's
changing. The typical way to handle it is to use private variable.
Fortunately in this case we can just simplify them to $^.
See further:
https://patchwork.freedesktop.org/patch/167718/
Also simplify LOCAL_GENERATED_SOURCES.
Fixes: 2dd4e2ec (spirv: Generate spirv_info.c)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
current build did not find required include 'spirv_info.h'
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.
Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>