NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan. So parse and propagate
those up to shader_info.h.
v2: Do not fail when ARB_compute_variable_group_size is being used,
since we are still clarifying what is the right thing to do here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.
v2: Set ES 3.0 as minimum GLES version as required by the extension
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently we only add a cache key for a shader once it is linked.
However games like Team Fortress 2 compile a whole bunch of shaders
which are never actually linked. These compiled shaders can take
up a bunch of memory.
This patch changes things so that we add the key for the shader to
the cache as soon as it is compiled. This means on a warm cache we
can avoid the wasted memory from these shaders. Worst case scenario
is we need to compile the shaders at link time but this can happen
anyway if the shader has been evicted from the cache.
Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
warm cache from start up to the game menu.
V2: only add key to cache when compilation is successful.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Currently we only add a cache key for a shader once it is linked.
However games like Team Fortress 2 compile a whole bunch of shaders
which are never actually linked. These compiled shaders can take
up a bunch of memory.
This patch changes things so that we add the key for the shader to
the cache as soon as it is compiled. This means on a warm cache we
can avoid the wasted memory from these shaders. Worst case scenario
is we need to compile the shaders at link time but this can happen
anyway if the shader has been evicted from the cache.
Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
warm cache from start up to the game menu.
Acked-by: Marek Olšák <marek.olsak@amd.com>
This basically reverts c2bc0aa7b1.
By running the opts we reduce memory using in Team Fortress 2
from 1.5GB -> 1.3GB from start-up to game menu.
This will likely increase Deus Ex start up times as per commit
c2bc0aa7b1. However currently 32bit games like Team Fortress 2
can run out of memory on low memory systems, so that seems more
important.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This extension is not properly tested (testing for
GL_ARB_fragment_shader_interlock is not sufficient), and since this was
noted in review on August 28th no tests have been sent.
Revert "i965: Add INTEL_fragment_shader_ordering support."
Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering"
This reverts commit 03ecec9ed2.
This reverts commit 119435c877.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Anholt <eric@anholt.net>
Use #pragma warning(off) and #pragma warning(on) to disable or enable
all warnings. This is a big hammer. If we ever need a smaller hammer,
we can enhance this functionality.
There is one lame thing about this. Because we parse everything, create
an AST, then convert the AST to GLSL IR, we have to treat the #pragma
like a statment. This means that you can't do something like
' void
' #pragma warning(off)
' __foo
' #pragma warning(on)
' (float param0);
Fixing that would, as far as I can tell, require a huge amount of work.
I did try just handling the #pragma during parsing (like we do for
state for the whole shader.
v2: Fix the #pragma lines in the commit message that git-commit ate.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
EXT_shader_implicit_conversions adds support for implicit conversions
for GLES 3.1 and above.
This is essentially a subset of ARB_gpu_shader5, and augments
OES_gpu_shader5.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The spec is quite clear this is not allowed:
From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers can appear in several forms of declaration.
They can appear as part of an interface block definition or
block member, as shown in the grammar in the previous section.
They can also appear with just an interface-qualifier to establish
layouts of other declarations made with that qualifier:
layout-qualifier interface-qualifier ;
Or, they can appear with an individual variable declared with
an interface qualifier:
layout-qualifier interface-qualifier declaration ;"
From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers cannot be used on formal function parameters,
and layout qualification is not included in parameter matching."
However on the Nvidia binary driver they actually fail to compile
if image function params don't have a layout qualifier. This results
in applications such as No Mans Sky using layout qualifiers on params.
I've submitted a CTS test to expose this problem in the Nvidia driver
but until that is resolved this patch will help Mesa drivers work
around the issue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
because the closed driver exposes it.
It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The main purpose for having NV_fragment_shader_interlock
extension is because that extension is also for GLES31 while
the ARB extension is for GL only.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Now that the elements version handles both cases, remove the
non-elements version.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This extension provides new GLSL built-in functions
beginInvocationInterlockARB() and endInvocationInterlockARB()
that delimit a critical section of fragment shader code. For
pairs of shader invocations with "overlapping" coverage in a
given pixel, the OpenGL implementation will guarantee that the
critical section of the fragment shader will be executed for
only one fragment at a time.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <brianp@vmware.com>
The compatibility and core tokens were not added until GLSL 1.50,
for GLSL 1.40 just assume all shaders built with a compat profile
are compat shaders.
Fixes rendering issues in Dawn of War II on radeonsi which has
enabled OpenGL 3.1 compat support.
Fixes: a0c8b49284 "mesa: enable OpenGL 3.1 with ARB_compatibility"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105807
This fixes a bug in radeonsi where LLVM cannot handle the case where
a break exists but its not the last instruction in the block.
LLVM would fail with:
Terminator found in the middle of a basic block!
LLVM ERROR: Broken function found, compilation aborted!
Fixes: 96fe8834f5 "glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105317
This should end the drought of bits in the ast_type_qualifier object.
The bitset_t type works pretty much as a drop-in replacement for the
current uint64_t bitset.
The only catch is that the bitset_t type as defined in the previous
commit doesn't have a trivial constructor (because it has a
user-defined constructor), so it cannot be used as union member
without providing a user-defined constructor for the union (which
causes it in turn to be non-trivially constructible). This annoyance
could be easily addressed in C++11 by declaring the default
constructor of bitset_t to be the implicitly defined one -- IMO one
more reason to drop support for GCC 4.2-4.3.
The other minor change was required because glsl_parser_extras.cpp was
hard-coding the type of bitset temporaries as uint64_t, which (unlike
would have been the case if the uint64_t had been replaced with
e.g. an __int128) would otherwise have caused a build failure, because
the boolean conversion operator of bitset_t is marked explicit (if
C++11 is available), so the bitset won't be silently truncated down to
1 bit in order to use it to initialize the uint64_t temporaries
(yikes).
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This effectively factorizes a couple of similar routines.
v2 (Neil Roberts): Non-trivial rebase on master
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.
This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.
The function will be used in a subsequent patch.
v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
There are two callers of the constructor, and they are right next to
each other. Move the "#anon_struct" name handling to the parser so that
the conditional can be removed.
I've also deleted part of the comment (about the memory leak) because I
don't think it's quite accurate or relevant.
text data bss dec hex filename
8310399 269336 294072 8873807 87674f 32-bit i965_dri.so before
8310339 269336 294072 8873747 876713 32-bit i965_dri.so after
7845611 346552 420592 8612755 836b93 64-bit i965_dri.so before
7845579 346552 420592 8612723 836b73 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
c7affbf687 enabled GLSLOptimizeConservatively on some
drivers. The idea was to speed up compile times by running
the GLSL IR passes only once each time do_common_optimization()
is called. However loop unrolling can create a big mess and
with large loops can actually case compile times to increase
significantly due to a bunch of redundant if statements being
propagated to other IRs.
Here we make sure to clean things up before moving on.
There was no measureable difference in shader-db compile times,
but it makes compile times of some piglit tests go from a couple
of seconds to basically instant.
The shader-db results seemed positive also:
Totals:
SGPRS: 2829456 -> 2828376 (-0.04 %)
VGPRS: 1720793 -> 1721457 (0.04 %)
Spilled SGPRs: 7707 -> 7707 (0.00 %)
Spilled VGPRs: 33 -> 33 (0.00 %)
Private memory VGPRs: 3140 -> 2060 (-34.39 %)
Scratch size: 3308 -> 2180 (-34.10 %) dwords per thread
Code Size: 79441464 -> 79214616 (-0.29 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 558670 -> 558571 (-0.02 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Having this separate just makes the code harder to follow, and
requires an extra walk of the IR.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
As a result, unnamed structs defined in different places of the program
are considered the same types if they have the same fields in the same
order.
This will simplify matching of global variables whose type is an unnamed
struct.
It also fixes a memory leak when the same shader containing unnamed
structs is compiled over and over again: instead of creating a new type
each time, the existing type is re-used.
Finally, this does have the effect that some previously rejected programs
are now accepted, such as:
struct {
float a;
} s1;
struct {
float a;
} s2;
s2 = s1;
C/C++ do not allow that, but GLSL does seem to want to treat unnamed
structs with the same fields as the same type at least during linking
(and apparently, some applications require it), so it seems odd to treat
them as different types elsewhere.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This changes the logic during the conversion of the declaration list
struct S {
...
} v;
from AST to IR, but should not change the end result.
When assigning the type of v, instead of looking `S' up in the symbol
table, we read the type from the member variable of ast_struct_specifier.
This change is necessary for the subsequent change to how anonymous types
are handled.
v2: remove a type override when redefining a structure; should be
the same type in that case anyway
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This adds bindless_sampler and bound_sampler (and respectively
bindless_image and bound_image) to the parser.
v3: - add an extra space in apply_bindless_qualifier_to_variable()
- fix indentation in merge_qualifier()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This also adds the extension to the standalone GLSL compiler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This moves the hashing of shader source for the cache lookup to before
the preprocessor. In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.
Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.
Also fixes the leaking of state.
V2: fix indentation
v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.
Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.
To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.
Improves cold cache start-up times on Deus Ex by ~20 seconds.
When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.
V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.
v2: don't try to compute key (and crash) if cache is disabled
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>