This was added with ARB_enhanced_layouts.
V2: Add an extra format specifier for the new qualifier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Valgrind detected that variable ir_copy_propagation_visitor::killed_all
is uninitialized.
Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The order of optimizations can lead to the conditional discard optimization
being applied twice to the same discard statement. In this case, we must
ensure that both conditions are applied.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96762
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are only used by get_matching_input() which has been call
at this point so free the hash tables.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were only restricting based on ES/non-ES-ness and whether
the overall enable bit had been flipped on. However we have been adding
more fine-grained restrictions, such as based on compat profiles, as
well as specific ES versions. Most of the time this doesn't matter, but
it can create awkward situations and duplication of logic.
Here we separate the main extension table into a separate object file,
linked to the glsl compiler, which makes use of it with a custom
function which takes the ES-ness of the shader into account (thus
allowing desktop shaders to properly use ES extensions that would
otherwise have been disallowed.) We can also now use this logic to
generate #define's for all supported extensions automatically, removing
the duplicate (and often inaccurate) list in glcpp.
The effect of this change should be nil in most cases. However in some
situations, extensions like GL_ARB_gpu_shader5 which were formerly
available in compat contexts on the GLSL side of things will now become
inaccessible.
This regresses two ES CTS tests:
ES3-CTS.shaders.shader_integer_mix.define
ES31-CTS.shader_integer_mix.define
however that is due to them using #version 100 instead of 300 es. As the
extension is only defined for ES3, I believe this is the correct
behavior.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v2)
v2 -> v3: integrate glcpp defines into the same mechanism
"flat centroid" and "flat sample" both just mean "flat", so we should
ignore interpolateAtCentroid/Sample and just return the flat value.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For lod query instructions, we really don't care whether or not the sampler
is an array type because that doesn't factor into the LOD.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
On i965, we can't support coordinate offsets for texelFetch or rectangle
textures. Previously, we were doing this with a GLSL pass but we need to
do it in NIR if we want those workarounds for SPIR-V.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
While SPIR-V technically doesn't support "old style" shadow, the
shadow-compare gather instruction does return a vec4 so we need to be able
to set the old_style_shadow bit in NIR.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
We can't get an lod with txf_ms and SPIR-V considers textureGrad to be an
explicit-LOD texturing instruction.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
subroutine variables are to be used just in the way functions are
called. Although the spec doesn't say it explicitely, this means that
these variables are not to be used in any other way than those left
for function calls. Therefore, a comparison between 2 subroutine
variables should also cause a compilation error.
From The OpenGL® Shading Language 4.40, page 117:
" To use subroutines, a subroutine type is declared, one or more
functions are associated with that subroutine type, and a
subroutine variable of that type is declared. The function
currently assigned to the variable function is then called by
using function calling syntax replacing a function name with the
name of the subroutine variable. Subroutine variables are
uniforms, and are assigned to specific functions only through
commands (UniformSubroutinesuiv) in the OpenGL API."
From The OpenGL® Shading Language 4.40, page 118:
" Subroutine uniform variables are called the same way functions
are called. When a subroutine variable (or an element of a
subroutine variable array) is associated with a particular
function, all function calls through that variable will call that
particular function."
Fixes GL44-CTS.shader_subroutine.subroutines_cannot_be_assigned_float_int_values_or_be_compared
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 52e75dcb8c made nir_lower_io
start using nir_intrinsic_set_base instead of writing const_index[0]
directly. However, those intrinsics apparently don't /have/ a base,
so this caused assert failures.
However, the old code was happily setting non-existent const_index
fields, so it was pretty bogus too.
Jason pointed out that load_shared and store_shared have a base,
and that the i965 driver uses that field. So presumably atomics
should have one as well, so that loads/stores/atomics all refer
to variables with consistent addressing.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We can still do packing we just need to get the packing type from the consumer
rather than the producer.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97033
This makes sure we give the correct driver location
for doubles when using component packing. Specifically
it handles packing a dvec3 with a double which is the
only packing scenario allowed which spans across two
locations.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Now nir_lower_io can optionally produce load_interpolated_input
and load_barycentric_* intrinsics for fragment shader inputs.
flat inputs continue using regular load_input.
v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean
passing (in response to review feedback from Chris Forbes).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Backends can normally handle shader inputs solely by looking at
load_input intrinsics, and ignore the nir_variables in nir->inputs.
One exception is fragment shader inputs. load_input doesn't capture
the necessary interpolation information - flat, smooth, noperspective
mode, and centroid, sample, or pixel for the location. This means
that backends have to interpolate based on the nir_variables, then
associate those with the load_input intrinsics (say, by storing a
map of which variables are at which locations).
With GL_ARB_enhanced_layouts, we're going to have multiple varyings
packed into a single vec4 location. The intrinsics make this easy:
simply load N components from location <loc, component>. However,
working with variables and correlating the two is very awkward; we'd
much rather have intrinsics capture all the necessary information.
Fragment shader input interpolation typically works by producing a
set of barycentric coordinates, then using those to do a linear
interpolation between the values at the triangle's corners.
We represent this by introducing five new load_barycentric_* intrinsics:
- load_barycentric_pixel (ordinary variable)
- load_barycentric_centroid (centroid qualified variable)
- load_barycentric_sample (sample qualified variable)
- load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample())
- load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset())
Each of these take the interpolation mode (smooth or noperspective only)
as a const_index, and produce a vec2. The last two also take a sample
or offset source.
We then introduce a new load_interpolated_input intrinsic, which
is like a normal load_input intrinsic, but with an additional
barycentric coordinate source.
The intention is that flat inputs will still use regular load_input
intrinsics. This makes them distinguishable from normal inputs that
need fancy interpolation, while also providing all the necessary data.
This nicely unifies regular inputs and interpolateAt functions.
Qualifiers and variables become irrelevant; there are just
load_barycentric intrinsics that determine the interpolation.
v2: Document the interp_mode const_index value, define a new
BARYCENTRIC() helper rather than using SYSTEM_VALUE() for
some of them (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For intrinsics we don't care about, just skip to the next loop iteration
and process the next instruction. We don't want to execute the rest of
the code.
This was a bug in commit cdfc05ea6e.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass. The code generated for frexp() uses
fabs, and this resulted in an extra instruction. Ultimately I ended up
not using frexp.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This isn't the lowering pass you want. Most GPUs that can support GLSL
1.30 have a multiply unit that can do something more interesting than
32x32->32. Many have 32x16->48. Any GPU that does, should do the
lowering in the backend. This is just the thing that will always work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
At this point there is no reason not to be using the linked shaders,
using the linked shaders should be faster and will make things simpler
for upcoming shader cache work.
The previous variable name suggests the linked shaders were intended
to be used here anyway.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Likewise, rename the enum type to glsl_interp_mode.
Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input. Also, SPIR-V calls these "decorations" rather than "qualifiers".
Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
-e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
-e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
I recently refactored this to share code between load and atomic
lowering. loads used intrin->num_components, while atomics used
intrin->dest.ssa.num_components. They should be equivalent, but
Jason wanted me to use the latter. I missed applying his review.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is more readable and also offers assertions that protect against
setting const_index fields on the wrong kind of intrinsic.
Suggested by Jason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The original function was becoming a bit hard to read, with the details
of creating and filling out load/store/atomic atomics all in one
function.
This patch makes helpers for creating each type of intrinsic, and also
combines them with the *_op() helpers, as they're closely coupled and
not too large.
v2: Minor style nits from Jason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This can't happen, the caller asserts that mode is shader_out or shared.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Both loads and atomics had identical code to rewrite destinations,
and all cases had the same two lines to replace instructions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>