This was last used with Mesa classic, in _mesa_ir_link_shader().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15623>
This was last used with i915c, now lower_fpow covers this class of
lowering.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15623>
Nothing here that NIR doesn't do. No effect on shader-db of hsw or
softpipe.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14249>
Nothing uses it, and i965 was the last thing to. Even if I enable it for
softpipe or crocus, it quickly causes NIR validation failures in shader-db
from swizzles outside the bounds of vectors. Retire it in favor of
nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14249>
Doing so allow you to easily tell what the pass did using the existing
infrastructure in the OPT macro.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10292>
Version 4.4 of the GLSL spec changed the definition of noise*() to
always return zero and earlier versions of the spec allowed zero as a
valid implementation.
All drivers, as far as I can tell, unconditionally call lower_noise()
today which turns ir_unop_noise into zero. We've got a 10-year-old
comment in there saying "In the future, ir_unop_noise may be replaced by
a call to a function that implements noise." Well, it's the future now
and we've not yet gotten around to that. In the mean time, the GLSL
spec has made doing so illegal.
To make things worse, we then pretend to handle the opcode in
glsl_to_nir, ir_to_mesa, and st_glsl_to_tgsi even though it should never
get there given the lowering. The lowering in st_glsl_to_tgsi defines
noise*() to be 0.5 which is an illegal implementation of the noise
functions according to pre-4.4 specs. We also have opcodes for this in
NIR which are never used because, again, we always call lower_noise().
Let's just kill the whole opcode and make builtin_builder.cpp build a
bunch of functions that just return zero.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
Previously, the ir_call functions for builtin functions were replaced
with the inline implementation immediately after being added to the
instruction list. This patch replaces that with a separate pass that
lowers them after the conversion from AST to IR is complete. This will
be useful to be able to insert some handling for the precision lowering
pass before the inlining. This needs to happen because the precision
of the operations in the inlined implementation depends on the highest
precision of all of the arguments to the call.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
This works by finding the first rvalue that it can lower using an
ir_rvalue_visitor. In that case it adds a conversion to float16
after each rvalue and a conversion back to float before storing
the assignment.
Also it uses a set to keep track of rvalues that have been
lowred already. The handle_rvalue method of the rvalue visitor doesn’t
provide any way to stop iteration. If we handle a value in
find_precision_visitor we want to be able to stop it from descending into
the lowered rvalue again.
Additionally this pass disallows converting nodes containing non-float.
The can_lower_rvalue function explicitly excludes any branches
that have non-float types except bools. This avoids the need to have
special handling for functions that convert to int or double.
Co-authored-by: Hyunjun Ko <zzoon@igalia.com>
v2. Adds lowering for texture samples
v3. Instead of checking whether each node can be lowered while walking the
tree, a separate tree walk is now done to check all of the nodes in a
single pass. The lowerable nodes are added to a set which is checked
during find_precision_visitor instead of calling can_lower_rvalue.
v4. Move the special case for temporaries to find_lowerable_rvalues. This
needs to be handled while checking for lowerable rvalues so that any
later dereferences of the variable will see the right precision.
v5. Add an override to visit ir_call instructions and apply the same
technique to override the precision of the temporary variable in the
same way as done for builtin temporaries and ir_assignment calls.
v6. Changes the pass so that it doesn’t need to lower an entire subtree in
order do perform a lowering. Instead, certain instructions can be
marked as being indepedent of their child instructions. For example,
this is the case with array dereferences. The precision of the array
index doesn’t have any bearing on whether things using the result of
the array deref can be lowered.
Now, only toplevel lowerable nodes are added to the lowerable_rvalues
instead instead of additionally adding all of the subnodes.
It now also only needs one hash table instead of two.
v7. Don’t try to lower sampler types. Instead, the sample instruction is
now treated as an independent point where the result of the sample can
be used in a lowered section. The precision of the sampler type
determines the precision of the sample instruction. This also means
the coordinates to the sampler can be lowered.
v8. Use f2fmp instead of f2f16.
v9. Disable lowering derivatives calcualtions, which might not work
properly on some hw backends.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
When varying packing is disabled for transform feedback and a xfb
declaration points to an array element or structure member, the
element/member should be aligned to the start of a slot as well.
If that's not the case, a new varying is created and the
element/member value is copied.
There might a way to further optimize the number of slots allocated
or the number of copies necessary if the performance cost is
problematic. For example, in cases where simply padding the top
level variable might correctly align all the captured values.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2433>
Some drivers (e.g. Panfrost) don't support packing of varyings when
used for transform feedback. This new constant ensures that any
varying used for xfb is aligned at the start of a slot and won't be
packed with other varyings.
Scenarios where transform feedback declarations are related to an
array element or a struct member will be handled in a subsequent
patch.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (Fix order of arguments to varying_matches())
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2433>
Optimize mulExtended to use 32x32->64 multiplication.
Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.
v2: Add missing condition check (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that the elements version handles both cases, remove the
non-elements version.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <brianp@vmware.com>
This requires passing an extra argument to the lowering pass because
the KHR_blend_equation_advanced specification doesn't seem to define
any mechanism for the implementation to determine at compile-time
whether coherent blending can ever be used (not even an "#extension
KHR_blend_equation_advanced_coherent" directive seems to be required
in the shader source AFAICT).
In the long run we'll probably want to do state-dependent recompiles
based on the value of ctx->Color.BlendCoherent, but right now there
would be no benefit from that because the only driver that supports
coherent framebuffer fetch is i965 on SKL+ hardware, which are unable
to support the non-coherent path for the moment because of texture
layout issues, so framebuffer fetch coherency is always enabled for
them.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Unlike uniforms, the limit on shared memory size is not called out
explicitly in the list of things that cause linker errors, but presumably
that's just an oversight in the spec.
Fixes dEQP-GLES31.functional.debug.negative_coverage.{callbacks,get_error,log}.compute.exceed_shared_memory_size_limit
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is basically a wash now, but it simplifies later patches that
convert to using ir_builder.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Here we also make use of the UseSTD430AsDefaultPacking constant
and call the new get_internal_ifc_packing() helper.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Through the glsl headers we had an odd mix of guards be that
"ifndef", "pragma once" neither or both.
Simplify things by using the more common ones (ifndef) and annotating
all the sources, barring the generated builting header -
builtin_int64.h.
The final header - udivmod64.h - is [seemingly] unused and on its way
out (patch purge it is on the mailing list).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
v2: Rename lower_64bit.cpp and lower_64bit_test.cpp to lower_int64.
Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Instead of packing varyings into vec4's, keep track of how many
components each slot uses and create varyings with matching types. This
ensures that we don't end up using more components than the orginal
shader, which is especially important for geometry shader output limits.
This comes up for NVIDIA hw, where the limit is 1024 output components
for a GS, and the hardware complains *loudly* if you even think about
going over.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
need to emulate it in the pixel shader. This lowering pass implements
all the necessary math for advanced blending. It fetches the existing
framebuffer value using the MESA_shader_framebuffer_fetch built-in
variables, and the previous commit's state var uniform to select
which equation to use.
This is done at the GLSL IR level to make it easy for all drivers to
implement the GL_KHR_blend_equation_advanced extension and share code.
Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
1. Hook up the fb_fetch_output variable
2. Implement BlendBarrier()
Then to get KHR_blend_equation_advanced, they simply need to:
3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
4. Call this lowering pass.
Very little driver specific code should be required.
v2: Handle multiple output variables per render target (which may exist
due to ARB_enhanced_layouts), and array variables (even with one
render target, we might have out vec4 color[1]), and non-vec4
variables (it's easier than finding spec text to justify not
handling it). Thanks to Francisco Jerez for the feedback.
v3: Lower main returns so that we have a single exit point where we
can add our blending epilogue (caught by Francisco Jerez).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This isn't the lowering pass you want. Most GPUs that can support GLSL
1.30 have a multiply unit that can do something more interesting than
32x32->32. Many have 32x16->48. Any GPU that does, should do the
lowering in the backend. This is just the thing that will always work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are two distinctly different uses of this struct. The first
is to store GL shader objects. The second is to store information
about a shader stage thats been linked.
The two uses actually share few fields and there is clearly confusion
about their use. For example the linked shaders map one to one with
a program so can simply be destroyed along with the program. However
previously we were calling reference counting on the linked shaders.
We were also creating linked shaders with a name even though it
is always 0 and called the driver version of the _mesa_new_shader()
function unnecessarily for GL shader objects.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
This prevents array overflow when the block is actually an array of UBOs or
SSBOs. On some hardware such as i965, such overflows can cause GPU hangs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The last version of this broke clipping, and I had to spend
sometime getting this working properly.
I had to introduce a third pass to count the clip/cull totals,
all due to one messy corner case. We have a piglit test
tes-input-gl_ClipDistance.shader_test
that doesn't actually output the clip distances, it just passes
them like a varying from TCS->TES, the older lowering pass worked
but to lower clip/cull we need to know the total number of clip+culls
used to defined the new variable correctly, and to offset culls
properly.
This adds an extra pass that works out the sizes for clip/cull,
then lowers gl_ClipDistance then gl_CullDistance into the new
gl_ClipDistanceMESA.
The pass checks using the fixed array sizes code if they array
has been referenced, or is actually never used, and ignores
it in the latter case.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>