This works by finding the first rvalue that it can lower using an
ir_rvalue_visitor. In that case it adds a conversion to float16
after each rvalue and a conversion back to float before storing
the assignment.
Also it uses a set to keep track of rvalues that have been
lowred already. The handle_rvalue method of the rvalue visitor doesn’t
provide any way to stop iteration. If we handle a value in
find_precision_visitor we want to be able to stop it from descending into
the lowered rvalue again.
Additionally this pass disallows converting nodes containing non-float.
The can_lower_rvalue function explicitly excludes any branches
that have non-float types except bools. This avoids the need to have
special handling for functions that convert to int or double.
Co-authored-by: Hyunjun Ko <zzoon@igalia.com>
v2. Adds lowering for texture samples
v3. Instead of checking whether each node can be lowered while walking the
tree, a separate tree walk is now done to check all of the nodes in a
single pass. The lowerable nodes are added to a set which is checked
during find_precision_visitor instead of calling can_lower_rvalue.
v4. Move the special case for temporaries to find_lowerable_rvalues. This
needs to be handled while checking for lowerable rvalues so that any
later dereferences of the variable will see the right precision.
v5. Add an override to visit ir_call instructions and apply the same
technique to override the precision of the temporary variable in the
same way as done for builtin temporaries and ir_assignment calls.
v6. Changes the pass so that it doesn’t need to lower an entire subtree in
order do perform a lowering. Instead, certain instructions can be
marked as being indepedent of their child instructions. For example,
this is the case with array dereferences. The precision of the array
index doesn’t have any bearing on whether things using the result of
the array deref can be lowered.
Now, only toplevel lowerable nodes are added to the lowerable_rvalues
instead instead of additionally adding all of the subnodes.
It now also only needs one hash table instead of two.
v7. Don’t try to lower sampler types. Instead, the sample instruction is
now treated as an independent point where the result of the sample can
be used in a lowered section. The precision of the sampler type
determines the precision of the sample instruction. This also means
the coordinates to the sampler can be lowered.
v8. Use f2fmp instead of f2f16.
v9. Disable lowering derivatives calcualtions, which might not work
properly on some hw backends.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
This is the same as ir_unop_f2f16 except that it comes with a promise
that it is safe to optimise it out if the result is immediately
converted back to float32 again. Normally this would be a lossy
operation but it is safe to do if the conversion was generated as part
of the precision lowering pass.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3929>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
The GLSL barrier() intrinsic does an implicit shared memory barrier in
compute shaders and an implicit TCS patch output barrier in tessellation
control shaders. We'd like NIR's barrier intrinsic to just be a control
flow barrier and not have memory implications. To satisfy this, we need
to add an extra memory barrier in front of each nir_intrinsic_barrier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.
According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In order to be able to implement a NIR based glsl linker we need to
build the program resource list with NIR. This change delays the
remaping so that a later commit can call the NIR based resource
list builder.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes a ton of regressions in image load store tests.
Fixes: 4319cc8c0f ("nir: pack nir_variable::data::xfb_*")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds ir_binop_atan2 and ir_unop_atan. When converting to NIR these are
expanded out using the appropriate builtin generator. If they are used
with anything else then it will just hit an assert.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
From EXT_demote_to_helper_invocation, implemented with the existing
nir_intrinsic_is_helper_invocation.
Such builtin is necessary when using `demote` because we can't
redefine the value of gl_HelperInvocation (since it is an input
variable).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To represent the new `demote` keyword when using
EXT_demote_to_helper_invocation extension. Most of the changes are to
include it in the visitors.
Demote is not considered a control flow, so also include an empty
visit member function in ir_control_flow_visitor.
Only NIR actually supports `demote`, so assert the translations for
TGSI and Mesa's gl_program -- since the demote is not expected to
appear for those.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
By optimizing the shader before inlining, we avoid having to redo this
work for each inlined copy of a function. It should also reduce the
memory consumption a bit.
This cuts the KHR-GL46.arrays_of_arrays_gl.SubroutineFunctionCalls2
runtime by 25% on my Icelake. That test compiles many shaders, which
contain large types (dmat4) and division (expensive operations).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have a cap bit for gallium and a GLSL compiler flag to control this.
Just trust what GLSL gives us and stop forcing it. In order for this to
be safe, we have to advertise another cap in some of the gallium
drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most places in NIR, we treat matrices like arrays. The one annoying
exception to this has been nir_constant where a matrix is a first-class
thing. This commit changes that so a matrix nir_constant is the same as
an array nir_constant. This makes matrix nir_constants a tiny bit more
expensive but shrinks all others by 96B.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We were completely ignoring these before, except for putting them on
variables. While we're here, don't set access qualifiers when converting
to bindless since glsl_to_nir will already have set a more accurate
qualifier that includes any qualifiers on struct members that are
dereferenced.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Improvements related to the patch that removed native_integers:
* In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
* In prog_to_nir, use sge/slt and let lower_scmp lower it if needed
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
This flag has caused more confusion than good in most cases. You can
validly use imov for floats or fmov for integers because, without source
modifiers, neither modify their input in any way. Using imov for floats
is more reliable so we go that direction.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
glsl_to_nir.cpp:276: uninit_member: Non-static class member "sig" is not initialized in this constructor nor in any functions that it calls.
Reported by coverity
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
At initial nir level all drivers are supporting ints.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Driver which do not support native integers should use a lowering
pass to go from integers to floats.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we want to assert on found == true when the loop exits early, we
need to initialize it to false.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
fixes retrieving the sampler type for bindless images stored inside structs.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: add support for AMD
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Values inside the offsets parameter of textureGatherOffsets are required to be
constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET,
GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET].
As this range is never outside [-32, 31] for all existing drivers inside mesa,
we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr.
Right now only Nvidia hardware supports this in hardware, so we can turn this
on inside Nouveau for the NIR path as it is already enabled with the TGSI one.
v2: use memcpy instead of for loops
add missing bits to nir_instr_set
don't show offsets if they are all 0
v3: default offsets aren't all 0
v4: rename offsets -> tg4_offsets
rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets
Signed-off-by: Karol Herbst <kherbst@redhat.com>
We didn't have any of these before because all NIR consumers always
called lower_ubo_references. Soon, we want to pass the derefs straight
through to NIR so we need to handle these intrinsics directly.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We want to be able to use variables and derefs for UBO/SSBO access in
NIR. In order to do this, the rest of NIR needs to know the type layout
information.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
glsl_to_nir() is still missing support for converting certain
functions to NIR, so for those we use the GLSL IR optimisations
to remove the functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
This doesn't really change anything as the functions will all get
inlined anyway. However it does let us do a bit of the work earlier and
in a common place.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Optimize mulExtended to use 32x32->64 multiplication.
Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.
v2: Add missing condition check (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.
This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.
FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.
This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.
Fixes: e68871f6a ("spirv: Handle constants and types before execution
modes")
v2: (Jason)
* glsl_to_nir: get the info before glsl_to_nir, while all the rest
of the info gathering is happening
* prog_to_nir: gather the info on a general info-gathering pass,
not on variable setup.
v3: (Jason)
* Squash with the patch that removes that info from ir variable
* anv: assert that OriginUpperLeft is true. It should be already
set by spirv_to_nir.
* blorp: set origin_upper_left on its core "compile fragment
shader", not just on some specific places (for this we added an
helper on a previous patch).
* prog_to_nir: no need to gather specifically this fragcoord modes
as the full gl_program shader_info is copied.
* spirv_to_nir: assert that we are a fragment shader when handling
this execution modes.
v4: (reported by failing gitlab pipeline #18750)
* state_tracker: update too due changes on ir.h/gl_program
v5:
* blorp: minor change after change on previous patch
* radeonsi: update due this change.
v6: (Timothy Arceri)
* prog_to_nir: remove extra whitespace
* shader_info: don't use :1 on origin_upper_left
* glsl: program.fs.origin_upper_left/pixel_center_integer can be
move out of the shader list loop
nir_lower_clip_cull_distance_arrays() marks the combined clip/cull
distance array as compact. However, when translating in from GLSL
or SPIR-V, we were not marking the original float[] arrays as compact.
We should do so. That way, we can detect these corner cases properly.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>