We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there. This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.
Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again. I'm not aware of any hardware which
need lowering for one bitsize and not the other.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64. The version there also better handles imprecisions
due to lowered frcp@64. Let's consolidate on one version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead, we add a new helper which stomps one nir_shader and replaces it
with another. The new helper effectively just changes which pointer
gets used for the base nir_shader. It should be 99% as good at testing
cloning but without requiring that everything handle having the shader
swapped out from under it constantly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
This pattern was noticed in glmark's jellyfish scene.
v2: Add inexact qualifier due to NaN behaviour.
Minimal shader-db changes (slightly helped).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
This corresponds to commit 8b911bd2ba37677037b38c9bd286c7c05701bcda on
GitHub.
We previously tweaked OpenCL.std.h from upstream to be included in C
code. Now upstream header can be included, however the symbol names
are slightly different (include an OpenCLstd_ prefix), so this patch
also fixes vtn_opencl.c to use those.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in
https://github.com/KhronosGroup/SPIRV-Headers.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is the same as SpvOpCopyObject but without the type checking,
which is how vtn_composite_copy works, so we just need to hook the
operation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SPIR-V 1.4 supports OpSelect over any composite type, and also allows
scalar boolean condition for vector types -- a case which we already
handled to support old GLSLang.
Added a helper function to recursively perform nir_bcsel, that makes
easier to support structs.
v2: Replace asserts() with vtn_fail_if(). (Jason)
v3: Simplify Condition and Result types verifications. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Fix comparing addresses from formats that have more than one
component by using nir_ball_iequal(). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing. That
invalid will be properly ignored later.
Fixes: c12750527b "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.
Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It is treated like the vecN instructions which also have no type.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.
The other change is to get type information for load input/uniform and
store output, and use that to get correct results.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This type information will be used by gather_ssa_types to get usable results
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Improvements related to the patch that removed native_integers:
* In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
* In prog_to_nir, use sge/slt and let lower_scmp lower it if needed
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before this change, we were searching for each instruction twice, once
when checking if it exists and once when figuring out where to insert
it. By using the new function, we can do everything we need to do in one
operation.
Compilation time numbers for my shader-db database:
Difference at 95.0% confidence
-4.04706 +/- 0.669508
-0.922142% +/- 0.151948%
(Student's t, pooled s = 0.95824)
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag. This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction. This can save register pressure by
not needing to store the Boolean value.
There are several possible ares for future improvement to this pass:
1. Be more conservative. If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare. The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.
2. Be less conservative. Currently the pass requires that all users of
the comparison match the pattern. The idea is that after the pass is
complete, no instruction will use the resulting Boolean value. The only
uses will be of the flag value. It may be beneficial to relax this
requirement in some cases.
3. Be less conservative. Also try to rematerialize comparisons used for
discard_if intrinsics. After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already. The changes were pretty small. Instructions were helped in 19
shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.
4. Copy the preceeding ALU instruction. If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction. On Intel GPUs, this may enable
cmod propagation to make additional progress.
v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).
Fixes: af355aaa07 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Because the core principle of the vars_to_ssa pass is that it globally
(within a function) looks at all of the uses of a never-indirected path
and does a full into-SSA on that path, it can't handle a path which has
any chance of having aliasing. If a function_temp variable has a cast
or anything else which may cause aliasing, we have to assume that all
paths to that variable may alias and ignore the entire variable.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're about to change the meaning of get_deref_node returning NULL so we
need a non-NULL value to mean properly undefined.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This lets passes easily detect derefs which have uses that fall outside
the standard load/store/copy pattern so they can bail appropriately.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When we inlined cf_node_has_side_effects into node_is_dead, all the
conditions flipped and we forgot to flip one. Fortunately, it doesn't
matter right now because no one uses this pass on shaders with more than
one function.
Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When creating function parameters, we create pointers from ssa
values, this creates nir casts with stride 0, however we have
no where else to get this value from. Later passes to lower
explicit io need this stride value to do the right thing.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This mode is used by PhysicalStorageBufferEXT storage class.
Fixes: 8bdf5a008b "nir: Allow derefs to be used as phi sources"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Silence two unused var warnings. And init elem_size, elem_align to
zero to silence "maybe uninitialized" warnings.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it. There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.
The return type reflects better what the function name suggests. It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it. When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It is possible and valid for a pointer to be selected based on a
conditional before used, and depending on the mode, those cases will
result in a phi with derefs as sources.
To achieve this, we don't rematerialize derefs that are used by phis.
As a consequence, when converting from SSA to regs, we may have phis
that come from different blocks and are used by phis. We now convert
those to regs too.
Validation was added to ensure only derefs of certain modes can be
used as phi sources. No extra validation is needed for the presence
of cast, any instruction that uses derefs will validate the
deref-chain is complete (ending in a cast or a var).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We scalarize IO to enable further optimizations, such as propagating
constant components across shaders, eliminating dead components, and
so on. This patch attempts to re-vectorize those operations after
the varying optimizations are done.
Intel GPUs are a scalar architecture, but IO operations work on whole
vec4's at a time, so we'd prefer to have a single IO load per vector
rather than 4 scalar IO loads. This re-vectorization can help a lot.
Broadcom GPUs, however, really do want scalar IO. radeonsi may want
this, or may want to leave it to LLVM. So, we make a new flag in the
NIR compiler options struct, and key it off of that, allowing drivers
to pick. (It's a bit awkward because we have per-stage settings, but
this is about IO between two stages...but I expect drivers to globally
prefer one way or the other. We can adjust later if needed.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Unless source modifiers are present, fmov and imov are the same.
There's no good reason for having two helpers.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
It's potentially a tiny bit less efficient but the helpers make it much
easier to sort out the rules for updating source modifiers.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This flag has caused more confusion than good in most cases. You can
validly use imov for floats or fmov for integers because, without source
modifiers, neither modify their input in any way. Using imov for floats
is more reliable so we go that direction.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This greatly simplifies the code to calculate if we should add a
buffer to the resource list. This uses the spec rules and simple
math to decide if we should add the buffer rather than complex
string processing.
This patch refines a patch present in the ARB_gl_spriv merge
request for the NIR linker and applies it to the GLSL IR linker.
This is why we also move the function to the shared linker code,
because we will want to reuse the code for the NIR linker also.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1 were
incorporated to SPIR-V. Let's pick the names used by SPIR-V core.
Reviewed-by: Karol Herbst <kherbst@redhat.com>