Complement the existing un/pack_32_2x16 opcodes. These are useful for
8-bit format packing. On Midgard, they are equivalent to just a 32-bit
move, but other GPUs could lower to other packs if needed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5107>
Corresponds to the .pos modifier on all Mali GPUs (lima and panfrost).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>
Exists on later Mali. Equivalent to clamp(x, -1.0, 1.0)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>
This is how lower_fsat and ACO implements fsat and is a more useful
definition since it can be exactly created from fmin(fmax(a, 0.0), 1.0).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3716>
So far only the singed versions are defined.
v2: Make umad24 and umul24 non-driver specific (Eric Anholt)
v3: Take care of nir_builder and automatic lowering of the
opcodes if they are not supported by the backend.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
As of the previous commit, they are never used.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
These exist to convert between different types of boolean values. In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants. In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.
The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
This opcode is the same as the f2f16 opcode except that it comes with
a promise that it is safe to optimise it out if the result is
immediately converted back to a 32-bit float again. Normally this
would be a lossy conversion and so it would be visible to the
application, but if the conversion is generated as part of the mediump
lowering process then this removal doesn’t matter. The opcode is
eventually replaced with a regular f2f16 in the late optimisations so
the backends don’t need to handle it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
uctz isn't added because it will implemented in the GLSL path and the
SPIR-V path using other pre-existing instructions.
v2: Avoid signed integer overflow for uabs_isub(0, INT_MIN). Noticed by
Caio.
v3: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN).
I tried the previous methon in a small test program with -ftrapv, and it
failed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
This introduces new vec8 and vec16 instructions (which are the only
instructions taking more than 4 sources), in order to construct 8 and 16
component vectors.
In order to avoid fixing up the non-autogenerated nir_build_alu() sites
and making them pass 16 src args for the benefit of the two instructions
that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has
nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is
used for the > 4 src args case).
v2 (Karol Herbst):
use nir_build_alu2 for vec8 and vec16
use python's array multiplication syntax
add nir_op_vec helper
simplify nir_vec
nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert
use nir_build_alu for opcodes with <= 4 sources
v3 (Karol Herbst):
fix nir_serialize
v4 (Dave Airlie):
fix serialization of glsl_type
handle vec8/16 in lowering of bools
v5 (Karol Herbst):
fix load store vectorizer
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Adds nir_type_bool8 as well as 8-bit versions of all the bool
opcodes.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds nir_type_bool16 as well as 16-bit versions of all the bool
opcodes.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size. For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
ir3 compiler has a signed integer multiply-add instruction (MAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can directly represent it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Until now, it was using the floating point version of fmin/fmax,
instead of the double version.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
According to VK_KHR_shader_float_controls:
"Denormalized values obtained via unpacking an integer into a vector
of values with smaller bit width and interpreting those values as
floating-point numbers must: be flushed to zero, unless the entry
point is declared with the code:DenormPreserve execution mode."
v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).
v3:
- Adapt to use the new NIR lowering framework (Andres).
v4:
- Updated to renamed shader info member and enum values (Andres).
v5:
- Simplify flags logic operations (Caio).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
According to Vulkan spec, the new execution modes affect only
correctly rounded SPIR-V instructions, which includes fadd, fsub and
fmul.
v2:
- Fix fmul, fsub and fadd round-to-zero definitions, they should use
auxiliary functions to calculate the proper value because Mesa uses
round-to-nearest-even rounding mode by default (Connor).
v3:
- Do an actual fused multiply-add at ffma (Connor).
v4:
- Simplify fadd and fmul for bit sizes < 64 (Connor).
- Do not use double ffma for 32 bits float (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
f2f16's rounding modes are already handled and f2f64 don't need it
as there is not a floating point type with higher bit size than 64 for
now.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.
'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.
Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.
Reviewed-by: Eric Anholt <eric@anholt.net>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns. Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.
No shader-db changes on any Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The meaning of the new name is that the first two sources are
commutative. Since this is only currently applied to two-source
operations, there is no change.
A future change will mark ffma as 2src_commutative.
It is also possible that future work will add 3src_commutative for
opcodes like fmin3.
v2: s/commutative_2src/2src_commutative/g. I had originally considered
this, but I discarded it because I did't want to deal with identifiers
that (should) start with 2. Jason suggested it in review, so we decided
that _2src_commutative would be used in nir_opcodes.py. Also add some
comments documenting what 2src_commutative means. Also suggested by
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:
src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
~~~ ^~~~~~~~~~
Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Seems it was missing the "/ ma + 0.5" and the order was swapped.
Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only the exponent needs to be 32-bit signed integer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is set to True only for numeric conversion opcodes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.
Anyway, this is better than nothing and covers the most common builtins.
v2: add hadd proof from Jason
move some of the lowering into opt_algebraic and create new nir opcodes
simplify nextafter lowering
fix normalize lowering for inf
rework upsample to use nir_pack_bits
add missing files to build systems
v3: split lines of iadd/sub_sat expressions
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.
Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended
v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
2) Add lower_mul_2x32_64 flag (Matt Turner)
3) Remove associative property as bit size is different (Connor
Abbott)
v3: Fix indentation and variable naming convention (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>