Commit graph

136 commits

Author SHA1 Message Date
Jason Ekstrand
7c43b8ce1b nir: Delete the fnoise opcodes
As of the previous commit, they are never used.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
2020-04-21 06:16:13 +00:00
Rob Clark
bf64648864 nir: fix definition of imadsh_mix16 for vectors
Fixes: c27b3758fa ("nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>
2020-04-04 00:07:10 +00:00
Jason Ekstrand
b2db84153a nir: Add b2b opcodes
These exist to convert between different types of boolean values.  In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i.  In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants.  In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.

The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Albert Astals Cid
d988061172 cube_face_index: Use fabsf instead of fabs since we know it's floats
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>
2020-02-26 21:47:01 +00:00
Albert Astals Cid
6db7467b59 cube_face_coord: Use fabsf instead of fabs since we know it's floats
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933>
2020-02-26 21:47:01 +00:00
Neil Roberts
125f867d3d nir/opcodes: Add nir_op_f2fmp
This opcode is the same as the f2f16 opcode except that it comes with
a promise that it is safe to optimise it out if the result is
immediately converted back to a 32-bit float again. Normally this
would be a lossy conversion and so it would be visible to the
application, but if the conversion is generated as part of the mediump
lowering process then this removal doesn’t matter. The opcode is
eventually replaced with a regular f2f16 in the late optimisations so
the backends don’t need to handle it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
2020-02-24 17:24:13 +00:00
Ian Romanick
1d97d186fb nir: Mark fmin and fmax as commutative and associative
Per the resolution of Khronos GLSL issue 80
(https://github.com/KhronosGroup/GLSL/issues/80).  Spec updates have not
landed yet, but I'll get to it soon. :)

The extra hurt shaders on Gen8+ are a handful of shaders that see things like

    bcsel(fmin(b - a, a - c) >= 0, x, y)

converted to

   bcsel(a >= b && c >= a, x, y)

The former can be generated as a CSEL instruction.  If either b - a or a
- c is used elsewhere in the shader, this saves an instruction.

All Haswell+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14550188 -> 14550048 (<.01%)
instructions in affected programs: 12168 -> 12028 (-1.15%)
helped: 30
HURT: 3
helped stats (abs) min: 1 max: 17 x̄: 4.77 x̃: 2
helped stats (rel) min: 0.05% max: 3.85% x̄: 1.77% x̃: 1.80%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.50% max: 0.50% x̄: 0.50% x̃: 0.50%
95% mean confidence interval for instructions value: -6.15 -2.33
95% mean confidence interval for instructions %-change: -2.00% -1.12%
Instructions are helped.

total cycles in shared programs: 203770286 -> 203771464 (<.01%)
cycles in affected programs: 688466 -> 689644 (0.17%)
helped: 172
HURT: 220
helped stats (abs) min: 1 max: 286 x̄: 12.15 x̃: 6
helped stats (rel) min: 0.03% max: 5.97% x̄: 0.70% x̃: 0.35%
HURT stats (abs)   min: 1 max: 578 x̄: 14.85 x̃: 6
HURT stats (rel)   min: 0.03% max: 32.36% x̄: 1.21% x̃: 0.52%
95% mean confidence interval for cycles value: -0.74 6.75
95% mean confidence interval for cycles %-change: 0.15% 0.59%
Inconclusive result (value mean confidence interval includes 0).

total fills in shared programs: 4525 -> 4523 (-0.04%)
fills in affected programs: 48 -> 46 (-4.17%)
helped: 1
HURT: 0

Ivy Bridge
total instructions in shared programs: 11858995 -> 11858898 (<.01%)
instructions in affected programs: 10822 -> 10725 (-0.90%)
helped: 25
HURT: 13
helped stats (abs) min: 1 max: 17 x̄: 5.32 x̃: 2
helped stats (rel) min: 0.40% max: 5.00% x̄: 2.16% x̃: 1.85%
HURT stats (abs)   min: 1 max: 15 x̄: 2.77 x̃: 2
HURT stats (rel)   min: 0.47% max: 2.90% x̄: 1.83% x̃: 2.15%
95% mean confidence interval for instructions value: -4.66 -0.45
95% mean confidence interval for instructions %-change: -1.54% -0.05%
Instructions are helped.

total cycles in shared programs: 177947023 -> 177946880 (<.01%)
cycles in affected programs: 822075 -> 821932 (-0.02%)
helped: 157
HURT: 175
helped stats (abs) min: 1 max: 164 x̄: 13.17 x̃: 4
helped stats (rel) min: 0.03% max: 6.72% x̄: 0.64% x̃: 0.17%
HURT stats (abs)   min: 1 max: 308 x̄: 11.00 x̃: 4
HURT stats (rel)   min: 0.03% max: 9.76% x̄: 0.70% x̃: 0.18%
95% mean confidence interval for cycles value: -3.86 3.00
95% mean confidence interval for cycles %-change: -0.09% 0.22%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4185 -> 4188 (0.07%)
spills in affected programs: 146 -> 149 (2.05%)
helped: 0
HURT: 1

total fills in shared programs: 5248 -> 5249 (0.02%)
fills in affected programs: 347 -> 348 (0.29%)
helped: 0
HURT: 1

Sandy Bridge
total instructions in shared programs: 10680224 -> 10680144 (<.01%)
instructions in affected programs: 4702 -> 4622 (-1.70%)
helped: 15
HURT: 3
helped stats (abs) min: 1 max: 17 x̄: 5.53 x̃: 5
helped stats (rel) min: 0.39% max: 4.76% x̄: 2.17% x̃: 1.67%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.52% max: 0.52% x̄: 0.52% x̃: 0.52%
95% mean confidence interval for instructions value: -7.24 -1.65
95% mean confidence interval for instructions %-change: -2.55% -0.89%
Instructions are helped.

total cycles in shared programs: 152988780 -> 152985691 (<.01%)
cycles in affected programs: 1072850 -> 1069761 (-0.29%)
helped: 168
HURT: 145
helped stats (abs) min: 1 max: 592 x̄: 33.90 x̃: 12
helped stats (rel) min: 0.02% max: 10.73% x̄: 0.90% x̃: 0.31%
HURT stats (abs)   min: 1 max: 259 x̄: 17.98 x̃: 6
HURT stats (rel)   min: 0.02% max: 8.17% x̄: 0.77% x̃: 0.19%
95% mean confidence interval for cycles value: -17.95 -1.79
95% mean confidence interval for cycles %-change: -0.34% 0.08%
Inconclusive result (%-change mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8107033 -> 8107025 (<.01%)
instructions in affected programs: 696 -> 688 (-1.15%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.60 x̃: 2
helped stats (rel) min: 0.34% max: 7.14% x̄: 3.47% x̃: 4.65%
95% mean confidence interval for instructions value: -2.28 -0.92
95% mean confidence interval for instructions %-change: -7.22% 0.28%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 188348526 -> 188348404 (<.01%)
cycles in affected programs: 33618 -> 33496 (-0.36%)
helped: 23
HURT: 0
helped stats (abs) min: 2 max: 12 x̄: 5.30 x̃: 6
helped stats (rel) min: 0.05% max: 1.83% x̄: 0.47% x̃: 0.51%
95% mean confidence interval for cycles value: -6.70 -3.91
95% mean confidence interval for cycles %-change: -0.64% -0.30%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1359>
2020-02-10 18:37:36 -08:00
Ian Romanick
21f0d020fe nir: Add new instructions for INTEL_shader_integer_functions2
uctz isn't added because it will implemented in the GLSL path and the
SPIR-V path using other pre-existing instructions.

v2: Avoid signed integer overflow for uabs_isub(0, INT_MIN).  Noticed by
Caio.

v3: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN).
I tried the previous methon in a small test program with -ftrapv, and it
failed.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
2020-01-23 00:18:57 +00:00
Rob Clark
a8ec4082a4 nir+vtn: vec8+vec16 support
This introduces new vec8 and vec16 instructions (which are the only
instructions taking more than 4 sources), in order to construct 8 and 16
component vectors.

In order to avoid fixing up the non-autogenerated nir_build_alu() sites
and making them pass 16 src args for the benefit of the two instructions
that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has
nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is
used for the > 4 src args case).

v2 (Karol Herbst):
  use nir_build_alu2 for vec8 and vec16
  use python's array multiplication syntax
  add nir_op_vec helper
  simplify nir_vec
  nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert
  use nir_build_alu for opcodes with <= 4 sources
v3 (Karol Herbst):
  fix nir_serialize
v4 (Dave Airlie):
  fix serialization of glsl_type
  handle vec8/16 in lowering of bools
v5 (Karol Herbst):
  fix load store vectorizer

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-12-21 11:00:17 +00:00
Neil Roberts
634eb9c04b nir: Add a 8-bit bool type
Adds nir_type_bool8 as well as 8-bit versions of all the bool
opcodes.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
0f5640c577 nir: Add a 16-bit bool type
Adds nir_type_bool16 as well as 16-bit versions of all the bool
opcodes.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
2ec97e78a9 nir/opcodes: Add a helper function to generate reduce opcodes
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Neil Roberts
9a96afb97e nir/opcodes: Add a helper function to generate the comparison binops
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-20 14:09:43 +01:00
Rob Clark
6320e37d4b nir: add amul instruction
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size.  For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
0568761f8e nir: Add a new ALU nir_op_imul24
Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Eduardo Lima Mitev
32e5fbf47c nir: Add a new ALU nir_op_imad24_ir3
ir3 compiler has a signed integer multiply-add instruction (MAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can directly represent it.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Andres Gomez
d9760f8935 nir/opcodes: Clear variable names confusion
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.

Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-18 23:59:07 +03:00
Samuel Iglesias Gonsálvez
dba4d0a319 nir: fix fmin/fmax support for doubles
Until now, it was using the floating point version of fmin/fmax,
instead of the double version.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez
1e0e3ed15a nir: fix denorms in unpack_half_1x16()
According to VK_KHR_shader_float_controls:

"Denormalized values obtained via unpacking an integer into a vector
 of values with smaller bit width and interpreting those values as
 floating-point numbers must: be flushed to zero, unless the entry
 point is declared with the code:DenormPreserve execution mode."

v2:
- Add nir_op_unpack_half_2x16_flush_to_zero opcode (Connor).

v3:
- Adapt to use the new NIR lowering framework (Andres).

v4:
- Updated to renamed shader info member and enum values (Andres).

v5:
- Simplify flags logic operations (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez
ef681cf971 nir/opcodes: make sure f2f16_rtz and f2f16_rtne behavior is not overriden by the float controls execution mode
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez
7580707345 nir: mind rounding mode on fadd, fsub, fmul and fma opcodes
According to Vulkan spec, the new execution modes affect only
correctly rounded SPIR-V instructions, which includes fadd, fsub and
fmul.

v2:
- Fix fmul, fsub and fadd round-to-zero definitions, they should use
  auxiliary functions to calculate the proper value because Mesa uses
  round-to-nearest-even rounding mode by default (Connor).

v3:
- Do an actual fused multiply-add at ffma (Connor).

v4:
- Simplify fadd and fmul for bit sizes < 64 (Connor).
- Do not use double ffma for 32 bits float (Connor).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v3]
2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez
0ac07c7ca7 nir: add support for round to zero rounding mode to nir_op_f2f32
f2f16's rounding modes are already handled and f2f64 don't need it
as there is not a floating point type with higher bit size than 64 for
now.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-17 23:39:18 +03:00
Erico Nunes
4a407df682 nir/algebraic: add new fsum ops and fdot lowering
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-31 21:35:58 +02:00
Sagar Ghuge
81d342e2a1 nir: Add urol and uror opcodes
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Jonathan Marek
a70ff70158 nir: remove fnot/fxor/fand/for opcodes
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-26 15:26:10 -04:00
Daniel Schürmann
a8b0b6e52b nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select.
bitfield_select is defined as:
bitfield_select(mask, base, insert) = (mask & base) | (~mask & insert)
matching the behavior of AMD's BFI instruction.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Eduardo Lima Mitev
c27b3758fa nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes
'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.

'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.

Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Jason Ekstrand
f2dc0f2872 nir: Drop imov/fmov in favor of one mov instruction
The difference between imov and fmov has been a constant source of
confusion in NIR for years.  No one really knows why we have two or when
to use one vs. the other.  The real reason is that they do different
things in the presence of source and destination modifiers.  However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-05-24 08:38:11 -05:00
Ian Romanick
7b4ff6a1af nir: Mark ffma as 2src_commutative
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns.  Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.

No shader-db changes on any Intel platform.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
ede45bf9cf nir: Rename commutative to 2src_commutative
The meaning of the new name is that the first two sources are
commutative.  Since this is only currently applied to two-source
operations, there is no change.

A future change will mark ffma as 2src_commutative.

It is also possible that future work will add 3src_commutative for
opcodes like fmin3.

v2: s/commutative_2src/2src_commutative/g.  I had originally considered
this, but I discarded it because I did't want to deal with identifiers
that (should) start with 2.  Jason suggested it in review, so we decided
that _2src_commutative would be used in nir_opcodes.py.  Also add some
comments documenting what 2src_commutative means.  Also suggested by
Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
85e6865ff6 nir: Saturating integer arithmetic is not associative
In 8-bits,

    iadd_sat(iadd_sat(0x7f, 0x7f), -1) =
    iadd_sat(0x7f, -1) =
    0x7e

but,

    iadd_sat(0x7f, iadd_sat(0x7f, -1)) =
    iadd_sat(0x7f, 0x7e) =
    0x7f

Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-01 09:07:47 -07:00
Kristian H. Kristensen
41593f3c37 nir_opcodes.py: Saturate to expression that doesn't overflow
Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:

src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
            uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
                    ~~~                          ^~~~~~~~~~

Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-19 16:17:37 +00:00
Rhys Perry
8671cfe2a2 nir,ac/nir: fix cube_face_coord
Seems it was missing the "/ ma + 0.5" and the order was swapped.

Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-15 17:22:47 +01:00
Samuel Pitoiset
6ae5797243 nir: use generic float types for frexp_exp and frexp_sig
Only the exponent needs to be 32-bit signed integer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:44 +01:00
Iago Toral Quiroga
ca2b5e9069 compiler/nir: add an is_conversion field to nir_op_info
This is set to True only for numeric conversion opcodes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Karol Herbst
272e927d0e nir/spirv: initial handling of OpenCL.std extension opcodes
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.

Anyway, this is better than nothing and covers the most common builtins.

v2: add hadd proof from Jason
    move some of the lowering into opt_algebraic and create new nir opcodes
    simplify nextafter lowering
    fix normalize lowering for inf
    rework upsample to use nir_pack_bits
    add missing files to build systems
v3: split lines of iadd/sub_sat expressions

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Sagar Ghuge
e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Daniel Schürmann
0525bdc225 nir: Define shifts according to SM5 specification.
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:43 -06:00
Jason Ekstrand
191a1dce92 nir: Add 1-bit Boolean opcodes
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
80e8dfe9de nir: Rename Boolean-related opcodes to include 32 in the name
This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Ian Romanick
090e282407 nir: Add a saturated unsigned integer add opcode
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-13 17:49:48 +00:00
Jason Ekstrand
9525971e2b nir: Allow [iu]mul_high on non-32-bit types
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Jason Ekstrand
dca6cd9ce6 nir: Make boolean conversions sized just like the others
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64.  This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:07 -06:00
Jason Ekstrand
85f0ea9d8f nir/opcodes: Rename tbool to tbool32
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:02:49 -06:00
Jason Ekstrand
03571a7a6c nir/opcodes: Pull in the type helpers from constant_expressions
While we're at it, we rework them a bit to all use regular expressions
and assert more.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:02:06 -06:00
Jason Ekstrand
80c424148b nir/opcodes: Make unpack_half_2x16_split_* variable-width
There is nothing inherent about these opcodes that requires them to only
take scalars.  It's very convenient if we let them take vectors as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Karol Herbst
7f95564a22 nir: rename f2f16_undef to f2f16
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.

v2: rebased on master

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-24 20:40:05 +02:00
Mathieu Bridon
9ebd8372b9 python: Use range() instead of xrange()
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.

Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().

As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-07-24 11:07:04 -07:00
Iago Toral Quiroga
c9653cc14c nir: add opcodes for 16-bit packing and unpacking
Noitice that we don't need 'split' versions of the 64-bit to / from
16-bit opcodes which we require during pack lowering to implement these
operations. This is because these operations can be expressed as a
collection of 32-bit from / to 16-bit and 64-bit to / from 32-bit
operations, so we don't need new opcodes specifically for them.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00