SPV_KHR_bfloat16 requires a small set of operations,
since it doesn't support all the arithmetic ops.
This patch adds conversions to/from Float32 and also
the necessary ops (bfdot, bffma, bfmul) to implement
SpvOpDot using the same lowering approach than the
Float32 counterpart.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105>
Bifrost LDEXP.v2f16 takes a 16-bit exponent, which requires messy
lowering. The codegen for this is quite bad currently, but would be
improved by implementing unpack_32_2x16_split_*, and by fusing
comparisons with CSEL.
The main alternative is converting to F32, then LDEXP.f32, then
converting back to F16. This has better codegen for dynamic exponents
currently, but worse in the common case with a constant exponent where
all the saturating cast logic can be folded.
Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.ldexp.compute.vec2
when shaderFloat16 is enabled in panvk.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33637>
In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the
cmp and sel that we'd otherwise need.
total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%)
Instruction Count in affected programs: 162701 -> 160360 (-1.44%)
helped: 81
HURT: 29
Instruction count are helped.
total MOV Count in shared programs: 86777 -> 88671 (2.18%)
MOV Count in affected programs: 28119 -> 30013 (6.74%)
helped: 1
HURT: 292
Mov count are HURT.
total COV Count in shared programs: 15070 -> 14962 (-0.72%)
COV Count in affected programs: 5770 -> 5662 (-1.87%)
helped: 76
HURT: 2
Cov count are helped.
total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%)
Last helper instruction in affected programs: 91331 -> 89240 (-2.29%)
helped: 30
HURT: 1
Last helper instruction are helped.
total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%)
Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%)
helped: 8
HURT: 43
Instructions with ss sync bit are HURT.
total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%)
Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%)
helped: 21
HURT: 61
Estimated cycles stalled on ss are HURT.
total cat1 instructions in shared programs: 101933 -> 103695 (1.73%)
cat1 instructions in affected programs: 35804 -> 37566 (4.92%)
helped: 18
HURT: 290
Cat1 instructions are HURT.
total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%)
cat2 instructions in affected programs: 128609 -> 125809 (-2.18%)
helped: 322
HURT: 0
Cat2 instructions are helped.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>
ir3 has a number of bitwise triops (e.g., shrm == (src0 >> src1) & src2)
that don't have NIR-equivalents. Doing instruction selection for them is
a lot more convenient using algebraic patterns than to have to manually
match for them. This patch add NIR opcodes for these instructions.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
V3D can use these too.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>
rewrote most of the impl but shrug.
regresses code gen for mediump but I'm not too bothered given the lackluster
perf of fp16 on bifrost :(
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30567>
SPIR-V strengthened the semantics around signed zero, requiring fmin(-0, +0) =
-0. Since nir_op_fmin is commutative, we must also require fmin(+0, -0) = -0 to
match, although it's unclear if SPIR-V requires that. We must strengthen NIR's
definitions accordingly.
This strengthening is additionally motivated by the existing nir_opt_algebraic
rule like:
(('fmin', a, ('fneg', a)), ('fneg', ('fabs', a))),
With the strengthened new definition, this transform is clearly exact. With the
weaker definition, the transform could change the sign of zero based on
implementation-defined behaviours which ... while, not exactly unsound, is
undesireable semantically.
...
This is probably technically a bug fix, but I'm not convinced it's worth it's
weight in backporting.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
This is more idiomatic and already #include'd.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075>
Ensure unsigned integers are used instead of signed ones when performing
left bit shifts.
This has been detected by the Undefined Behaviour Sanitizer (UBSan).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29772>
Undefined Behaviour Sanitizer (UBSan) detected the following when
running testing `dEQP-VK.graphicsfuzz.cov-fold-negate-min-int-value`:
`negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself`
SPIR-V spec states that OpSNegate(0x80000000) has to return 0x80000000;
in our case, -2147483648 should be -2147483648.
While this is not causing any issue because compilers seem to be
behaving like that, it is still undefined behaviour, so it expects to be
this handled explicitly, which is the purpose of this commit.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29772>
It doesn't make sense to have two sets of opcodes for this when all backends
that support the flush_to_zero variant just rely on the global floating point
mode anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29433>
The constant-folding definition and comments say that it takes the high
16 bits of the first source and low 16 bits of the second source, but
actually it's the opposite. The algebraic optimization, which actually
happens and needs to be correct, was correct but the comment above it
was wrong.
Note that in the way we use it when lowering multiplications, the
ordering doesn't matter.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
Since v71, broadcom hw include specific packing/conversion
instructions, so this commit adds opcodes to be able to make use of
them, specially for image stores:
* pack_2x16_to_unorm_2x8 (on backend vftounorm8/vftosnorm8):
2x16-bit floating point to 2x8-bit unorm/snorm
* f2unorm_16/f2snorm_16 (on backend ftounorm16/ftosnorm16):
floating point to 16-bit unorm/snorm
* pack_2x16_to_unorm_2x10/pack_2x16_to_unorm_10_2 (on backend
vftounorm10lo/vftounorm10hi): used to convert a floating point to
a r10g10b10a2 unorm
* pack_32_to_r11g11b10 (on backend v11fpack): packs 2 2x16 FP into
R11G11B10.
* pack_uint_32_to_r10g10b10a2 (on backend v10pack): pack 2 2x16
integer into R10G10B10A2
* pack_4x16_to_4x8 (on backend v8pack): packs 2 2x16 bit integer
into 4x8 bits.
* pack_2x32_to_2x16 (on backend vpack): 2x32 bit to 2x16 integer
pack
For the latter, it can be easly confused with the existing
pack_32_2x16_split. But note that this one receives two 16bit integer,
and packs them on a 32bit integer. But broadcom opcode takes two 32bit
integer, takes the lower halfword, and packs them as 2x16 on a 32bit
integer.
Interestingly broadcom also defines a similar one that packs the
higher halfword. Not used yet.
Note that at this point we use agnostic names, even if we add a _v3d
suffix as they are only available for broadcom, in order to follow
current NIR conventions.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25726>
As noted in the previous commit, the intermediate cast to float from
double can produce wrong results.
Fixes upcoming Vulkan CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_nostorage
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_nostorage_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_sconst_conv_from_fp64_up_nostorage_frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>
Midgard has both int and float version of b32csel. The backend needs some way to
pick between the two, and it's a lot more convenient to choose in NIR before
going out-of-SSA than in the backend.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23769>
While this is a generic bit twiddling ALU instruction, it's especially useful
for address calculations, since the architecture's tiled textures use Morton
coding within the tiles.
This will be used when lowering image_texel_address on AGX, as part of the image
atomics implementation. I don't know if there's any other neat uses I could
detect with opt_algebraic, this doesn't seem like an operation a shader would
open-code... Maybe useful for BVH building or something...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
We already document a lot of ALU opcodes, let's make this machine-readable so we
can put the descriptions in our generated HTML documentation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22929>
Models `(a * b) + (c << d)` in general, as implemented in various forms on AGX.
This will be fused with backend NIR opt algebraic rules, both for the literal
pattern as well as to strength reduce certain multiplications, e.g. replacing
a * 5 with `a + (a << 2)` expressed as imadshl_agx(a, 1, a, 2).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
We need to do full pow if 64-bit, and we can do fpow() otherwise. Not
the other way around.
Fixes: 9076c4e289 ("nir: update opcode definitions for different bit sizes")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22774>
For AMD GPU which has instruction to normalize and pack two float16
inputs, and used when fragment shader export color output.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21552>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
Same as pack_half_2x16_rtz_split, but always uses RTZ mode.
Note that pack_half_2x16 rounding mode is unspecified.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
NIR has a fsin instruction that takes an argument in radians. Midgard instead
has an fsinpi argument that takes an argument in multiples of pi. So, we had a
NIR pass that would change fsin(x) to fsin(x / pi) and then map fsin to fsinpi
in the backend.
But that's invalid! In NIR, the opcode fsin is well-defined. fsin(x) means
something very different than fsin(x / pi). They won't usually be equal. The
transform fsin(x) -> fsin(x / pi) is fundamentally unsound.
It did work before, by accident. Most NIR passes don't care about the semantics
of ALU instructions. fsin(x) and fsin(x / pi) are both well-defined but
fundamentally different NIR shaders. So while rewriting is wrong -- the NIR we
get out is not equivalent to the NIR we put in, and the Midgard ops we generate
are not equivalent to the NIR -- but if we don't run any passes that care about
the definition of fsin the two wrongs will cancel out to make a right.
However, some NIR passes do care about the definitions of ALU instructions,
instead of treating them as named black boxes. In particular, constant folding
(nir_opt_constant_fold) evaluates ALU instructions when their inputs are
constants, according to the definition in nir_opcodes.py. So our little charade
will only work if we don't call nir_opt_constant_fold, or if all the fsin
instructions have non-constant inputs. At the beginning of this series, that is
the case. With the later scalarization change, that's no longer the case, and
the unsoundness translates to real failing tests rather than a quibble of NIR's
semantics.
To mitigate, we define a new NIR opcode with the semantics we want and translate
fsin(x) = fsin_mdg(x / pi), where that equivalence does hold mathematically. So
the new translation is sound and doesn't rely on lucky pass ordering.
This matches the approach already used for AMD and AGX, which have fsin_amd and
fsin_agx opcodes respectively.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>