While this is a generic bit twiddling ALU instruction, it's especially useful
for address calculations, since the architecture's tiled textures use Morton
coding within the tiles.
This will be used when lowering image_texel_address on AGX, as part of the image
atomics implementation. I don't know if there's any other neat uses I could
detect with opt_algebraic, this doesn't seem like an operation a shader would
open-code... Maybe useful for BVH building or something...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23513>
We already document a lot of ALU opcodes, let's make this machine-readable so we
can put the descriptions in our generated HTML documentation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22929>
Models `(a * b) + (c << d)` in general, as implemented in various forms on AGX.
This will be fused with backend NIR opt algebraic rules, both for the literal
pattern as well as to strength reduce certain multiplications, e.g. replacing
a * 5 with `a + (a << 2)` expressed as imadshl_agx(a, 1, a, 2).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
We need to do full pow if 64-bit, and we can do fpow() otherwise. Not
the other way around.
Fixes: 9076c4e289 ("nir: update opcode definitions for different bit sizes")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22774>
For AMD GPU which has instruction to normalize and pack two float16
inputs, and used when fragment shader export color output.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21552>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
Same as pack_half_2x16_rtz_split, but always uses RTZ mode.
Note that pack_half_2x16 rounding mode is unspecified.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
NIR has a fsin instruction that takes an argument in radians. Midgard instead
has an fsinpi argument that takes an argument in multiples of pi. So, we had a
NIR pass that would change fsin(x) to fsin(x / pi) and then map fsin to fsinpi
in the backend.
But that's invalid! In NIR, the opcode fsin is well-defined. fsin(x) means
something very different than fsin(x / pi). They won't usually be equal. The
transform fsin(x) -> fsin(x / pi) is fundamentally unsound.
It did work before, by accident. Most NIR passes don't care about the semantics
of ALU instructions. fsin(x) and fsin(x / pi) are both well-defined but
fundamentally different NIR shaders. So while rewriting is wrong -- the NIR we
get out is not equivalent to the NIR we put in, and the Midgard ops we generate
are not equivalent to the NIR -- but if we don't run any passes that care about
the definition of fsin the two wrongs will cancel out to make a right.
However, some NIR passes do care about the definitions of ALU instructions,
instead of treating them as named black boxes. In particular, constant folding
(nir_opt_constant_fold) evaluates ALU instructions when their inputs are
constants, according to the definition in nir_opcodes.py. So our little charade
will only work if we don't call nir_opt_constant_fold, or if all the fsin
instructions have non-constant inputs. At the beginning of this series, that is
the case. With the later scalarization change, that's no longer the case, and
the unsoundness translates to real failing tests rather than a quibble of NIR's
semantics.
To mitigate, we define a new NIR opcode with the semantics we want and translate
fsin(x) = fsin_mdg(x / pi), where that equivalence does hold mathematically. So
the new translation is sound and doesn't rely on lucky pass ordering.
This matches the approach already used for AMD and AGX, which have fsin_amd and
fsin_agx opcodes respectively.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19350>
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b. Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.
At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.
The failing test on d3d12 is a pre-existing bug that is triggered by
this change. I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.
v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.
v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.
v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress. The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).
v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.
v6: Just eliminate the i2b instruction.
v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷
No shader-db changes on any Intel platform.
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2
Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2
The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.
Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
For example if src0 is 0x80000000 we should return 1, not 0.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: a5747f8ab3 ("nir: add opcodes for *find_msb_rev and lowering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>
These opcodes where fixed to return an integer instead of a boolean
value some time ago but the documentation for them was not updated
and still talked about a boolean result.
Fixes: b0d4ee520 ('nir/opcodes: Fix up uadd_carry and usub_borrow')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17372>
There are several places that should have supported the various sized
versions of bcsel and the various nir_op_[fi]csel_* opcodes. Rather
than enumerate the whole list, add a property.
v2: Make the comment for NIR_OP_IS_SELECTION more descriptive.
Suggested by Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
These instructions have AMD hardware equivalent and they will be used
to lower fragment shader outputs in NIR.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15231>
This required relaxing a core NIR assertion which I don't think is doing
any important validation.
The shader-db effects here are small, but they're important for avoiding a
regression when we start doing per-component DCE in opt_shrink_vectors
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12468)
softpipe shader-db:
total instructions in shared programs: 2859777 -> 2859454 (-0.01%)
instructions in affected programs: 18881 -> 18558 (-1.71%)
total temps in shared programs: 293994 -> 293914 (-0.03%)
temps in affected programs: 418 -> 338 (-19.14%)
i915g:
total instructions in shared programs: 407562 -> 407544 (<.01%)
instructions in affected programs: 570 -> 552 (-3.16%)
r300:
total instructions in shared programs: 1414450 -> 1414459 (<.01%)
instructions in affected programs: 44494 -> 44503 (0.02%)
total vinst in shared programs: 473782 -> 473727 (-0.01%)
vinst in affected programs: 1102 -> 1047 (-4.99%)
total sinst in shared programs: 231224 -> 231216 (<.01%)
sinst in affected programs: 432 -> 424 (-1.85%)
total temps in shared programs: 197605 -> 197607 (<.01%)
temps in affected programs: 103 -> 105 (1.94%)
crocus hsw:
total instructions in shared programs: 8158185 -> 8158134 (<.01%)
instructions in affected programs: 10927 -> 10876 (-0.47%)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15178>
Extend 4195a9450b so that the next poor fool doesn't come along and
say, "sge does the right thing for 16-bit sources, but slt gives a NIR
validation failure. What the deuce?"
NOTE: This commit is necessary to prevent regressions in GLSLstd450Step
tests of 16-bit sources at "spriv: Produce correct result for
GLSLstd450Step with NaN".
Fixes: 4195a9450b ("nir: sge operation is defined for floating-point types")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
This fixes dEQP-VK.graphicsfuzz.cov-condition-bitfield-extract-integer.
For example, nir_ibitfield_extract(3, 1, 2) should return 1.
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13791>
Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying:
fabs(fdd*_must_abs_mali(v)) = fabs(fdd*(v))
The sign of their result is undefined.
On Bifrost and Valhall, these unsigned derivatives can be implemented
more efficiently than the correctly-signed counterparts, since the sign
fixup requires extra ALU instructions. On backends where this is the
case, it is useful to optimize fabs(fdd*(v)) to
fabs(fdd*_must_abs_mali(v)). This pattern comes up with the GLSL builtin
`fwidth`.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These
represent the combinations of integer dot-product and add that operate
on packed source vectors. That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.
Some hardware may prefer to operate on unpacked byte vectors. When such
hardware comes to Mesa, we'll have to figure out how to name things.
v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes
are not 2-source commutative.
v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur. Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts. All suggested by Jason.
v4: Fix typo previously noticed by Alyssa but missed in v2.
v5: Add has_sudot_4x8 flag. Requested by Rhys.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results
in a lot of shifts and MOVs. When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.
v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Many places need to know the maximum or minimum possible value for a
given size integer... so everyone just open-codes their favorite
version. There is some potential to hit either undefined or
implementation-defined behavior, so having one version that Just Works
seems beneficial.
v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in
u_intmin. Noticed by CI. Lol. Rename functions
`s/u_(uint|int)(min|max)/u_\1N_\2/g`. Suggested by Jason. Add some
unit tests that would have caught the copy-and-paste bug before wasting
CI time. Change the implementation of u_intN_min to use the same
pattern as stdint.h. This avoids the integer division. Noticed by
Jason.
v3: Add changes to convert_clear_color
(src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>
ifind_msb_rev was introduced in a5747f8ab3.
ifind_msb_rev guards against src0 being both 0 or -1 at the same time.
That is always true. This patch changes it to check for those values
individually.
Spotted from a compile warning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: a5747f8ab3 (\"nir: add opcodes for *find_msb_rev and lowering\")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11630>
This should be a subtract, not an add. The comment's proof is correct,
but the (wrong) expression we actually use isn't what it's in the
comment! Correct the discrepancy.
The lowering in nir_opt_algebraic was correctly typed.
Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11671>
Long ago, the semantics of bcsel were such that it took a single boolean
value and selected between whole vectors. These days, it takes a vector
boolean with the assumption that if you want the old behavior you can
just use a .xxxx swizzle. There currently are no opcodes which use a
output_size of 0 but have a scalar or fixed-vector input. Let's
disallow it for now to force us to think through the semantics again if
this ever comes up as something someone actually wants.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
NIR currently doesn't have any intrinsics for a horizontal packed add,
so this one is modeled after AMD's v_sad_u8.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
These are equivalent to the 32bit opcodes if there are no more efficient
24bit opcodes available, but inputs are guaranteed to already be 24bit,
so the 24bit opcodes can be used instead if they exist and are efficient.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10549>
Used to split up the fsin/fcos lowering for AGX between NIR and the
backend, to permit algebraic optimizations without polluting NIR with
too many hardware details. The backend NIR lowering produces an
fmul/ffma of the input so we can optimize code like sin(2*x).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10582>
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.
Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>