diff --git a/src/panfrost/bifrost/valhall/ISA.xml b/src/panfrost/bifrost/valhall/ISA.xml
index 99fbbdcf9f2..3dcd81fb72b 100644
--- a/src/panfrost/bifrost/valhall/ISA.xml
+++ b/src/panfrost/bifrost/valhall/ISA.xml
@@ -576,7 +576,7 @@
v2inf
-
+
Do nothing. Useful at the start of a block for waiting on slots required
by the first actual instruction of the block, to reconcile dependencies
@@ -584,7 +584,7 @@
-
+
Branches to a specified relative offset if its source is nonzero (default)
or if its source is zero (if `.eq` is set). The offset is 27-bits and
@@ -605,7 +605,7 @@
-
+
Evaluates the given condition, and if it passes, discards the current
fragment and terminates the thread. The destination should be set to R60.
@@ -617,7 +617,7 @@
Right value to compare
-
+
Jump to an indirectly specified address. Used to jump to blend shaders at
the end of a fragment shader.
@@ -627,7 +627,7 @@
-
+
General-purpose barrier. Must use slot #7. Must be paired with a
`.barrier` action on the instruction.
@@ -635,7 +635,7 @@
-
+
@@ -649,7 +649,7 @@
Return value if false
-
+
@@ -670,7 +670,7 @@
Return value if false
-
+
@@ -680,7 +680,7 @@
-
+
Interpolates a given varying
@@ -694,7 +694,7 @@
-
+
@@ -705,7 +705,7 @@
-
+
The index must not diverge within a warp.
@@ -717,7 +717,7 @@
Index
-
+
Loads the effective address of the position buffer (in a position shader)
or the varying buffer (in a varying shader). That is, the base pointer
@@ -736,7 +736,7 @@
Linear ID
-
+
Loads from main memory
@@ -747,7 +747,7 @@
-
+
Loads from main memory
@@ -758,7 +758,7 @@
-
+
Loads from main memory
@@ -769,7 +769,7 @@
-
+
Loads from main memory
@@ -780,7 +780,7 @@
-
+
Loads from main memory
@@ -791,7 +791,7 @@
-
+
Loads from main memory
@@ -802,7 +802,7 @@
-
+
Loads from main memory
@@ -813,7 +813,7 @@
-
+
Loads from main memory
@@ -824,7 +824,7 @@
-
+
Stores to main memory
@@ -842,7 +842,7 @@
-
+
Stores to images
@@ -850,7 +850,7 @@
Address to store to after adding offset
-
+
Loads a given render target, specified in the pixel indices descriptor, at
a given location and sample, and convert to the format specified in the
@@ -865,7 +865,7 @@
Conversion descriptor
-
+
Blends a given render target. This loads the API-specified blend state for
the render target from the first source. Blend descriptors are available
@@ -901,7 +901,7 @@
-
+
Does alpha-to-coverage testing, updating the sample coverage mask. ATEST
does not do an implicit discard. It should be executed before the first
@@ -914,7 +914,7 @@
-
+
Programatically writes out depth, stencil, or both, depending on which
modifiers are set. Used to implement gl_FragDepth and gl_FragStencil.
@@ -927,7 +927,7 @@
Input coverage mask
-
+
Performs the given data conversion. Note that floating-point rounding is
handled via the same hardware and therefore shares an encoding. Round mode
@@ -950,7 +950,7 @@
Value to convert
-
+
Performs the given data conversion.
@@ -958,7 +958,7 @@
Value to convert
-
+
Performs the given data conversion.
@@ -968,13 +968,13 @@
Value to convert
-
+
Converts up with the specified round mode.
Value to convert
-
+
Performs the given data conversion.
@@ -992,7 +992,7 @@
Value to convert
-
+
Performs the given rounding, using the convert unit.
@@ -1004,33 +1004,33 @@
Value to convert
-
+
Canonical register-to-register move.
-
+
Used as a primitive for various bitwise operations.
-
+
Used as a primitive for various bitwise operations.
-
+
Used as a primitive for various bitwise operations.
-
+
64-bit abs may be constructed in 4 instructions (5 clocks) by checking the
sign with `ICMP.s32.lt.m1 hi, 0` and negating based on the result with
@@ -1039,15 +1039,15 @@
-
+
-
+
-
+
Only available as 32-bit. Smaller bitsizes require explicit conversions.
64-bit popcount may be constructed in 3 clocks by separate 32-bit
@@ -1057,28 +1057,29 @@
-
+
Only available as 32-bit. Other bitsizes may be derived with swizzles.
-
+
For fully featured bitwise operation, see the shift opcodes.
-
+
For fully featured bitwise operation, see the shift opcodes.
-
+
+
Returns the mask of lanes ever active within the warp (subgroup), such
that the source is nonzero. The number of work-items in a subgroup is
@@ -1094,7 +1095,7 @@
-
+
@@ -1109,7 +1110,7 @@
-
+
@@ -1121,10 +1122,10 @@
The logarithm instruction (`FLOGD.f32`) requires an argument reduction. See the
transcendentals section for more information.
-
+
-
+
@@ -1134,7 +1135,7 @@
-
+
$A + B$
@@ -1143,7 +1144,7 @@
B
-
+
$\min \{ A, B \}$
@@ -1152,7 +1153,7 @@
B
-
+
$\max \{ A, B \}$
@@ -1161,7 +1162,7 @@
B
-
+
Given a pair of 32-bit floats, output a pair of 16-bit floats packed into
@@ -1171,7 +1172,7 @@
B
-
+
@@ -1185,7 +1186,7 @@
B
-
+
Calculates the base-2 exponent of an argument specified as a 8:24
fixed-point. The original argument is passed as well for correct handling
@@ -1196,7 +1197,7 @@
Input as 32-bit float
-
+
Performs a floating-point addition specialized for logarithm computation.
@@ -1205,7 +1206,7 @@
B
-
+
$A + B$ with optional saturation.
@@ -1226,13 +1227,13 @@
-
+
Calculates $A | (B \ll 16)$. Used to implement `(ushort2)(A, B)`
A
B
-
+
@@ -1247,7 +1248,7 @@
-
+
Sign or zero extend B to 64-bits, left-shift by `shift`, and add the
64-bit value A. These instructions accelerate address arithmetic, but may
@@ -1260,7 +1261,7 @@
B
-
+
@@ -1281,7 +1282,8 @@
-
+
+
@@ -1298,7 +1300,7 @@
-
+
@@ -1320,7 +1322,7 @@
-
+
$A \cdot B + C$
@@ -1330,7 +1332,7 @@
C
-
+
@@ -1346,7 +1348,7 @@
B
-
+
@@ -1362,7 +1364,7 @@
B
-
+
@@ -1378,7 +1380,7 @@
B
-
+
@@ -1394,7 +1396,7 @@
B
-
+
@@ -1410,7 +1412,7 @@
B
-
+
@@ -1426,7 +1428,7 @@
B
-
+
Mux between A and B based on the provided mask. Equivalent to
`bitselect()` in OpenCL. `(A & mask) | (A & ~mask)`
@@ -1436,21 +1438,21 @@
Mask
-
+
During a cube map transform, select the S coordinate given a selected face.
Z coordinate as 32-bit floating point
X coordinate as 32-bit floating point
Cube face index
-
+
During a cube map transform, select the T coordinate given a selected face.
Y coordinate as 32-bit floating point
Z coordinate as 32-bit floating point
Cube face index
-
+
Calculates $A | (B \ll 8) | (CD \ll 16)$ for 8-bit A and B and 16-bit CD.
@@ -1465,21 +1467,22 @@
CD
-
+
Select the maximum absolute value of its arguments.
X coordinate as 32-bit floating point
Y coordinate as 32-bit floating point
Z coordinate as 32-bit floating point
-
+
Select the cube face index corresponding to the arguments.
X coordinate as 32-bit floating point
Y coordinate as 32-bit floating point
Z coordinate as 32-bit floating point
-
+
+
8-bit integer dot product between 4 channel vectors, intended for machine
learning. Available in both unsigned and signed variants, controlling
@@ -1500,7 +1503,7 @@
-
+
Evaluates the given condition, do a logical and/or with the condition in
the result source, and return in the given result type (integer
@@ -1528,7 +1531,7 @@
C
-
+
Evaluates the given condition, do a logical and/or with the condition in
the result source, and return in the given result type (integer
@@ -1547,7 +1550,7 @@
C
-
+
Evaluates the given condition, do a logical and/or with the condition in
the result source, and return in the given result type (integer
@@ -1575,7 +1578,7 @@
C
-
+
Adds an arbitrary 32-bit immediate embedded within the instruction stream.
If no modifiers are required, this is preferred to `IADD.i32` with a
@@ -1588,7 +1591,7 @@
-
+
Adds an arbitrary pair of 16-bit immediates embedded within the
instruction stream. If no modifiers are required, this is preferred to
@@ -1600,7 +1603,7 @@
-
+
Adds an arbitrary quad of 8-bit immediates embedded within the
instruction stream. If no modifiers are required, this is preferred to
@@ -1612,7 +1615,7 @@
-
+
Adds an arbitrary 32-bit immediate embedded within the instruction stream.
If no modifiers are required, this is preferred to `FADD.f32` with a
@@ -1623,7 +1626,7 @@
-
+
Adds an arbitrary pair of 16-bit immediates embedded within the
instruction stream. If no modifiers are required, this is preferred to
@@ -1635,7 +1638,7 @@
-
+
@@ -1646,7 +1649,7 @@
-
+
@@ -1657,7 +1660,7 @@
-
+
Unfiltered textured instruction.
@@ -1669,7 +1672,7 @@
Image to read from
-
+
Ordinary texturing instruction using a sampler.
@@ -1683,8 +1686,11 @@
-
- Only works for FP32 varyings.
+
+
+ Only works for FP32 varyings. Performance characteristics are similar
+ to LD_VAR_IMM_F32.v2.f32 followed by TEX, using both V and T units.
+
@@ -1692,7 +1698,7 @@
Image to read from
-
+
First calculates $A \cdot B + C$ and then biases the exponent by D. Used in
special transcendental function sequences. It should not be used for