diff --git a/src/panfrost/bifrost/valhall/ISA.xml b/src/panfrost/bifrost/valhall/ISA.xml index 99fbbdcf9f2..3dcd81fb72b 100644 --- a/src/panfrost/bifrost/valhall/ISA.xml +++ b/src/panfrost/bifrost/valhall/ISA.xml @@ -576,7 +576,7 @@ v2inf - + Do nothing. Useful at the start of a block for waiting on slots required by the first actual instruction of the block, to reconcile dependencies @@ -584,7 +584,7 @@ - + Branches to a specified relative offset if its source is nonzero (default) or if its source is zero (if `.eq` is set). The offset is 27-bits and @@ -605,7 +605,7 @@ - + Evaluates the given condition, and if it passes, discards the current fragment and terminates the thread. The destination should be set to R60. @@ -617,7 +617,7 @@ Right value to compare - + Jump to an indirectly specified address. Used to jump to blend shaders at the end of a fragment shader. @@ -627,7 +627,7 @@ - + General-purpose barrier. Must use slot #7. Must be paired with a `.barrier` action on the instruction. @@ -635,7 +635,7 @@ - + @@ -649,7 +649,7 @@ Return value if false - + @@ -670,7 +670,7 @@ Return value if false - + @@ -680,7 +680,7 @@ - + Interpolates a given varying @@ -694,7 +694,7 @@ - + @@ -705,7 +705,7 @@ - + The index must not diverge within a warp. @@ -717,7 +717,7 @@ Index - + Loads the effective address of the position buffer (in a position shader) or the varying buffer (in a varying shader). That is, the base pointer @@ -736,7 +736,7 @@ Linear ID - + Loads from main memory @@ -747,7 +747,7 @@ - + Loads from main memory @@ -758,7 +758,7 @@ - + Loads from main memory @@ -769,7 +769,7 @@ - + Loads from main memory @@ -780,7 +780,7 @@ - + Loads from main memory @@ -791,7 +791,7 @@ - + Loads from main memory @@ -802,7 +802,7 @@ - + Loads from main memory @@ -813,7 +813,7 @@ - + Loads from main memory @@ -824,7 +824,7 @@ - + Stores to main memory @@ -842,7 +842,7 @@ - + Stores to images @@ -850,7 +850,7 @@ Address to store to after adding offset - + Loads a given render target, specified in the pixel indices descriptor, at a given location and sample, and convert to the format specified in the @@ -865,7 +865,7 @@ Conversion descriptor - + Blends a given render target. This loads the API-specified blend state for the render target from the first source. Blend descriptors are available @@ -901,7 +901,7 @@ - + Does alpha-to-coverage testing, updating the sample coverage mask. ATEST does not do an implicit discard. It should be executed before the first @@ -914,7 +914,7 @@ - + Programatically writes out depth, stencil, or both, depending on which modifiers are set. Used to implement gl_FragDepth and gl_FragStencil. @@ -927,7 +927,7 @@ Input coverage mask - + Performs the given data conversion. Note that floating-point rounding is handled via the same hardware and therefore shares an encoding. Round mode @@ -950,7 +950,7 @@ Value to convert - + Performs the given data conversion. @@ -958,7 +958,7 @@ Value to convert - + Performs the given data conversion. @@ -968,13 +968,13 @@ Value to convert - + Converts up with the specified round mode. Value to convert - + Performs the given data conversion. @@ -992,7 +992,7 @@ Value to convert - + Performs the given rounding, using the convert unit. @@ -1004,33 +1004,33 @@ Value to convert - + Canonical register-to-register move. - + Used as a primitive for various bitwise operations. - + Used as a primitive for various bitwise operations. - + Used as a primitive for various bitwise operations. - + 64-bit abs may be constructed in 4 instructions (5 clocks) by checking the sign with `ICMP.s32.lt.m1 hi, 0` and negating based on the result with @@ -1039,15 +1039,15 @@ - + - + - + Only available as 32-bit. Smaller bitsizes require explicit conversions. 64-bit popcount may be constructed in 3 clocks by separate 32-bit @@ -1057,28 +1057,29 @@ - + Only available as 32-bit. Other bitsizes may be derived with swizzles. - + For fully featured bitwise operation, see the shift opcodes. - + For fully featured bitwise operation, see the shift opcodes. - + + Returns the mask of lanes ever active within the warp (subgroup), such that the source is nonzero. The number of work-items in a subgroup is @@ -1094,7 +1095,7 @@ - + @@ -1109,7 +1110,7 @@ - + @@ -1121,10 +1122,10 @@ The logarithm instruction (`FLOGD.f32`) requires an argument reduction. See the transcendentals section for more information. - + - + @@ -1134,7 +1135,7 @@ - + $A + B$ @@ -1143,7 +1144,7 @@ B - + $\min \{ A, B \}$ @@ -1152,7 +1153,7 @@ B - + $\max \{ A, B \}$ @@ -1161,7 +1162,7 @@ B - + Given a pair of 32-bit floats, output a pair of 16-bit floats packed into @@ -1171,7 +1172,7 @@ B - + @@ -1185,7 +1186,7 @@ B - + Calculates the base-2 exponent of an argument specified as a 8:24 fixed-point. The original argument is passed as well for correct handling @@ -1196,7 +1197,7 @@ Input as 32-bit float - + Performs a floating-point addition specialized for logarithm computation. @@ -1205,7 +1206,7 @@ B - + $A + B$ with optional saturation. @@ -1226,13 +1227,13 @@ - + Calculates $A | (B \ll 16)$. Used to implement `(ushort2)(A, B)` A B - + @@ -1247,7 +1248,7 @@ - + Sign or zero extend B to 64-bits, left-shift by `shift`, and add the 64-bit value A. These instructions accelerate address arithmetic, but may @@ -1260,7 +1261,7 @@ B - + @@ -1281,7 +1282,8 @@ - + + @@ -1298,7 +1300,7 @@ - + @@ -1320,7 +1322,7 @@ - + $A \cdot B + C$ @@ -1330,7 +1332,7 @@ C - + @@ -1346,7 +1348,7 @@ B - + @@ -1362,7 +1364,7 @@ B - + @@ -1378,7 +1380,7 @@ B - + @@ -1394,7 +1396,7 @@ B - + @@ -1410,7 +1412,7 @@ B - + @@ -1426,7 +1428,7 @@ B - + Mux between A and B based on the provided mask. Equivalent to `bitselect()` in OpenCL. `(A & mask) | (A & ~mask)` @@ -1436,21 +1438,21 @@ Mask - + During a cube map transform, select the S coordinate given a selected face. Z coordinate as 32-bit floating point X coordinate as 32-bit floating point Cube face index - + During a cube map transform, select the T coordinate given a selected face. Y coordinate as 32-bit floating point Z coordinate as 32-bit floating point Cube face index - + Calculates $A | (B \ll 8) | (CD \ll 16)$ for 8-bit A and B and 16-bit CD. @@ -1465,21 +1467,22 @@ CD - + Select the maximum absolute value of its arguments. X coordinate as 32-bit floating point Y coordinate as 32-bit floating point Z coordinate as 32-bit floating point - + Select the cube face index corresponding to the arguments. X coordinate as 32-bit floating point Y coordinate as 32-bit floating point Z coordinate as 32-bit floating point - + + 8-bit integer dot product between 4 channel vectors, intended for machine learning. Available in both unsigned and signed variants, controlling @@ -1500,7 +1503,7 @@ - + Evaluates the given condition, do a logical and/or with the condition in the result source, and return in the given result type (integer @@ -1528,7 +1531,7 @@ C - + Evaluates the given condition, do a logical and/or with the condition in the result source, and return in the given result type (integer @@ -1547,7 +1550,7 @@ C - + Evaluates the given condition, do a logical and/or with the condition in the result source, and return in the given result type (integer @@ -1575,7 +1578,7 @@ C - + Adds an arbitrary 32-bit immediate embedded within the instruction stream. If no modifiers are required, this is preferred to `IADD.i32` with a @@ -1588,7 +1591,7 @@ - + Adds an arbitrary pair of 16-bit immediates embedded within the instruction stream. If no modifiers are required, this is preferred to @@ -1600,7 +1603,7 @@ - + Adds an arbitrary quad of 8-bit immediates embedded within the instruction stream. If no modifiers are required, this is preferred to @@ -1612,7 +1615,7 @@ - + Adds an arbitrary 32-bit immediate embedded within the instruction stream. If no modifiers are required, this is preferred to `FADD.f32` with a @@ -1623,7 +1626,7 @@ - + Adds an arbitrary pair of 16-bit immediates embedded within the instruction stream. If no modifiers are required, this is preferred to @@ -1635,7 +1638,7 @@ - + @@ -1646,7 +1649,7 @@ - + @@ -1657,7 +1660,7 @@ - + Unfiltered textured instruction. @@ -1669,7 +1672,7 @@ Image to read from - + Ordinary texturing instruction using a sampler. @@ -1683,8 +1686,11 @@ - - Only works for FP32 varyings. + + + Only works for FP32 varyings. Performance characteristics are similar + to LD_VAR_IMM_F32.v2.f32 followed by TEX, using both V and T units. + @@ -1692,7 +1698,7 @@ Image to read from - + First calculates $A \cdot B + C$ and then biases the exponent by D. Used in special transcendental function sequences. It should not be used for