From e62e3c201065c501daf74d70e06fe1e7893522cd Mon Sep 17 00:00:00 2001 From: Lars-Ivar Hesselberg Simonsen Date: Thu, 5 Mar 2026 17:28:04 +0100 Subject: [PATCH] pan/va/ISA: Add v15 opcodes --- src/panfrost/compiler/bifrost/valhall/ISA.xml | 1292 +++++++++++++++++ 1 file changed, 1292 insertions(+) diff --git a/src/panfrost/compiler/bifrost/valhall/ISA.xml b/src/panfrost/compiler/bifrost/valhall/ISA.xml index 9c412ebf469..6fc6e0d12de 100644 --- a/src/panfrost/compiler/bifrost/valhall/ISA.xml +++ b/src/panfrost/compiler/bifrost/valhall/ISA.xml @@ -787,6 +787,13 @@ + + + + + + + Do nothing. Useful at the start of a block for waiting on slots required by the first actual instruction of the block, to reconcile dependencies @@ -798,6 +805,12 @@ + + + + + + Branches to a specified relative offset if its source is nonzero (default) or if its source is zero (if `.eq` is set). The offset is 27-bits and @@ -823,6 +836,12 @@ + + + + + + Evaluates the given condition, and if it passes, discards the current fragment and terminates the thread. Only valid in a **fragment** shader. @@ -836,6 +855,12 @@ + + + + + + Jump to an indirectly specified (absolute or relative) address. Used to jump to blend shaders at the end of a fragment shader. @@ -851,6 +876,13 @@ + + + + + + + General-purpose barrier. Must use slot #7. Must be paired with a `.wait` flow on the instruction. @@ -863,11 +895,21 @@ + + + + + + + + + + Evaluates the given condition and outputs either the true source or the @@ -885,21 +927,41 @@ + + + + + + + + + + + + + + + + + + + + Evaluates the given condition and outputs either the true source or the @@ -921,6 +983,13 @@ + + + + + + + @@ -936,6 +1005,13 @@ + + + + + + + Fetches a given flat varying from hardware buffer @@ -949,6 +1025,13 @@ + + + + + + + Fetches a given flat varying from hardware buffer @@ -964,11 +1047,27 @@ + + + + + + + + + + + + + + + + @@ -988,11 +1087,25 @@ + + + + + + + + + + + + + + @@ -1010,6 +1123,12 @@ + + + + + + Interpolates a given varying from a software buffer @@ -1026,6 +1145,13 @@ + + + + + + + Interpolates a given varying from a software buffer @@ -1043,6 +1169,13 @@ + + + + + + + Fetches a given varying from a software buffer @@ -1056,6 +1189,13 @@ + + + + + + + Fetches a given varying from a software buffer @@ -1071,6 +1211,12 @@ + + + + + + Load `vecsize` components from the attribute descriptor at entry `index` of resource table `table` at index (vertex ID, instance ID), converting @@ -1092,6 +1238,13 @@ + + + + + + + Load `vecsize` components from the attribute descriptor at the specified location at index (vertex ID, instance ID), converting @@ -1113,6 +1266,13 @@ + + + + + + + Load the 64-bit global clock, either a cycle counter or the system clock. @@ -1124,6 +1284,12 @@ + + + + + + Load `vecsize` components from the texture descriptor at entry `index` of resource table `table`, converting @@ -1145,6 +1311,13 @@ + + + + + + + Load `vecsize` components from the texture descriptor at the specified location at index, converting @@ -1165,6 +1338,12 @@ + + + + + + Load the effective address of an attribute specified with the given immediate index. Returns three staging register: the low/high @@ -1184,6 +1363,13 @@ + + + + + + + Load the effective address of an attribute specified with the given index. Returns three staging register: the low/high @@ -1203,6 +1389,12 @@ + + + + + + Load the effective address of a texel from the image specified with the given immediate index. Returns three staging registers: the low/high @@ -1227,6 +1419,13 @@ + + + + + + + Load the effective address of a texel from the image specified with the given index. Returns three staging register: the low/high @@ -1251,6 +1450,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1272,6 +1478,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1293,6 +1506,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1314,6 +1534,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1335,6 +1562,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1356,6 +1590,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1377,6 +1618,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1398,6 +1646,13 @@ + + + + + + + Loads a buffer descriptor. If bits 25...31 of the mode descriptor are all-ones, load from the buffer descriptors in the table indexed by the @@ -1419,6 +1674,11 @@ + + + + + Load effective address of a buffer with an offset added. @@ -1433,6 +1693,12 @@ + + + + + + Load effective address of a buffer with an immediate offset added. @@ -1449,6 +1715,15 @@ + + + + + + + + + Loads from main memory @@ -1465,6 +1740,15 @@ + + + + + + + + + Loads from main memory @@ -1481,6 +1765,15 @@ + + + + + + + + + Loads from main memory @@ -1497,6 +1790,15 @@ + + + + + + + + + Loads from main memory @@ -1513,6 +1815,15 @@ + + + + + + + + + Loads from main memory @@ -1529,6 +1840,15 @@ + + + + + + + + + Loads from main memory @@ -1545,6 +1865,15 @@ + + + + + + + + + Loads from main memory @@ -1561,6 +1890,15 @@ + + + + + + + + + Loads from main memory @@ -1580,48 +1918,120 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -1634,6 +2044,11 @@ + + + + + Load effective address of a simple buffer with an offset added. @@ -1648,6 +2063,13 @@ + + + + + + + Load from memory with data conversion. The address to load from is given in the first source, which must be a 64-bit register (a pair of 32-bit @@ -1668,6 +2090,13 @@ + + + + + + + Store to memory with data conversion. The address to store to is given in the first source, which must be a 64-bit register (a pair of 32-bit @@ -1690,6 +2119,12 @@ + + + + + + Loads a given render target, specified in the pixel indices descriptor, at a given location and sample, and convert to the format specified in the @@ -1710,6 +2145,13 @@ + + + + + + + Store to given render target, specified in the pixel indices descriptor, at a given location and sample, and convert to the format specified in the @@ -1729,6 +2171,12 @@ + + + + + + Blends a given render target. This loads the API-specified blend state for the render target from the first source. Blend descriptors are available @@ -1768,6 +2216,13 @@ + + + + + + + Does alpha-to-coverage testing, updating the sample coverage mask. ATEST does not do an implicit discard. It should be executed before the first @@ -1784,6 +2239,13 @@ + + + + + + + Programatically writes out depth, stencil, or both, depending on which modifiers are set. Used to implement gl_FragDepth and gl_FragStencil. @@ -1818,6 +2280,11 @@ + + + + + @@ -1833,6 +2300,11 @@ + + + + + @@ -1849,6 +2321,11 @@ + + + + + @@ -1863,6 +2340,11 @@ + + + + + @@ -1883,12 +2365,22 @@ + + + + + + + + + + Value to convert @@ -1939,6 +2431,11 @@ + + + + + Converts up with the specified round mode. Value to convert @@ -1954,6 +2451,11 @@ + + + + + @@ -1969,6 +2471,11 @@ + + + + + @@ -1992,6 +2499,11 @@ + + + + + @@ -2006,6 +2518,11 @@ + + + + + @@ -2029,6 +2546,11 @@ + + + + + @@ -2048,6 +2570,11 @@ + + + + + Canonical register-to-register move. @@ -2057,6 +2584,11 @@ + + + + + Used as a primitive for various bitwise operations. @@ -2068,6 +2600,11 @@ + + + + + Used as a primitive for various bitwise operations. @@ -2079,6 +2616,11 @@ + + + + + Used as a primitive for various bitwise operations. @@ -2090,6 +2632,11 @@ + + + + + 64-bit abs may be constructed in 4 instructions (5 clocks) by checking the sign with `ICMP.s32.lt.m1 hi, 0` and negating based on the result with @@ -2103,6 +2650,11 @@ + + + + + @@ -2120,6 +2672,11 @@ + + + + + Only available as 32-bit. Smaller bitsizes require explicit conversions. 64-bit popcount may be constructed in 3 clocks by separate 32-bit @@ -2134,6 +2691,11 @@ + + + + + Only available as 32-bit. Other bitsizes may be derived with swizzles. @@ -2166,6 +2728,11 @@ + + + + + Returns the mask of lanes ever active within the warp (subgroup), such that the source is nonzero. The number of work-items in a subgroup is @@ -2187,12 +2754,22 @@ + + + + + + + + + + Flush special float values. The ftz modifier flushes subnormal values to @@ -2212,6 +2789,11 @@ + + + + + @@ -2225,6 +2807,11 @@ + + + + + @@ -2251,60 +2838,110 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Performs a given special function. The floating-point reciprocal (`FRCP`) @@ -2323,24 +2960,44 @@ + + + + + + + + + + + + + + + + + + + + Performs a given special function. The trigonometric tables @@ -2356,12 +3013,22 @@ + + + + + + + + + + $A + B$ @@ -2377,12 +3044,22 @@ + + + + + + + + + + $\min \{ A, B \}$ @@ -2396,12 +3073,22 @@ + + + + + + + + + + $\max \{ A, B \}$ @@ -2433,12 +3120,23 @@ + + + + + + + + + + + Computes $A \cdot 2^B$ by adding B to the exponent of A. Used to calculate @@ -2457,6 +3155,11 @@ + + + + + Calculates the base-2 exponent of an argument specified as a 8:24 fixed-point. The original argument is passed as well for correct handling @@ -2472,6 +3175,11 @@ + + + + + Performs a floating-point addition specialized for logarithm computation. @@ -2485,6 +3193,12 @@ + + + + + + Used for `atan2()` implementation. Destination is two 16-bit values (int and float) for the first form, and a single 32-bit float when @@ -2507,12 +3221,22 @@ + + + + + + + + + + @@ -2526,12 +3250,22 @@ + + + + + + + + + + @@ -2545,12 +3279,24 @@ + + + + + + + + + + + + A B @@ -2562,6 +3308,11 @@ + + + + + Calculates $A | (B \ll 16)$. Used to implement `(ushort2)(A, B)` A B @@ -2573,12 +3324,22 @@ + + + + + + + + + + @@ -2592,12 +3353,22 @@ + + + + + + + + + + @@ -2611,12 +3382,24 @@ + + + + + + + + + + + + $A - B$ with optional saturation A @@ -2637,6 +3420,12 @@ + + + + + + @@ -2655,12 +3444,24 @@ + + + + + + + + + + + + A @@ -2673,42 +3474,78 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $A \cdot B$ with optional saturation. Note the multipliers can only handle up to @@ -2775,6 +3612,11 @@ + + + + + Selects the value of A in the subgroup lane given by B. This implements subgroup broadcasts. It may be used as a primitive for screen space @@ -2792,11 +3634,21 @@ + + + + + + + + + + $A \cdot B + C$ @@ -2812,24 +3664,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + Left shifts its first source by a specified amount and bitwise ANDs it with the @@ -2847,24 +3723,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + Right shifts its first source by a specified amount and bitwise ANDs it with the @@ -2885,24 +3785,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + Left shifts its first source by a specified amount and bitwise ORs it with the @@ -2920,24 +3844,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + Right shifts its first source by a specified amount and bitwise ORs it with the @@ -2958,24 +3906,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + Left shifts its first source by a specified amount and bitwise XORs it with the @@ -2993,24 +3965,48 @@ + + + + + + + + + + + + + + + + + + + + + + + + Right shifts its first source by a specified amount and bitwise XORs it with the @@ -3029,6 +4025,12 @@ + + + + + + Mux between A and B based on the provided mask. The condition specified as the `mux` modifier is evaluated on the mask. If true, `A` is chosen, @@ -3046,6 +4048,12 @@ + + + + + + Mux between A and B based on the provided mask. The condition specified as the `mux` modifier is evaluated on the mask. If true, `A` is chosen, @@ -3063,6 +4071,12 @@ + + + + + + Mux between A and B based on the provided mask. The condition specified as the `mux` modifier is evaluated on the mask. If true, `A` is chosen, @@ -3081,6 +4095,12 @@ + + + + + + During a cube map transform, select the S coordinate given a selected face. Z coordinate as 32-bit floating point X coordinate as 32-bit floating point @@ -3092,6 +4112,12 @@ + + + + + + During a cube map transform, select the T coordinate given a selected face. Y coordinate as 32-bit floating point Z coordinate as 32-bit floating point @@ -3102,6 +4128,11 @@ + + + + + Calculates $A | (B \ll 8) | (CD \ll 16)$ for 8-bit A and B and 16-bit CD. @@ -3120,6 +4151,11 @@ + + + + + Select the maximum absolute value of its arguments. X coordinate as 32-bit floating point Y coordinate as 32-bit floating point @@ -3130,6 +4166,11 @@ + + + + + Select the cube face index corresponding to the arguments. X coordinate as 32-bit floating point Y coordinate as 32-bit floating point @@ -3153,12 +4194,24 @@ + + + + + + + + + + + + A B @@ -3179,12 +4232,22 @@ + + + + + + + + + + @@ -3212,12 +4275,22 @@ + + + + + + + + + + @@ -3246,12 +4319,22 @@ + + + + + + + + + + @@ -3272,12 +4355,22 @@ + + + + + + + + + + @@ -3298,12 +4391,22 @@ + + + + + + + + + + @@ -3331,12 +4434,22 @@ + + + + + + + + + + @@ -3371,12 +4484,22 @@ + + + + + + + + + + @@ -3389,6 +4512,10 @@ + + + + Adds an arbitrary 32-bit immediate embedded within the instruction stream. If no modifiers are required, this is preferred to `IADD.i32` with a @@ -3405,6 +4532,10 @@ + + + + Adds an arbitrary pair of 16-bit immediates embedded within the instruction stream. If no modifiers are required, this is preferred to @@ -3436,6 +4567,10 @@ + + + + Adds an arbitrary 32-bit immediate embedded within the instruction stream. If no modifiers are required, this is preferred to `FADD.f32` with a @@ -3450,6 +4585,10 @@ + + + + Adds an arbitrary pair of 16-bit immediates embedded within the instruction stream. If no modifiers are required, this is preferred to @@ -3466,6 +4605,13 @@ + + + + + + + @@ -3481,6 +4627,13 @@ + + + + + + + @@ -3496,6 +4649,13 @@ + + + + + + + @@ -3510,6 +4670,13 @@ + + + + + + + @@ -3524,6 +4691,13 @@ + + + + + + + @@ -3544,6 +4718,13 @@ + + + + + + + @@ -3563,6 +4744,12 @@ + + + + + + Unfiltered textured instruction. @@ -3589,6 +4776,11 @@ + + + + + Ordinary texturing instruction using a sampler. @@ -3617,6 +4809,11 @@ + + + + + Texture gather instruction. @@ -3646,6 +4843,12 @@ + + + + + + Texture sample with explicit gradient. @@ -3672,6 +4875,11 @@ + + + + + Pair of texture instructions. @@ -3697,6 +4905,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_BUF_IMM_F32.v2.f32 followed by TEX, using both V and T units. @@ -3721,6 +4937,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_BUF_IMM_F32.v2.f32 followed by TEX, using both V and T units. @@ -3746,6 +4970,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_BUF_IMM_F32.v2.f32 followed by TEX, using both V and T units. @@ -3771,6 +5003,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_BUF_IMM_F32.v2.f32 followed by TEX_DUAL, using both V and T units. @@ -3795,6 +5035,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_IMM_F32.v2.f32 followed by TEX, using both V and T units. @@ -3819,6 +5067,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_IMM_F32.v2.f32 followed by TEX, using both V and T units. @@ -3844,6 +5100,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_IMM_F32.v2.f32 followed by TEX, using both V and T units. @@ -3869,6 +5133,14 @@ + + + + + + + + Only works for FP32 varyings. Performance characteristics are similar to LD_VAR_IMM_F32.v2.f32 followed by TEX_DUAL, using both V and T units. @@ -3893,6 +5165,11 @@ + + + + + First calculates $A \cdot B + C$ and then biases the exponent by D. Used in special transcendental function sequences. It should not be used for @@ -3911,6 +5188,11 @@ + + + + + First calculates $A \cdot B + C$ and then biases the exponent by D. If $A = 0$ or $B = 0$, the multiply $A \cdot B$ is treated as zero even if an @@ -3930,6 +5212,11 @@ + + + + + First calculates $A \cdot B + C$ and then biases the exponent by D. If $A = 0$ or $B = 0$, the multiply is treated as $A$ even if an @@ -3949,6 +5236,11 @@ + + + + + First calculates $A \cdot B + C$ and then biases the exponent by D, interpreted as a 16-bit value. Used in special transcendental function