mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-20 11:18:11 +02:00
This is the compiler for both Bifrost and Valhall, and presumably future Mali GPUs too. Give it a more generic name so we can use the bifrost/ path for something a bit more specific. For historical reasons the compiler's name is still "bifrost" and uses the prefix `bi_`. I think that's ok in the same way that i915 in the kernel supports way more than just i915. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20455>
180 lines
7.3 KiB
Text
180 lines
7.3 KiB
Text
# Notes on opcodes
|
|
|
|
_Notes mainly by Connor Abbott extracted from the disassembler_
|
|
|
|
LOG_FREXPM:
|
|
|
|
// From the ARM patent US20160364209A1:
|
|
// "Decompose v (the input) into numbers x1 and s such that v = x1 * 2^s,
|
|
// and x1 is a floating point value in a predetermined range where the
|
|
// value 1 is within the range and not at one extremity of the range (e.g.
|
|
// choose a range where 1 is towards middle of range)."
|
|
//
|
|
// This computes x1.
|
|
|
|
FRCP_FREXPM:
|
|
|
|
// Given a floating point number m * 2^e, returns m * 2^{-1}. This is
|
|
// exactly the same as the mantissa part of frexp().
|
|
|
|
FSQRT_FREXPM:
|
|
// Given a floating point number m * 2^e, returns m * 2^{-2} if e is even,
|
|
// and m * 2^{-1} if e is odd. In other words, scales by powers of 4 until
|
|
// within the range [0.25, 1). Used for square-root and reciprocal
|
|
// square-root.
|
|
|
|
|
|
|
|
|
|
FRCP_FREXPE:
|
|
// Given a floating point number m * 2^e, computes -e - 1 as an integer.
|
|
// Zero and infinity/NaN return 0.
|
|
|
|
FSQRT_FREXPE:
|
|
// Computes floor(e/2) + 1.
|
|
|
|
FRSQ_FREXPE:
|
|
// Given a floating point number m * 2^e, computes -floor(e/2) - 1 as an
|
|
// integer.
|
|
|
|
LSHIFT_ADD_LOW32:
|
|
// These instructions in the FMA slot, together with LSHIFT_ADD_HIGH32.i32
|
|
// in the ADD slot, allow one to do a 64-bit addition with an extra small
|
|
// shift on one of the sources. There are three possible scenarios:
|
|
//
|
|
// 1) Full 64-bit addition. Do:
|
|
// out.x = LSHIFT_ADD_LOW32.i64 src1.x, src2.x, shift
|
|
// out.y = LSHIFT_ADD_HIGH32.i32 src1.y, src2.y
|
|
//
|
|
// The shift amount is applied to src2 before adding. The shift amount, and
|
|
// any extra bits from src2 plus the overflow bit, are sent directly from
|
|
// FMA to ADD instead of being passed explicitly. Hence, these two must be
|
|
// bundled together into the same instruction.
|
|
//
|
|
// 2) Add a 64-bit value src1 to a zero-extended 32-bit value src2. Do:
|
|
// out.x = LSHIFT_ADD_LOW32.u32 src1.x, src2, shift
|
|
// out.y = LSHIFT_ADD_HIGH32.i32 src1.x, 0
|
|
//
|
|
// Note that in this case, the second argument to LSHIFT_ADD_HIGH32 is
|
|
// ignored, so it can actually be anything. As before, the shift is applied
|
|
// to src2 before adding.
|
|
//
|
|
// 3) Add a 64-bit value to a sign-extended 32-bit value src2. Do:
|
|
// out.x = LSHIFT_ADD_LOW32.i32 src1.x, src2, shift
|
|
// out.y = LSHIFT_ADD_HIGH32.i32 src1.x, 0
|
|
//
|
|
// The only difference is the .i32 instead of .u32. Otherwise, this is
|
|
// exactly the same as before.
|
|
//
|
|
// In all these instructions, the shift amount is stored where the third
|
|
// source would be, so the shift has to be a small immediate from 0 to 7.
|
|
// This is fine for the expected use-case of these instructions, which is
|
|
// manipulating 64-bit pointers.
|
|
//
|
|
// These instructions can also be combined with various load/store
|
|
// instructions which normally take a 64-bit pointer in order to add a
|
|
// 32-bit or 64-bit offset to the pointer before doing the operation,
|
|
// optionally shifting the offset. The load/store op implicity does
|
|
// LSHIFT_ADD_HIGH32.i32 internally. Letting ptr be the pointer, and offset
|
|
// the desired offset, the cases go as follows:
|
|
//
|
|
// 1) Add a 64-bit offset:
|
|
// LSHIFT_ADD_LOW32.i64 ptr.x, offset.x, shift
|
|
// ld_st_op ptr.y, offset.y, ...
|
|
//
|
|
// Note that the output of LSHIFT_ADD_LOW32.i64 is not used, instead being
|
|
// implicitly sent to the load/store op to serve as the low 32 bits of the
|
|
// pointer.
|
|
//
|
|
// 2) Add a 32-bit unsigned offset:
|
|
// temp = LSHIFT_ADD_LOW32.u32 ptr.x, offset, shift
|
|
// ld_st_op temp, ptr.y, ...
|
|
//
|
|
// Now, the low 32 bits of offset << shift + ptr are passed explicitly to
|
|
// the ld_st_op, to match the case where there is no offset and ld_st_op is
|
|
// called directly.
|
|
//
|
|
// 3) Add a 32-bit signed offset:
|
|
// temp = LSHIFT_ADD_LOW32.i32 ptr.x, offset, shift
|
|
// ld_st_op temp, ptr.y, ...
|
|
//
|
|
// Again, the same as the unsigned case except for the offset.
|
|
|
|
---
|
|
|
|
ADD ops..
|
|
|
|
F16_TO_F32.X: // take the low 16 bits, and expand it to a 32-bit float
|
|
F16_TO_F32.Y: // take the high 16 bits, and expand it to a 32-bit float
|
|
|
|
MOV:
|
|
// Logically, this should be SWZ.XY, but that's equivalent to a move, and
|
|
// this seems to be the canonical way the blob generates a MOV.
|
|
|
|
|
|
FRCP_FREXPM:
|
|
// Given a floating point number m * 2^e, returns m ^ 2^{-1}.
|
|
|
|
FLOG_FREXPE:
|
|
// From the ARM patent US20160364209A1:
|
|
// "Decompose v (the input) into numbers x1 and s such that v = x1 * 2^s,
|
|
// and x1 is a floating point value in a predetermined range where the
|
|
// value 1 is within the range and not at one extremity of the range (e.g.
|
|
// choose a range where 1 is towards middle of range)."
|
|
//
|
|
// This computes s.
|
|
|
|
LD_UBO.v4i32
|
|
// src0 = offset, src1 = binding
|
|
|
|
FRCP_FAST.f32:
|
|
// *_FAST does not exist on G71 (added to G51, G72, and everything after)
|
|
|
|
FRCP_TABLE
|
|
// Given a floating point number m * 2^e, produces a table-based
|
|
// approximation of 2/m using the top 17 bits. Includes special cases for
|
|
// infinity, NaN, and zero, and copies the sign bit.
|
|
|
|
FRCP_FAST.f16.X
|
|
// Exists on G71
|
|
|
|
FRSQ_TABLE:
|
|
// A similar table for inverse square root, using the high 17 bits of the
|
|
// mantissa as well as the low bit of the exponent.
|
|
|
|
FRCP_APPROX:
|
|
// Used in the argument reduction for log. Given a floating-point number
|
|
// m * 2^e, uses the top 4 bits of m to produce an approximation to 1/m
|
|
// with the exponent forced to 0 and only the top 5 bits are nonzero. 0,
|
|
// infinity, and NaN all return 1.0.
|
|
// See the ARM patent for more information.
|
|
|
|
MUX:
|
|
// For each bit i, return src2[i] ? src0[i] : src1[i]. In other words, this
|
|
// is the same as (src2 & src0) | (~src2 & src1).
|
|
|
|
ST_VAR:
|
|
// store a varying given the address and datatype from LD_VAR_ADDR
|
|
|
|
LD_VAR_ADDR:
|
|
// Compute varying address and datatype (for storing in the vertex shader),
|
|
// and store the vec3 result in the data register. The result is passed as
|
|
// the 3 normal arguments to ST_VAR.
|
|
|
|
DISCARD
|
|
// Conditional discards (discard_if) in NIR. Compares the first two
|
|
// sources and discards if the result is true
|
|
|
|
ATEST.f32:
|
|
// Implements alpha-to-coverage, as well as possibly the late depth and
|
|
// stencil tests. The first source is the existing sample mask in R60
|
|
// (possibly modified by gl_SampleMask), and the second source is the alpha
|
|
// value. The sample mask is written right away based on the
|
|
// alpha-to-coverage result using the normal register write mechanism,
|
|
// since that doesn't need to read from any memory, and then written again
|
|
// later based on the result of the stencil and depth tests using the
|
|
// special register.
|
|
|
|
BLEND:
|
|
// This takes the sample coverage mask (computed by ATEST above) as a
|
|
// regular argument, in addition to the vec4 color in the special register.
|