From 3b19545966b771cf44a32f186351eacbcf30e89b Mon Sep 17 00:00:00 2001 From: Emma Anholt Date: Wed, 2 Jun 2021 13:12:41 -0700 Subject: [PATCH] docs/freedreno: Rewrite the section on array access. We don't use collect/split for array access these days, instead use ir3_array structs that the ir3_register can point to. Part-of: --- docs/drivers/freedreno/ir3-notes.rst | 87 ++++++++++------------------ 1 file changed, 29 insertions(+), 58 deletions(-) diff --git a/docs/drivers/freedreno/ir3-notes.rst b/docs/drivers/freedreno/ir3-notes.rst index 91c1fe4c8cc..0d859813d5f 100644 --- a/docs/drivers/freedreno/ir3-notes.rst +++ b/docs/drivers/freedreno/ir3-notes.rst @@ -292,81 +292,52 @@ results in: The scheduling pass has some smarts to schedule things such that only a single ``a0.x`` value is used at any one time. -To implement variable arrays, values are stored in consecutive scalar registers. This has some overlap with `register groups`_, in that ``collect`` and ``split`` are used to help group things for the `register assignment`_ pass. - -To use a variable array as a src register, a slight variation of what is done for const array src. The instruction src is a `collect` instruction that groups all the array members: +To implement variable arrays, the NIR registers are stored as an ``ir3_array``, +which will be register allocated to consecutive hardware registers. The array +access uses the id field in the ``ir3_register`` to map to the array being +accessed, and the offset field for the fixed offset within the array. A NIR +indirect register read such as: :: - mova a0.x, hr1.y - sub r1.y, r2.x, r3.x - add r0.x, r1.y, r + decl_reg vec2 32 r0[2] + ... + vec2 32 ssa_19 = mov r0[0 + ssa_9] + results in: -.. graphviz:: +:: - digraph { - a0 [label="r0.z"]; - a1 [label="r0.w"]; - a2 [label="r1.x"]; - a3 [label="r1.y"]; - sub; - collect; - mova; - add; - add -> sub; - add -> collect [label="off=2"]; - add -> mova; - collect -> a0; - collect -> a1; - collect -> a2; - collect -> a3; - } + 0000:0000:001: shl.b hssa_19, hssa_17, himm[0.000000,1,0x1] + 0000:0000:002: mov.s16s16 hr61.x, hssa_19 + 0000:0000:002: mov.u32u32 ssa_21, arr[id=1, offset=0, size=4, ssa_12], address=_[0000:0000:002: mov.s16s16] + 0000:0000:002: mov.u32u32 ssa_22, arr[id=1, offset=1, size=4, ssa_12], address=_[0000:0000:002: mov.s16s16] -TODO better describe how actual deref offset is derived, i.e. based on array base register. -To do an indirect write to a variable array, a ``split`` is used. Say the array was assigned to registers ``r0.z`` through ``r1.y`` (hence the constant offset of 2): - - Note that only cat1 (mov) can do indirect write. +Array writes write to the array in ``instr->regs[0]->array.id``. A NIR indirect +register write such as: :: - mova a0.x, hr1.y - min r2.x, r2.x, c0.x - mov r, r2.x - mul r0.x, r0.z, c0.z + decl_reg vec2 32 r0[2] + ... + r0[0 + ssa_12] = mov ssa_13 +results in: -In this case, the ``mov`` instruction does not write all elements of the array (compared to usage of ``split`` for ``sam`` instructions in grouping_). But the ``mov`` instruction does need an additional dependency (via ``collect``) on instructions that last wrote the array element members, to ensure that they get scheduled before the ``mov`` in scheduling_ stage (which also serves to group the array elements for the `register assignment`_ stage). +:: -.. graphviz:: - - digraph { - a0 [label="r0.z"]; - a1 [label="r0.w"]; - a2 [label="r1.x"]; - a3 [label="r1.y"]; - min; - mova; - mov; - mul; - split [label="split\noff=0"]; - mul -> split; - split -> mov; - collect; - collect -> a0; - collect -> a1; - collect -> a2; - collect -> a3; - mov -> min; - mov -> mova; - mov -> collect; - } - -Note that there would in fact be ``split`` nodes generated for each array element (although only the reachable ones will be scheduled, etc). + 0000:0000:001: shl.b hssa_29, hssa_27, himm[0.000000,1,0x1] + 0000:0000:002: mov.s16s16 hr61.x, hssa_29 + 0000:0000:001: mov.u32u32 arr[id=1, offset=0, size=4, ssa_17], c2.y, address=_[0000:0000:002: mov.s16s16] + 0000:0000:004: mov.u32u32 arr[id=1, offset=1, size=4, ssa_31], c2.z, address=_[0000:0000:002: mov.s16s16] +Note that only cat1 (mov) can do indirect write, and thus NIR register stores +may need to introduce an extra mov. +ir3 array accesses in the DAG get serialized by the ``instr->barrier_class`` and +containing ``IR3_BARRIER_ARRAY_W`` or ``IR3_BARRIER_ARRAY_R``. Shader Passes -------------