mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-04-28 15:10:46 +02:00
aco: Fixup markdown formatting of the README-ISA.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4164>
This commit is contained in:
parent
086fafc4e0
commit
b4d4c548eb
1 changed files with 27 additions and 17 deletions
|
|
@ -1,38 +1,45 @@
|
|||
# Unofficial GCN/RDNA ISA reference errata
|
||||
|
||||
## v_sad_u32
|
||||
## `v_sad_u32`
|
||||
|
||||
The Vega ISA reference writes its behaviour as:
|
||||
|
||||
The Vega ISA reference writes it's behaviour as:
|
||||
```
|
||||
D.u = abs(S0.i - S1.i) + S2.u.
|
||||
```
|
||||
|
||||
This is incorrect. The actual behaviour is what is written in the GCN3 reference
|
||||
guide:
|
||||
|
||||
```
|
||||
ABS_DIFF (A,B) = (A>B) ? (A-B) : (B-A)
|
||||
D.u = ABS_DIFF (S0.u,S1.u) + S2.u
|
||||
```
|
||||
|
||||
The instruction doesn't subtract the S0 and S1 and use the absolute value (the
|
||||
_signed_ distance), it uses the _unsigned_ distance between the operands. So
|
||||
`v_sad_u32(-5, 0, 0)` would return `4294967291` (`-5` interpreted as unsigned),
|
||||
not `5`.
|
||||
|
||||
## s_bfe_*
|
||||
## `s_bfe_*`
|
||||
|
||||
Both the Vega and GCN3 ISA references write that these instructions don't write
|
||||
Both the RDNA, Vega and GCN3 ISA references write that these instructions don't write
|
||||
SCC. They do.
|
||||
|
||||
## v_bcnt_u32_b32
|
||||
## `v_bcnt_u32_b32`
|
||||
|
||||
The Vega ISA reference writes its behaviour as:
|
||||
|
||||
The Vega ISA reference writes it's behaviour as:
|
||||
```
|
||||
D.u = 0;
|
||||
for i in 0 ... 31 do
|
||||
D.u += (S0.u[i] == 1 ? 1 : 0);
|
||||
endfor.
|
||||
```
|
||||
|
||||
This is incorrect. The actual behaviour (and number of operands) is what
|
||||
is written in the GCN3 reference guide:
|
||||
|
||||
```
|
||||
D.u = CountOneBits(S0.u) + S1.u.
|
||||
```
|
||||
|
|
@ -54,10 +61,12 @@ are there in LLVM.
|
|||
## VMEM stores
|
||||
|
||||
All reference guides say (under "Vector Memory Instruction Data Dependencies"):
|
||||
|
||||
> When a VM instruction is issued, the address is immediately read out of VGPRs
|
||||
> and sent to the texture cache. Any texture or buffer resources and samplers
|
||||
> are also sent immediately. However, write-data is not immediately sent to the
|
||||
> texture cache.
|
||||
|
||||
Reading that, one might think that waitcnts need to be added when writing to
|
||||
the registers used for a VMEM store's data. Experimentation has shown that this
|
||||
does not seem to be the case on GFX8 and GFX9 (GFX6 and GFX7 are untested). It
|
||||
|
|
@ -107,25 +116,26 @@ circumstances (eg. we needn't set DLC when only one shader array is used).
|
|||
|
||||
Stores and atomics always bypass the L1 cache, so they don't support the DLC bit,
|
||||
and it shouldn't be set in these cases. Setting the DLC for these cases can result
|
||||
in graphical glitches.
|
||||
in graphical glitches or hangs.
|
||||
|
||||
## RDNA S_DCACHE_WB
|
||||
## RDNA `s_dcache_wb`
|
||||
|
||||
The S_DCACHE_WB is not mentioned in the RDNA ISA doc, but it is needed in order
|
||||
The `s_dcache_wb` is not mentioned in the RDNA ISA doc, but it is needed in order
|
||||
to achieve correct behavior in some SSBO CTS tests.
|
||||
|
||||
## RDNA subvector mode
|
||||
|
||||
The documentation of S_SUBVECTOR_LOOP_BEGIN and S_SUBVECTOR_LOOP_END is not clear
|
||||
The documentation of `s_subvector_loop_begin` and `s_subvector_mode_end` is not clear
|
||||
on what sort of addressing should be used, but it says that it
|
||||
"is equivalent to an S_CBRANCH with extra math", so the subvector loop handling
|
||||
in ACO is done according to the S_CBRANCH doc.
|
||||
"is equivalent to an `S_CBRANCH` with extra math", so the subvector loop handling
|
||||
in ACO is done according to the `s_cbranch` doc.
|
||||
|
||||
# Hardware Bugs
|
||||
|
||||
## SMEM corrupts VCCZ on SI/CI
|
||||
|
||||
https://github.com/llvm/llvm-project/blob/acb089e12ae48b82c0b05c42326196a030df9b82/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp#L580-L616
|
||||
[See this LLVM source.](https://github.com/llvm/llvm-project/blob/acb089e12ae48b82c0b05c42326196a030df9b82/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp#L580-L616)
|
||||
|
||||
After issuing a SMEM instructions, we need to wait for the SMEM instructions to
|
||||
finish and then write to vcc (for example, `s_mov_b64 vcc, vcc`) to correct vccz
|
||||
|
||||
|
|
@ -133,11 +143,11 @@ Currently, we don't do this.
|
|||
|
||||
## GCN / GFX6 hazards
|
||||
|
||||
### VINTRP followed by a read with v_readfirstlane or v_readlane
|
||||
### VINTRP followed by a read with `v_readfirstlane` or `v_readlane`
|
||||
|
||||
It's required to insert 1 wait state if the dst VGPR of any v_interp_* is
|
||||
followed by a read with v_readfirstlane or v_readlane to fix GPU hangs on GFX6.
|
||||
Note that v_writelane_* is apparently not affected. This hazard isn't
|
||||
It's required to insert 1 wait state if the dst VGPR of any `v_interp_*` is
|
||||
followed by a read with `v_readfirstlane` or `v_readlane` to fix GPU hangs on GFX6.
|
||||
Note that `v_writelane_*` is apparently not affected. This hazard isn't
|
||||
documented anywhere but AMD confirmed it.
|
||||
|
||||
## RDNA / GFX10 hazards
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue