aco: Fix s_dcache_wb on GFX10.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This commit is contained in:
Timur Kristóf 2019-09-17 19:59:17 +02:00
parent 68c9554732
commit a89153d038
2 changed files with 13 additions and 0 deletions

View file

@ -109,6 +109,11 @@ Stores and atomics always bypass the L1 cache, so they don't support the DLC bit
and it shouldn't be set in these cases. Setting the DLC for these cases can result
in graphical glitches.
## RDNA S_DCACHE_WB
The S_DCACHE_WB is not mentioned in the RDNA ISA doc, but it is needed in order
to achieve correct behavior in some SSBO CTS tests.
## RDNA subvector mode
The documentation of S_SUBVECTOR_LOOP_BEGIN and S_SUBVECTOR_LOOP_END is not clear

View file

@ -323,6 +323,14 @@ wait_imm kill(Instruction* instr, wait_ctx& ctx)
if (ctx.exp_cnt || ctx.vm_cnt || ctx.lgkm_cnt)
imm.combine(check_instr(instr, ctx));
if (ctx.chip_class >= GFX10) {
/* Seems to be required on GFX10 to achieve correct behaviour.
* It shouldn't cost anything anyways since we're about to do s_endpgm.
*/
if (ctx.lgkm_cnt && instr->opcode == aco_opcode::s_dcache_wb)
imm.lgkm = 0;
}
if (instr->format == Format::PSEUDO_BARRIER) {
unsigned* bsize = ctx.program->info->cs.block_size;
unsigned workgroup_size = bsize[0] * bsize[1] * bsize[2];