broadcom: fix pairing tmu lookup with previous ldtmu

There are some restrictions when pairing a new TMU lookup with
a previous LDTMU and we had code to handle this but we were not
limiting the restriction only to TMU lookups.

total instructions in shared programs: 10856992 -> 10823967 (-0.30%)
instructions in affected programs: 1823670 -> 1790645 (-1.81%)
helped: 10212
HURT: 110
Instructions are helped.

total max-temps in shared programs: 2234069 -> 2233153 (-0.04%)
max-temps in affected programs: 15100 -> 14184 (-6.07%)
helped: 660
HURT: 3
Max-temps are helped.

total sfu-stalls in shared programs: 15935 -> 15967 (0.20%)
sfu-stalls in affected programs: 317 -> 349 (10.09%)
helped: 31
HURT: 57
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 10872927 -> 10839934 (-0.30%)
inst-and-stalls in affected programs: 1824656 -> 1791663 (-1.81%)
helped: 10199
HURT: 111
Inst-and-stalls are helped.

total nops in shared programs: 185612 -> 185767 (0.08%)
nops in affected programs: 4865 -> 5020 (3.19%)
helped: 164
HURT: 256
Nops are HURT.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31574>
This commit is contained in:
Iago Toral Quiroga 2024-10-09 09:07:46 +02:00 committed by Marge Bot
parent 20d5020ad7
commit 4d1971f17f

View file

@ -200,6 +200,27 @@ tmu_write_is_sequence_terminator(uint32_t waddr)
}
}
static bool
is_tmu_sequence_terminator(struct qinst *inst)
{
if (inst->qpu.type != V3D_QPU_INSTR_TYPE_ALU)
return false;
if (inst->qpu.alu.add.op != V3D_QPU_A_NOP) {
if (!inst->qpu.alu.add.magic_write)
return false;
return tmu_write_is_sequence_terminator(inst->qpu.alu.add.waddr);
}
if (inst->qpu.alu.mul.op != V3D_QPU_M_NOP) {
if (!inst->qpu.alu.mul.magic_write)
return false;
return tmu_write_is_sequence_terminator(inst->qpu.alu.mul.waddr);
}
return false;
}
static bool
can_reorder_tmu_write(const struct v3d_device_info *devinfo, uint32_t waddr)
{
@ -1533,6 +1554,7 @@ retry:
* this aspect in the compiler yet.
*/
if (prev_inst->inst->qpu.sig.ldtmu &&
is_tmu_sequence_terminator(n->inst) &&
!scoreboard->first_ldtmu_after_thrsw &&
(scoreboard->pending_ldtmu_count +
n->inst->ldtmu_count > 16 / c->threads)) {