mesa/src/broadcom/compiler
Iago Toral Quiroga 1784dd22a3 broadcom/compiler: pipeline smooth ldvary sequences
Typically, we would schedule smooth varyings like this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0
fadd  rf13, r0, r5   ; nop               ; ldvary.r1
nop                  ; fmul  r2, r1, rf0
fadd  rf12, r2, r5   ; nop               ; ldvary.r3
nop                  ; fmul  r4, r3, rf0
fadd  rf11, r4, r5   ; nop               ; ldvary.r0

where we pair up an ldvary with the fadd of the previous sequence
instead of the previous fmul. This is because ldvary has an implicit
write to r5 which is read by the fadd of the previous sequence, so
our dependency tracking doesn't allow us to move the ldvary before the
fadd, however, the r5 write of the ldvary instruction happens in the
instruction after it is emitted so we can actually move it to the fmul
and the r5 write would still happen in the same instruction as the fadd,
which is fine.

This patch allows us to pipeline these sequences optimally. For that,
after merging an ldvary into a previous instruction in the middle of
a pipelineable ldvary sequence, we check if we can manually move it
to the last scheduled instruction instead (the one before the
instruction we are currently scheduling).

If we are successful at moving the ldvary to the previous instruction,
then we flag the ldvary as scheduled immediately, which may promote
its children (the follow-up fmul instruction for that ldvary) to DAG
heads and continue the merge loop so that fmul can be picked and
merged into the final fadd of the previous sequence (where we had
originally merged the ldvary). This leads to a result that looks like
this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0 ; ldvary.r1
fadd  rf13, r0, r5   ; fmul  r2, r1, rf0 ; ldvary.r3
fadd  rf12, r2, r5   ; fmul  r4, r3, rf0 ; ldvary.r0

Shader-db results:

total instructions in shared programs: 14071591 -> 13820690 (-1.78%)
instructions in affected programs: 7809692 -> 7558791 (-3.21%)
helped: 41209
HURT: 4528
Instructions are helped.

total max-temps in shared programs: 2335784 -> 2326435 (-0.40%)
max-temps in affected programs: 84302 -> 74953 (-11.09%)
helped: 4561
HURT: 293
Max-temps are helped.

total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%)
sfu-stalls in affected programs: 3551 -> 2697 (-24.05%)
helped: 1713
HURT: 750
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%)
inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%)
helped: 41411
HURT: 4535
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
..
meson.build broadcom/compiler: add a constant alu optimization pass 2021-02-23 08:08:01 +00:00
nir_to_vir.c broadcom/compiler: track pipelineable ldvary sequences 2021-03-02 07:56:00 +01:00
qpu_schedule.c broadcom/compiler: pipeline smooth ldvary sequences 2021-03-02 07:56:00 +01:00
qpu_validate.c broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x 2021-02-12 08:24:21 +00:00
v3d33_tex.c broadcom/compiler: support pipelining of tex instructions 2021-02-04 10:33:10 +00:00
v3d33_vpm_setup.c broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file. 2018-01-12 21:56:24 -08:00
v3d40_tex.c broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f 2021-02-12 07:05:33 +00:00
v3d_compiler.h broadcom/compiler: track pipelineable ldvary sequences 2021-03-02 07:56:00 +01:00
v3d_nir_lower_image_load_store.c v3d: Ask the state tracker to lower image accesses off of derefs. 2020-02-24 18:25:02 +00:00
v3d_nir_lower_io.c v3d: use intrinsic builders 2021-01-06 14:34:41 +00:00
v3d_nir_lower_line_smooth.c v3d: use intrinsic builders 2021-01-06 14:34:41 +00:00
v3d_nir_lower_logic_ops.c v3d: mark some variables static const 2021-01-13 07:24:32 +00:00
v3d_nir_lower_robust_buffer_access.c v3d/compiler: add a lowering pass for robust buffer access 2020-10-13 21:21:33 +00:00
v3d_nir_lower_scratch.c v3d: Use the new lower_to_scratch implementation for indirects on temps. 2019-04-12 16:16:58 -07:00
v3d_nir_lower_txf_ms.c v3d: Use nir_shader_lower_instructions() for txf_ms lowering. 2019-07-18 11:28:56 -07:00
vir.c v3d: Replace driver lowering of GL_CLAMP with mesa/st's. 2021-02-24 18:03:46 +00:00
vir_dump.c broadcom/compiler: name registers correctly based on V3D version 2021-02-12 08:24:21 +00:00
vir_live_variables.c util/hash_table: update users to use new optimal integer hash functions 2020-01-23 17:06:57 +00:00
vir_opt_constant_alu.c broadcom/compiler: add a constant alu optimization pass 2021-02-23 08:08:01 +00:00
vir_opt_copy_propagate.c v3d: Use ldunif instructions for uniforms. 2019-03-05 12:57:39 -08:00
vir_opt_dead_code.c broadcom/compiler: add a constant alu optimization pass 2021-02-23 08:08:01 +00:00
vir_opt_redundant_flags.c v3d: fix checking twice auf flag 2019-06-13 11:45:18 +02:00
vir_opt_small_immediates.c v3d: Use ldunif instructions for uniforms. 2019-03-05 12:57:39 -08:00
vir_register_allocate.c broadcom/compiler: use a helper function to decide on TMU spilling 2021-02-17 09:01:02 +01:00
vir_to_qpu.c broadcom/compiler: emit ldunifarf when needed 2021-02-12 08:24:21 +00:00