mesa/src/broadcom/compiler at 1784dd22a32dccff0fee0428f7cf7fea8dccc574 - fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 15:58:06 +02:00

History

Iago Toral Quiroga 1784dd22a3 broadcom/compiler: pipeline smooth ldvary sequences Typically, we would schedule smooth varyings like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 fadd rf13, r0, r5 ; nop ; ldvary.r1 nop ; fmul r2, r1, rf0 fadd rf12, r2, r5 ; nop ; ldvary.r3 nop ; fmul r4, r3, rf0 fadd rf11, r4, r5 ; nop ; ldvary.r0 where we pair up an ldvary with the fadd of the previous sequence instead of the previous fmul. This is because ldvary has an implicit write to r5 which is read by the fadd of the previous sequence, so our dependency tracking doesn't allow us to move the ldvary before the fadd, however, the r5 write of the ldvary instruction happens in the instruction after it is emitted so we can actually move it to the fmul and the r5 write would still happen in the same instruction as the fadd, which is fine. This patch allows us to pipeline these sequences optimally. For that, after merging an ldvary into a previous instruction in the middle of a pipelineable ldvary sequence, we check if we can manually move it to the last scheduled instruction instead (the one before the instruction we are currently scheduling). If we are successful at moving the ldvary to the previous instruction, then we flag the ldvary as scheduled immediately, which may promote its children (the follow-up fmul instruction for that ldvary) to DAG heads and continue the merge loop so that fmul can be picked and merged into the final fadd of the previous sequence (where we had originally merged the ldvary). This leads to a result that looks like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 Shader-db results: total instructions in shared programs: 14071591 -> 13820690 (-1.78%) instructions in affected programs: 7809692 -> 7558791 (-3.21%) helped: 41209 HURT: 4528 Instructions are helped. total max-temps in shared programs: 2335784 -> 2326435 (-0.40%) max-temps in affected programs: 84302 -> 74953 (-11.09%) helped: 4561 HURT: 293 Max-temps are helped. total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%) sfu-stalls in affected programs: 3551 -> 2697 (-24.05%) helped: 1713 HURT: 750 Sfu-stalls are helped. total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%) inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%) helped: 41411 HURT: 4535 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>		2021-03-02 07:56:00 +01:00
..
meson.build	broadcom/compiler: add a constant alu optimization pass	2021-02-23 08:08:01 +00:00
nir_to_vir.c	broadcom/compiler: track pipelineable ldvary sequences	2021-03-02 07:56:00 +01:00
qpu_schedule.c	broadcom/compiler: pipeline smooth ldvary sequences	2021-03-02 07:56:00 +01:00
qpu_validate.c	broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x	2021-02-12 08:24:21 +00:00
v3d33_tex.c	broadcom/compiler: support pipelining of tex instructions	2021-02-04 10:33:10 +00:00
v3d33_vpm_setup.c	broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file.	2018-01-12 21:56:24 -08:00
v3d40_tex.c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f	2021-02-12 07:05:33 +00:00
v3d_compiler.h	broadcom/compiler: track pipelineable ldvary sequences	2021-03-02 07:56:00 +01:00
v3d_nir_lower_image_load_store.c	v3d: Ask the state tracker to lower image accesses off of derefs.	2020-02-24 18:25:02 +00:00
v3d_nir_lower_io.c	v3d: use intrinsic builders	2021-01-06 14:34:41 +00:00
v3d_nir_lower_line_smooth.c	v3d: use intrinsic builders	2021-01-06 14:34:41 +00:00
v3d_nir_lower_logic_ops.c	v3d: mark some variables static const	2021-01-13 07:24:32 +00:00
v3d_nir_lower_robust_buffer_access.c	v3d/compiler: add a lowering pass for robust buffer access	2020-10-13 21:21:33 +00:00
v3d_nir_lower_scratch.c	v3d: Use the new lower_to_scratch implementation for indirects on temps.	2019-04-12 16:16:58 -07:00
v3d_nir_lower_txf_ms.c	v3d: Use nir_shader_lower_instructions() for txf_ms lowering.	2019-07-18 11:28:56 -07:00
vir.c	v3d: Replace driver lowering of GL_CLAMP with mesa/st's.	2021-02-24 18:03:46 +00:00
vir_dump.c	broadcom/compiler: name registers correctly based on V3D version	2021-02-12 08:24:21 +00:00
vir_live_variables.c	util/hash_table: update users to use new optimal integer hash functions	2020-01-23 17:06:57 +00:00
vir_opt_constant_alu.c	broadcom/compiler: add a constant alu optimization pass	2021-02-23 08:08:01 +00:00
vir_opt_copy_propagate.c	v3d: Use ldunif instructions for uniforms.	2019-03-05 12:57:39 -08:00
vir_opt_dead_code.c	broadcom/compiler: add a constant alu optimization pass	2021-02-23 08:08:01 +00:00
vir_opt_redundant_flags.c	v3d: fix checking twice auf flag	2019-06-13 11:45:18 +02:00
vir_opt_small_immediates.c	v3d: Use ldunif instructions for uniforms.	2019-03-05 12:57:39 -08:00
vir_register_allocate.c	broadcom/compiler: use a helper function to decide on TMU spilling	2021-02-17 09:01:02 +01:00
vir_to_qpu.c	broadcom/compiler: emit ldunifarf when needed	2021-02-12 08:24:21 +00:00