fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 00:38:06 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	1784dd22a3	broadcom/compiler: pipeline smooth ldvary sequences Typically, we would schedule smooth varyings like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 fadd rf13, r0, r5 ; nop ; ldvary.r1 nop ; fmul r2, r1, rf0 fadd rf12, r2, r5 ; nop ; ldvary.r3 nop ; fmul r4, r3, rf0 fadd rf11, r4, r5 ; nop ; ldvary.r0 where we pair up an ldvary with the fadd of the previous sequence instead of the previous fmul. This is because ldvary has an implicit write to r5 which is read by the fadd of the previous sequence, so our dependency tracking doesn't allow us to move the ldvary before the fadd, however, the r5 write of the ldvary instruction happens in the instruction after it is emitted so we can actually move it to the fmul and the r5 write would still happen in the same instruction as the fadd, which is fine. This patch allows us to pipeline these sequences optimally. For that, after merging an ldvary into a previous instruction in the middle of a pipelineable ldvary sequence, we check if we can manually move it to the last scheduled instruction instead (the one before the instruction we are currently scheduling). If we are successful at moving the ldvary to the previous instruction, then we flag the ldvary as scheduled immediately, which may promote its children (the follow-up fmul instruction for that ldvary) to DAG heads and continue the merge loop so that fmul can be picked and merged into the final fadd of the previous sequence (where we had originally merged the ldvary). This leads to a result that looks like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 Shader-db results: total instructions in shared programs: 14071591 -> 13820690 (-1.78%) instructions in affected programs: 7809692 -> 7558791 (-3.21%) helped: 41209 HURT: 4528 Instructions are helped. total max-temps in shared programs: 2335784 -> 2326435 (-0.40%) max-temps in affected programs: 84302 -> 74953 (-11.09%) helped: 4561 HURT: 293 Max-temps are helped. total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%) sfu-stalls in affected programs: 3551 -> 2697 (-24.05%) helped: 1713 HURT: 750 Sfu-stalls are helped. total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%) inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%) helped: 41411 HURT: 4535 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	1d021539a2	broadcom/compiler: track pipelineable ldvary sequences If we have two (or more) smooth varyings like this: nop t3; ldvary.rf0 fmul t5, t3, t0 fadd t6, t5, r5 nop t7; ldvary.rf0 fmul t9, t7, t0 fadd t10, t9, r5 nop t11; ldvary.rf0 fmul t13, t11, t0 fadd t14, t13, r5 We may be able to pipeline them like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 But in order to do this, we will need to manually tweak the QPU scheduling. This patch tracks information about ldvary sequences that are good candidates for pipelining, and a follow-up patch will use this information to pipeline them when we emit the QPU code. v2 (apinheiro): - Rename the v3d_compile fields to avoid confusion with the qinst fields. - Assert that a sequence's start instruction is not the same as the end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	c2c2cdc3d3	broadcom/compiler: fix indentation style Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	b41edee879	broadcom/compiler: fix DAG pre-remove for merged instructions When selecting an instruction to merge, we want to pre-remove that instruction from the DAG, not the one we are merging it in, which we had already pre-removed right before. The reason this was not causing problems before is that the consequence of this bug is we will choose the same instruction again in the merge loop and trying to merge that instruction twice will fail and we would break out of the merge loop and move on. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Eric Anholt	60573b443b	v3d: Replace driver lowering of GL_CLAMP with mesa/st's. Mesa core can do this logic for us now. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9228>	2021-02-24 18:03:46 +00:00
Iago Toral Quiroga	b17ec53c81	broadcom/compiler: use nir_opt_sink total instructions in shared programs: 14072341 -> 14062334 (-0.07%) instructions in affected programs: 1996685 -> 1986678 (-0.50%) helped: 3038 HURT: 2432 Instructions are helped. total uniforms in shared programs: 3797720 -> 3794523 (-0.08%) uniforms in affected programs: 191711 -> 188514 (-1.67%) helped: 831 HURT: 449 Uniforms are helped. total max-temps in shared programs: 2340632 -> 2335124 (-0.24%) max-temps in affected programs: 113632 -> 108124 (-4.85%) helped: 2728 HURT: 436 Max-temps are helped. total spills in shared programs: 6050 -> 5931 (-1.97%) spills in affected programs: 2869 -> 2750 (-4.15%) helped: 14 HURT: 4 total fills in shared programs: 13970 -> 13371 (-4.29%) fills in affected programs: 8831 -> 8232 (-6.78%) helped: 14 HURT: 4 total inst-and-stalls in shared programs: 14103668 -> 14093712 (-0.07%) inst-and-stalls in affected programs: 2004035 -> 1994079 (-0.50%) helped: 3009 HURT: 2426 Inst-and-stalls are helped. LOST: 0 GAINED: 10 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9209>	2021-02-24 08:02:00 +01:00
Iago Toral Quiroga	54c17e45ae	broadcom/compiler: skip unnecessary unifa writes If a new UBO load happens to read exactly at the offset right after the previous UBO load (something that is fairly common, for example when reading a matrix), we can skip the unifa write (with its 3 delay slots) and just continue to call ldunifa to continue reading consecutive addresses. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e1cf2406da	broadcom/compiler: add a constant alu optimization pass Currently this is useful to clean up after DCEing leading ldunifa instructions, but it can be expanded to handle more cases which may allow to simplify the compiler code in places where we have been trying to optimize manually for similar cases. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	89de085055	broadcom/compiler: remove unused leading ldunifa This requires that we go back to the unifa write and update the address to jump over the unused leading component. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	9d16d2d0be	broadcom/compiler: allow dead code elimination of unused trailing ldunifa If a ldunifa is the last in a sequence and is not used, we can safely eliminate it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e20ae14978	broadcom/compiler: fix ldunif optimization When we look back for a previous uniform definition we want to start looking from the current position of the cursor, not the end of the current block. The latter only works when translating from NIR, since in that case both always match, but any optimization pass may rewrite code and emit uniforms at any place in the middle of the program. Also, ntq_store_dest expects result to be written by the last instruction to handle the case where it is stored to a NIR register. That won't be the case if the result comes from an optimized uniform, so in that case we need to insert a MOV, like we do in non-uniform control flow. v2: fix ntq_store_dest for optimized uniforms. Fixes: `14af7b3085` ('broadcom/compiler: don't emit redundant ldunif') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	064b846949	broadcom/compiler: don't dump shader-db stats for failed shaders Shaders that fail register allocation were dumped with an instruction count of 0, so getting them to compile would show up as an instruction count regression. Also, the LOST/GAINED stats depend on us not dumping data for failed shaders, which is why we were always seeing 0/0 there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:02 +01:00
Iago Toral Quiroga	df6c19c1fd	broadcom/compiler: use a helper function to decide on TMU spilling As we add more compiler optimizations that can increase register pressure we may decide to disallow TMU spilling in more cases so it is probably better to move this to its own helper function. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:02 +01:00
Iago Toral Quiroga	14af7b3085	broadcom/compiler: don't emit redundant ldunif If we emit a new uniform and that uniform has already been emitted in the same block we can just reuse that. There is a balancing game here between reducing ldunif instructions and not increasing register pressure too much though, so we put a limit to how far back we are willing to look for a previous definition of the uniform. Based on shader-db results, 20 instructions produces best results. total instructions in shared programs: 14928266 -> 14907432 (-0.14%) instructions in affected programs: 6431841 -> 6411007 (-0.32%) helped: 15270 HURT: 10772 Instructions are helped. total uniforms in shared programs: 3944672 -> 3840276 (-2.65%) uniforms in affected programs: 1827184 -> 1722788 (-5.71%) helped: 30423 HURT: 845 Uniforms are helped. total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%) inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%) helped: 15287 HURT: 10852 Inst-and-stalls are helped. v2 (Eric): - consider ldunifrf too - check that no other instruction writes to the register Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov	7f61ff7b4d	broadcom/compiler: Merge instructions more efficiently Instructions are allowed to access up to two rf registers, or one rf register and a small immediate. This change allows qpu_merge_inst to take full advantage of this by allowint the merging of two instructions if they have no more than two different rf registers between them, or one rf register and one small immediate. qpu_merge_inst rewrites the instructions as needed to pack everything into raddr_a and raddr_b in the merged instruction. shader-db stats: total instructions in shared programs: 19938769 -> 18929664 (-5.06%) instructions in affected programs: 17929438 -> 16920333 (-5.63%) helped: 95008 HURT: 242 helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98% HURT stats (abs) min: 1 max: 2 x̄: 1.10 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54% 95% mean confidence interval for instructions value: -10.67 -10.52 95% mean confidence interval for instructions %-change: -5.37% -5.33% Instructions are helped. total max-temps in shared programs: 3122664 -> 3112446 (-0.33%) max-temps in affected programs: 124881 -> 114663 (-8.18%) helped: 5445 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1 helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67% 95% mean confidence interval for max-temps value: -1.91 -1.84 95% mean confidence interval for max-temps %-change: -9.12% -8.81% Max-temps are helped. total sfu-stalls in shared programs: 38028 -> 41231 (8.42%) sfu-stalls in affected programs: 6053 -> 9256 (52.92%) helped: 664 HURT: 3380 helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1 helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00% HURT stats (abs) min: 1 max: 4 x̄: 1.15 x̃: 1 HURT stats (rel) min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: 0.76 0.82 95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26% Sfu-stalls are HURT. total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%) inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%) helped: 95017 HURT: 245 helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95% HURT stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54% 95% mean confidence interval for inst-and-stalls value: -10.64 -10.48 95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31% Inst-and-stalls are helped. v2 (Iago): - moved early return for naddrs > 2 even earlier. - only update {add,mul}.b mux if instruction has more than one operand. - don't OR b->raddr_{a,b} if we are not merging add/mul instructions. - don't initialize packed to 0. - minor style fixes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>	2021-02-16 11:46:31 +00:00
Iago Toral Quiroga	82981ccbb1	broadcom/compiler: use unifa for UBO loads from uniform addresses This basically processes UBO loads as uniform loads by writing the load address to the unifa register and reading sequential values with ldunifa. This process is faster than going through the TMU, but we can only use it when the address we are reading from is uniform across all channels, since we are basically reading from the UBO address as if it was a uniform stream. This leads to better performance in the UE4 Shooter demo. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:22 +00:00
Iago Toral Quiroga	878555976e	broadcom/compiler: emit ldunifarf when needed Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator and ldunifarf writes to the register file. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	c2a04aca48	broadcom/compiler: do not DCE ldunifa ldunifa reads a uniform from the unifa address and updates the unifa address implicitly, so if we dead-code-eliminate one a follow-up ldunifa will not read from the appropriate address. We could avoid this if the compiler ensures that every ldunifa is paired with an explicit unifa, so for example if we are reading a vec4, we could emit: unifa (addrr) ldunifa unifa (addr+4) ldunifa unifa (addr+8) ldunifa unifa (addr+12) ldunifa instead of: unifa (addr) ldunifa ldunifa ldunifa ldunifa But since each unifa has a 3 delay slot before we can do ldunifa, that would end up being quite expensive. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	efc75e13ea	broadcom/compiler: disallow reading two uniforms in the same instruction The simulator asserts on this, which can happen if we merge a ldunif (or any other instruction that reads a uniform implicitly) and ldunifa in the same instruction. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	e8e4bdae8d	broadcom/compiler: ensure 3-slot delay between unifa and ldunifa Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	42880fdf5d	broadcom/compiler: preserve ordering of unifa/ldunifa sequences unifa writes the addresss from which follow-up ldunifa loads, and each ldunifa increments the unifa addeess by 32-bit so the loads need to be ordered too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	97c078488f	broadcom/compiler: disallow unifa overlap with thread switch/end Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	4b929ae9f0	broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware we are targetting. Our version resolution doesn't allow us to check for 4.2 versions lower than .14, but that is okay because the simulator would still validate this in any case. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	457ed5aa01	broadcom/compiler: name registers correctly based on V3D version So we can differentiate between TMU for V3D 4.x and UNIFA for V3D 4.x, which are aliased. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	f85fcaa494	broadcom/compiler: pass a devinfo to check if an instruction writes to TMU V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA (which isn't a TMU write address). This change passes a devinfo to any functions that need to do these checks so we can account for the target V3D version correctly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov	9909fe6bac	broadcom/compiler: Skip bool_to_cond where possible This change keeps track of when a boolean temp is loaded into the flags by a comparison instruction and uses that information to skip emitting instructions to set the flags in ntq_emit_bool_to_cond when the flags already have the right contents. total instructions in shared programs: 11116502 -> 11112225 (-0.04%) instructions in affected programs: 631691 -> 627414 (-0.68%) helped: 1591 HURT: 754 helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58% HURT stats (abs) min: 1 max: 19 x̄: 3.07 x̃: 2 HURT stats (rel) min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15% 95% mean confidence interval for instructions value: -2.02 -1.63 95% mean confidence interval for instructions %-change: -0.94% -0.71% Instructions are helped. total uniforms in shared programs: 3281555 -> 3281513 (<.01%) uniforms in affected programs: 1754 -> 1712 (-2.39%) helped: 10 HURT: 5 helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5 helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05% HURT stats (abs) min: 1 max: 15 x̄: 7.40 x̃: 3 HURT stats (rel) min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41% 95% mean confidence interval for uniforms value: -8.57 2.97 95% mean confidence interval for uniforms %-change: -7.35% 1.07% Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 1758419 -> 1758174 (-0.01%) max-temps in affected programs: 7006 -> 6761 (-3.50%) helped: 290 HURT: 14 helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88% HURT stats (abs) min: 1 max: 13 x̄: 6.00 x̃: 3 HURT stats (rel) min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12% 95% mean confidence interval for max-temps value: -1.03 -0.58 95% mean confidence interval for max-temps %-change: -6.24% -4.16% Max-temps are helped. total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%) sfu-stalls in affected programs: 1578 -> 1512 (-4.18%) helped: 257 HURT: 252 helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1 helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00% HURT stats (abs) min: 1 max: 2 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: -0.25 -0.01 95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33% Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%) inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%) helped: 1581 HURT: 755 helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59% HURT stats (abs) min: 1 max: 17 x̄: 3.17 x̃: 2 HURT stats (rel) min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20% 95% mean confidence interval for inst-and-stalls value: -2.06 -1.66 95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70% Inst-and-stalls are helped. Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Iago Toral Quiroga	bd0ef080d0	v3d/compiler: fix QPU scheduler TMU sequence shuffling The QPU scheduler allows to move certain TMU instructions around and since we enabled pipelining, we need to protect against the case where doing this might break a TMU sequence. For example, this test: dEQP-VK.rasterization.line_continuity.line-strip Was generating this VIR: mov tmud, t187 mov.pushz null, t176 mov.ifa tmua, t9 nop null; wrtmuc (img[0].p0 \| 0x0) mov tmut, t185 mov tmud, t180 mov.ifa tmusf, t183 nop null; thrsw where we have a general TMU access (tmud,tmua) followed by an image access (wrtmuc, tmut, tmud, tmusf), which the QPU scheduler was turning into: nop ; nop ; ldunifrf.rf22 (0xffffff00 / -nan) nop ; nop ; wrtmuc (img[0].p0 \| 0x0) nop ; nop ; ldtmu.r2 add r0, r2, 1 ; nop ; ldtmu.r3 nop ; nop ; ldtmu.r4 nop ; mov tmud, r0 nop ; mov.ifa tmua, rf15 nop ; mov tmut, r4 ; thrsw nop ; mov tmud, rf22 nop ; mov.ifa tmusf, r3 where it allowed the wrtmuc to move up and before the general TMU access, leading to an incorrect TMU sequence. Fix this by flagging TMUA writes (which are the sequence terminators for general TMU accessess) as writing new TMU configuration, like we do for all other TMU sequence terminators for textures and images. Fixes: `197090a3fc` ('broadcom/compiler: implement pipelining for general TMU operations') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8954>	2021-02-10 13:18:25 +00:00
Eric Anholt	bcb5f9f94a	v3d: Stop advertising support for flat shading. The GL frontend can lower this weird GL feature away for us. This should fix redeclaration of the gl_Color/SecondaryColor as centroid, since that case had been missed in the !flat special case here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	ff805f8ac7	v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED. The GL frontend can lower away this deprecated GL feature for us. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	2992dc7386	v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR. The GL frontend can lower away this deprecated GL feature for us. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	5ddc2f916f	v3d: Clean up vestiges of alpha test lowering. We had an unnecessary case in our uniforms upload switch statement, since we no longer advertise the cap. Fixes: `8ad931808e` ("v3d: do not report alpha-test as supported") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov	0b29a8a206	Revert "broadcom/compiler: improve generation of if conditions" This reverts commit `93f8f83a95`. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8903>	2021-02-08 06:52:59 +00:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: `1636184` -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	d57a358128	broadcom/compiler: log spilling shaders to perf output Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0f90b729fb	broadcom/compiler: disallow spilling if TMU pipelining was enabled TMU pipelining makes TMU spilling difficult and can easily lead to doing large amounts of spills to compile a shader. It is best to only use pipelining if we can compile without spilling. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	e18d6bbf2f	broadcom/compiler: disable TMU pipelining if we fail to register allocate TMU pipelining can severely reduce our capacity to emit TMU spills, causing us to fail to compile a shader we may otherwise be able to compile. This is because pipelining extends the liveness of TMU sequences by posponing the thread switch and LDTMU until a result is needed, and we can't emit TMU spills while in the middle of a TMU sequence. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	ecd654bf00	broadcom/compiler: support pipelining of image load/store instructions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0bdc6dca6c	broadcom/compiler: refactor image load/store TMU emission code This mostly moves code around to group together the code involved with actually emitting a TMU sequence. This will make it a bit easier to then implement pipelining while reusing this code, similar to how we handled other cases of TMU pipelining. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0e96f0f8cd	broadcom/compiler: prepare TMU spilling code to account for TMU pipelining Follow-up patches will implement support for TMU pipelining in the compiler, which basically means that we will be able to have more than one outstanding TMU operation. Our spilling code currently relies on properly identifying the end of a TMU sequence (since we can't emit a new TMU sequence for a spill in the middle of an existing TMU sequence), however, that code expects that only one TMU sequence may be outstanding, which won't be true once we implement pipelining. This change fixes the 'end of TMU sequence' checks to account for this in preparation for upcoming patches. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	3926030183	broadcom/compiler: fix indentation with TABs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov	93f8f83a95	broadcom/compiler: improve generation of if conditions Where it is safe to do so, avoid the generation of code to convert a condition code into a boolean which is then tested to generate a condition code. This is only done in uniform ifs, and only for condition values that are SSA and only used once (in that if statement). shader-db relative to MR 7726: total instructions in shared programs: 8985667 -> 8974151 (-0.13%) instructions in affected programs: 390140 -> 378624 (-2.95%) helped: 810 HURT: 276 helped stats (abs) min: 1 max: 49 x̄: 17.77 x̃: 16 helped stats (rel) min: 0.10% max: 33.63% x̄: 7.97% x̃: 6.45% HURT stats (abs) min: 1 max: 46 x̄: 10.42 x̃: 10 HURT stats (rel) min: 0.16% max: 21.54% x̄: 2.26% x̃: 2.03% 95% mean confidence interval for instructions value: -11.46 -9.75 95% mean confidence interval for instructions %-change: -5.76% -4.97% Instructions are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8709>	2021-02-02 06:55:49 +00:00
Arcady Goldmints-Orlov	8f583df7b6	broadcom/compiler: Enable PER_QUAD TMU access only in uniform control flow PER_QUAD TMU lookups will partially override the predication mask on TMU writes. If some but not all lanes in a quad are predicated out, setting PER_QUAD will force them all to be enabled. This can result in TMU access to bogus addresses when in nonuniform control flow. Also, since PER_QUAD is needed to make sure derivatives work with helper invocations, and derivatives are undefined in nonuniform control flow, there is no reason to leave it enabled in this case. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov	79bde75131	broadcom/compiler: Emit uniform loops using uniform control flow Similarly to if statements, uniform loops are now emitted without predication, using simple branches for breaks and continues. The uniformity of the loop is determined by running the nir_divergence_analysis pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov	6643bdbd53	broadcom/compiler: Use ANYA for branches in uniform ifs Using ANYAP instead of ALLAP makes things work correctly in cases where all lanes are masked out. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Caio Marcelo de Oliveira Filho	9f3d5e99ea	compiler: Use util/bitset.h for system_values_read It is currently a bitset on top of a uint64_t but there are already more than 64 values. Change to use BITSET to cover all the SYSTEM_VALUE_MAX bits. Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>	2021-01-26 20:20:47 +00:00
Alejandro Piñeiro	212b1516df	v3d/compiler: enable lower_add_sat NIR option We are enabling this option for the Vulkan driver, so it makes sense to enable it for the OpenGL one. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8582>	2021-01-20 12:41:52 +00:00
Christian Gmeiner	36e1c902b9	v3d: mark some variables static const Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>	2021-01-13 07:24:32 +00:00

1 2 3 4 5 ...

455 commits