fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 11:38:06 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	df6c19c1fd	broadcom/compiler: use a helper function to decide on TMU spilling As we add more compiler optimizations that can increase register pressure we may decide to disallow TMU spilling in more cases so it is probably better to move this to its own helper function. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:02 +01:00
Iago Toral Quiroga	14af7b3085	broadcom/compiler: don't emit redundant ldunif If we emit a new uniform and that uniform has already been emitted in the same block we can just reuse that. There is a balancing game here between reducing ldunif instructions and not increasing register pressure too much though, so we put a limit to how far back we are willing to look for a previous definition of the uniform. Based on shader-db results, 20 instructions produces best results. total instructions in shared programs: 14928266 -> 14907432 (-0.14%) instructions in affected programs: 6431841 -> 6411007 (-0.32%) helped: 15270 HURT: 10772 Instructions are helped. total uniforms in shared programs: 3944672 -> 3840276 (-2.65%) uniforms in affected programs: 1827184 -> 1722788 (-5.71%) helped: 30423 HURT: 845 Uniforms are helped. total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%) inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%) helped: 15287 HURT: 10852 Inst-and-stalls are helped. v2 (Eric): - consider ldunifrf too - check that no other instruction writes to the register Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov	7f61ff7b4d	broadcom/compiler: Merge instructions more efficiently Instructions are allowed to access up to two rf registers, or one rf register and a small immediate. This change allows qpu_merge_inst to take full advantage of this by allowint the merging of two instructions if they have no more than two different rf registers between them, or one rf register and one small immediate. qpu_merge_inst rewrites the instructions as needed to pack everything into raddr_a and raddr_b in the merged instruction. shader-db stats: total instructions in shared programs: 19938769 -> 18929664 (-5.06%) instructions in affected programs: 17929438 -> 16920333 (-5.63%) helped: 95008 HURT: 242 helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98% HURT stats (abs) min: 1 max: 2 x̄: 1.10 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54% 95% mean confidence interval for instructions value: -10.67 -10.52 95% mean confidence interval for instructions %-change: -5.37% -5.33% Instructions are helped. total max-temps in shared programs: 3122664 -> 3112446 (-0.33%) max-temps in affected programs: 124881 -> 114663 (-8.18%) helped: 5445 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1 helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67% 95% mean confidence interval for max-temps value: -1.91 -1.84 95% mean confidence interval for max-temps %-change: -9.12% -8.81% Max-temps are helped. total sfu-stalls in shared programs: 38028 -> 41231 (8.42%) sfu-stalls in affected programs: 6053 -> 9256 (52.92%) helped: 664 HURT: 3380 helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1 helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00% HURT stats (abs) min: 1 max: 4 x̄: 1.15 x̃: 1 HURT stats (rel) min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: 0.76 0.82 95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26% Sfu-stalls are HURT. total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%) inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%) helped: 95017 HURT: 245 helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95% HURT stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54% 95% mean confidence interval for inst-and-stalls value: -10.64 -10.48 95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31% Inst-and-stalls are helped. v2 (Iago): - moved early return for naddrs > 2 even earlier. - only update {add,mul}.b mux if instruction has more than one operand. - don't OR b->raddr_{a,b} if we are not merging add/mul instructions. - don't initialize packed to 0. - minor style fixes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>	2021-02-16 11:46:31 +00:00
Iago Toral Quiroga	82981ccbb1	broadcom/compiler: use unifa for UBO loads from uniform addresses This basically processes UBO loads as uniform loads by writing the load address to the unifa register and reading sequential values with ldunifa. This process is faster than going through the TMU, but we can only use it when the address we are reading from is uniform across all channels, since we are basically reading from the UBO address as if it was a uniform stream. This leads to better performance in the UE4 Shooter demo. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:22 +00:00
Iago Toral Quiroga	878555976e	broadcom/compiler: emit ldunifarf when needed Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator and ldunifarf writes to the register file. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	c2a04aca48	broadcom/compiler: do not DCE ldunifa ldunifa reads a uniform from the unifa address and updates the unifa address implicitly, so if we dead-code-eliminate one a follow-up ldunifa will not read from the appropriate address. We could avoid this if the compiler ensures that every ldunifa is paired with an explicit unifa, so for example if we are reading a vec4, we could emit: unifa (addrr) ldunifa unifa (addr+4) ldunifa unifa (addr+8) ldunifa unifa (addr+12) ldunifa instead of: unifa (addr) ldunifa ldunifa ldunifa ldunifa But since each unifa has a 3 delay slot before we can do ldunifa, that would end up being quite expensive. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	efc75e13ea	broadcom/compiler: disallow reading two uniforms in the same instruction The simulator asserts on this, which can happen if we merge a ldunif (or any other instruction that reads a uniform implicitly) and ldunifa in the same instruction. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	e8e4bdae8d	broadcom/compiler: ensure 3-slot delay between unifa and ldunifa Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	42880fdf5d	broadcom/compiler: preserve ordering of unifa/ldunifa sequences unifa writes the addresss from which follow-up ldunifa loads, and each ldunifa increments the unifa addeess by 32-bit so the loads need to be ordered too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	97c078488f	broadcom/compiler: disallow unifa overlap with thread switch/end Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	4b929ae9f0	broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware we are targetting. Our version resolution doesn't allow us to check for 4.2 versions lower than .14, but that is okay because the simulator would still validate this in any case. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	457ed5aa01	broadcom/compiler: name registers correctly based on V3D version So we can differentiate between TMU for V3D 4.x and UNIFA for V3D 4.x, which are aliased. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	f85fcaa494	broadcom/compiler: pass a devinfo to check if an instruction writes to TMU V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA (which isn't a TMU write address). This change passes a devinfo to any functions that need to do these checks so we can account for the target V3D version correctly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov	9909fe6bac	broadcom/compiler: Skip bool_to_cond where possible This change keeps track of when a boolean temp is loaded into the flags by a comparison instruction and uses that information to skip emitting instructions to set the flags in ntq_emit_bool_to_cond when the flags already have the right contents. total instructions in shared programs: 11116502 -> 11112225 (-0.04%) instructions in affected programs: 631691 -> 627414 (-0.68%) helped: 1591 HURT: 754 helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58% HURT stats (abs) min: 1 max: 19 x̄: 3.07 x̃: 2 HURT stats (rel) min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15% 95% mean confidence interval for instructions value: -2.02 -1.63 95% mean confidence interval for instructions %-change: -0.94% -0.71% Instructions are helped. total uniforms in shared programs: 3281555 -> 3281513 (<.01%) uniforms in affected programs: 1754 -> 1712 (-2.39%) helped: 10 HURT: 5 helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5 helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05% HURT stats (abs) min: 1 max: 15 x̄: 7.40 x̃: 3 HURT stats (rel) min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41% 95% mean confidence interval for uniforms value: -8.57 2.97 95% mean confidence interval for uniforms %-change: -7.35% 1.07% Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 1758419 -> 1758174 (-0.01%) max-temps in affected programs: 7006 -> 6761 (-3.50%) helped: 290 HURT: 14 helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88% HURT stats (abs) min: 1 max: 13 x̄: 6.00 x̃: 3 HURT stats (rel) min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12% 95% mean confidence interval for max-temps value: -1.03 -0.58 95% mean confidence interval for max-temps %-change: -6.24% -4.16% Max-temps are helped. total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%) sfu-stalls in affected programs: 1578 -> 1512 (-4.18%) helped: 257 HURT: 252 helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1 helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00% HURT stats (abs) min: 1 max: 2 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: -0.25 -0.01 95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33% Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%) inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%) helped: 1581 HURT: 755 helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59% HURT stats (abs) min: 1 max: 17 x̄: 3.17 x̃: 2 HURT stats (rel) min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20% 95% mean confidence interval for inst-and-stalls value: -2.06 -1.66 95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70% Inst-and-stalls are helped. Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Iago Toral Quiroga	bd0ef080d0	v3d/compiler: fix QPU scheduler TMU sequence shuffling The QPU scheduler allows to move certain TMU instructions around and since we enabled pipelining, we need to protect against the case where doing this might break a TMU sequence. For example, this test: dEQP-VK.rasterization.line_continuity.line-strip Was generating this VIR: mov tmud, t187 mov.pushz null, t176 mov.ifa tmua, t9 nop null; wrtmuc (img[0].p0 \| 0x0) mov tmut, t185 mov tmud, t180 mov.ifa tmusf, t183 nop null; thrsw where we have a general TMU access (tmud,tmua) followed by an image access (wrtmuc, tmut, tmud, tmusf), which the QPU scheduler was turning into: nop ; nop ; ldunifrf.rf22 (0xffffff00 / -nan) nop ; nop ; wrtmuc (img[0].p0 \| 0x0) nop ; nop ; ldtmu.r2 add r0, r2, 1 ; nop ; ldtmu.r3 nop ; nop ; ldtmu.r4 nop ; mov tmud, r0 nop ; mov.ifa tmua, rf15 nop ; mov tmut, r4 ; thrsw nop ; mov tmud, rf22 nop ; mov.ifa tmusf, r3 where it allowed the wrtmuc to move up and before the general TMU access, leading to an incorrect TMU sequence. Fix this by flagging TMUA writes (which are the sequence terminators for general TMU accessess) as writing new TMU configuration, like we do for all other TMU sequence terminators for textures and images. Fixes: `197090a3fc` ('broadcom/compiler: implement pipelining for general TMU operations') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8954>	2021-02-10 13:18:25 +00:00
Eric Anholt	bcb5f9f94a	v3d: Stop advertising support for flat shading. The GL frontend can lower this weird GL feature away for us. This should fix redeclaration of the gl_Color/SecondaryColor as centroid, since that case had been missed in the !flat special case here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	ff805f8ac7	v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED. The GL frontend can lower away this deprecated GL feature for us. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	2992dc7386	v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR. The GL frontend can lower away this deprecated GL feature for us. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	5ddc2f916f	v3d: Clean up vestiges of alpha test lowering. We had an unnecessary case in our uniforms upload switch statement, since we no longer advertise the cap. Fixes: `8ad931808e` ("v3d: do not report alpha-test as supported") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov	0b29a8a206	Revert "broadcom/compiler: improve generation of if conditions" This reverts commit `93f8f83a95`. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8903>	2021-02-08 06:52:59 +00:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: `1636184` -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	d57a358128	broadcom/compiler: log spilling shaders to perf output Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0f90b729fb	broadcom/compiler: disallow spilling if TMU pipelining was enabled TMU pipelining makes TMU spilling difficult and can easily lead to doing large amounts of spills to compile a shader. It is best to only use pipelining if we can compile without spilling. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	e18d6bbf2f	broadcom/compiler: disable TMU pipelining if we fail to register allocate TMU pipelining can severely reduce our capacity to emit TMU spills, causing us to fail to compile a shader we may otherwise be able to compile. This is because pipelining extends the liveness of TMU sequences by posponing the thread switch and LDTMU until a result is needed, and we can't emit TMU spills while in the middle of a TMU sequence. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	ecd654bf00	broadcom/compiler: support pipelining of image load/store instructions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0bdc6dca6c	broadcom/compiler: refactor image load/store TMU emission code This mostly moves code around to group together the code involved with actually emitting a TMU sequence. This will make it a bit easier to then implement pipelining while reusing this code, similar to how we handled other cases of TMU pipelining. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0e96f0f8cd	broadcom/compiler: prepare TMU spilling code to account for TMU pipelining Follow-up patches will implement support for TMU pipelining in the compiler, which basically means that we will be able to have more than one outstanding TMU operation. Our spilling code currently relies on properly identifying the end of a TMU sequence (since we can't emit a new TMU sequence for a spill in the middle of an existing TMU sequence), however, that code expects that only one TMU sequence may be outstanding, which won't be true once we implement pipelining. This change fixes the 'end of TMU sequence' checks to account for this in preparation for upcoming patches. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	3926030183	broadcom/compiler: fix indentation with TABs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov	93f8f83a95	broadcom/compiler: improve generation of if conditions Where it is safe to do so, avoid the generation of code to convert a condition code into a boolean which is then tested to generate a condition code. This is only done in uniform ifs, and only for condition values that are SSA and only used once (in that if statement). shader-db relative to MR 7726: total instructions in shared programs: 8985667 -> 8974151 (-0.13%) instructions in affected programs: 390140 -> 378624 (-2.95%) helped: 810 HURT: 276 helped stats (abs) min: 1 max: 49 x̄: 17.77 x̃: 16 helped stats (rel) min: 0.10% max: 33.63% x̄: 7.97% x̃: 6.45% HURT stats (abs) min: 1 max: 46 x̄: 10.42 x̃: 10 HURT stats (rel) min: 0.16% max: 21.54% x̄: 2.26% x̃: 2.03% 95% mean confidence interval for instructions value: -11.46 -9.75 95% mean confidence interval for instructions %-change: -5.76% -4.97% Instructions are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8709>	2021-02-02 06:55:49 +00:00
Arcady Goldmints-Orlov	8f583df7b6	broadcom/compiler: Enable PER_QUAD TMU access only in uniform control flow PER_QUAD TMU lookups will partially override the predication mask on TMU writes. If some but not all lanes in a quad are predicated out, setting PER_QUAD will force them all to be enabled. This can result in TMU access to bogus addresses when in nonuniform control flow. Also, since PER_QUAD is needed to make sure derivatives work with helper invocations, and derivatives are undefined in nonuniform control flow, there is no reason to leave it enabled in this case. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov	79bde75131	broadcom/compiler: Emit uniform loops using uniform control flow Similarly to if statements, uniform loops are now emitted without predication, using simple branches for breaks and continues. The uniformity of the loop is determined by running the nir_divergence_analysis pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov	6643bdbd53	broadcom/compiler: Use ANYA for branches in uniform ifs Using ANYAP instead of ALLAP makes things work correctly in cases where all lanes are masked out. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Caio Marcelo de Oliveira Filho	9f3d5e99ea	compiler: Use util/bitset.h for system_values_read It is currently a bitset on top of a uint64_t but there are already more than 64 values. Change to use BITSET to cover all the SYSTEM_VALUE_MAX bits. Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>	2021-01-26 20:20:47 +00:00
Alejandro Piñeiro	212b1516df	v3d/compiler: enable lower_add_sat NIR option We are enabling this option for the Vulkan driver, so it makes sense to enable it for the OpenGL one. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8582>	2021-01-20 12:41:52 +00:00
Christian Gmeiner	36e1c902b9	v3d: mark some variables static const Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>	2021-01-13 07:24:32 +00:00
Christian Gmeiner	9151dab967	v3d: update fallthrough comments Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>	2021-01-13 07:24:32 +00:00
Christian Gmeiner	4ec956a2b0	v3d: drop not use function parameter Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>	2021-01-13 07:24:32 +00:00
Daniel Schürmann	bd8e84eb8d	nir: replace .lower_sub with .has_fsub and .has_isub This allows a more fine-grained control about whether a backend supports one of these instructions. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>	2021-01-11 19:13:51 +00:00
Christian Gmeiner	66d51965af	v3d: use intrinsic builders Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8295>	2021-01-06 14:34:41 +00:00
Rob Clark	790144e65a	util+treewide: container_of() cleanup Replace mesa's slightly different container_of() with one more aligned to the linux kernel's version which takes a type as the 2nd param. This avoids warnings like: freedreno_context.c:396:44: warning: variable 'batch' is uninitialized when used within its own initialization [-Wuninitialized] At the same time, we can add additional build-time type-checking asserts Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7941>	2020-12-10 16:48:36 +00:00
Jason Ekstrand	630e54a08b	nir: Add a halt instruction type Halt is like a return for the entire shader or exit() if you prefer to think of it that way. Once an invocation hits a halt, it's 100% dead. Any writes to output variables which happened before the halt do, however, still apply. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:09 +00:00
Alejandro Piñeiro	429c336412	broadcom/compiler: separate texture/sampler info from v3d_key So far the v3d compiler has them combined, as for OpenGL both are the same. This change is intended to fit the v3d compiler better with Vulkan, where they are separate concepts. Note that NIR has them separate for a long time, both on nir_variable and on some NIR lowerings. v2: (from Iago feedback) * Use key->num_tex/sampler_used to iterate through the array * Fill up num_samplers_used on v3d, assert that is the same that num_tex_used if possible. v3: (Iago) * Assert num_tex/samplers_used is smaller that tex/sampler array size. v4: Update assert mentioned on v3 to use <= instead of < (detected by CI) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> squash! broadcom/compiler: separate texture/sampler info from v3d_key Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>	2020-11-14 15:59:02 +00:00
Arcady Goldmints-Orlov	a1a365e818	broadcom/compiler: Allow spills of temporaries from TMU reads Since spills and fills use the TMU, special care has to be taken to avoid putting one between a TMU setup instruction and the corresponding reads or writes. This change adds logic to move fills up and move spills down to avoid interrupting such sequences. This allows compiling 6 more programs from shader-db. Other stats: total spills in shared programs: 446 -> 446 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 606 -> 610 (0.66%) fills in affected programs: 38 -> 42 (10.53%) helped: 0 HURT: 2 total instructions in shared programs: 19330 -> 19363 (0.17%) instructions in affected programs: 3299 -> 3332 (1.00%) helped: 0 HURT: 5 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6606>	2020-11-09 20:45:58 +00:00
Juan A. Suarez Romero	1e723745dd	v3d/compiler: extend swapping R/B support to all vertex attributes So far the support for R/B swapping in vertex attributes were for the generic attributes. But there are cases like glSecondaryColorPointer() supporting BGRA formats that require the R/B swapping to be also allowed in the non-generic vertex attributes (in this case, in the COLOR1 attribute). v2: - Don't split line (Iago) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7196>	2020-11-05 12:15:28 +00:00
Arcady Goldmints-Orlov	0b30336906	broadcom/compiler: Handle non-SSA destinations for tex instructions The NIR that is given to the VIR compiler is not in SSA form, and so the v3d*_vir_emit_tex() functions must be able to handle both SSA and register destinations. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7318>	2020-11-05 09:03:46 +00:00
Alejandro Piñeiro	09b2bd1df9	broadcom/compiler: remove v3d_fs_key depth_enabled field. It is not used right now, so keeping it adds some noise/confusion. So far configuring Z test are done through the CFG_BITS. See v3dX(emit_state) at v3dx_emit.c for v3d, and pack_cfg_bits at v3dv_pipeline.c for v3dv. There flags like z_updates_enable and others are filled up. That key field seems like a leftover coming from using vc4 as reference, as that driver defines and uses a field with name name. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7421>	2020-11-03 10:55:08 +00:00
Iago Toral Quiroga	40788be134	v3d/compiler: fix BGRA vertex attributes for vec2/float size. We don't natively support BGRA format, instead we handle these as RGBA and we lower the loads to swap components R and B. However, the driver emits VPM loads based on the size of the input variables so when we have a vec2 or float BGRA input, it would only emit VPM loads for components 0 and 1, which is not correct since we emit a load of component 2 to swap with component 0. v2: handle GL legacy vertex inputs gracefully. Fixes: dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-highp-output-vec2 dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-mediump-output-vec2 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>	2020-10-23 09:19:02 +02:00

1 2 3 4 5 ...

443 commits