fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 15:58:06 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	df6c19c1fd	broadcom/compiler: use a helper function to decide on TMU spilling As we add more compiler optimizations that can increase register pressure we may decide to disallow TMU spilling in more cases so it is probably better to move this to its own helper function. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:02 +01:00
Iago Toral Quiroga	14af7b3085	broadcom/compiler: don't emit redundant ldunif If we emit a new uniform and that uniform has already been emitted in the same block we can just reuse that. There is a balancing game here between reducing ldunif instructions and not increasing register pressure too much though, so we put a limit to how far back we are willing to look for a previous definition of the uniform. Based on shader-db results, 20 instructions produces best results. total instructions in shared programs: 14928266 -> 14907432 (-0.14%) instructions in affected programs: 6431841 -> 6411007 (-0.32%) helped: 15270 HURT: 10772 Instructions are helped. total uniforms in shared programs: 3944672 -> 3840276 (-2.65%) uniforms in affected programs: 1827184 -> 1722788 (-5.71%) helped: 30423 HURT: 845 Uniforms are helped. total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%) inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%) helped: 15287 HURT: 10852 Inst-and-stalls are helped. v2 (Eric): - consider ldunifrf too - check that no other instruction writes to the register Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov	7f61ff7b4d	broadcom/compiler: Merge instructions more efficiently Instructions are allowed to access up to two rf registers, or one rf register and a small immediate. This change allows qpu_merge_inst to take full advantage of this by allowint the merging of two instructions if they have no more than two different rf registers between them, or one rf register and one small immediate. qpu_merge_inst rewrites the instructions as needed to pack everything into raddr_a and raddr_b in the merged instruction. shader-db stats: total instructions in shared programs: 19938769 -> 18929664 (-5.06%) instructions in affected programs: 17929438 -> 16920333 (-5.63%) helped: 95008 HURT: 242 helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98% HURT stats (abs) min: 1 max: 2 x̄: 1.10 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54% 95% mean confidence interval for instructions value: -10.67 -10.52 95% mean confidence interval for instructions %-change: -5.37% -5.33% Instructions are helped. total max-temps in shared programs: 3122664 -> 3112446 (-0.33%) max-temps in affected programs: 124881 -> 114663 (-8.18%) helped: 5445 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1 helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67% 95% mean confidence interval for max-temps value: -1.91 -1.84 95% mean confidence interval for max-temps %-change: -9.12% -8.81% Max-temps are helped. total sfu-stalls in shared programs: 38028 -> 41231 (8.42%) sfu-stalls in affected programs: 6053 -> 9256 (52.92%) helped: 664 HURT: 3380 helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1 helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00% HURT stats (abs) min: 1 max: 4 x̄: 1.15 x̃: 1 HURT stats (rel) min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: 0.76 0.82 95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26% Sfu-stalls are HURT. total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%) inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%) helped: 95017 HURT: 245 helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7 helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95% HURT stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 HURT stats (rel) min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54% 95% mean confidence interval for inst-and-stalls value: -10.64 -10.48 95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31% Inst-and-stalls are helped. v2 (Iago): - moved early return for naddrs > 2 even earlier. - only update {add,mul}.b mux if instruction has more than one operand. - don't OR b->raddr_{a,b} if we are not merging add/mul instructions. - don't initialize packed to 0. - minor style fixes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>	2021-02-16 11:46:31 +00:00
Alejandro Piñeiro	3f614c6f7c	v3dv/meta_copy: get tlb compatible BC compressed formats for copies So we can use the tlb path for several operations (copy image, clear, copy buffer to image, etc). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>	2021-02-12 22:04:13 +00:00
Alejandro Piñeiro	6fdf375a90	v3dv/formats: expose support for BC1-3 compressed formats Even though we can't expose textureCompressedBC as the hw doesn't support all the formats, we can expose as supported individual formats. This gets several ~850 CTS tests going from skip to pass, with patterns like: * dEQP-VK.texture.compressed.bc* * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2dbc * dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.3dbc * dEQP-VK.api.info.image_format_propertiesbc * etc v2: BC1-3 formats are texture filterable (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>	2021-02-12 22:04:13 +00:00
Alejandro Piñeiro	fcb229cbe0	v3dv/device: clarify that we can't expose textureCompressionBC From spec: "textureCompressionBC specifies whether all of the BC compressed texture formats are supported. If this feature is enabled" Note the all. v3d hw supports BC1, BC2, and BC3, but not BC4 through BC7. Let's clarify that we can't expose textureCompressionBC even if we support some of them. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8929>	2021-02-12 22:04:13 +00:00
Iago Toral Quiroga	82981ccbb1	broadcom/compiler: use unifa for UBO loads from uniform addresses This basically processes UBO loads as uniform loads by writing the load address to the unifa register and reading sequential values with ldunifa. This process is faster than going through the TMU, but we can only use it when the address we are reading from is uniform across all channels, since we are basically reading from the UBO address as if it was a uniform stream. This leads to better performance in the UE4 Shooter demo. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:22 +00:00
Iago Toral Quiroga	878555976e	broadcom/compiler: emit ldunifarf when needed Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator and ldunifarf writes to the register file. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	c2a04aca48	broadcom/compiler: do not DCE ldunifa ldunifa reads a uniform from the unifa address and updates the unifa address implicitly, so if we dead-code-eliminate one a follow-up ldunifa will not read from the appropriate address. We could avoid this if the compiler ensures that every ldunifa is paired with an explicit unifa, so for example if we are reading a vec4, we could emit: unifa (addrr) ldunifa unifa (addr+4) ldunifa unifa (addr+8) ldunifa unifa (addr+12) ldunifa instead of: unifa (addr) ldunifa ldunifa ldunifa ldunifa But since each unifa has a 3 delay slot before we can do ldunifa, that would end up being quite expensive. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	efc75e13ea	broadcom/compiler: disallow reading two uniforms in the same instruction The simulator asserts on this, which can happen if we merge a ldunif (or any other instruction that reads a uniform implicitly) and ldunifa in the same instruction. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	e8e4bdae8d	broadcom/compiler: ensure 3-slot delay between unifa and ldunifa Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	42880fdf5d	broadcom/compiler: preserve ordering of unifa/ldunifa sequences unifa writes the addresss from which follow-up ldunifa loads, and each ldunifa increments the unifa addeess by 32-bit so the loads need to be ordered too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	97c078488f	broadcom/compiler: disallow unifa overlap with thread switch/end Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	24db1a5112	broadcom/compiler: add a helper to check if an instruction writes unifa Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	4b929ae9f0	broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware we are targetting. Our version resolution doesn't allow us to check for 4.2 versions lower than .14, but that is okay because the simulator would still validate this in any case. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	457ed5aa01	broadcom/compiler: name registers correctly based on V3D version So we can differentiate between TMU for V3D 4.x and UNIFA for V3D 4.x, which are aliased. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	f85fcaa494	broadcom/compiler: pass a devinfo to check if an instruction writes to TMU V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA (which isn't a TMU write address). This change passes a devinfo to any functions that need to do these checks so we can account for the target V3D version correctly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Iago Toral Quiroga	449af48f42	broadcom/compiler: add V3D_QPU_WADDR_UNIFA This only exists in V3D 4.x and aliases V3D_QPU_WADDR_TMU from V3D 3.x. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov	9909fe6bac	broadcom/compiler: Skip bool_to_cond where possible This change keeps track of when a boolean temp is loaded into the flags by a comparison instruction and uses that information to skip emitting instructions to set the flags in ntq_emit_bool_to_cond when the flags already have the right contents. total instructions in shared programs: 11116502 -> 11112225 (-0.04%) instructions in affected programs: 631691 -> 627414 (-0.68%) helped: 1591 HURT: 754 helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58% HURT stats (abs) min: 1 max: 19 x̄: 3.07 x̃: 2 HURT stats (rel) min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15% 95% mean confidence interval for instructions value: -2.02 -1.63 95% mean confidence interval for instructions %-change: -0.94% -0.71% Instructions are helped. total uniforms in shared programs: 3281555 -> 3281513 (<.01%) uniforms in affected programs: 1754 -> 1712 (-2.39%) helped: 10 HURT: 5 helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5 helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05% HURT stats (abs) min: 1 max: 15 x̄: 7.40 x̃: 3 HURT stats (rel) min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41% 95% mean confidence interval for uniforms value: -8.57 2.97 95% mean confidence interval for uniforms %-change: -7.35% 1.07% Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 1758419 -> 1758174 (-0.01%) max-temps in affected programs: 7006 -> 6761 (-3.50%) helped: 290 HURT: 14 helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88% HURT stats (abs) min: 1 max: 13 x̄: 6.00 x̃: 3 HURT stats (rel) min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12% 95% mean confidence interval for max-temps value: -1.03 -0.58 95% mean confidence interval for max-temps %-change: -6.24% -4.16% Max-temps are helped. total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%) sfu-stalls in affected programs: 1578 -> 1512 (-4.18%) helped: 257 HURT: 252 helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1 helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00% HURT stats (abs) min: 1 max: 2 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: -0.25 -0.01 95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33% Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%) inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%) helped: 1581 HURT: 755 helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59% HURT stats (abs) min: 1 max: 17 x̄: 3.17 x̃: 2 HURT stats (rel) min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20% 95% mean confidence interval for inst-and-stalls value: -2.06 -1.66 95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70% Inst-and-stalls are helped. Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Iago Toral Quiroga	bd0ef080d0	v3d/compiler: fix QPU scheduler TMU sequence shuffling The QPU scheduler allows to move certain TMU instructions around and since we enabled pipelining, we need to protect against the case where doing this might break a TMU sequence. For example, this test: dEQP-VK.rasterization.line_continuity.line-strip Was generating this VIR: mov tmud, t187 mov.pushz null, t176 mov.ifa tmua, t9 nop null; wrtmuc (img[0].p0 \| 0x0) mov tmut, t185 mov tmud, t180 mov.ifa tmusf, t183 nop null; thrsw where we have a general TMU access (tmud,tmua) followed by an image access (wrtmuc, tmut, tmud, tmusf), which the QPU scheduler was turning into: nop ; nop ; ldunifrf.rf22 (0xffffff00 / -nan) nop ; nop ; wrtmuc (img[0].p0 \| 0x0) nop ; nop ; ldtmu.r2 add r0, r2, 1 ; nop ; ldtmu.r3 nop ; nop ; ldtmu.r4 nop ; mov tmud, r0 nop ; mov.ifa tmua, rf15 nop ; mov tmut, r4 ; thrsw nop ; mov tmud, rf22 nop ; mov.ifa tmusf, r3 where it allowed the wrtmuc to move up and before the general TMU access, leading to an incorrect TMU sequence. Fix this by flagging TMUA writes (which are the sequence terminators for general TMU accessess) as writing new TMU configuration, like we do for all other TMU sequence terminators for textures and images. Fixes: `197090a3fc` ('broadcom/compiler: implement pipelining for general TMU operations') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8954>	2021-02-10 13:18:25 +00:00
Alejandro Piñeiro	f758b1a25b	v3dv: support for depthBiasClamp Gets tests like the following working: dEQP-VK.dynamic_state.rs_state.depth_bias_clamp Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8928>	2021-02-10 10:29:09 +00:00
Eric Anholt	bcb5f9f94a	v3d: Stop advertising support for flat shading. The GL frontend can lower this weird GL feature away for us. This should fix redeclaration of the gl_Color/SecondaryColor as centroid, since that case had been missed in the !flat special case here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	ff805f8ac7	v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED. The GL frontend can lower away this deprecated GL feature for us. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	2992dc7386	v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR. The GL frontend can lower away this deprecated GL feature for us. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	5ddc2f916f	v3d: Clean up vestiges of alpha test lowering. We had an unnecessary case in our uniforms upload switch statement, since we no longer advertise the cap. Fixes: `8ad931808e` ("v3d: do not report alpha-test as supported") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov	9e1aa23448	v3dv: initialize render_fd at the top of physical_device_init This fixes an uninitialized variable warning. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8902>	2021-02-09 06:45:41 +00:00
Iago Toral Quiroga	8eeb61a3bf	v3dv: add a perf trace when a device is created with robust buffer access Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>	2021-02-08 13:00:16 +00:00
Iago Toral Quiroga	e6f8202749	v3dv: serialize pipeline compilation when debugging shaders It is possible to compile pipelines in multiple threads, but when we are dumping debug information for shaders, we want all the outputs serialized so we can make sense of it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>	2021-02-08 13:00:16 +00:00
Iago Toral Quiroga	44dcc4c24d	v3d/common: use spaces instead of TABs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8913>	2021-02-08 13:00:16 +00:00
Arcady Goldmints-Orlov	0b29a8a206	Revert "broadcom/compiler: improve generation of if conditions" This reverts commit `93f8f83a95`. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8903>	2021-02-08 06:52:59 +00:00
Iago Toral Quiroga	c72d99550c	v3dv: allow a component swizzle in copy_buffer_to_image_shader This is trivial because this path relies on our blit_shader interface which supports this already, so it just needs to pass it along. I don't think this is ever triggered practice, since we should be able to handle any case that could require this with the texel buffer path, but at least it allows us to simplify a bit the code. Tested by manually disabling the priority paths to ensure we exercise component swizzles with this path. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8875>	2021-02-05 13:31:25 +01:00
Iago Toral Quiroga	4d4a0797ce	v3dv: batch copies in the copy_buffer_to_image_blit path This path is very memory hungry and batching allows us to reduce this by allocating memory just once and reuse it for all regions in the batch instead of allocating once per region. v2: document return value for this function (apinheiro). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8875>	2021-02-05 13:31:25 +01:00
Iago Toral Quiroga	7aa04ad065	v3dv: handle D/S buffer to image copies with the texel buffer path We do this by converting them to a compatible color copy and using a destination color mask as well as a source component swizzle to handle D24 format semantics according to the V3D hardware requirements, similar to what we do with our blit shader interface. This path is faster than the terrible copy_buffer_to_image_blit, which requires to copy the source buffer to a tiled image first and should be avoided as much as possible, since it is slow and can also quickly increase device memory usage. This fixes occasional OOM errors when loading traces in renderdoc. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8875>	2021-02-05 13:31:25 +01:00
Jason Ekstrand	0260b4a7e7	vulkan: Add a common helper for enumerating instance extension properties Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8792>	2021-02-04 20:02:12 +00:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: `1636184` -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	d57a358128	broadcom/compiler: log spilling shaders to perf output Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0f90b729fb	broadcom/compiler: disallow spilling if TMU pipelining was enabled TMU pipelining makes TMU spilling difficult and can easily lead to doing large amounts of spills to compile a shader. It is best to only use pipelining if we can compile without spilling. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	e18d6bbf2f	broadcom/compiler: disable TMU pipelining if we fail to register allocate TMU pipelining can severely reduce our capacity to emit TMU spills, causing us to fail to compile a shader we may otherwise be able to compile. This is because pipelining extends the liveness of TMU sequences by posponing the thread switch and LDTMU until a result is needed, and we can't emit TMU spills while in the middle of a TMU sequence. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	ecd654bf00	broadcom/compiler: support pipelining of image load/store instructions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0bdc6dca6c	broadcom/compiler: refactor image load/store TMU emission code This mostly moves code around to group together the code involved with actually emitting a TMU sequence. This will make it a bit easier to then implement pipelining while reusing this code, similar to how we handled other cases of TMU pipelining. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0e96f0f8cd	broadcom/compiler: prepare TMU spilling code to account for TMU pipelining Follow-up patches will implement support for TMU pipelining in the compiler, which basically means that we will be able to have more than one outstanding TMU operation. Our spilling code currently relies on properly identifying the end of a TMU sequence (since we can't emit a new TMU sequence for a spill in the middle of an existing TMU sequence), however, that code expects that only one TMU sequence may be outstanding, which won't be true once we implement pipelining. This change fixes the 'end of TMU sequence' checks to account for this in preparation for upcoming patches. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	3926030183	broadcom/compiler: fix indentation with TABs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov	93f8f83a95	broadcom/compiler: improve generation of if conditions Where it is safe to do so, avoid the generation of code to convert a condition code into a boolean which is then tested to generate a condition code. This is only done in uniform ifs, and only for condition values that are SSA and only used once (in that if statement). shader-db relative to MR 7726: total instructions in shared programs: 8985667 -> 8974151 (-0.13%) instructions in affected programs: 390140 -> 378624 (-2.95%) helped: 810 HURT: 276 helped stats (abs) min: 1 max: 49 x̄: 17.77 x̃: 16 helped stats (rel) min: 0.10% max: 33.63% x̄: 7.97% x̃: 6.45% HURT stats (abs) min: 1 max: 46 x̄: 10.42 x̃: 10 HURT stats (rel) min: 0.16% max: 21.54% x̄: 2.26% x̃: 2.03% 95% mean confidence interval for instructions value: -11.46 -9.75 95% mean confidence interval for instructions %-change: -5.76% -4.97% Instructions are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8709>	2021-02-02 06:55:49 +00:00
Jason Ekstrand	f2545f22f4	vulkan: Drop the type_prefix parameter from gen_extensions Now that all the drivers are converted, it's set to 'vk' by everyone so there's no point in having the parameter. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:25 +00:00
Jason Ekstrand	bafd0c680d	vulkan: Rework vk_device_init and friends Now that all drivers are converted over, we can make a few changes. First off, vk_device_init no longer takes two separate allocators because we can assume that the parent instance is non-null and it can pull the instance allocator from that. Second, dispatch tables and the instance extension table are no longer optional. We leave the device extension table optional for now because we don't do any verification at vk_init_physical_device time and some drivers find it more convenient to set the extensions later in their own physical_device_init for various reasons. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:25 +00:00
Jason Ekstrand	7fe36c1187	v3dv: Switch to the common VK_EXT_debug_report Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:24 +00:00
Jason Ekstrand	9933b188d2	v3dv: Use common entrypoints for VK_EXT_private_data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:24 +00:00

1 2 3 4 5 ...

1297 commits