fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-24 08:28:16 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	1174f37609	broadcom/compiler: avoid using ldvary sequence to hide latency of branching This can cause us to stomp the contents of r5 before we have a chance to read it, like this: 0x3d103186bb800000 nop ; nop ; ldvary.r0 0x3d105686bbf40000 nop ; mov rf26, r5 ; ldvary.r1 0x020000ef0000d000 bu.allna 232, r:unif (0x0000001c / 0.000000) 0x3d1096c6bbf40000 nop ; mov rf27, r5 ; ldvary.r2 Here, the MOV in the last instruction is supposed to read r5 produced from ldvary.r0, but because we have inserted the bu instruction in between now that read happens at the same time that ldvary.r1 updates r5, stomping the value we were supposed to read. Fix this by disallowing injection of a branch instruction in between an ldvary instruction and its write to the r5 register 2 instructions later. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7062 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19616>	2022-11-09 20:51:25 +00:00
Iago Toral Quiroga	c7150ad8e6	broadcom/compiler: drop unused v3d_compile parameter for nir pass Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19519>	2022-11-04 09:58:10 +00:00
Iago Toral Quiroga	8cd50ef071	broadcom/compiler: handle vec2 load/store index In vulkan, we load descriptors via vulkan resource index, which returns a vec2, of which we want component 0 which holds the actual index. Typically, this will be cleaned-up by the time we get to emitting VIR so the index is a single scalar component, but there are some cases where this might no be the case, so make sure we don't assume it to be a scalar, like we do in other places. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>	2022-10-28 08:23:32 +02:00
Iago Toral Quiroga	24d9a80247	v3dv: implement VK_EXT_pipeline_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>	2022-10-27 08:17:11 +00:00
Iago Toral Quiroga	1a2ca58aed	v3dv: use NIR_PASS with v3d_nir_lower_robust_image_access Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>	2022-10-27 08:17:11 +00:00
Alejandro Piñeiro	019529aa11	broadcom/compiler: call nir_opt_gcm with a custom strategy nir_opt_gcm get us worse shader-db stats, but that is expected. But we want to prevent to get worse values on spill/fills. Analyzing the outcome with shader-db, this mostly happen with shaders that are already complex, and are already spilling/filling. So the best option here is adding a new strategy, that fall backs if we get spill/fill using nir_opt_gcm. It is not clear in which order we should disable gcm. For now we disable it before loop unrolling. We get a slight performance gain (in average) using nir_opt_gcm. We don't show the shaderdb stats, as they are worse, but as mentioned, this is expected. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	afc6de356a	broadcom/compiler: pass a strategy struct to vir_compile_init That allows to reduce the number of parameters of the method. And after all, they were already filled using an existing strategy struct. This would make easier adding new fields on a strategy. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	0bf31b0710	broadcom/compiler: add more lowerings/optimizations on v3d_optimize_nir Optimizations that we are already calling on the Vulkan driver. As preparation to the Vulkan frontend to use v3d_optimize_nir too. We need to add a new parameter to v3d_optimize_nir in order to know if we can call nir_opt_find_array_copies. As we don't track if we are calling nir_var_lower_copies, we explicitly call it when we create the uncompiled shader create. So instead of tracking, we assume that each driver (v3d/v3dv) would call it when the shader is created. So when v3d_optimize_nir is called as part of the process to compile it at the compiler, we call it with allow_copies as false. We exclude on purpose nir_opt_gcm as it is a case of a optimization that could help performance even if it hurts shader db stats. shaderdb stats: total instructions in shared programs: 11705923 -> 11705034 (<.01%) instructions in affected programs: 88350 -> 87461 (-1.01%) helped: 201 HURT: 80 Instructions are helped. total threads in shared programs: 375552 -> 375558 (<.01%) threads in affected programs: 6 -> 12 (100.00%) helped: 3 HURT: 0 total uniforms in shared programs: 3486108 -> 3485789 (<.01%) uniforms in affected programs: 7473 -> 7154 (-4.27%) helped: 90 HURT: 1 Uniforms are helped. total max-temps in shared programs: 2021860 -> 2021802 (<.01%) max-temps in affected programs: 800 -> 742 (-7.25%) helped: 21 HURT: 3 Max-temps are helped. total sfu-stalls in shared programs: 19299 -> 19296 (-0.02%) sfu-stalls in affected programs: 18 -> 15 (-16.67%) helped: 10 HURT: 7 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 11725222 -> 11724330 (<.01%) inst-and-stalls in affected programs: 88402 -> 87510 (-1.01%) helped: 201 HURT: 80 Inst-and-stalls are helped. total nops in shared programs: 269674 -> 269386 (-0.11%) nops in affected programs: 3641 -> 3353 (-7.91%) helped: 103 HURT: 29 Nops are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	9cbc3ab239	broadcom/compiler: update how we compute return_words_of_texture_data on non-ssa For the non-ssa case, we were trying to use reg->num_components. But this is not the same that nir_ssa_def_components_read. It is the number of components of the destination register. And in the 16bit case, even if nir_lower_tex packs the outcome, it doesn't update the number of components, as nir_tex_instr_dest_size would still return 4. And nir validate would check that those values are the same. So this change focuses on the last part of this comment at nir_lower_tex: * Note that we don't change the destination num_components, because * nir_tex_instr_dest_size() will still return 4. The driver is just * expected to not store the other channels, given that nothing at the * NIR level will read them. We just limit how many channels we would use for the f16 case. It is also worth to note, based on the CTS and different applications we test, that this is a corner case. This was detected when we experimented to enable nir_opt_gcm for v3d, that lead to raise an assertion slightly below with some shaderdb tests, but technically it could happen without it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	ec10a37a52	broadcom/compiler: don't call nir_opt_load_store_vectorize on all v3d_optimize_nir calls For compute shaders, to avoid a crash with that optimization, it requires doing some optimizations and lowerings before. Example: static void lower_cs_shared(struct nir_shader *nir) { NIR_PASS_V(nir, nir_lower_vars_to_explicit_types, nir_var_mem_shared, shared_type_info); NIR_PASS_V(nir, nir_lower_explicit_io, nir_var_mem_shared, nir_address_format_32bit_offset); } In the same way other drivers (like anv) calls nir_opt_load_store_vectorize as part of their post-process-nir. So one option would be to move nir_opt_load_store_vectorize outsize the common v3d_nir_optimize, to a post-process nir method. To make things simpler, this change calls that optimization only if we have a v3d_compiler object, that is when each frontend has already done their lowerings, and call the v3d_compiler to get the final assembly (so we are already on a kind of post processing nir step). This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we start to call v3d_optimize_nir on v3dv directly. Slight shaderdb changes, but not significant. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Iago Toral Quiroga	b6093ffbe7	v3dv: expose VK_EXT_image_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	c7e022abfd	broadcom/compiler: add a lowering for robust image access Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	adcfd9bc2f	broadcom/compiler: rename static helpers involved with robust buffer access To make it explicit that they involve buffers, since we will be adding robust image access shortly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	5e5eaa3f1a	broadcom/compiler: rename v3d_nir_lower_robust_buffer_access.c We are going to add code to handle image robustness shortly, so better rename this to v3d_nir_lower_robust_access.c Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	4ea916f704	broadcom/compiler: don't apply robust buffer access to shared variables This feature is only concerned with buffers bound through a descriptor set. We are still keeping the code for this (disabled by default) since it may be useful for debugging some scenarios. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	44b02b5cb1	broadcom/compiler: handle shared stores with robust buffer access For some reason we supported all shared intrinsics but this one. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	b2bce9c98a	broadcom/compiler: fix robust buffer access Our implemention was bogus, it was only putting a cap on the offset based on the aligned buffer size and this doesn't ensure the access to the buffer happens within its valid range. I think the only reason we have been passing the tests is that we align all buffers sizes to 256B and the tests create buffers with a size that is smaller than that (like 64B). When get the size of the buffer from the shader, we get the actual bound range (so 64B in this case) and by capping to that we don't ensure the access will stay within that range, but we ensure it will stay within the underlying memory bound to the buffer (256B), and this is fine by the spec, however, I think if the actual buffer range was the same as the underlying allocation we would fail the tests. A valid behavior for robust buffer access on an out-of-bounds access is to return any valid bytes within the buffer, so we can just make that offset 0. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	15cdf5bb48	v3dv: optimize ldunif load into unifa write If we emit a ldunif to load the ubo/ssbo base address and then we are immediately moving it to the unifa register we can have the ldunif write directly to unifa and avoid the mov in between, which won't be done by copy propagation because that only works with temp registers. Also, since we can't read from unifa we must be careful to disallow reuse of the ldunif result for a future ldunif of the same base address. We do that by only reusing ldunif results from temp registers. total instructions in shared programs: 12468943 -> 12455139 (-0.11%) instructions in affected programs: 1661233 -> 1647429 (-0.83%) helped: 8307 HURT: 3994 total uniforms in shared programs: 3704532 -> 3704522 (<.01%) uniforms in affected programs: 339 -> 329 (-2.95%) helped: 7 HURT: 0 total max-temps in shared programs: 2148158 -> 2148290 (<.01%) max-temps in affected programs: 9320 -> 9452 (1.42%) helped: 175 HURT: 295 total spills in shared programs: 2202 -> 2202 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 3059 -> 3057 (-0.07%) fills in affected programs: 27 -> 25 (-7.41%) helped: 1 HURT: 0 total sfu-stalls in shared programs: 21167 -> 21056 (-0.52%) sfu-stalls in affected programs: 497 -> 386 (-22.33%) helped: 209 HURT: 127 total inst-and-stalls in shared programs: 12490110 -> 12476195 (-0.11%) inst-and-stalls in affected programs: 1662875 -> 1648960 (-0.84%) helped: 8312 HURT: 3987 total nops in shared programs: 316563 -> 313553 (-0.95%) nops in affected programs: 24269 -> 21259 (-12.40%) helped: 2158 HURT: 1006 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Iago Toral Quiroga	cbc5169ef9	broadcom/compiler: check signal writes to magic regs when updating scoreboard We have only been checking magic writes from ADD and MUL ports, but signals can potentially write to magic registers too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Eric Engestrom	5bfca00d31	broadcom: fix dependencies in static_library() calls The first argument is the name of the library, and the second argument is the list of files; those two got a bit mixed up. Fixes: `1ae8018a6a` ("meson: Add support for the vc4 driver.") Fixes: `4f3e380fa0` ("meson: Add support for the vc5 driver.") Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18593>	2022-09-14 09:38:28 +00:00
Iago Toral Quiroga	ca33c319e5	v3dv: implement VK_KHR_zero_initialize_workgroup_memory This only requires that we call the relevant lowering pass in NIR. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18312>	2022-08-31 07:33:19 +02:00
Eric Engestrom	e767f54f28	v3d: introduce V3D_DBG() macro to make V3D_DEBUG checks consistent The main issue was the inconsistent use of `unlikely()`, but the macro also simplifies the code a little bit. Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18086>	2022-08-24 23:03:57 +00:00
Iago Toral Quiroga	87a9951073	broadcom/compiler: track number of TMU operations in prog data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>	2022-08-15 23:35:16 +00:00
Iago Toral Quiroga	8ecea47f06	broadcom/compiler: simplify code emitted for centroid coordinates Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17909>	2022-08-06 22:34:25 +00:00
Iago Toral Quiroga	20591573f1	broadcom/compiler: use nir_opt_idiv_const total instructions in shared programs: 12463625 -> 12463571 (<.01%) instructions in affected programs: 1758 -> 1704 (-3.07%) helped: 12 HURT: 0 total uniforms in shared programs: 3704589 -> 3704591 (<.01%) uniforms in affected programs: 17 -> 19 (11.76%) helped: 0 HURT: 1 total max-temps in shared programs: 2148088 -> 2148138 (<.01%) max-temps in affected programs: 170 -> 220 (29.41%) helped: 0 HURT: 10 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>	2022-08-05 09:28:22 +00:00
Iago Toral Quiroga	73e8fc3efb	broadcom/compiler: don't use imprecise_32bit_lowering for idiv lowering This is known to produce bogus results for certain combinations of operands, so don't use it. See this issue for details: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6555 With this change, the idiv lowering will produce mul_high instructions, so we need to instruct the compiler to lower those with the ALU lowering right after the idiv lowering by adding the lower_mul_high option (we only need to add this to V3D, since V3DV already had it set). This will cause injection of uadd_carry instructions, for which we have backend implementations that produce better code for us than the NIR lowering. total instructions in shared programs: 12457692 -> 12463625 (0.05%) instructions in affected programs: 23115 -> 29048 (25.67%) helped: 0 HURT: 111 total threads in shared programs: 416372 -> 416368 (<.01%) threads in affected programs: 8 -> 4 (-50.00%) helped: 0 HURT: 2 total uniforms in shared programs: 3704067 -> 3704589 (0.01%) uniforms in affected programs: 5804 -> 6326 (8.99%) helped: 2 HURT: 109 total max-temps in shared programs: 2147845 -> 2148088 (0.01%) max-temps in affected programs: 2456 -> 2699 (9.89%) helped: 6 HURT: 91 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>	2022-08-05 09:28:22 +00:00
Alejandro Piñeiro	efc827ceea	v3d/v3dv: use NIR_PASS(_ Instead of NIR_PASS_V, when possible. This was done recently on anv (see commit `ce60195ec` and MR#17014) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	0a50330c3d	broadcom/compiler: make several passes to return a progress Two advantages: * When using NIR_DEBUG=nir_print_xx, will print outcome only if there is a change * We can use NIR_PASS(_, ...) instead of NIR_PASS_V, that has slightly more validation checks. This includes: * v3d_nir_lower_image_load_store * v3d_nir_lower_io * v3d_nir_lower_line_smooth * v3d_nir_lower_load_store_bitsize * v3d_nir_lower_robust_buffer_access * v3d_nir_lower_scratch * v3d_nir_lower_txf_ms As we are here we also simplify some of them by using the nir_shader_instructions_pass helper. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	81ca0b4191	broadcom/compiler: removed unused function It is not even implemented. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	d8fee4cdaa	broadcom/compiler: use NIR_PASS for nir_lower_vars_to_ssa at v3d_optimize_nir There's no reason to not take into account progress at that point. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:24 +00:00
Alejandro Piñeiro	dea0fe8a06	broadcom/compiler: wrap nir_convert_to_lcssa with NIR_PASS_V So we get it included with the NIR_DEBUG=print_xx debug options. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:24 +00:00
Iago Toral Quiroga	90054e9c5d	broadcom/compiler: track if a shader uses global intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>	2022-07-19 09:47:34 +02:00
Iago Toral Quiroga	fa03d9c8be	broadcom/compiler: implement 2x32 global intrinsics Notice we ignore the high 32-bit component of the address because we know it must be 0. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>	2022-07-19 09:47:34 +02:00
Iago Toral Quiroga	871a7536e8	broadcom/compiler: don't over-estimate latency of TMU instructions Over-estimating latency can cause us to delay the critical paths of the shader unnecessarily, producing larger QPU programs that take more time to execute as a result (and it also adds register pressure) so striking a balance is important. The thread switching model in V3D is quite effective at hiding latency and usuallly we just need to hint it to delay TMU instructions a little bit to find the best compromise for performance. The new latency numbers have been chosen empirically by testing V3DV with Sponza and a few UE4 samples. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>	2022-07-11 10:34:58 +00:00
Iago Toral Quiroga	f227aa7c98	broadcom/compiler: don't try to hide TMU latency at QPU scheduling Based on empirical testing with Sponza and a few UE4 samples this is consistently slightly benefitial for performance. The most likely reason why this helps is that thrsw is probably already quite effective at hiding latency and we are already trying to hide latency at NIR scheduling and also via TMU pipelining, so piling up on this when scheduling QPU typically ends up providing no benefit at all for latency and is instead possibly preventing us to unblock critical paths in the shader that depend on the TMU result, requiring us to execute more cycles to complete the program. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>	2022-07-11 10:34:58 +00:00
Iago Toral Quiroga	152fc4fd28	v3dv: don't lower uadd_carry and usub_borrow We can produce slightly better code for these in the backend, so do that. For this we need to: 1. Fix our implementation of uadd_carry (which wasn't used) to return an integer instead of a boolean value. 2. Add an implementation of usub_borrow. Notice these are only used in Vulkan. In GL these instructions are always unconditionally lowered by the state tracker in GLSL IR so we never get to see them in the backend. Shader-db stats from a collection of Vulkan samples: total instructions in shared programs: 122351 -> 122345 (<.01%) instructions in affected programs: 196 -> 190 (-3.06%) helped: 2 HURT: 0 total uniforms in shared programs: 18670 -> 18672 (0.01%) uniforms in affected programs: 59 -> 61 (3.39%) helped: 0 HURT: 2 total max-temps in shared programs: 13145 -> 13147 (0.02%) max-temps in affected programs: 27 -> 29 (7.41%) helped: 0 HURT: 2 total inst-and-stalls in shared programs: 123052 -> 123046 (<.01%) inst-and-stalls in affected programs: 197 -> 191 (-3.05%) helped: 2 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17372>	2022-07-07 09:16:24 +00:00
Iago Toral Quiroga	cfccd93efc	broadcom/compiler: don't predicate postponed spills The postponed spill is predicated using the condition from the last write, but this is only correct if the register was only written once in the TMU sequence, or if it is always written with the same predication. While we could try to track whether this is the case or not, it would make the postponed spill path even more complex than it already is, so let's just avoid predicating these. We are already discouraging TMU spilling of registers in the middle of TMU sequences, so this should not be a very common case. Cc: mesa-stable Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17201>	2022-06-28 05:49:51 +00:00
Iago Toral Quiroga	98420408d0	broadcom/compiler: fix postponed TMU spills with multiple writes If we are spilling a register that is used in the middle of a TMU sequence, we postpone the spill until the TMU sequence finishes, at which point we inject the spill and rewrite the original instruction to write to the new temp. However, this doesn't work if the register is written multiple times during the TMU sequence. In that scenario, we need to ensure that all writes are rewritten to use the new temp, not just the last one. Cc: mesa-stable Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17201>	2022-06-28 05:49:51 +00:00
Iago Toral Quiroga	a97f78eb14	broadcom/compiler: disable flags optimization for loop conditions This is not safe because it may skip regenerating the flags for the loop condition in the loop continue block and these flags may be stomped in the loop body by other conditionals. Fixes: `9909fe6ba` ('broadcom/compiler: Skip bool_to_cond where possible') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>	2022-06-14 11:30:33 +00:00
Erik Faye-Lund	873ec432b3	broadcom/compiler: use macro for power-of-two check This will allow the use of static_assert here instead of our compiler-specific implementation. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>	2022-06-03 07:14:43 +00:00
Iago Toral Quiroga	b90d7b9b38	broadcom/compiler: don't promote early fragment tests when writing sample mask If the sample mask is being written it means we want to discard some of the samples generated so we should not be promoting the fragment shader to do early tests, since that would not take into account the sample mask written from the shader. Fixes: dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16626>	2022-05-20 13:04:32 +00:00
Iago Toral Quiroga	487c213142	v3d/compiler: add more stats to prog_data So we can expose them via VK_KHR_pipeline_executable_properties. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>	2022-05-09 12:12:35 +00:00
Emma Anholt	536c8ee96d	nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional. This controls the whole lowering of "make tex ops with implicit derivatives on non-implicit-derivative stages be tex ops with an explicit lod of 0 instead", but it's really hard to describe that in a git commit summary. All existing callers get it added except: - nir_to_tgsi which didn't want it. - nouveau, which didn't want it (fixes regressions in shadowcube and shadow2darray with NIR, since the shading languages don't expose txl of those sampler types and thus it's not supported in HW) - optional lowering passes in mesa/st (lower_rect, YUV lowering, etc) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>	2022-04-28 21:26:08 +00:00
Iago Toral Quiroga	cf4b3cb563	broadcom/compiler: prefer reconstruction over TMU spills when possible We have been reconstructing/rematerializing uniforms for a while, but we can do this in more scenarios, namely instructions which result is immutable along the execution of a shader across all channels. By doing this we gain the capacity to eliminate TMU spills which not only are slower, but can also make us drop to a fallback compilation strategy. Shader-db results show a small increase in instruction counts caused by us now being able to choose preferential compiler strategies that are intended to reduce TMU latency. In some cases, we are now also able to avoid dropping thread counts: total instructions in shared programs: 12658092 -> 12659245 (<.01%) instructions in affected programs: 75812 -> 76965 (1.52%) helped: 55 HURT: 107 total threads in shared programs: 416286 -> 416412 (0.03%) threads in affected programs: 126 -> 252 (100.00%) helped: 63 HURT: 0 total uniforms in shared programs: 3716916 -> 3716396 (-0.01%) uniforms in affected programs: 19327 -> 18807 (-2.69%) helped: 94 HURT: 50 total max-temps in shared programs: 2161796 -> 2161578 (-0.01%) max-temps in affected programs: 3961 -> 3743 (-5.50%) helped: 80 HURT: 24 total spills in shared programs: 3274 -> 3266 (-0.24%) spills in affected programs: 98 -> 90 (-8.16%) helped: 6 HURT: 0 total fills in shared programs: 4657 -> 4642 (-0.32%) fills in affected programs: 130 -> 115 (-11.54%) helped: 6 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15710>	2022-04-08 05:37:28 +00:00
Iago Toral Quiroga	597560e27c	broadcom/compiler: always enable per-quad on spill operations This ensures that any channels used for helper invocations are also spilled/filled correctly. Alternatively, we could recursively track all temps that get involved in computing values that are then used in explicit (dfdx,dfdy) or implicit (texture coordinates for mipmap or anisotropic filtering, etc) derivatives, and only enable per-quad on these (or disable spilling of any of these values). Fixes: dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>	2022-04-01 08:53:50 +00:00
Iago Toral Quiroga	ce849032a4	broadcom/compiler: allow ldunifa with indirect uniform loads We handle uniforms by copying them into the uniform stream to be consumed with ldunif when they have a constant offset. Otherwise we fallback to general TMU access, which has more latency. However, just like we did for UBOs and read-only SSBOs, we can also try to use the unifa mechanism to handle indirect accesses in certain cases instead of the TMU fallback. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>	2022-03-28 10:44:13 +00:00
Iago Toral Quiroga	ea3223e7a4	v3dv: implement VK_EXT_inline_uniform_block Inline uniform blocks store their contents in pool memory rather than a separate buffer, and are intended to provide a way in which some platforms may provide more efficient access to the uniform data, similar to push constants but with more flexible size constraints. We implement these in a similar way as push constants: for constant access we copy the data in the uniform stream (using the new QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from which we need to copy and for indirect access we fallback to regular UBO access. Because at NIR level there is no distinction between inline and regular UBOs and the compiler isn't aware of Vulkan descriptor sets, we use the UBO index on UBO load intrinsics to identify inline UBOs, just like we do for push constants. Particularly, we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this, however, unlike push constants, inline buffers are accessed through descriptor sets, and therefore we need to make sure they are located in the first slots of the UBO descriptor map. This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS slots of the map, with regular UBOs always coming after these slots. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>	2022-03-28 10:44:13 +00:00
Alejandro Piñeiro	e3d905ec39	v3dv/pipeline: use new helper vk_shader_module_to_nir In addition to use the helper, we also remove some of the lowering we had at preprocess_nir, as they are called now by the helper. As we are here we also move the call to nir_lower_sysvals_to_varyings, that for some reason we were calling it before preprocess_nir. It is worth to note that with this change we lose the ability to debug the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done on vk_spirv_to_nir, and as mentioned that includes several lowerings now. The workaround to that is to use NIR_DEBUG. We also needed to change how to check the entrypoint on the broadcom compiler, checking just if it is an entrypoint, instead of assuming that the name will be "main". v2: tweak comment, squash v3dv and compiler change (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>	2022-03-18 11:05:11 +00:00
Iago Toral Quiroga	49b5431197	broadcom/compiler: remove unused functions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15302>	2022-03-10 07:25:37 +00:00
Iago Toral Quiroga	44feff93c2	broadcom/compiler: don't always assign r5 if available Instead, only favor assigning r5 if we have first decided to assign an accumulator. This helps with assining r5 to short lived uniforms, favoring accumulator rotation to facilitate QPU merges. total instructions in shared programs: 12656164 -> 12628339 (-0.22%) instructions in affected programs: 5368373 -> 5340548 (-0.52%) helped: 17420 HURT: 9996 total uniforms in shared programs: 3704776 -> 3704863 (<.01%) uniforms in affected programs: 12247 -> 12334 (0.71%) helped: 23 HURT: 78 total max-temps in shared programs: 2153505 -> 2152684 (-0.04%) max-temps in affected programs: 26468 -> 25647 (-3.10%) helped: 569 HURT: 328 total fills in shared programs: 4656 -> 4657 (0.02%) fills in affected programs: 43 -> 44 (2.33%) helped: 0 HURT: 1 total sfu-stalls in shared programs: 34728 -> 34403 (-0.94%) sfu-stalls in affected programs: 3411 -> 3086 (-9.53%) helped: 842 HURT: 534 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00

1 2 3 4 5 ...

672 commits