fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 07:38:14 +02:00

Author	SHA1	Message	Date
Alejandro Piñeiro	9cbc3ab239	broadcom/compiler: update how we compute return_words_of_texture_data on non-ssa For the non-ssa case, we were trying to use reg->num_components. But this is not the same that nir_ssa_def_components_read. It is the number of components of the destination register. And in the 16bit case, even if nir_lower_tex packs the outcome, it doesn't update the number of components, as nir_tex_instr_dest_size would still return 4. And nir validate would check that those values are the same. So this change focuses on the last part of this comment at nir_lower_tex: * Note that we don't change the destination num_components, because * nir_tex_instr_dest_size() will still return 4. The driver is just * expected to not store the other channels, given that nothing at the * NIR level will read them. We just limit how many channels we would use for the f16 case. It is also worth to note, based on the CTS and different applications we test, that this is a corner case. This was detected when we experimented to enable nir_opt_gcm for v3d, that lead to raise an assertion slightly below with some shaderdb tests, but technically it could happen without it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	ec10a37a52	broadcom/compiler: don't call nir_opt_load_store_vectorize on all v3d_optimize_nir calls For compute shaders, to avoid a crash with that optimization, it requires doing some optimizations and lowerings before. Example: static void lower_cs_shared(struct nir_shader *nir) { NIR_PASS_V(nir, nir_lower_vars_to_explicit_types, nir_var_mem_shared, shared_type_info); NIR_PASS_V(nir, nir_lower_explicit_io, nir_var_mem_shared, nir_address_format_32bit_offset); } In the same way other drivers (like anv) calls nir_opt_load_store_vectorize as part of their post-process-nir. So one option would be to move nir_opt_load_store_vectorize outsize the common v3d_nir_optimize, to a post-process nir method. To make things simpler, this change calls that optimization only if we have a v3d_compiler object, that is when each frontend has already done their lowerings, and call the v3d_compiler to get the final assembly (so we are already on a kind of post processing nir step). This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we start to call v3d_optimize_nir on v3dv directly. Slight shaderdb changes, but not significant. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Iago Toral Quiroga	b6093ffbe7	v3dv: expose VK_EXT_image_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	c7e022abfd	broadcom/compiler: add a lowering for robust image access Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	adcfd9bc2f	broadcom/compiler: rename static helpers involved with robust buffer access To make it explicit that they involve buffers, since we will be adding robust image access shortly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	5e5eaa3f1a	broadcom/compiler: rename v3d_nir_lower_robust_buffer_access.c We are going to add code to handle image robustness shortly, so better rename this to v3d_nir_lower_robust_access.c Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	4ea916f704	broadcom/compiler: don't apply robust buffer access to shared variables This feature is only concerned with buffers bound through a descriptor set. We are still keeping the code for this (disabled by default) since it may be useful for debugging some scenarios. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	44b02b5cb1	broadcom/compiler: handle shared stores with robust buffer access For some reason we supported all shared intrinsics but this one. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	b2bce9c98a	broadcom/compiler: fix robust buffer access Our implemention was bogus, it was only putting a cap on the offset based on the aligned buffer size and this doesn't ensure the access to the buffer happens within its valid range. I think the only reason we have been passing the tests is that we align all buffers sizes to 256B and the tests create buffers with a size that is smaller than that (like 64B). When get the size of the buffer from the shader, we get the actual bound range (so 64B in this case) and by capping to that we don't ensure the access will stay within that range, but we ensure it will stay within the underlying memory bound to the buffer (256B), and this is fine by the spec, however, I think if the actual buffer range was the same as the underlying allocation we would fail the tests. A valid behavior for robust buffer access on an out-of-bounds access is to return any valid bytes within the buffer, so we can just make that offset 0. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	15cdf5bb48	v3dv: optimize ldunif load into unifa write If we emit a ldunif to load the ubo/ssbo base address and then we are immediately moving it to the unifa register we can have the ldunif write directly to unifa and avoid the mov in between, which won't be done by copy propagation because that only works with temp registers. Also, since we can't read from unifa we must be careful to disallow reuse of the ldunif result for a future ldunif of the same base address. We do that by only reusing ldunif results from temp registers. total instructions in shared programs: 12468943 -> 12455139 (-0.11%) instructions in affected programs: 1661233 -> 1647429 (-0.83%) helped: 8307 HURT: 3994 total uniforms in shared programs: 3704532 -> 3704522 (<.01%) uniforms in affected programs: 339 -> 329 (-2.95%) helped: 7 HURT: 0 total max-temps in shared programs: 2148158 -> 2148290 (<.01%) max-temps in affected programs: 9320 -> 9452 (1.42%) helped: 175 HURT: 295 total spills in shared programs: 2202 -> 2202 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 3059 -> 3057 (-0.07%) fills in affected programs: 27 -> 25 (-7.41%) helped: 1 HURT: 0 total sfu-stalls in shared programs: 21167 -> 21056 (-0.52%) sfu-stalls in affected programs: 497 -> 386 (-22.33%) helped: 209 HURT: 127 total inst-and-stalls in shared programs: 12490110 -> 12476195 (-0.11%) inst-and-stalls in affected programs: 1662875 -> 1648960 (-0.84%) helped: 8312 HURT: 3987 total nops in shared programs: 316563 -> 313553 (-0.95%) nops in affected programs: 24269 -> 21259 (-12.40%) helped: 2158 HURT: 1006 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Iago Toral Quiroga	cbc5169ef9	broadcom/compiler: check signal writes to magic regs when updating scoreboard We have only been checking magic writes from ADD and MUL ports, but signals can potentially write to magic registers too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Eric Engestrom	5bfca00d31	broadcom: fix dependencies in static_library() calls The first argument is the name of the library, and the second argument is the list of files; those two got a bit mixed up. Fixes: `1ae8018a6a` ("meson: Add support for the vc4 driver.") Fixes: `4f3e380fa0` ("meson: Add support for the vc5 driver.") Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18593>	2022-09-14 09:38:28 +00:00
Iago Toral Quiroga	ca33c319e5	v3dv: implement VK_KHR_zero_initialize_workgroup_memory This only requires that we call the relevant lowering pass in NIR. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18312>	2022-08-31 07:33:19 +02:00
Eric Engestrom	e767f54f28	v3d: introduce V3D_DBG() macro to make V3D_DEBUG checks consistent The main issue was the inconsistent use of `unlikely()`, but the macro also simplifies the code a little bit. Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18086>	2022-08-24 23:03:57 +00:00
Iago Toral Quiroga	87a9951073	broadcom/compiler: track number of TMU operations in prog data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>	2022-08-15 23:35:16 +00:00
Iago Toral Quiroga	8ecea47f06	broadcom/compiler: simplify code emitted for centroid coordinates Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17909>	2022-08-06 22:34:25 +00:00
Iago Toral Quiroga	20591573f1	broadcom/compiler: use nir_opt_idiv_const total instructions in shared programs: 12463625 -> 12463571 (<.01%) instructions in affected programs: 1758 -> 1704 (-3.07%) helped: 12 HURT: 0 total uniforms in shared programs: 3704589 -> 3704591 (<.01%) uniforms in affected programs: 17 -> 19 (11.76%) helped: 0 HURT: 1 total max-temps in shared programs: 2148088 -> 2148138 (<.01%) max-temps in affected programs: 170 -> 220 (29.41%) helped: 0 HURT: 10 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>	2022-08-05 09:28:22 +00:00
Iago Toral Quiroga	73e8fc3efb	broadcom/compiler: don't use imprecise_32bit_lowering for idiv lowering This is known to produce bogus results for certain combinations of operands, so don't use it. See this issue for details: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6555 With this change, the idiv lowering will produce mul_high instructions, so we need to instruct the compiler to lower those with the ALU lowering right after the idiv lowering by adding the lower_mul_high option (we only need to add this to V3D, since V3DV already had it set). This will cause injection of uadd_carry instructions, for which we have backend implementations that produce better code for us than the NIR lowering. total instructions in shared programs: 12457692 -> 12463625 (0.05%) instructions in affected programs: 23115 -> 29048 (25.67%) helped: 0 HURT: 111 total threads in shared programs: 416372 -> 416368 (<.01%) threads in affected programs: 8 -> 4 (-50.00%) helped: 0 HURT: 2 total uniforms in shared programs: 3704067 -> 3704589 (0.01%) uniforms in affected programs: 5804 -> 6326 (8.99%) helped: 2 HURT: 109 total max-temps in shared programs: 2147845 -> 2148088 (0.01%) max-temps in affected programs: 2456 -> 2699 (9.89%) helped: 6 HURT: 91 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>	2022-08-05 09:28:22 +00:00
Alejandro Piñeiro	efc827ceea	v3d/v3dv: use NIR_PASS(_ Instead of NIR_PASS_V, when possible. This was done recently on anv (see commit `ce60195ec` and MR#17014) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	0a50330c3d	broadcom/compiler: make several passes to return a progress Two advantages: * When using NIR_DEBUG=nir_print_xx, will print outcome only if there is a change * We can use NIR_PASS(_, ...) instead of NIR_PASS_V, that has slightly more validation checks. This includes: * v3d_nir_lower_image_load_store * v3d_nir_lower_io * v3d_nir_lower_line_smooth * v3d_nir_lower_load_store_bitsize * v3d_nir_lower_robust_buffer_access * v3d_nir_lower_scratch * v3d_nir_lower_txf_ms As we are here we also simplify some of them by using the nir_shader_instructions_pass helper. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	81ca0b4191	broadcom/compiler: removed unused function It is not even implemented. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	d8fee4cdaa	broadcom/compiler: use NIR_PASS for nir_lower_vars_to_ssa at v3d_optimize_nir There's no reason to not take into account progress at that point. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:24 +00:00
Alejandro Piñeiro	dea0fe8a06	broadcom/compiler: wrap nir_convert_to_lcssa with NIR_PASS_V So we get it included with the NIR_DEBUG=print_xx debug options. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:24 +00:00
Iago Toral Quiroga	90054e9c5d	broadcom/compiler: track if a shader uses global intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>	2022-07-19 09:47:34 +02:00
Iago Toral Quiroga	fa03d9c8be	broadcom/compiler: implement 2x32 global intrinsics Notice we ignore the high 32-bit component of the address because we know it must be 0. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>	2022-07-19 09:47:34 +02:00
Iago Toral Quiroga	871a7536e8	broadcom/compiler: don't over-estimate latency of TMU instructions Over-estimating latency can cause us to delay the critical paths of the shader unnecessarily, producing larger QPU programs that take more time to execute as a result (and it also adds register pressure) so striking a balance is important. The thread switching model in V3D is quite effective at hiding latency and usuallly we just need to hint it to delay TMU instructions a little bit to find the best compromise for performance. The new latency numbers have been chosen empirically by testing V3DV with Sponza and a few UE4 samples. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>	2022-07-11 10:34:58 +00:00
Iago Toral Quiroga	f227aa7c98	broadcom/compiler: don't try to hide TMU latency at QPU scheduling Based on empirical testing with Sponza and a few UE4 samples this is consistently slightly benefitial for performance. The most likely reason why this helps is that thrsw is probably already quite effective at hiding latency and we are already trying to hide latency at NIR scheduling and also via TMU pipelining, so piling up on this when scheduling QPU typically ends up providing no benefit at all for latency and is instead possibly preventing us to unblock critical paths in the shader that depend on the TMU result, requiring us to execute more cycles to complete the program. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>	2022-07-11 10:34:58 +00:00
Iago Toral Quiroga	152fc4fd28	v3dv: don't lower uadd_carry and usub_borrow We can produce slightly better code for these in the backend, so do that. For this we need to: 1. Fix our implementation of uadd_carry (which wasn't used) to return an integer instead of a boolean value. 2. Add an implementation of usub_borrow. Notice these are only used in Vulkan. In GL these instructions are always unconditionally lowered by the state tracker in GLSL IR so we never get to see them in the backend. Shader-db stats from a collection of Vulkan samples: total instructions in shared programs: 122351 -> 122345 (<.01%) instructions in affected programs: 196 -> 190 (-3.06%) helped: 2 HURT: 0 total uniforms in shared programs: 18670 -> 18672 (0.01%) uniforms in affected programs: 59 -> 61 (3.39%) helped: 0 HURT: 2 total max-temps in shared programs: 13145 -> 13147 (0.02%) max-temps in affected programs: 27 -> 29 (7.41%) helped: 0 HURT: 2 total inst-and-stalls in shared programs: 123052 -> 123046 (<.01%) inst-and-stalls in affected programs: 197 -> 191 (-3.05%) helped: 2 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17372>	2022-07-07 09:16:24 +00:00
Iago Toral Quiroga	cfccd93efc	broadcom/compiler: don't predicate postponed spills The postponed spill is predicated using the condition from the last write, but this is only correct if the register was only written once in the TMU sequence, or if it is always written with the same predication. While we could try to track whether this is the case or not, it would make the postponed spill path even more complex than it already is, so let's just avoid predicating these. We are already discouraging TMU spilling of registers in the middle of TMU sequences, so this should not be a very common case. Cc: mesa-stable Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17201>	2022-06-28 05:49:51 +00:00
Iago Toral Quiroga	98420408d0	broadcom/compiler: fix postponed TMU spills with multiple writes If we are spilling a register that is used in the middle of a TMU sequence, we postpone the spill until the TMU sequence finishes, at which point we inject the spill and rewrite the original instruction to write to the new temp. However, this doesn't work if the register is written multiple times during the TMU sequence. In that scenario, we need to ensure that all writes are rewritten to use the new temp, not just the last one. Cc: mesa-stable Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17201>	2022-06-28 05:49:51 +00:00
Iago Toral Quiroga	a97f78eb14	broadcom/compiler: disable flags optimization for loop conditions This is not safe because it may skip regenerating the flags for the loop condition in the loop continue block and these flags may be stomped in the loop body by other conditionals. Fixes: `9909fe6ba` ('broadcom/compiler: Skip bool_to_cond where possible') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>	2022-06-14 11:30:33 +00:00
Erik Faye-Lund	873ec432b3	broadcom/compiler: use macro for power-of-two check This will allow the use of static_assert here instead of our compiler-specific implementation. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>	2022-06-03 07:14:43 +00:00
Iago Toral Quiroga	b90d7b9b38	broadcom/compiler: don't promote early fragment tests when writing sample mask If the sample mask is being written it means we want to discard some of the samples generated so we should not be promoting the fragment shader to do early tests, since that would not take into account the sample mask written from the shader. Fixes: dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16626>	2022-05-20 13:04:32 +00:00
Iago Toral Quiroga	487c213142	v3d/compiler: add more stats to prog_data So we can expose them via VK_KHR_pipeline_executable_properties. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>	2022-05-09 12:12:35 +00:00
Emma Anholt	536c8ee96d	nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional. This controls the whole lowering of "make tex ops with implicit derivatives on non-implicit-derivative stages be tex ops with an explicit lod of 0 instead", but it's really hard to describe that in a git commit summary. All existing callers get it added except: - nir_to_tgsi which didn't want it. - nouveau, which didn't want it (fixes regressions in shadowcube and shadow2darray with NIR, since the shading languages don't expose txl of those sampler types and thus it's not supported in HW) - optional lowering passes in mesa/st (lower_rect, YUV lowering, etc) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>	2022-04-28 21:26:08 +00:00
Iago Toral Quiroga	cf4b3cb563	broadcom/compiler: prefer reconstruction over TMU spills when possible We have been reconstructing/rematerializing uniforms for a while, but we can do this in more scenarios, namely instructions which result is immutable along the execution of a shader across all channels. By doing this we gain the capacity to eliminate TMU spills which not only are slower, but can also make us drop to a fallback compilation strategy. Shader-db results show a small increase in instruction counts caused by us now being able to choose preferential compiler strategies that are intended to reduce TMU latency. In some cases, we are now also able to avoid dropping thread counts: total instructions in shared programs: 12658092 -> 12659245 (<.01%) instructions in affected programs: 75812 -> 76965 (1.52%) helped: 55 HURT: 107 total threads in shared programs: 416286 -> 416412 (0.03%) threads in affected programs: 126 -> 252 (100.00%) helped: 63 HURT: 0 total uniforms in shared programs: 3716916 -> 3716396 (-0.01%) uniforms in affected programs: 19327 -> 18807 (-2.69%) helped: 94 HURT: 50 total max-temps in shared programs: 2161796 -> 2161578 (-0.01%) max-temps in affected programs: 3961 -> 3743 (-5.50%) helped: 80 HURT: 24 total spills in shared programs: 3274 -> 3266 (-0.24%) spills in affected programs: 98 -> 90 (-8.16%) helped: 6 HURT: 0 total fills in shared programs: 4657 -> 4642 (-0.32%) fills in affected programs: 130 -> 115 (-11.54%) helped: 6 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15710>	2022-04-08 05:37:28 +00:00
Iago Toral Quiroga	597560e27c	broadcom/compiler: always enable per-quad on spill operations This ensures that any channels used for helper invocations are also spilled/filled correctly. Alternatively, we could recursively track all temps that get involved in computing values that are then used in explicit (dfdx,dfdy) or implicit (texture coordinates for mipmap or anisotropic filtering, etc) derivatives, and only enable per-quad on these (or disable spilling of any of these values). Fixes: dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15705>	2022-04-01 08:53:50 +00:00
Iago Toral Quiroga	ce849032a4	broadcom/compiler: allow ldunifa with indirect uniform loads We handle uniforms by copying them into the uniform stream to be consumed with ldunif when they have a constant offset. Otherwise we fallback to general TMU access, which has more latency. However, just like we did for UBOs and read-only SSBOs, we can also try to use the unifa mechanism to handle indirect accesses in certain cases instead of the TMU fallback. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>	2022-03-28 10:44:13 +00:00
Iago Toral Quiroga	ea3223e7a4	v3dv: implement VK_EXT_inline_uniform_block Inline uniform blocks store their contents in pool memory rather than a separate buffer, and are intended to provide a way in which some platforms may provide more efficient access to the uniform data, similar to push constants but with more flexible size constraints. We implement these in a similar way as push constants: for constant access we copy the data in the uniform stream (using the new QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from which we need to copy and for indirect access we fallback to regular UBO access. Because at NIR level there is no distinction between inline and regular UBOs and the compiler isn't aware of Vulkan descriptor sets, we use the UBO index on UBO load intrinsics to identify inline UBOs, just like we do for push constants. Particularly, we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this, however, unlike push constants, inline buffers are accessed through descriptor sets, and therefore we need to make sure they are located in the first slots of the UBO descriptor map. This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS slots of the map, with regular UBOs always coming after these slots. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>	2022-03-28 10:44:13 +00:00
Alejandro Piñeiro	e3d905ec39	v3dv/pipeline: use new helper vk_shader_module_to_nir In addition to use the helper, we also remove some of the lowering we had at preprocess_nir, as they are called now by the helper. As we are here we also move the call to nir_lower_sysvals_to_varyings, that for some reason we were calling it before preprocess_nir. It is worth to note that with this change we lose the ability to debug the NIR just after spirv_to_nir using V3D_DEBUG, as now this is done on vk_spirv_to_nir, and as mentioned that includes several lowerings now. The workaround to that is to use NIR_DEBUG. We also needed to change how to check the entrypoint on the broadcom compiler, checking just if it is an entrypoint, instead of assuming that the name will be "main". v2: tweak comment, squash v3dv and compiler change (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15449>	2022-03-18 11:05:11 +00:00
Iago Toral Quiroga	49b5431197	broadcom/compiler: remove unused functions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15302>	2022-03-10 07:25:37 +00:00
Iago Toral Quiroga	44feff93c2	broadcom/compiler: don't always assign r5 if available Instead, only favor assigning r5 if we have first decided to assign an accumulator. This helps with assining r5 to short lived uniforms, favoring accumulator rotation to facilitate QPU merges. total instructions in shared programs: 12656164 -> 12628339 (-0.22%) instructions in affected programs: 5368373 -> 5340548 (-0.52%) helped: 17420 HURT: 9996 total uniforms in shared programs: 3704776 -> 3704863 (<.01%) uniforms in affected programs: 12247 -> 12334 (0.71%) helped: 23 HURT: 78 total max-temps in shared programs: 2153505 -> 2152684 (-0.04%) max-temps in affected programs: 26468 -> 25647 (-3.10%) helped: 569 HURT: 328 total fills in shared programs: 4656 -> 4657 (0.02%) fills in affected programs: 43 -> 44 (2.33%) helped: 0 HURT: 1 total sfu-stalls in shared programs: 34728 -> 34403 (-0.94%) sfu-stalls in affected programs: 3411 -> 3086 (-9.53%) helped: 842 HURT: 534 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	77f58b46d9	broadcom/compiler: add comment on why we don't use r5 with ldunifa Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	5b140428b0	broadcom/compiler: adjust register threshold for 2-thread compiles We have twice the registers in this case so it makes sense to double this as well. While this causes slight regressions in shader-db stats (due to additional register pressure), it helps us hide latency of memory reads better on 2-thread compiles, where the thread switch mechanism will be less effective. This shows a ~3% performance improvement on the UE4 SunTemple demo. total instructions in shared programs: 12642413 -> 12656164 (0.11%) instructions in affected programs: 2272652 -> 2286403 (0.61%) helped: 2924 HURT: 3389 total uniforms in shared programs: 3703861 -> 3704776 (0.02%) uniforms in affected programs: 213729 -> 214644 (0.43%) helped: 823 HURT: 1272 total max-temps in shared programs: 2150686 -> 2153505 (0.13%) max-temps in affected programs: 191332 -> 194151 (1.47%) helped: 1900 HURT: 1891 total spills in shared programs: 3255 -> 3274 (0.58%) spills in affected programs: 166 -> 185 (11.45%) helped: 3 HURT: 6 total fills in shared programs: 4630 -> 4656 (0.56%) fills in affected programs: 367 -> 393 (7.08%) helped: 7 HURT: 15 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	a35b47a0b1	broadcom/compiler: add a strategy to disable scheduling of general TMU reads This can add quite a bit of register pressure so it makes sense to disable it to prevent us from dropping to 2 threads or increase spills: total instructions in shared programs: 12672813 -> 12642413 (-0.24%) instructions in affected programs: 256721 -> 226321 (-11.84%) helped: 719 HURT: 77 total threads in shared programs: 415534 -> 416322 (0.19%) threads in affected programs: 788 -> 1576 (100.00%) helped: 394 HURT: 0 total uniforms in shared programs: 3711370 -> 3703861 (-0.20%) uniforms in affected programs: 28859 -> 21350 (-26.02%) helped: 204 HURT: 455 total max-temps in shared programs: 2159439 -> 2150686 (-0.41%) max-temps in affected programs: 32945 -> 24192 (-26.57%) helped: 585 HURT: 47 total spills in shared programs: 5966 -> 3255 (-45.44%) spills in affected programs: 2933 -> 222 (-92.43%) helped: 192 HURT: 4 total fills in shared programs: 9328 -> 4630 (-50.36%) fills in affected programs: 5184 -> 486 (-90.62%) helped: 196 HURT: 0 Compared to the stats before adding scheduling of non-filtered memory reads we see we that we have now gotten back all that was lost and then some: total instructions in shared programs: 12663186 -> 12642413 (-0.16%) instructions in affected programs: 2051803 -> 2031030 (-1.01%) helped: 4885 HURT: 3338 total threads in shared programs: 415870 -> 416322 (0.11%) threads in affected programs: 896 -> 1348 (50.45%) helped: 300 HURT: 74 total uniforms in shared programs: 3711629 -> 3703861 (-0.21%) uniforms in affected programs: 158766 -> 150998 (-4.89%) helped: 1973 HURT: 499 total max-temps in shared programs: 2138857 -> 2150686 (0.55%) max-temps in affected programs: 177920 -> 189749 (6.65%) helped: 2666 HURT: 2035 total spills in shared programs: 3860 -> 3255 (-15.67%) spills in affected programs: 2653 -> 2048 (-22.80%) helped: 77 HURT: 21 total fills in shared programs: 5573 -> 4630 (-16.92%) fills in affected programs: 3839 -> 2896 (-24.56%) helped: 81 HURT: 15 total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%) sfu-stalls in affected programs: 8993 -> 7564 (-15.89%) helped: 1808 HURT: 1038 total nops in shared programs: 324894 -> 323685 (-0.37%) nops in affected programs: 30362 -> 29153 (-3.98%) helped: 2513 HURT: 2077 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	f783bd0d2a	broadcom/compiler: define v3d-specific delays for NIR instructions We do a few changes over NIR's defaults: 1. Lower delay for texture reads. Empirically, we don't observe any benefits with delays over 50 and since this delay value is still used by the scheduler in the "favor register pressure" case it is benefitial to avoid overestimating it too much. 2. Adjust delay for non-filtered TMU reads to the delay selected for texture reads. 3. In our case, UBO reads from dynamically uniform addresses don't use the TMU and have a latency of 1 instruction in the best case scenario or 4 at worse, so we go with 1 so we don't try to move this early. This helps us get back some of what we lost when updating the default scheduler configuration to add a delay for non-filtered memory reads: total instructions in shared programs: 13126587 -> 12671765 (-3.46%) instructions in affected programs: 3764097 -> 3309275 (-12.08%) helped: 14664 HURT: 4244 total threads in shared programs: 407208 -> 415522 (2.04%) threads in affected programs: 8716 -> 17030 (95.39%) helped: 4224 HURT: 67 total uniforms in shared programs: 3812698 -> 3711224 (-2.66%) uniforms in affected programs: 335170 -> 233696 (-30.28%) helped: 2816 HURT: 3551 total max-temps in shared programs: 2318430 -> 2159345 (-6.86%) max-temps in affected programs: 539991 -> 380906 (-29.46%) helped: 13173 HURT: 1440 total spills in shared programs: 49086 -> 5966 (-87.85%) spills in affected programs: 48306 -> 5186 (-89.26%) helped: 1655 HURT: 28 total fills in shared programs: 55810 -> 9328 (-83.29%) fills in affected programs: 54821 -> 8339 (-84.79%) helped: 1659 HURT: 22 LOST: 0 GAINED: 3 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	e7a4e97076	nir/schedule: use larger delay for non-filtered memory reads This has been pending for a long time. It is not very consistent to add a significant delay for textures and not do it for UBOs, etc The reason we have not been doing this so far is the accumulated effect on register pressure for V3D as shown by shader-db results below, but from the point of view of a generic scheduler it makes sense to do this. Later patches will address V3D specific issues with register pressure derived from this by letting the driver control its instruction delay settings. total instructions in shared programs: 12662138 -> 13126587 (3.67%) instructions in affected programs: 1813091 -> 2277540 (25.62%) helped: 2410 HURT: 10499 total threads in shared programs: 415858 -> 407208 (-2.08%) threads in affected programs: 17348 -> 8698 (-49.86%) helped: 8 HURT: 4333 total uniforms in shared programs: 3711483 -> 3812698 (2.73%) uniforms in affected programs: 128012 -> 229227 (79.07%) helped: 3474 HURT: 2143 total max-temps in shared programs: 2138763 -> 2318430 (8.40%) max-temps in affected programs: 318780 -> 498447 (56.36%) helped: 588 HURT: 11997 total spills in shared programs: 3860 -> 49086 (1171.66%) spills in affected programs: 709 -> 45935 (6378.84%) helped: 23 HURT: 1595 total fills in shared programs: 5573 -> 55810 (901.44%) fills in affected programs: 1067 -> 51304 (4708.25%) helped: 23 HURT: 1595 LOST: 3 GAINED: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:04 +00:00
Iago Toral Quiroga	9ef499b315	broadcom/compiler: stop moving UBO loads before NIR scheduling This doesn't have any significant impact shader-db stats and would reduce our capacity to hide latency from the loads, so it is probably undesirable: total instructions in shared programs: 12663189 -> 12663186 (<.01%) instructions in affected programs: 4222 -> 4219 (-0.07%) helped: 9 HURT: 4 total uniforms in shared programs: 3711624 -> 3711629 (<.01%) uniforms in affected programs: 186 -> 191 (2.69%) helped: 0 HURT: 2 total max-temps in shared programs: 2138822 -> 2138857 (<.01%) max-temps in affected programs: 569 -> 604 (6.15%) helped: 1 HURT: 9 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>	2022-03-09 15:53:03 +00:00
Iago Toral Quiroga	f761f8fd9e	broadcom/compiler: simplify node/temp translation during register allocation Now that we don't sort our nodes we can arrange them so we can easily translate between nodes and temps without a mapping table, just applying an offset. To do this we have a single array of nodes where twe put first the nodes for accumulators and then the nodes for temps. With this setup we can ensure that for any given temp T, its node is always T + ACC_COUNT. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00
Iago Toral Quiroga	871b0a7f6a	broadcom/compiler: don't sort nodes for register allocation Nodes are allocated in order to registers so initially sorting was used to ensure that nodes with smaller life ranges would be assigned first and therefore be more likely to get accumulators. However, since `d81a6e5f1d` now we don't rely on order to make decisions about accumulators and instead we make policy decisions based on actual liveness, so sorting is no longer strictly relevant to this decision. Furthermore, we are not re-sorting nodes after each spill either, since that would probably require that we rebuild the interference graph after each spill (the graph identifies nodes by their index). Shader-db results show a significant improvement in instruction counts, due to more optimal accumulator assignments. The reason for this is that we use a round-robin policy for choosing the next accumulator to assign. The idea behind this is preventing nearby temps to be assigned to the same accumulator so that QPU scheduling is more flexible, but if we sort our nodes, we are basically not assigning temps in program order any more and the round-robin policy becomes less effective: total instructions in shared programs: 13000420 -> 12663189 (-2.59%) instructions in affected programs: 11791267 -> 11454036 (-2.86%) helped: 62890 HURT: 19987 total threads in shared programs: 415874 -> 415870 (<.01%) threads in affected programs: 20 -> 16 (-20.00%) helped: 2 HURT: 4 total uniforms in shared programs: 3711652 -> 3711624 (<.01%) uniforms in affected programs: 43430 -> 43402 (-0.06%) helped: 134 HURT: 173 total max-temps in shared programs: 2144876 -> 2138822 (-0.28%) max-temps in affected programs: 123334 -> 117280 (-4.91%) helped: 4112 HURT: 1195 total spills in shared programs: 3870 -> 3860 (-0.26%) spills in affected programs: 1013 -> 1003 (-0.99%) helped: 14 HURT: 12 total fills in shared programs: 5560 -> 5573 (0.23%) fills in affected programs: 1765 -> 1778 (0.74%) helped: 14 HURT: 17 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>	2022-03-02 08:09:11 +00:00

1 2 3 4 5 ...

664 commits