fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-23 12:58:09 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	994ad351f7	broadcom/compiler: increase peephole limit to 24 instructions This helps by reducing the number of branches with their corresponding delay slots, at the expense of additional register pressure. It also helps a lot with SFU stalls, probably because removing control-flow blocks gives us more QPU scheduling flexibility to hide them. Shader-db results below correspond to the "closed shaders" set, since the full set is very dominated by the massive impact this change has on Skia's shaders (for the better), so this is probably more representative of real impact: total instructions in shared programs: 11887255 -> 11854898 (-0.27%) instructions in affected programs: 538170 -> 505813 (-6.01%) helped: 1653 HURT: 43 Instructions are helped. total threads in shared programs: 385924 -> 385872 (-0.01%) threads in affected programs: 236 -> 184 (-22.03%) helped: 22 HURT: 48 Inconclusive result (%-change mean confidence interval includes 0). total uniforms in shared programs: 3552808 -> 3547894 (-0.14%) uniforms in affected programs: 157486 -> 152572 (-3.12%) helped: 1673 HURT: 35 Uniforms are helped. total max-temps in shared programs: 2062403 -> 2064720 (0.11%) max-temps in affected programs: 18209 -> 20526 (12.72%) helped: 168 HURT: 369 Max-temps are HURT. total spills in shared programs: 1937 -> 1994 (2.94%) spills in affected programs: 79 -> 136 (72.15%) helped: 0 HURT: 1 total fills in shared programs: 2652 -> 2717 (2.45%) fills in affected programs: 115 -> 180 (56.52%) helped: 0 HURT: 1 total sfu-stalls in shared programs: 19349 -> 18010 (-6.92%) sfu-stalls in affected programs: 2321 -> 982 (-57.69%) helped: 674 HURT: 74 Sfu-stalls are helped. total inst-and-stalls in shared programs: 11906604 -> 11872908 (-0.28%) inst-and-stalls in affected programs: 541339 -> 507643 (-6.22%) helped: 1656 HURT: 43 Inst-and-stalls are helped. total nops in shared programs: 245740 -> 238085 (-3.12%) nops in affected programs: 19282 -> 11627 (-39.70%) helped: 1335 HURT: 76 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22922>	2023-05-10 11:11:38 +00:00
Iago Toral Quiroga	c950098abb	broadcom/compiler: move buffer loads to lower register pressure If we are trying to lower register pressure this can make a big difference in some cases. To avoid adding even more strategies, merge this with disabling ubo load sorting, since they are basically trying to do the same. total instructions in shared programs: 12848024 -> 12844510 (-0.03%) instructions in affected programs: 236537 -> 233023 (-1.49%) helped: 195 HURT: 87 Instructions are helped. total uniforms in shared programs: 3815601 -> 3814932 (-0.02%) uniforms in affected programs: 31773 -> 31104 (-2.11%) helped: 67 HURT: 115 Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 2210803 -> 2210622 (<.01%) max-temps in affected programs: 9362 -> 9181 (-1.93%) helped: 114 HURT: 34 Max-temps are helped. total spills in shared programs: 2556 -> 2330 (-8.84%) spills in affected programs: 1391 -> 1165 (-16.25%) helped: 39 HURT: 9 total fills in shared programs: 3840 -> 3317 (-13.62%) fills in affected programs: 2379 -> 1856 (-21.98%) helped: 39 HURT: 23 total sfu-stalls in shared programs: 21965 -> 21978 (0.06%) sfu-stalls in affected programs: 2618 -> 2631 (0.50%) helped: 45 HURT: 81 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 12869989 -> 12866488 (-0.03%) inst-and-stalls in affected programs: 238771 -> 235270 (-1.47%) helped: 193 HURT: 87 Inst-and-stalls are helped. total nops in shared programs: 303501 -> 303274 (-0.07%) nops in affected programs: 4159 -> 3932 (-5.46%) helped: 87 HURT: 105 Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22824>	2023-05-03 13:01:58 +00:00
Iago Toral Quiroga	9f522ac0c6	broadcom/compiler: don't allocate undef to rf0 rf0 is affected by restrictions in some scenarios so we rather use a register that does not cause conflicts for scheduling. total instructions in shared programs: 12850958 -> 12848024 (-0.02%) instructions in affected programs: 331974 -> 329040 (-0.88%) helped: 2559 HURT: 201 Instructions are helped. total max-temps in shared programs: 2210893 -> 2210803 (<.01%) max-temps in affected programs: 1486 -> 1396 (-6.06%) helped: 96 HURT: 7 Max-temps are helped. total sfu-stalls in shared programs: 21975 -> 21965 (-0.05%) sfu-stalls in affected programs: 32 -> 22 (-31.25%) helped: 16 HURT: 6 Sfu-stalls are helped. total inst-and-stalls in shared programs: 12872933 -> 12869989 (-0.02%) inst-and-stalls in affected programs: 332036 -> 329092 (-0.89%) helped: 2560 HURT: 189 Inst-and-stalls are helped. total nops in shared programs: 305911 -> 303501 (-0.79%) nops in affected programs: 11215 -> 8805 (-21.49%) helped: 2131 HURT: 3 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22797>	2023-05-03 05:39:35 +00:00
Iago Toral Quiroga	0468ce3791	broadcom/compiler: try harder to merge thread switch earlier We have been stopping as soon as we find a conflict but that doesn't mean we can't merge it in an earlier slot, so keep going. Going by shader-db, this sometimes allows us to merge the final thrsw a bit earlier and avoid emitting NOP instructions at the program end to make up for its delay slots. I have not observed cases where this helps with regular thrsw though, but it doesn't hurt to try with those too. total instructions in shared programs: 11526876 -> 11526354 (<.01%) instructions in affected programs: 10760 -> 10238 (-4.85%) helped: 236 HURT: 0 Instructions are helped. total max-temps in shared programs: 2231705 -> 2231677 (<.01%) max-temps in affected programs: 276 -> 248 (-10.14%) helped: 27 HURT: 0 Max-temps are helped. total inst-and-stalls in shared programs: 11545177 -> 11544655 (<.01%) inst-and-stalls in affected programs: 10777 -> 10255 (-4.84%) helped: 236 HURT: 0 Inst-and-stalls are helped. total nops in shared programs: 321624 -> 321152 (-0.15%) nops in affected programs: 751 -> 279 (-62.85%) helped: 236 HURT: 0 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22679>	2023-04-27 08:43:29 +00:00
Iago Toral Quiroga	c2003535b9	broadcom/compiler: return early for SFU op latency calculation Since we are returning a fixed latency for these check for them earlier and return early if they match. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22675>	2023-04-25 11:15:34 +02:00
Iago Toral Quiroga	148473eae4	broadcom/compiler: fix incorrect ALU checks We had a bunch of cases where we would check ALU parameters without first checking if the ALU op was valid. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22675>	2023-04-25 11:15:26 +02:00
Iago Toral Quiroga	18a3a0d915	broadcom/compiler: fix incorrect check for SFU op Before testing the waddr for SFU we should first validate this is indeed a valid (not NOP) magic write. Use the helper we have for this which gets this right. total instructions in shared programs: 12898957 -> 12850958 (-0.37%) instructions in affected programs: 4328937 -> 4280938 (-1.11%) helped: 19974 HURT: 439 Instructions are helped. total max-temps in shared programs: 2211503 -> 2210893 (-0.03%) max-temps in affected programs: 12924 -> 12314 (-4.72%) helped: 509 HURT: 20 Max-temps are helped. total sfu-stalls in shared programs: 22233 -> 21975 (-1.16%) sfu-stalls in affected programs: 722 -> 464 (-35.73%) helped: 297 HURT: 54 Sfu-stalls are helped. total inst-and-stalls in shared programs: 12921190 -> 12872933 (-0.37%) inst-and-stalls in affected programs: 4337977 -> 4289720 (-1.11%) helped: 20015 HURT: 404 Inst-and-stalls are helped. total nops in shared programs: 333743 -> 305911 (-8.34%) nops in affected programs: 86902 -> 59070 (-32.03%) helped: 14545 HURT: 76 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22593>	2023-04-24 09:34:20 +00:00
Harri Nieminen	c3c63cb1d8	broadcom: fix typos Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22591>	2023-04-21 17:19:46 +00:00
Iago Toral Quiroga	9217c565b2	v3d,v3dv: stop trying to force 16-bit TMU output for shadow comparisons In V3D we were doing this incorrectly by peeking into the sampler state unconditionally, which is not correct if the TMU operations don't use sampler state at all (like PBOs). This was causing us to fail the second test in this sequence when both tests run back back to back in the same process: dEQP-GLES3.functional.texture.shadow.2d.linear.greater_or_equal_depth_component32f dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rg32f_cube Here, the first test would setup sampler state for shadow comparisons and the second test would setup a PBO upload, which would incorrectly pick up the sampler state to decide about the TMU output size for the PBO operation. In V3DV we were doing this right looking through each texture/sampler instruction and checking if they all involved shadow comparisons or had relaxed precission, defaulting to 32-bit otherwise. This special-casing for shadow comparisons also leaks from drivers into the compiler where we are forced to emit some pieces of sampler state for 32-bit outputs, so we had to special-case shadow instructions there as well and we also had a fix for CS textures not having correct sampler state representing shadow operations too. Finally, we also had at least a couple of bugs where forcing 32-bit TMU output through V3D_DEBUG wasn't correctly forcing shadow comparisons to actually be 32-bit in all the right places, leading to visual bugs with the option enabled (Sponza being one example of this). This change eliminates all of these issues. Finally, the performance improvement observed from special casing shadow comparison is negligible, and in specific scenarios it can even be detrimental to performance due to increased register pressure (Sponza with PCF filtering set to 4 is an example of this again). Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8684 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22284>	2023-04-05 06:52:51 +00:00
Iago Toral Quiroga	1e28f2a6f2	broadcom/compiler: track pending ldtmu count with each TMU lookup And use this information when scheduling QPU to avoid merging a new TMU request into a previous ldtmu instruction when doing so may cause TMU output fifo overflow due to a stalling ldtmu. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22044>	2023-03-21 11:29:05 +00:00
Daniel Schürmann	2bb369dd8d	nir: add assertions that loops don't have a Continue Construct Hoping that I didn't miss any, this should add assertions to all functions and passes which explicitly handle 'nir_loop'. Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>	2023-02-21 10:41:11 +00:00
Alejandro Piñeiro	1a1fa2393e	v3d/v3dv: use shader_info->var_copies_lowered Instead of passing allow_copies as a parameter for v3d_optimize_nir (so manually doing that tracking). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>	2023-02-06 22:11:34 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Alejandro Piñeiro	2901066980	broadcom/compiler: fix indentation at v3d_nir_lower_image_load_store Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20972>	2023-01-30 21:57:45 +00:00
Alejandro Piñeiro	b56be4c37e	broadcom/compiler: treat PIPE_FORMAT_NONE as 32-bit formats for output type Needed to support Vulkan feature shaderStorageImageReadWithoutFormat. With that enabled we could receive a NONE format on a load image. For those we treat them as 32-bit formats, that would mean that the lowering would not need to do any format-specific unpacking. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>	2023-01-18 13:09:57 +00:00
Alejandro Piñeiro	41a081380a	broadcom/compiler: v3d_nir_lower_txf_ms doesn't need v3d_compile Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>	2023-01-18 13:09:57 +00:00
Iago Toral Quiroga	9bf525b4bd	broadcom/compiler: produce better code for f2f16 with RTZ rounding Suggested by Georg Lehmann, this generates far less code and should be more correct. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8090 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20721>	2023-01-16 12:03:24 +01:00
Iago Toral Quiroga	22ef66bcc9	v3d/compiler: remove unused sample_coverage field from fs key. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20634>	2023-01-11 10:54:05 +00:00
Iago Toral Quiroga	f40afe9883	v3d: add a debug option to optimize shader compile times Particularly, this makes compilation stop as soon as we get a valid shader and doesn't try to optimize spilling by trying fallback strategies. Might come in handy to reduce CTS execution time, for example, dEQP-VK.ssbo.layout.random.8bit.all_per_block_buffers.6 goes from 43m46.715s down to 15m15.068s. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20601>	2023-01-11 10:25:28 +00:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Iago Toral Quiroga	1174f37609	broadcom/compiler: avoid using ldvary sequence to hide latency of branching This can cause us to stomp the contents of r5 before we have a chance to read it, like this: 0x3d103186bb800000 nop ; nop ; ldvary.r0 0x3d105686bbf40000 nop ; mov rf26, r5 ; ldvary.r1 0x020000ef0000d000 bu.allna 232, r:unif (0x0000001c / 0.000000) 0x3d1096c6bbf40000 nop ; mov rf27, r5 ; ldvary.r2 Here, the MOV in the last instruction is supposed to read r5 produced from ldvary.r0, but because we have inserted the bu instruction in between now that read happens at the same time that ldvary.r1 updates r5, stomping the value we were supposed to read. Fix this by disallowing injection of a branch instruction in between an ldvary instruction and its write to the r5 register 2 instructions later. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7062 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19616>	2022-11-09 20:51:25 +00:00
Iago Toral Quiroga	c7150ad8e6	broadcom/compiler: drop unused v3d_compile parameter for nir pass Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19519>	2022-11-04 09:58:10 +00:00
Iago Toral Quiroga	8cd50ef071	broadcom/compiler: handle vec2 load/store index In vulkan, we load descriptors via vulkan resource index, which returns a vec2, of which we want component 0 which holds the actual index. Typically, this will be cleaned-up by the time we get to emitting VIR so the index is a single scalar component, but there are some cases where this might no be the case, so make sure we don't assume it to be a scalar, like we do in other places. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>	2022-10-28 08:23:32 +02:00
Iago Toral Quiroga	24d9a80247	v3dv: implement VK_EXT_pipeline_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>	2022-10-27 08:17:11 +00:00
Iago Toral Quiroga	1a2ca58aed	v3dv: use NIR_PASS with v3d_nir_lower_robust_image_access Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>	2022-10-27 08:17:11 +00:00
Alejandro Piñeiro	019529aa11	broadcom/compiler: call nir_opt_gcm with a custom strategy nir_opt_gcm get us worse shader-db stats, but that is expected. But we want to prevent to get worse values on spill/fills. Analyzing the outcome with shader-db, this mostly happen with shaders that are already complex, and are already spilling/filling. So the best option here is adding a new strategy, that fall backs if we get spill/fill using nir_opt_gcm. It is not clear in which order we should disable gcm. For now we disable it before loop unrolling. We get a slight performance gain (in average) using nir_opt_gcm. We don't show the shaderdb stats, as they are worse, but as mentioned, this is expected. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	afc6de356a	broadcom/compiler: pass a strategy struct to vir_compile_init That allows to reduce the number of parameters of the method. And after all, they were already filled using an existing strategy struct. This would make easier adding new fields on a strategy. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	0bf31b0710	broadcom/compiler: add more lowerings/optimizations on v3d_optimize_nir Optimizations that we are already calling on the Vulkan driver. As preparation to the Vulkan frontend to use v3d_optimize_nir too. We need to add a new parameter to v3d_optimize_nir in order to know if we can call nir_opt_find_array_copies. As we don't track if we are calling nir_var_lower_copies, we explicitly call it when we create the uncompiled shader create. So instead of tracking, we assume that each driver (v3d/v3dv) would call it when the shader is created. So when v3d_optimize_nir is called as part of the process to compile it at the compiler, we call it with allow_copies as false. We exclude on purpose nir_opt_gcm as it is a case of a optimization that could help performance even if it hurts shader db stats. shaderdb stats: total instructions in shared programs: 11705923 -> 11705034 (<.01%) instructions in affected programs: 88350 -> 87461 (-1.01%) helped: 201 HURT: 80 Instructions are helped. total threads in shared programs: 375552 -> 375558 (<.01%) threads in affected programs: 6 -> 12 (100.00%) helped: 3 HURT: 0 total uniforms in shared programs: 3486108 -> 3485789 (<.01%) uniforms in affected programs: 7473 -> 7154 (-4.27%) helped: 90 HURT: 1 Uniforms are helped. total max-temps in shared programs: 2021860 -> 2021802 (<.01%) max-temps in affected programs: 800 -> 742 (-7.25%) helped: 21 HURT: 3 Max-temps are helped. total sfu-stalls in shared programs: 19299 -> 19296 (-0.02%) sfu-stalls in affected programs: 18 -> 15 (-16.67%) helped: 10 HURT: 7 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 11725222 -> 11724330 (<.01%) inst-and-stalls in affected programs: 88402 -> 87510 (-1.01%) helped: 201 HURT: 80 Inst-and-stalls are helped. total nops in shared programs: 269674 -> 269386 (-0.11%) nops in affected programs: 3641 -> 3353 (-7.91%) helped: 103 HURT: 29 Nops are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	9cbc3ab239	broadcom/compiler: update how we compute return_words_of_texture_data on non-ssa For the non-ssa case, we were trying to use reg->num_components. But this is not the same that nir_ssa_def_components_read. It is the number of components of the destination register. And in the 16bit case, even if nir_lower_tex packs the outcome, it doesn't update the number of components, as nir_tex_instr_dest_size would still return 4. And nir validate would check that those values are the same. So this change focuses on the last part of this comment at nir_lower_tex: * Note that we don't change the destination num_components, because * nir_tex_instr_dest_size() will still return 4. The driver is just * expected to not store the other channels, given that nothing at the * NIR level will read them. We just limit how many channels we would use for the f16 case. It is also worth to note, based on the CTS and different applications we test, that this is a corner case. This was detected when we experimented to enable nir_opt_gcm for v3d, that lead to raise an assertion slightly below with some shaderdb tests, but technically it could happen without it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	ec10a37a52	broadcom/compiler: don't call nir_opt_load_store_vectorize on all v3d_optimize_nir calls For compute shaders, to avoid a crash with that optimization, it requires doing some optimizations and lowerings before. Example: static void lower_cs_shared(struct nir_shader *nir) { NIR_PASS_V(nir, nir_lower_vars_to_explicit_types, nir_var_mem_shared, shared_type_info); NIR_PASS_V(nir, nir_lower_explicit_io, nir_var_mem_shared, nir_address_format_32bit_offset); } In the same way other drivers (like anv) calls nir_opt_load_store_vectorize as part of their post-process-nir. So one option would be to move nir_opt_load_store_vectorize outsize the common v3d_nir_optimize, to a post-process nir method. To make things simpler, this change calls that optimization only if we have a v3d_compiler object, that is when each frontend has already done their lowerings, and call the v3d_compiler to get the final assembly (so we are already on a kind of post processing nir step). This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we start to call v3d_optimize_nir on v3dv directly. Slight shaderdb changes, but not significant. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Iago Toral Quiroga	b6093ffbe7	v3dv: expose VK_EXT_image_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	c7e022abfd	broadcom/compiler: add a lowering for robust image access Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	adcfd9bc2f	broadcom/compiler: rename static helpers involved with robust buffer access To make it explicit that they involve buffers, since we will be adding robust image access shortly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	5e5eaa3f1a	broadcom/compiler: rename v3d_nir_lower_robust_buffer_access.c We are going to add code to handle image robustness shortly, so better rename this to v3d_nir_lower_robust_access.c Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	4ea916f704	broadcom/compiler: don't apply robust buffer access to shared variables This feature is only concerned with buffers bound through a descriptor set. We are still keeping the code for this (disabled by default) since it may be useful for debugging some scenarios. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	44b02b5cb1	broadcom/compiler: handle shared stores with robust buffer access For some reason we supported all shared intrinsics but this one. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	b2bce9c98a	broadcom/compiler: fix robust buffer access Our implemention was bogus, it was only putting a cap on the offset based on the aligned buffer size and this doesn't ensure the access to the buffer happens within its valid range. I think the only reason we have been passing the tests is that we align all buffers sizes to 256B and the tests create buffers with a size that is smaller than that (like 64B). When get the size of the buffer from the shader, we get the actual bound range (so 64B in this case) and by capping to that we don't ensure the access will stay within that range, but we ensure it will stay within the underlying memory bound to the buffer (256B), and this is fine by the spec, however, I think if the actual buffer range was the same as the underlying allocation we would fail the tests. A valid behavior for robust buffer access on an out-of-bounds access is to return any valid bytes within the buffer, so we can just make that offset 0. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	15cdf5bb48	v3dv: optimize ldunif load into unifa write If we emit a ldunif to load the ubo/ssbo base address and then we are immediately moving it to the unifa register we can have the ldunif write directly to unifa and avoid the mov in between, which won't be done by copy propagation because that only works with temp registers. Also, since we can't read from unifa we must be careful to disallow reuse of the ldunif result for a future ldunif of the same base address. We do that by only reusing ldunif results from temp registers. total instructions in shared programs: 12468943 -> 12455139 (-0.11%) instructions in affected programs: 1661233 -> 1647429 (-0.83%) helped: 8307 HURT: 3994 total uniforms in shared programs: 3704532 -> 3704522 (<.01%) uniforms in affected programs: 339 -> 329 (-2.95%) helped: 7 HURT: 0 total max-temps in shared programs: 2148158 -> 2148290 (<.01%) max-temps in affected programs: 9320 -> 9452 (1.42%) helped: 175 HURT: 295 total spills in shared programs: 2202 -> 2202 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 3059 -> 3057 (-0.07%) fills in affected programs: 27 -> 25 (-7.41%) helped: 1 HURT: 0 total sfu-stalls in shared programs: 21167 -> 21056 (-0.52%) sfu-stalls in affected programs: 497 -> 386 (-22.33%) helped: 209 HURT: 127 total inst-and-stalls in shared programs: 12490110 -> 12476195 (-0.11%) inst-and-stalls in affected programs: 1662875 -> 1648960 (-0.84%) helped: 8312 HURT: 3987 total nops in shared programs: 316563 -> 313553 (-0.95%) nops in affected programs: 24269 -> 21259 (-12.40%) helped: 2158 HURT: 1006 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Iago Toral Quiroga	cbc5169ef9	broadcom/compiler: check signal writes to magic regs when updating scoreboard We have only been checking magic writes from ADD and MUL ports, but signals can potentially write to magic registers too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Eric Engestrom	5bfca00d31	broadcom: fix dependencies in static_library() calls The first argument is the name of the library, and the second argument is the list of files; those two got a bit mixed up. Fixes: `1ae8018a6a` ("meson: Add support for the vc4 driver.") Fixes: `4f3e380fa0` ("meson: Add support for the vc5 driver.") Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18593>	2022-09-14 09:38:28 +00:00
Iago Toral Quiroga	ca33c319e5	v3dv: implement VK_KHR_zero_initialize_workgroup_memory This only requires that we call the relevant lowering pass in NIR. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18312>	2022-08-31 07:33:19 +02:00
Eric Engestrom	e767f54f28	v3d: introduce V3D_DBG() macro to make V3D_DEBUG checks consistent The main issue was the inconsistent use of `unlikely()`, but the macro also simplifies the code a little bit. Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18086>	2022-08-24 23:03:57 +00:00
Iago Toral Quiroga	87a9951073	broadcom/compiler: track number of TMU operations in prog data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>	2022-08-15 23:35:16 +00:00
Iago Toral Quiroga	8ecea47f06	broadcom/compiler: simplify code emitted for centroid coordinates Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17909>	2022-08-06 22:34:25 +00:00
Iago Toral Quiroga	20591573f1	broadcom/compiler: use nir_opt_idiv_const total instructions in shared programs: 12463625 -> 12463571 (<.01%) instructions in affected programs: 1758 -> 1704 (-3.07%) helped: 12 HURT: 0 total uniforms in shared programs: 3704589 -> 3704591 (<.01%) uniforms in affected programs: 17 -> 19 (11.76%) helped: 0 HURT: 1 total max-temps in shared programs: 2148088 -> 2148138 (<.01%) max-temps in affected programs: 170 -> 220 (29.41%) helped: 0 HURT: 10 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>	2022-08-05 09:28:22 +00:00
Iago Toral Quiroga	73e8fc3efb	broadcom/compiler: don't use imprecise_32bit_lowering for idiv lowering This is known to produce bogus results for certain combinations of operands, so don't use it. See this issue for details: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6555 With this change, the idiv lowering will produce mul_high instructions, so we need to instruct the compiler to lower those with the ALU lowering right after the idiv lowering by adding the lower_mul_high option (we only need to add this to V3D, since V3DV already had it set). This will cause injection of uadd_carry instructions, for which we have backend implementations that produce better code for us than the NIR lowering. total instructions in shared programs: 12457692 -> 12463625 (0.05%) instructions in affected programs: 23115 -> 29048 (25.67%) helped: 0 HURT: 111 total threads in shared programs: 416372 -> 416368 (<.01%) threads in affected programs: 8 -> 4 (-50.00%) helped: 0 HURT: 2 total uniforms in shared programs: 3704067 -> 3704589 (0.01%) uniforms in affected programs: 5804 -> 6326 (8.99%) helped: 2 HURT: 109 total max-temps in shared programs: 2147845 -> 2148088 (0.01%) max-temps in affected programs: 2456 -> 2699 (9.89%) helped: 6 HURT: 91 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17871>	2022-08-05 09:28:22 +00:00
Alejandro Piñeiro	efc827ceea	v3d/v3dv: use NIR_PASS(_ Instead of NIR_PASS_V, when possible. This was done recently on anv (see commit `ce60195ec` and MR#17014) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	0a50330c3d	broadcom/compiler: make several passes to return a progress Two advantages: * When using NIR_DEBUG=nir_print_xx, will print outcome only if there is a change * We can use NIR_PASS(_, ...) instead of NIR_PASS_V, that has slightly more validation checks. This includes: * v3d_nir_lower_image_load_store * v3d_nir_lower_io * v3d_nir_lower_line_smooth * v3d_nir_lower_load_store_bitsize * v3d_nir_lower_robust_buffer_access * v3d_nir_lower_scratch * v3d_nir_lower_txf_ms As we are here we also simplify some of them by using the nir_shader_instructions_pass helper. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	81ca0b4191	broadcom/compiler: removed unused function It is not even implemented. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:25 +00:00
Alejandro Piñeiro	d8fee4cdaa	broadcom/compiler: use NIR_PASS for nir_lower_vars_to_ssa at v3d_optimize_nir There's no reason to not take into account progress at that point. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>	2022-07-20 11:35:24 +00:00

... 3 4 5 6 7 ...

892 commits