fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 07:18:06 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	3530e3ffb2	broadcom/compiler: use scoped barriers Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23228>	2023-05-25 14:28:30 +02:00
Iago Toral Quiroga	e99ab86f77	broadcom/compiler: flag use of control barriers We have been relying on NIR's gather info pass for this but it is not safe unless we are certain we are always calling it after any other pass that may emit a control barrier. As it stands, nir_zero_initialize_shared_memory can emit a control barrier and we don't call the gather info pass after it, which is problematic. The only reason this is not really a problem right now is because for non-scoped barriers (which is what we currently use) it doesn't emit a scoped barrier, just a regular memory barrier (which is probably a bug in the pass!), but as soon as we move to scoped barriers, this is going to be a problem, since we need to know when we emit a control barrier to ensure supergroup calculations prevent deadlocks at the barrier op. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23228>	2023-05-25 14:28:30 +02:00
Erik Faye-Lund	c87e491107	nir: use nir_fsub_imm Now that we have nir_fsub_imm, let's use it to save some typing! Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>	2023-05-25 06:59:25 +00:00
Erik Faye-Lund	20d619cd84	nir: use more nir_fmul_imm This simplifies things a bit. Note that in some cases, the arguments are swapped, because multiplications are commutative, and nir_fmul_imm only allows the second operand to be an immediate. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>	2023-05-25 06:59:24 +00:00
Alejandro Piñeiro	88ca89bea9	broadcom/compiler: disable tmu pipelining when needed disable_tmu_pipelining has been recently set to false on two strategies that should set it to true. Fixes the following CTS test: dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite Fixes: `c950098ab` - broadcom/compiler: move buffer loads to lower register pressure Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23207>	2023-05-24 15:17:03 +00:00
Alejandro Piñeiro	470b8567a5	broadcom/compiler: return NULL if we fail to register allocate Right now if we fail to register allocate, we return the qpu_insts that we had at that point, even if the driver can't really use it. Also v3dv_pipeline was already assuming that it would return NULL on failure, returning VK_ERROR_UNKNOWN on that case. This allows CTS tests with a lot of pressure, that regress now and then to not being able to allocate, to finish with an error, instead of blocking forever. For example: dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23203>	2023-05-24 14:19:12 +00:00
Iago Toral Quiroga	e401add741	broadcom/compiler: skip jumps in non-uniform if/then when block cost is small We have an optimization for non-uniform if/else where if all channels meet the jump condition we emit a branch to jump straight to the ELSE block. Similarly, if at the end of the THEN block we don't have any channels that would execute the ELSE block, we emit a branch to jump straight to the AFTER block. This optimization has a cost though: we need to emit the condition for the branch and a branch instruction (which also comes with a 3 delay slot), so for very small blocks (just a couple of ALU for example) emitting the branch instruction is typically worse. Futher, if the condition for the branch is not met, we still pay the cost for no benefit at all. Here is an example: nop ; fmul.ifa rf26, 0x3e800000, rf54 xor.pushz -, rf52, 2 ; nop bu.alla 32, r:unif (0x00000000 / 0.000000) nop ; nop nop ; nop nop ; nop xor.pushz -, rf52, 3 ; nop nop ; mov.ifa rf52, 0 nop ; mov.pushz -, rf52 nop ; mov.ifa rf26, 0x3f800000 The bu instruction here is setup to jump over the following 4 instructions (the last 4 instructions in there). To do this, we pay the price of the xor to generate the condition, the bu instruction, and the 3 delay slots right after it, so we end up paying 6 instructions to skip over 4 which we pay always, even if the branch is not taken and we still have to execute those 4 instructions. With this change, we produce: nop ; fmul.ifa rf56, 0x3e800000, rf28 xor.pushz -, rf9, 3 ; nop nop ; mov.ifa rf9, 0 nop ; mov.pushz -, rf9 nop ; mov.ifa rf56, 0x3f800000 Now we don't try to skip the small block, ever. At worse, if all channels would have met the branch condition, we only pay the cost of the 4 instructions instead of 6, at best, if any channel wouldn't take the branch, we save ourselves 5 cycles for the branch condition, the branch instruction and its 3 delay slots. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23161>	2023-05-22 09:23:41 +00:00
Alyssa Rosenzweig	01e9ee79f7	nir: Drop unused name from nir_ssa_dest_init Since `624e799cc3` ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA defs don't have names, making the name argument unused. Drop it from the signature and fix the call sites. This was done with the help of the following Coccinelle semantic patch: @@ expression A, B, C, D, E; @@ -nir_ssa_dest_init(A, B, C, D, E); +nir_ssa_dest_init(A, B, C, D); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>	2023-05-17 23:46:16 +00:00
Alyssa Rosenzweig	c323762f9f	treewide: Stop lowering legacy atomics There are no more producers of legacy atomics so these calls are inert. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23036>	2023-05-16 22:36:21 +00:00
Iago Toral Quiroga	66b3d34633	broadcom/compiler: use unified atomics Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22939>	2023-05-15 07:43:09 +00:00
Iago Toral Quiroga	994ad351f7	broadcom/compiler: increase peephole limit to 24 instructions This helps by reducing the number of branches with their corresponding delay slots, at the expense of additional register pressure. It also helps a lot with SFU stalls, probably because removing control-flow blocks gives us more QPU scheduling flexibility to hide them. Shader-db results below correspond to the "closed shaders" set, since the full set is very dominated by the massive impact this change has on Skia's shaders (for the better), so this is probably more representative of real impact: total instructions in shared programs: 11887255 -> 11854898 (-0.27%) instructions in affected programs: 538170 -> 505813 (-6.01%) helped: 1653 HURT: 43 Instructions are helped. total threads in shared programs: 385924 -> 385872 (-0.01%) threads in affected programs: 236 -> 184 (-22.03%) helped: 22 HURT: 48 Inconclusive result (%-change mean confidence interval includes 0). total uniforms in shared programs: 3552808 -> 3547894 (-0.14%) uniforms in affected programs: 157486 -> 152572 (-3.12%) helped: 1673 HURT: 35 Uniforms are helped. total max-temps in shared programs: 2062403 -> 2064720 (0.11%) max-temps in affected programs: 18209 -> 20526 (12.72%) helped: 168 HURT: 369 Max-temps are HURT. total spills in shared programs: 1937 -> 1994 (2.94%) spills in affected programs: 79 -> 136 (72.15%) helped: 0 HURT: 1 total fills in shared programs: 2652 -> 2717 (2.45%) fills in affected programs: 115 -> 180 (56.52%) helped: 0 HURT: 1 total sfu-stalls in shared programs: 19349 -> 18010 (-6.92%) sfu-stalls in affected programs: 2321 -> 982 (-57.69%) helped: 674 HURT: 74 Sfu-stalls are helped. total inst-and-stalls in shared programs: 11906604 -> 11872908 (-0.28%) inst-and-stalls in affected programs: 541339 -> 507643 (-6.22%) helped: 1656 HURT: 43 Inst-and-stalls are helped. total nops in shared programs: 245740 -> 238085 (-3.12%) nops in affected programs: 19282 -> 11627 (-39.70%) helped: 1335 HURT: 76 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22922>	2023-05-10 11:11:38 +00:00
Iago Toral Quiroga	c950098abb	broadcom/compiler: move buffer loads to lower register pressure If we are trying to lower register pressure this can make a big difference in some cases. To avoid adding even more strategies, merge this with disabling ubo load sorting, since they are basically trying to do the same. total instructions in shared programs: 12848024 -> 12844510 (-0.03%) instructions in affected programs: 236537 -> 233023 (-1.49%) helped: 195 HURT: 87 Instructions are helped. total uniforms in shared programs: 3815601 -> 3814932 (-0.02%) uniforms in affected programs: 31773 -> 31104 (-2.11%) helped: 67 HURT: 115 Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 2210803 -> 2210622 (<.01%) max-temps in affected programs: 9362 -> 9181 (-1.93%) helped: 114 HURT: 34 Max-temps are helped. total spills in shared programs: 2556 -> 2330 (-8.84%) spills in affected programs: 1391 -> 1165 (-16.25%) helped: 39 HURT: 9 total fills in shared programs: 3840 -> 3317 (-13.62%) fills in affected programs: 2379 -> 1856 (-21.98%) helped: 39 HURT: 23 total sfu-stalls in shared programs: 21965 -> 21978 (0.06%) sfu-stalls in affected programs: 2618 -> 2631 (0.50%) helped: 45 HURT: 81 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 12869989 -> 12866488 (-0.03%) inst-and-stalls in affected programs: 238771 -> 235270 (-1.47%) helped: 193 HURT: 87 Inst-and-stalls are helped. total nops in shared programs: 303501 -> 303274 (-0.07%) nops in affected programs: 4159 -> 3932 (-5.46%) helped: 87 HURT: 105 Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22824>	2023-05-03 13:01:58 +00:00
Iago Toral Quiroga	9f522ac0c6	broadcom/compiler: don't allocate undef to rf0 rf0 is affected by restrictions in some scenarios so we rather use a register that does not cause conflicts for scheduling. total instructions in shared programs: 12850958 -> 12848024 (-0.02%) instructions in affected programs: 331974 -> 329040 (-0.88%) helped: 2559 HURT: 201 Instructions are helped. total max-temps in shared programs: 2210893 -> 2210803 (<.01%) max-temps in affected programs: 1486 -> 1396 (-6.06%) helped: 96 HURT: 7 Max-temps are helped. total sfu-stalls in shared programs: 21975 -> 21965 (-0.05%) sfu-stalls in affected programs: 32 -> 22 (-31.25%) helped: 16 HURT: 6 Sfu-stalls are helped. total inst-and-stalls in shared programs: 12872933 -> 12869989 (-0.02%) inst-and-stalls in affected programs: 332036 -> 329092 (-0.89%) helped: 2560 HURT: 189 Inst-and-stalls are helped. total nops in shared programs: 305911 -> 303501 (-0.79%) nops in affected programs: 11215 -> 8805 (-21.49%) helped: 2131 HURT: 3 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22797>	2023-05-03 05:39:35 +00:00
Iago Toral Quiroga	0468ce3791	broadcom/compiler: try harder to merge thread switch earlier We have been stopping as soon as we find a conflict but that doesn't mean we can't merge it in an earlier slot, so keep going. Going by shader-db, this sometimes allows us to merge the final thrsw a bit earlier and avoid emitting NOP instructions at the program end to make up for its delay slots. I have not observed cases where this helps with regular thrsw though, but it doesn't hurt to try with those too. total instructions in shared programs: 11526876 -> 11526354 (<.01%) instructions in affected programs: 10760 -> 10238 (-4.85%) helped: 236 HURT: 0 Instructions are helped. total max-temps in shared programs: 2231705 -> 2231677 (<.01%) max-temps in affected programs: 276 -> 248 (-10.14%) helped: 27 HURT: 0 Max-temps are helped. total inst-and-stalls in shared programs: 11545177 -> 11544655 (<.01%) inst-and-stalls in affected programs: 10777 -> 10255 (-4.84%) helped: 236 HURT: 0 Inst-and-stalls are helped. total nops in shared programs: 321624 -> 321152 (-0.15%) nops in affected programs: 751 -> 279 (-62.85%) helped: 236 HURT: 0 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22679>	2023-04-27 08:43:29 +00:00
Iago Toral Quiroga	c2003535b9	broadcom/compiler: return early for SFU op latency calculation Since we are returning a fixed latency for these check for them earlier and return early if they match. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22675>	2023-04-25 11:15:34 +02:00
Iago Toral Quiroga	148473eae4	broadcom/compiler: fix incorrect ALU checks We had a bunch of cases where we would check ALU parameters without first checking if the ALU op was valid. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22675>	2023-04-25 11:15:26 +02:00
Iago Toral Quiroga	18a3a0d915	broadcom/compiler: fix incorrect check for SFU op Before testing the waddr for SFU we should first validate this is indeed a valid (not NOP) magic write. Use the helper we have for this which gets this right. total instructions in shared programs: 12898957 -> 12850958 (-0.37%) instructions in affected programs: 4328937 -> 4280938 (-1.11%) helped: 19974 HURT: 439 Instructions are helped. total max-temps in shared programs: 2211503 -> 2210893 (-0.03%) max-temps in affected programs: 12924 -> 12314 (-4.72%) helped: 509 HURT: 20 Max-temps are helped. total sfu-stalls in shared programs: 22233 -> 21975 (-1.16%) sfu-stalls in affected programs: 722 -> 464 (-35.73%) helped: 297 HURT: 54 Sfu-stalls are helped. total inst-and-stalls in shared programs: 12921190 -> 12872933 (-0.37%) inst-and-stalls in affected programs: 4337977 -> 4289720 (-1.11%) helped: 20015 HURT: 404 Inst-and-stalls are helped. total nops in shared programs: 333743 -> 305911 (-8.34%) nops in affected programs: 86902 -> 59070 (-32.03%) helped: 14545 HURT: 76 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22593>	2023-04-24 09:34:20 +00:00
Harri Nieminen	c3c63cb1d8	broadcom: fix typos Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22591>	2023-04-21 17:19:46 +00:00
Iago Toral Quiroga	9217c565b2	v3d,v3dv: stop trying to force 16-bit TMU output for shadow comparisons In V3D we were doing this incorrectly by peeking into the sampler state unconditionally, which is not correct if the TMU operations don't use sampler state at all (like PBOs). This was causing us to fail the second test in this sequence when both tests run back back to back in the same process: dEQP-GLES3.functional.texture.shadow.2d.linear.greater_or_equal_depth_component32f dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rg32f_cube Here, the first test would setup sampler state for shadow comparisons and the second test would setup a PBO upload, which would incorrectly pick up the sampler state to decide about the TMU output size for the PBO operation. In V3DV we were doing this right looking through each texture/sampler instruction and checking if they all involved shadow comparisons or had relaxed precission, defaulting to 32-bit otherwise. This special-casing for shadow comparisons also leaks from drivers into the compiler where we are forced to emit some pieces of sampler state for 32-bit outputs, so we had to special-case shadow instructions there as well and we also had a fix for CS textures not having correct sampler state representing shadow operations too. Finally, we also had at least a couple of bugs where forcing 32-bit TMU output through V3D_DEBUG wasn't correctly forcing shadow comparisons to actually be 32-bit in all the right places, leading to visual bugs with the option enabled (Sponza being one example of this). This change eliminates all of these issues. Finally, the performance improvement observed from special casing shadow comparison is negligible, and in specific scenarios it can even be detrimental to performance due to increased register pressure (Sponza with PCF filtering set to 4 is an example of this again). Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8684 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22284>	2023-04-05 06:52:51 +00:00
Iago Toral Quiroga	1e28f2a6f2	broadcom/compiler: track pending ldtmu count with each TMU lookup And use this information when scheduling QPU to avoid merging a new TMU request into a previous ldtmu instruction when doing so may cause TMU output fifo overflow due to a stalling ldtmu. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22044>	2023-03-21 11:29:05 +00:00
Daniel Schürmann	2bb369dd8d	nir: add assertions that loops don't have a Continue Construct Hoping that I didn't miss any, this should add assertions to all functions and passes which explicitly handle 'nir_loop'. Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>	2023-02-21 10:41:11 +00:00
Alejandro Piñeiro	1a1fa2393e	v3d/v3dv: use shader_info->var_copies_lowered Instead of passing allow_copies as a parameter for v3d_optimize_nir (so manually doing that tracking). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>	2023-02-06 22:11:34 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Alejandro Piñeiro	2901066980	broadcom/compiler: fix indentation at v3d_nir_lower_image_load_store Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20972>	2023-01-30 21:57:45 +00:00
Alejandro Piñeiro	b56be4c37e	broadcom/compiler: treat PIPE_FORMAT_NONE as 32-bit formats for output type Needed to support Vulkan feature shaderStorageImageReadWithoutFormat. With that enabled we could receive a NONE format on a load image. For those we treat them as 32-bit formats, that would mean that the lowering would not need to do any format-specific unpacking. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>	2023-01-18 13:09:57 +00:00
Alejandro Piñeiro	41a081380a	broadcom/compiler: v3d_nir_lower_txf_ms doesn't need v3d_compile Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>	2023-01-18 13:09:57 +00:00
Iago Toral Quiroga	9bf525b4bd	broadcom/compiler: produce better code for f2f16 with RTZ rounding Suggested by Georg Lehmann, this generates far less code and should be more correct. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8090 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20721>	2023-01-16 12:03:24 +01:00
Iago Toral Quiroga	22ef66bcc9	v3d/compiler: remove unused sample_coverage field from fs key. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20634>	2023-01-11 10:54:05 +00:00
Iago Toral Quiroga	f40afe9883	v3d: add a debug option to optimize shader compile times Particularly, this makes compilation stop as soon as we get a valid shader and doesn't try to optimize spilling by trying fallback strategies. Might come in handy to reduce CTS execution time, for example, dEQP-VK.ssbo.layout.random.8bit.all_per_block_buffers.6 goes from 43m46.715s down to 15m15.068s. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20601>	2023-01-11 10:25:28 +00:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Iago Toral Quiroga	1174f37609	broadcom/compiler: avoid using ldvary sequence to hide latency of branching This can cause us to stomp the contents of r5 before we have a chance to read it, like this: 0x3d103186bb800000 nop ; nop ; ldvary.r0 0x3d105686bbf40000 nop ; mov rf26, r5 ; ldvary.r1 0x020000ef0000d000 bu.allna 232, r:unif (0x0000001c / 0.000000) 0x3d1096c6bbf40000 nop ; mov rf27, r5 ; ldvary.r2 Here, the MOV in the last instruction is supposed to read r5 produced from ldvary.r0, but because we have inserted the bu instruction in between now that read happens at the same time that ldvary.r1 updates r5, stomping the value we were supposed to read. Fix this by disallowing injection of a branch instruction in between an ldvary instruction and its write to the r5 register 2 instructions later. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7062 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19616>	2022-11-09 20:51:25 +00:00
Iago Toral Quiroga	c7150ad8e6	broadcom/compiler: drop unused v3d_compile parameter for nir pass Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19519>	2022-11-04 09:58:10 +00:00
Iago Toral Quiroga	8cd50ef071	broadcom/compiler: handle vec2 load/store index In vulkan, we load descriptors via vulkan resource index, which returns a vec2, of which we want component 0 which holds the actual index. Typically, this will be cleaned-up by the time we get to emitting VIR so the index is a single scalar component, but there are some cases where this might no be the case, so make sure we don't assume it to be a scalar, like we do in other places. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>	2022-10-28 08:23:32 +02:00
Iago Toral Quiroga	24d9a80247	v3dv: implement VK_EXT_pipeline_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>	2022-10-27 08:17:11 +00:00
Iago Toral Quiroga	1a2ca58aed	v3dv: use NIR_PASS with v3d_nir_lower_robust_image_access Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>	2022-10-27 08:17:11 +00:00
Alejandro Piñeiro	019529aa11	broadcom/compiler: call nir_opt_gcm with a custom strategy nir_opt_gcm get us worse shader-db stats, but that is expected. But we want to prevent to get worse values on spill/fills. Analyzing the outcome with shader-db, this mostly happen with shaders that are already complex, and are already spilling/filling. So the best option here is adding a new strategy, that fall backs if we get spill/fill using nir_opt_gcm. It is not clear in which order we should disable gcm. For now we disable it before loop unrolling. We get a slight performance gain (in average) using nir_opt_gcm. We don't show the shaderdb stats, as they are worse, but as mentioned, this is expected. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	afc6de356a	broadcom/compiler: pass a strategy struct to vir_compile_init That allows to reduce the number of parameters of the method. And after all, they were already filled using an existing strategy struct. This would make easier adding new fields on a strategy. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	0bf31b0710	broadcom/compiler: add more lowerings/optimizations on v3d_optimize_nir Optimizations that we are already calling on the Vulkan driver. As preparation to the Vulkan frontend to use v3d_optimize_nir too. We need to add a new parameter to v3d_optimize_nir in order to know if we can call nir_opt_find_array_copies. As we don't track if we are calling nir_var_lower_copies, we explicitly call it when we create the uncompiled shader create. So instead of tracking, we assume that each driver (v3d/v3dv) would call it when the shader is created. So when v3d_optimize_nir is called as part of the process to compile it at the compiler, we call it with allow_copies as false. We exclude on purpose nir_opt_gcm as it is a case of a optimization that could help performance even if it hurts shader db stats. shaderdb stats: total instructions in shared programs: 11705923 -> 11705034 (<.01%) instructions in affected programs: 88350 -> 87461 (-1.01%) helped: 201 HURT: 80 Instructions are helped. total threads in shared programs: 375552 -> 375558 (<.01%) threads in affected programs: 6 -> 12 (100.00%) helped: 3 HURT: 0 total uniforms in shared programs: 3486108 -> 3485789 (<.01%) uniforms in affected programs: 7473 -> 7154 (-4.27%) helped: 90 HURT: 1 Uniforms are helped. total max-temps in shared programs: 2021860 -> 2021802 (<.01%) max-temps in affected programs: 800 -> 742 (-7.25%) helped: 21 HURT: 3 Max-temps are helped. total sfu-stalls in shared programs: 19299 -> 19296 (-0.02%) sfu-stalls in affected programs: 18 -> 15 (-16.67%) helped: 10 HURT: 7 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 11725222 -> 11724330 (<.01%) inst-and-stalls in affected programs: 88402 -> 87510 (-1.01%) helped: 201 HURT: 80 Inst-and-stalls are helped. total nops in shared programs: 269674 -> 269386 (-0.11%) nops in affected programs: 3641 -> 3353 (-7.91%) helped: 103 HURT: 29 Nops are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	9cbc3ab239	broadcom/compiler: update how we compute return_words_of_texture_data on non-ssa For the non-ssa case, we were trying to use reg->num_components. But this is not the same that nir_ssa_def_components_read. It is the number of components of the destination register. And in the 16bit case, even if nir_lower_tex packs the outcome, it doesn't update the number of components, as nir_tex_instr_dest_size would still return 4. And nir validate would check that those values are the same. So this change focuses on the last part of this comment at nir_lower_tex: * Note that we don't change the destination num_components, because * nir_tex_instr_dest_size() will still return 4. The driver is just * expected to not store the other channels, given that nothing at the * NIR level will read them. We just limit how many channels we would use for the f16 case. It is also worth to note, based on the CTS and different applications we test, that this is a corner case. This was detected when we experimented to enable nir_opt_gcm for v3d, that lead to raise an assertion slightly below with some shaderdb tests, but technically it could happen without it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Alejandro Piñeiro	ec10a37a52	broadcom/compiler: don't call nir_opt_load_store_vectorize on all v3d_optimize_nir calls For compute shaders, to avoid a crash with that optimization, it requires doing some optimizations and lowerings before. Example: static void lower_cs_shared(struct nir_shader *nir) { NIR_PASS_V(nir, nir_lower_vars_to_explicit_types, nir_var_mem_shared, shared_type_info); NIR_PASS_V(nir, nir_lower_explicit_io, nir_var_mem_shared, nir_address_format_32bit_offset); } In the same way other drivers (like anv) calls nir_opt_load_store_vectorize as part of their post-process-nir. So one option would be to move nir_opt_load_store_vectorize outsize the common v3d_nir_optimize, to a post-process nir method. To make things simpler, this change calls that optimization only if we have a v3d_compiler object, that is when each frontend has already done their lowerings, and call the v3d_compiler to get the final assembly (so we are already on a kind of post processing nir step). This avoids dEQP-VK.memory_model.shared.basic_types.3 crashing if we start to call v3d_optimize_nir on v3dv directly. Slight shaderdb changes, but not significant. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>	2022-10-26 12:29:30 +00:00
Iago Toral Quiroga	b6093ffbe7	v3dv: expose VK_EXT_image_robustness Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	c7e022abfd	broadcom/compiler: add a lowering for robust image access Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	adcfd9bc2f	broadcom/compiler: rename static helpers involved with robust buffer access To make it explicit that they involve buffers, since we will be adding robust image access shortly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	5e5eaa3f1a	broadcom/compiler: rename v3d_nir_lower_robust_buffer_access.c We are going to add code to handle image robustness shortly, so better rename this to v3d_nir_lower_robust_access.c Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>	2022-09-27 09:08:29 +00:00
Iago Toral Quiroga	4ea916f704	broadcom/compiler: don't apply robust buffer access to shared variables This feature is only concerned with buffers bound through a descriptor set. We are still keeping the code for this (disabled by default) since it may be useful for debugging some scenarios. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	44b02b5cb1	broadcom/compiler: handle shared stores with robust buffer access For some reason we supported all shared intrinsics but this one. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	b2bce9c98a	broadcom/compiler: fix robust buffer access Our implemention was bogus, it was only putting a cap on the offset based on the aligned buffer size and this doesn't ensure the access to the buffer happens within its valid range. I think the only reason we have been passing the tests is that we align all buffers sizes to 256B and the tests create buffers with a size that is smaller than that (like 64B). When get the size of the buffer from the shader, we get the actual bound range (so 64B in this case) and by capping to that we don't ensure the access will stay within that range, but we ensure it will stay within the underlying memory bound to the buffer (256B), and this is fine by the spec, however, I think if the actual buffer range was the same as the underlying allocation we would fail the tests. A valid behavior for robust buffer access on an out-of-bounds access is to return any valid bytes within the buffer, so we can just make that offset 0. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18744>	2022-09-23 06:27:54 +00:00
Iago Toral Quiroga	15cdf5bb48	v3dv: optimize ldunif load into unifa write If we emit a ldunif to load the ubo/ssbo base address and then we are immediately moving it to the unifa register we can have the ldunif write directly to unifa and avoid the mov in between, which won't be done by copy propagation because that only works with temp registers. Also, since we can't read from unifa we must be careful to disallow reuse of the ldunif result for a future ldunif of the same base address. We do that by only reusing ldunif results from temp registers. total instructions in shared programs: 12468943 -> 12455139 (-0.11%) instructions in affected programs: 1661233 -> 1647429 (-0.83%) helped: 8307 HURT: 3994 total uniforms in shared programs: 3704532 -> 3704522 (<.01%) uniforms in affected programs: 339 -> 329 (-2.95%) helped: 7 HURT: 0 total max-temps in shared programs: 2148158 -> 2148290 (<.01%) max-temps in affected programs: 9320 -> 9452 (1.42%) helped: 175 HURT: 295 total spills in shared programs: 2202 -> 2202 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 3059 -> 3057 (-0.07%) fills in affected programs: 27 -> 25 (-7.41%) helped: 1 HURT: 0 total sfu-stalls in shared programs: 21167 -> 21056 (-0.52%) sfu-stalls in affected programs: 497 -> 386 (-22.33%) helped: 209 HURT: 127 total inst-and-stalls in shared programs: 12490110 -> 12476195 (-0.11%) inst-and-stalls in affected programs: 1662875 -> 1648960 (-0.84%) helped: 8312 HURT: 3987 total nops in shared programs: 316563 -> 313553 (-0.95%) nops in affected programs: 24269 -> 21259 (-12.40%) helped: 2158 HURT: 1006 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Iago Toral Quiroga	cbc5169ef9	broadcom/compiler: check signal writes to magic regs when updating scoreboard We have only been checking magic writes from ADD and MUL ports, but signals can potentially write to magic registers too. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18667>	2022-09-20 06:56:28 +00:00
Eric Engestrom	5bfca00d31	broadcom: fix dependencies in static_library() calls The first argument is the name of the library, and the second argument is the list of files; those two got a bit mixed up. Fixes: `1ae8018a6a` ("meson: Add support for the vc4 driver.") Fixes: `4f3e380fa0` ("meson: Add support for the vc5 driver.") Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18593>	2022-09-14 09:38:28 +00:00

1 2 3 4 5 ...

702 commits