fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-01 08:08:06 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	21b0a4c80c	v3dv: don't lower vulkan resource index result to scalar The intrinsic produces a vec2, so let's honor that and avoid the weird lowering to scalar and later reconstruction to vec2 when we find load vulkan descriptor intrinsics. It fixes tests like this (which require that we expose KHR_spirv_1_4): dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.null_comparisons_ssbo_equal that otherwise produce bad code that tries to access a vec2 from the result of that intrinsic, leading to NIR validation errors. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11257>	2021-06-10 05:47:29 +00:00
Caio Marcelo de Oliveira Filho	8af6766062	nir: Move workgroup_size and workgroup_variable_size into common shader_info Move it out the "cs" sub-struct, since these will be used for other shader stages in the future. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11225>	2021-06-08 09:23:55 -07:00
Caio Marcelo de Oliveira Filho	c8a7bd0dc8	nir: Rename WORK_GROUP (and similar) to WORKGROUP Be consistent with other usages in Vulkan and SPIR-V, and the recently added workgroup_size field. Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>	2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho	430d2206da	compiler: Rename local_size to workgroup_size Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>	2021-06-07 22:34:42 +00:00
Eric Anholt	ec3bc5da74	v3d: Use the ra_alloc_contig_reg_class() function to speed up RA. It means we don't need to do the n^2 loop over the regs to set up the pq values, nor do we need the register conflicts lists. Acked-by: Erico Nunes <nunes.erico@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Eric Anholt	95d41a3525	ra: Use struct ra_class in the public API. All these unsigned ints are awful to keep track of. Use pointers so we get some type checking. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Alejandro Piñeiro	4e9f1261ee	broadcom/compiler: use proper type field for atomic operations We were using the num_components to infer it, but in the end it is VEC2 for CMPXCHG and 32BIT for anything else. This doesn't affect any test with the real hw, but fixes an assert with the last version of the simulator. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:22 +02:00
Iago Toral Quiroga	f07c797e93	v3dv: implement vkCmdDispatchBase This was added with VK_KHR_device_group and allows users to specify a base offset that will be automatically added to gl_WorkGroupID. Unfortunately, V3D doesn't support this natively, so we need to add the base to the workgroup id generated by hardware manually. For this, we inject add instructions that source from a QUNIFORM that will retrieve the actual dispatch base from the compute job when it is dispatched. Since a compute shader can be dispatched with CmdDispatch and/or CmdDispatchBase, we always need to add these additional add instructions and use a base of (0,0,0) for regular dispatches. Since we don't support any version of OpenGL with this dispatch base functionality we can avoid the extra instructions there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Iago Toral Quiroga	39c41169ba	broadcom/compiler: consider RT component size when lowering logic ops in Vulkan In Vulkan we configure our integer RTs to clamp automatically, so with logic operations we need to be careful and avoid overflows by discarding any bits that won't fit in the RT component size. Fixes remaining CTS test failures in: dEQP-VK.pipeline.logic_op.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Iago Toral Quiroga	df7185d0d1	broadcom/compiler: don't emit TLB loads for components that don't exist This avoids debug builds to assert crash. Components that don't exist won't be used and will be eventually DCEd, so simply lower them to 0. Fixes CTS tests like these in debug builds: dEQP-VK.pipeline.logic_op.r8_uint.clear dEQP-VK.pipeline.logic_op.r8_uint.and dEQP-VK.pipeline.logic_op.r8_uint.and_reverse dEQP-VK.pipeline.logic_op.r8_uint.copy dEQP-VK.pipeline.logic_op.r8_uint.and_inverted dEQP-VK.pipeline.logic_op.r8_uint.no_op dEQP-VK.pipeline.logic_op.r8_uint.xor dEQP-VK.pipeline.logic_op.r8_uint.or Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Connor Abbott	a40714abf7	nir/lower_phis_to_scalar: Add "lower_all" option We don't want to have to deal with vector phis in freedreno, because vectors are always split/unsplit around vectorized instructions anyways, and the stated reason for not scalarising them (it hurting coalescing) won't apply to us because we won't be using nir_from_ssa. Add this option so that we don't have to do the equivalent thing while translating from NIR. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>	2021-05-17 09:59:45 +00:00
Iago Toral Quiroga	6f93354bae	broadcom/compiler: clarify PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR setting We enabled this in the past to fix some register allocation issues we faced with geometry shaders but we didn't document why it is safe for us to do this, which is not immediately obvious. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10745>	2021-05-11 12:26:19 +02:00
Iago Toral Quiroga	f0fef41917	broadcom/compiler: don't unroll due to indirect indexing of outputs We can handle this natively now, so there is no point. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	0235ed18a7	broadcom/compiler: don't use nir_src_is_dynamically_uniform Now that we have divergence analysis we should use that. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	cb39dca2d3	broadcom/compiler: make vir_VPM_WRITE_indirect handle non-uniform offsets Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	f71893a942	broadcom/compiler: implement non-uniform offset on vertex outputs Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	067ad7eccc	broadcom/compiler: move vertex shader output handling to its own function Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Juan A. Suarez Romero	54ec9c95cf	broadcom/compiler: fix dynamic-stack-buffer-overflow error When spilling a register, the number of temps can be increased when introducing a temporal variable. Those nodes are not elegible to be spilled, but we need to take care of no accessing out-of-bounds of the arrays defined with a size equal to the original number of temps. Fixes address sanitizer error on KHR-GLES3.shaders.uniform_block.random.all_shared_buffer.14 (and many others). v2 (Iago): - Add clarification in assertion. - Use `vir_get_temp` to increase num_temps. v3 (Iago): - Update clarification Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10643>	2021-05-11 07:46:17 +00:00
Iago Toral Quiroga	d81a6e5f1d	broadcom/compiler: change register allocation policy for accumulators The current policy is to always favor accumulators if possible, however, this is not always optimal. Particularly, accumulators play a crucial role in enabling QPU instruction merges, since these are limited to both the ADD and the ALU instructions addressing at most 2 physical registers. For 2-src instructions, this means that to be able to merge we need them to address at least 2 accumulators. While favoring accumulators does help the case for instruction merges in general, it is risky to assign accumulators to variables that have long life spans. Doing so will make the accumulator unavailable for any other instructions during that life span, and since we only have a few accumulators, we can quickly run out and losing our capacity to merge instructions for large parts of the qpu program. On the other hand, we also want to avoid the extreme case were we keep allocating physical registers to the point we run out, even if we have accumulators available, since accumulators have additional restrictions and may not be suitable for everything. This change continues the policy of favoring accumulators, but it only does so if the life span of the temps is short, to ensure that we can recycle accumulators often across instructions and avoid running out for sections of the QPU code, unless we are already running out of physical registers. total instructions in shared programs: 13654647 -> 13336921 (-2.33%) instructions in affected programs: 11015919 -> 10698193 (-2.88%) helped: 39758 HURT: 17325 Instructions are helped. total threads in shared programs: 412046 -> 412038 (<.01%) threads in affected programs: 16 -> 8 (-50.00%) helped: 0 HURT: 4 Threads are HURT. total uniforms in shared programs: 3745726 -> 3746003 (<.01%) uniforms in affected programs: 17296 -> 17573 (1.60%) helped: 76 HURT: 99 Uniforms are HURT. total max-temps in shared programs: 2364430 -> 2359942 (-0.19%) max-temps in affected programs: 109117 -> 104629 (-4.11%) helped: 2893 HURT: 772 Max-temps are helped. total spills in shared programs: 5727 -> 5746 (0.33%) spills in affected programs: 221 -> 240 (8.60%) helped: 1 HURT: 2 total fills in shared programs: 13121 -> 13139 (0.14%) fills in affected programs: 466 -> 484 (3.86%) helped: 1 HURT: 2 total sfu-stalls in shared programs: 33432 -> 34491 (3.17%) sfu-stalls in affected programs: 18219 -> 19278 (5.81%) helped: 4459 HURT: 5087 Inconclusive result total inst-and-stalls in shared programs: 13688079 -> 13371412 (-2.31%) inst-and-stalls in affected programs: 11030017 -> 10713350 (-2.87%) helped: 39630 HURT: 17429 Inst-and-stalls are helped. total nops in shared programs: 335753 -> 333708 (-0.61%) nops in affected programs: 112659 -> 110614 (-1.82%) helped: 8726 HURT: 7383 Inconclusive result Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10686>	2021-05-08 13:15:42 +02:00
Iago Toral Quiroga	c11e479852	broadcom/compiler: specify maximum thread count in compile strategies Once we have exhausted compile strategies at 4 threads and we start enabling lower thread counts, there is no point in starting compiles with 4 threads for them, we know these will fail, so let's start at 2 in these cases. This also has another nice implication: if the driver compiles at 4 threads and fails to register allocate, we were allowing it to try with 2 threads, but this would only retry the register allocation process and would not really recompile the shader with 2 threads. This is not optimal, because at 2 threads we have more TMU fifo space for each thread and we can do more TMU pipelining, so we were missing that opportunity. This improves performance in Sponza by ~1.5% and also seems to help UE4 slightly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	d19ce36ff2	broadcom/compiler: refactor compile strategies Until now, if we can't compile at 4 threads we would lower thread count with optimizations disabled, however, lowering thread count doubles the amount of registers available per thread, so that alone is already a big relief for register pressure so it makes sense to enable optimizations when we do that, and progressively disable them until we enable spilling as a last resort. This can slightly improve performance for some applications. Sponza, for example, gets a ~1.5% boost. I see several UE4 shaders that also get compiled to better code at 2 threads with this, but it is more difficult to assess how much this improves performance in practice due to the large variance in frame times that we observe with UE4 demos. Also, if a compiler strategy disables an optimization that did not make any progress in the previous compile attempt, we would end up re-compiling the exact same shader code and failing again. This, patch keeps track of which strategies won't make progress and skips them in that case to save some CPU time during shader compiles. Care should be taken to ensure that we try to compile with the default NIR scheduler at minimum thread count at least once though, so a specific strategy for this is added, to prevent the scenario where no optimizations are used and we skip directly to the fallback scheduler if the default strategy fails at 4 threads. Similarly, we now also explicitly specify which strategies are allowed to do TMU spills and make sure we take this into account when deciding to skip strategies. This prevents the case where no optimizations are used in a shader and we skip directly to the fallback scheduler after failing compilation at 2 threads with the default NIR scheduler but without trying to spill first. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	296fe4daa6	broadcom/compiler: add a compiler strategy to disable loop unrolling Loop unrolling can increase register pressure significantly, leading to lower thread counts and spilling. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	4742300e6b	v3d: move NIR compiler options to GL driver The Vulkan driver was already creating and using its own set of options, so the ones defined in the compiler are only used with GL, which is confusing. Move them to the GL driver. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	ec72b876fe	broadcom/compiler: add a loop unrolling pass Right now this is useful for Vulkan onnly, because GL gets loop unrolling from the GLSL compiler and/or mesa state tracker NIR front-ends. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:24:29 +02:00
Iago Toral Quiroga	f099fc3e07	v3d: choose a larger CSD supergroup size if possible Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>	2021-05-04 15:53:23 +00:00
Iago Toral Quiroga	f514280524	broadcom/compiler: track if a shader has control barriers in prog_data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>	2021-05-04 15:53:23 +00:00
Juan A. Suarez Romero	64943f2063	broadcom/compiler: use VPM offsets in GS load_per_vertex input Vertex Shader has a store_out lowering pass that converts gallium driver locations in offsets inside the VPM. One of the consequences is that these offsets are consecutives; that is, if the VS stores VARYING_SLOT_VAR0.xyz and VARYING_SLOT_VAR1.xyzw, there isn't a hole in the VPM offsets for the un-stored VARYING_SLOT_VAR0.w. Thus we need to change how the VPM offset is computed in the Geometry Shader when loading the inputs. This bug is exposed by !9050. v2 (Iago): - Include explanatory comment. - Use assert. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10129>	2021-04-13 16:08:00 +00:00
Rhys Perry	a2619b97f5	nir/lower_idiv: add options to use fp32 for 8-bit division lowering Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>	2021-04-12 16:19:46 +00:00
Juan A. Suarez Romero	e7f4f1b582	broadcom/compiler: use signed pointers for packed condition `qpu.raddr_b` is an unsigned int, so it is always positive, even after casting to signed int. Fixes CID#1438117 "Operands don't affect result (CONSTANT_EXPRESSION_RESULT)": "result_independent_of_operands: (int)inst->qpu.raddr_b >= -16 is always true regardless of the values of its operands. This occurs as the logical first operand of "&&". v2: - Use signed pointers (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10131>	2021-04-12 15:22:05 +00:00
Iago Toral Quiroga	0a3bfacabb	broadcom/compiler: rename unifa tracking fields The term 'last' may be misleading because the offset represents the current unifa offset, which is the offset used by the last load plus 4 bytes, so rename these to use the term 'current' instead. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	8998666de7	broadcom/compiler: sort constant UBO loads by index and offset This implements a NIR pass that groups together constant UBO loads for the same UBO index in order of increasing offset when the distance between them is small enough that it enables the "skip unifa write" optimization. This may increase register pressure because it can move UBO loads earlier, so we also add a compiler strategy fallback to disable the optimization if we need to drop thread count to compile the shader with this optimization enabled. total instructions in shared programs: 13557555 -> 13550300 (-0.05%) instructions in affected programs: 814684 -> 807429 (-0.89%) helped: 4485 HURT: 2377 Instructions are helped. total uniforms in shared programs: 3777243 -> 3760990 (-0.43%) uniforms in affected programs: 112554 -> 96301 (-14.44%) helped: 7226 HURT: 36 Uniforms are helped. total max-temps in shared programs: 2318133 -> 2333761 (0.67%) max-temps in affected programs: 63230 -> 78858 (24.72%) helped: 23 HURT: 3044 Max-temps are HURT. total sfu-stalls in shared programs: 32245 -> 32567 (1.00%) sfu-stalls in affected programs: 389 -> 711 (82.78%) helped: 139 HURT: 451 Inconclusive result. total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%) inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%) helped: 4478 HURT: 2395 Inst-and-stalls are helped. total nops in shared programs: 354365 -> 342202 (-3.43%) nops in affected programs: 31000 -> 18837 (-39.24%) helped: 4405 HURT: 265 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	fb2214a441	broadcom/compiler: allow compilation strategies to limit minimum thread count This adds a minimum thread count parameter to each compilation strategy with the intention to limit the minimum allowed thread count that can be used to register allocate with that strategy. For now all strategies allow the minimum thread count supported by the hardware, but we will be using this infrastructure to impose a more strict limit in an upcoming optimization. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	4b244dc64f	broadcom/compiler: add a definition for the unifa skip distance We will be using this distance to setup another optimization in a follow-up patch. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> x# Please enter the commit message for your changes. Lines starting Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Juan A. Suarez Romero	cc8d4cd1ae	broadcom/compiler: fix first_component assertion first_component is an uint, and thus if it takes value 0 we can't know if it is because writemask has its first bit to 1, or all bits to 0. As we want to ensure that at least one bit is set, apply the assertion in writemask. Fixes CID#1472829 "Macro compares unsigned to 0 (NO_EFFECT)". v2: - Restore "first_component <= last_component" assertion (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10103>	2021-04-09 07:55:41 +00:00
Bas Nieuwenhuizen	580f1ac473	nir: Extract shader_info->cs.shared_size out of union. It is valid for all stages, just 0 for most of them. In particular mesh/task shaders might be using it. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10094>	2021-04-08 14:39:28 +00:00
Iago Toral Quiroga	9ca0e070e7	broadcom/compiler: optimize branch emission for uniform break/continue A break/continue in a loop is typically emitted like this: if (cond) { break/continue; } else { } If cond is uniform, we'll emit code for a uniform if statement and that will emit a branch right before the if to jump directly to the else (or the block after the else in this case, since the else is empty) in case cond evaluates to false. This means we end up emitting two consecutive branch instructions, one before the if and one for the THEN block right after: branch(!cond) -> jump to else (or after else) if cond is false nop nop nop branch -> unconditional jump to break/continue nop nop nop Instead, if we are in this scenario, we can do better by emitting the conditional jump directly and avoiding the "jump to else" case: branch(cond) -> jump to break/continue if cond is true nop nop nop We need to be careful when emitting the break/continue for the case where all lanes are disabled to avoid infinite loops: if we have a break we always want to take the jump, but we don't want to take it if it is a continue. total instructions in shared programs: 13563672 -> 13557348 (-0.05%) instructions in affected programs: 348034 -> 341710 (-1.82%) helped: 1158 HURT: 10 Instructions are helped. total uniforms in shared programs: 3779137 -> 3777535 (-0.04%) uniforms in affected programs: 90583 -> 88981 (-1.77%) helped: 1169 HURT: 0 Uniforms are helped. total max-temps in shared programs: 2317670 -> 2317575 (<.01%) max-temps in affected programs: 1943 -> 1848 (-4.89%) helped: 85 HURT: 4 Max-temps are helped. total sfu-stalls in shared programs: 32247 -> 32247 (0.00%) sfu-stalls in affected programs: 69 -> 69 (0.00%) helped: 7 HURT: 9 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 13595919 -> 13589595 (-0.05%) inst-and-stalls in affected programs: 350674 -> 344350 (-1.80%) helped: 1154 HURT: 11 Inst-and-stalls are helped. total nops in shared programs: 358202 -> 354325 (-1.08%) nops in affected programs: 17367 -> 13490 (-22.32%) helped: 1168 HURT: 1 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9948>	2021-04-05 06:38:19 +00:00
Iago Toral Quiroga	14843ccc33	broadcom/compiler: implement restriction for branch after setmsf Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9948>	2021-04-05 06:38:19 +00:00
Iago Toral Quiroga	8f7640293d	broadcom/compiler: try to fill up delay slots after unconditional branch If we have an unconditional branch then we can try to fill up its delay slots with the initial instructions of its successor block by copying them into the delay slots and adjusting the branch offset to skip the copied instructions. total nops in shared programs: 365640 -> 364471 (-0.32%) nops in affected programs: 15416 -> 14247 (-7.58%) helped: 462 HURT: 0 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>	2021-03-31 05:51:22 +00:00
Iago Toral Quiroga	e266e6c634	broadcom/compiler: try to fill up delay slots after a branch instruction For this we do something similar to what we do with thrsw where we try to move the branch instruction earlier so the previous instructions execute in the delay slots of the branch. Generally, we can do this with any instruction except: - If the instruction reads a uniform: since our branches do as well and uniforms come from an ordered FIFO stream. - If the instruction writes flags, since our branch instruction will probably read them. - If the instruction is in the delay slots of another thread switch, branch, or unifa write, which is disallowed. total instructions in shared programs: 13648140 -> 13613972 (-0.25%) instructions in affected programs: 2209552 -> 2175384 (-1.55%) helped: 6765 HURT: 0 Instructions are helped. total max-temps in shared programs: 2318687 -> 2318436 (-0.01%) max-temps in affected programs: 5046 -> 4795 (-4.97%) helped: 152 HURT: 0 Max-temps are helped. total inst-and-stalls in shared programs: 13680494 -> 13646326 (-0.25%) inst-and-stalls in affected programs: 2220394 -> 2186226 (-1.54%) helped: 6765 HURT: 0 Inst-and-stalls are helped. total nops in shared programs: 399818 -> 365640 (-8.55%) nops in affected programs: 127311 -> 93133 (-26.85%) helped: 6765 HURT: 0 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>	2021-03-31 05:51:22 +00:00
Iago Toral Quiroga	f33ca092da	broadcom/compiler: add a NOP count stat to shader-db Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>	2021-03-31 05:51:22 +00:00
Iago Toral Quiroga	062eee7d33	broadcom/compiler: dump instruction index when failing to pack instructions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>	2021-03-31 05:51:22 +00:00
Juan A. Suarez Romero	cc1f070a27	broadcom/compiler: fix unused value Do not assign to a variable that won't be used. Fixes CID#1451708 and CID#1451710 "Unused value (UNUSED_VALUE)". Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>	2021-03-30 14:15:43 +00:00
Iago Toral Quiroga	c3c251d98f	broadcom/compiler: flag TMU reads with a read dependency on last TMU config We were using a write dependency to ensure ordering since LDTMUs sequences are ordered, but by using a write dependency with TMU config we were also preserving ordering with TMU config writes that are not a sequence terminator, which is not required and reduces scheduling flexibility. Instead, use a write dependency to ensure strict ordering of TMU reads, but only a read depdency with TMU config. With this change we also need to update CS barriers to also have a write dependency with TMU reads to ensure that we don't move TMU reads around CS barriers. total instructions in shared programs: 13602500 -> 13597851 (-0.03%) instructions in affected programs: 2681428 -> 2676779 (-0.17%) helped: 6567 HURT: 4960 Instructions are helped. total max-temps in shared programs: 2317927 -> 2317914 (<.01%) max-temps in affected programs: 13861 -> 13848 (-0.09%) helped: 355 HURT: 300 Inconclusive result (value mean confidence interval includes 0). total sfu-stalls in shared programs: 32074 -> 32247 (0.54%) sfu-stalls in affected programs: 848 -> 1021 (20.40%) helped: 160 HURT: 327 Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 13634574 -> 13630098 (-0.03%) inst-and-stalls in affected programs: 2703041 -> 2698565 (-0.17%) helped: 6558 HURT: 5020 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>	2021-03-29 06:21:22 +00:00
Iago Toral Quiroga	cbf9a7c3c1	broadcom/compiler: flag TMU read dependencies against last TMU config Instead of last TMU write. According to the documentation, the entries in the output FIFO are pushed with the final input write for the lookup, which is the one terminating the sequence. We flag these with last_tmu_config. This will allow us to move all TMU register writes for a lookup except the last one ahead of the LDTMUs for the previous lookup, possibly allowing us to pair up these writes the wrtmuc instructions for the same lookup, turning code like this: nop ; nop ; wrtmuc (tex[0].p0 \| 0x3) nop ; nop ; wrtmuc (tex[2].p1 \| 0x1) nop ; nop ; ldunif (ubo[2]+0xe0) fadd r4, rf33, rf51 ; mov unifa, r5 ; ldunif (ubo[2]+0x110) fmax rf34, 0, r4 ; nop nop ; mov tmut, rf11 nop ; mov tmus, rf0 into: nop ; mov tmut, rf11 ; wrtmuc (tex[0].p0 \| 0x3) nop ; nop ; wrtmuc (tex[2].p1 \| 0x1) nop ; nop ; ldunif (ubo[2]+0xe0) fadd r4, rf33, rf51 ; mov unifa, r5 ; ldunif (ubo[2]+0x110) fmax rf34, 0, r4 ; nop nop ; mov tmus, rf0 total instructions in shared programs: 13648140 -> 13602500 (-0.33%) instructions in affected programs: 3497402 -> 3451762 (-1.30%) helped: 12044 HURT: 3484 Instructions are helped. total max-temps in shared programs: 2318687 -> 2317927 (-0.03%) max-temps in affected programs: 17234 -> 16474 (-4.41%) helped: 615 HURT: 198 Max-temps are helped. total sfu-stalls in shared programs: 32354 -> 32074 (-0.87%) sfu-stalls in affected programs: 1462 -> 1182 (-19.15%) helped: 461 HURT: 188 Sfu-stalls are helped. total inst-and-stalls in shared programs: 13680494 -> 13634574 (-0.34%) inst-and-stalls in affected programs: 3514405 -> 3468485 (-1.31%) helped: 12062 HURT: 3486 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>	2021-03-29 06:21:22 +00:00
Iago Toral Quiroga	e790c20403	broadcom/compiler: try to fill up delay slots after a thrsw The way we handle thrsw instructions is that we try to merge them back into previously scheduled instructions to fill up its delay slots. This is generally safe, because the thrsw won't happen until after the delay slots, so we are not really changing the execution order of the instructions and we just need to make sure we don't violate a few specific restrictions. If we have not managed to fill up all delay slots after doing this, then we emit as many NOPs as needed to fill them. This is to ensure that we don't schedule an instruction that needs to execute after the thread switch before the thread switch happens. However, doing this can lead to inefficient code, since some times the instructions we schedule after a thrsw are indepdent of the thrsw and could be safely executed in its delay slots. This change removes the fixed NOP emission after a thrsw to fill delay slots and instead adds code to ensure that our instruction scheduling is aware of when it is scheduling instructions in the delay slots of a previous thrsw to avoid selecting conflicting instructions. The only case were we still emit fixed NOPs is for the thread end that we emit to terminate the program after scheduling all instructions because we can't end the instruction stream before the thread end is properly executed. total instructions in shared programs: 13691004 -> 13648140 (-0.31%) instructions in affected programs: 4345951 -> 4303087 (-0.99%) helped: 19645 HURT: 652 Instructions are helped. total max-temps in shared programs: 2319317 -> 2318687 (-0.03%) max-temps in affected programs: 10510 -> 9880 (-5.99%) helped: 532 HURT: 9 Max-temps are helped. total sfu-stalls in shared programs: 31752 -> 32354 (1.90%) sfu-stalls in affected programs: 840 -> 1442 (71.67%) helped: 7 HURT: 467 Sfu-stalls are HURT. total inst-and-stalls in shared programs: 13722756 -> 13680494 (-0.31%) inst-and-stalls in affected programs: 4335590 -> 4293328 (-0.97%) helped: 19453 HURT: 758 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>	2021-03-26 07:13:07 +00:00
Iago Toral Quiroga	22a979be65	broadcom/compiler: convert add to mul when possible to allow merge Integer add/sub can be implemented as either an add or a mul instruction but we always emit them as add instructions at VIR level. We can use this flexibility to improve our QPU scheduling so we can be more effective at instruction merging by converting these to mul instructions when we are attempting to merge them with another add instruction. total instructions in shared programs: 13721549 -> 13691004 (-0.22%) instructions in affected programs: 3340493 -> 3309948 (-0.91%) helped: 12805 HURT: 1656 Instructions are helped. total max-temps in shared programs: 2319528 -> 2319317 (<.01%) max-temps in affected programs: 5285 -> 5074 (-3.99%) helped: 195 HURT: 3 Max-temps are helped. total sfu-stalls in shared programs: 31616 -> 31752 (0.43%) sfu-stalls in affected programs: 469 -> 605 (29.00%) helped: 52 HURT: 161 Sfu-stalls are HURT. total inst-and-stalls in shared programs: 13753165 -> 13722756 (-0.22%) inst-and-stalls in affected programs: 3340383 -> 3309974 (-0.91%) helped: 12782 HURT: 1666 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9769>	2021-03-25 09:51:42 +00:00
Alejandro Piñeiro	b71fd5587e	broadcom/compiler: add driver_location_map at vs prog data This maps the nir shader data.location to its final data.driver_location. In general we are using the driver location as index (like vattr_sizes on the same struct), so having this map is useful if what we have is the data.location, and we don't have available the original nir shader. v2: use memset instead of for loop, and nir_foreach_shader_in_variable instead of nir_foreach_variable_with_modes (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>	2021-03-22 17:10:47 +00:00
Alejandro Piñeiro	2be0c36775	broadcom/compiler: add local_size in v3d_compute_prog_data As we plan to try to get directly the compiled variant from the cache, it would be possible to not have available the nir shaders, so we add this info on prog data. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>	2021-03-22 17:10:47 +00:00
Iago Toral Quiroga	cbe24a0e9c	broadcom/compiler: use nir_lower_undef_to_zero total instructions in shared programs: 13731663 -> 13721549 (-0.07%) instructions in affected programs: 98242 -> 88128 (-10.29%) helped: 191 HURT: 131 Instructions are helped. total threads in shared programs: 412272 -> 412296 (<.01%) threads in affected programs: 24 -> 48 (100.00%) helped: 12 HURT: 0 Threads are helped. total uniforms in shared programs: 3780693 -> 3779137 (-0.04%) uniforms in affected programs: 10564 -> 9008 (-14.73%) helped: 114 HURT: 7 Uniforms are helped. total max-temps in shared programs: 2319942 -> 2319528 (-0.02%) max-temps in affected programs: 4191 -> 3777 (-9.88%) helped: 113 HURT: 22 Max-temps are helped. total sfu-stalls in shared programs: 31584 -> 31616 (0.10%) sfu-stalls in affected programs: 217 -> 249 (14.75%) helped: 51 HURT: 54 Inconclusive result (value mean confidence interval includes 0). total inst-and-stalls in shared programs: 13763247 -> 13753165 (-0.07%) inst-and-stalls in affected programs: 98719 -> 88637 (-10.21%) helped: 187 HURT: 134 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>	2021-03-22 12:17:13 +00:00
Iago Toral Quiroga	1c987f5db3	broadcom/compiler: optimize constant vfpack total instructions in shared programs: 13733627 -> 13731663 (-0.01%) instructions in affected programs: 174140 -> 172176 (-1.13%) helped: 1597 HURT: 310 Instructions are helped. total uniforms in shared programs: 3784601 -> 3780693 (-0.10%) uniforms in affected programs: 58678 -> 54770 (-6.66%) helped: 2886 HURT: 3 Uniforms are helped. total max-temps in shared programs: 2322714 -> 2319942 (-0.12%) max-temps in affected programs: 15729 -> 12957 (-17.62%) helped: 2189 HURT: 1 Max-temps are helped. total spills in shared programs: 6010 -> 6012 (0.03%) spills in affected programs: 61 -> 63 (3.28%) helped: 0 HURT: 1 total fills in shared programs: 13494 -> 13497 (0.02%) fills in affected programs: 89 -> 92 (3.37%) helped: 0 HURT: 1 total sfu-stalls in shared programs: 31521 -> 31584 (0.20%) sfu-stalls in affected programs: 328 -> 391 (19.21%) helped: 30 HURT: 94 Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 13765148 -> 13763247 (-0.01%) inst-and-stalls in affected programs: 174237 -> 172336 (-1.09%) helped: 1551 HURT: 316 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>	2021-03-22 12:17:13 +00:00

1 2 3 4 5 ...

523 commits