fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 11:38:06 +02:00

Author	SHA1	Message	Date
Timothy Arceri	a9ed4538ab	nir: add indirect loop unrolling to compiler options This is where it should be rather than having to pass it into the optimisation pass every time. It also allows us to call the loop analysis pass without having to duplicate these options which we will do later in this series. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>	2021-08-03 10:54:50 +00:00
Iago Toral Quiroga	d5acae3206	broadcom/compiler: implement nir_intrinsic_load_view_index This is used for multiview's gl_ViewIndex built-in. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>	2021-07-27 07:31:31 +00:00
Juan A. Suarez Romero	dc40157888	broadcom/compiler: emit TMU flush before a jump Like in the case of emitting a block, process pending TMU operations before a jump is executed. Fixes dEQP-VK.graphicsfuzz.stable-binarysearch-tree-nested-if-and-conditional. Fixes: `197090a3fc` ("broadcom/compiler: implement pipelining for general TMU operations") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11971>	2021-07-20 10:15:21 +00:00
Iago Toral Quiroga	940725a7d9	broadcom/compiler: implement gl_PrimitiveID in FS without a GS OpenGL ES 3.1 specifies that a geometry shader can write to gl_PrimitiveID, which can then be read by a fragment shader. OpenGL ES 3.2 additionally adds the capacity for the fragment shader to read gl_PrimitiveID even if there is no geometry shader. This commit adds support for this feature, which is also implicitly expected by the geometry shader feature in Vulkan 1.0. Fixes: dEQP-VK.pipeline.framebuffer_attachment.no_attachments dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11874>	2021-07-14 12:05:56 +00:00
Thomas H.P. Andersen	cf05a7e66f	broadcom/compiler: fix add vs. mul Spotted by a compile warning Fixes: `7f61ff7b4d` ("broadcom/compiler: Merge instructions more efficiently") Reviewed-by: Iago Torral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11764>	2021-07-12 11:54:03 +00:00
Thomas H.P. Andersen	458801e2c3	broadcom/compiler: use correct flag enum They have the same value, so no functional change Reviewed-by: Iago Torral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11764>	2021-07-12 11:54:03 +00:00
Iago Toral Quiroga	ee11e9183d	broadcom/compiler: don't ignore constant offset on per-vertex input loads Fixes: dEQP-VK.clipping.user_defined.clip_distance.vert_geom.{5,6,7,8} Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>	2021-07-12 08:35:56 +02:00
Iago Toral Quiroga	e1a24a0047	broadcom/compiler: handle compact input arrays for geometry shaders Clip distance arrays will come as compact array variables, so we need to handle them as such, like we did for vertex inputs. Fixes: dEQP-VK.clipping.user_defined.clip_distance.vert_geom.{1,2,3,4} Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>	2021-07-12 08:35:56 +02:00
Iago Toral Quiroga	353f0a180f	broadcom/compiler: create a helper for computing VPM config This code is the same across drivers. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>	2021-07-12 08:35:55 +02:00
Iago Toral Quiroga	2733a17b14	broadcom/compiler: track if geometry shaders write gl_PointSize Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>	2021-07-12 08:35:55 +02:00
Iago Toral Quiroga	8fada5cb21	broadcom/compiler: use nir_sort_variables_with_modes Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11624>	2021-06-29 10:11:58 +00:00
Iago Toral Quiroga	10313b03b5	broadcom/compiler: track if a compute shader uses subgroup functionality Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	5081de07f7	broadcom/compiler: add a set_a_flags_for_subgroup helper We will need this in the future to implement more subgroup operations, so make this code available in a helper. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	b9f510087d	broadcom/compiler: add a ntq_emit_cond_to_bool helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	53341e44ad	broadcom/compiler: implement more subgroup intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	87fa5908b3	broadcom/compiler: add FLAFIRST and FLNAFIRST opcodes We will at least need the former to implement subgroupElect() Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	a9ad04f17d	broadcom/compiler: lower nir_intrinsic_load_num_subgroups The number of subgroups is the local workgroup size divided by the dispatch width. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	30dec8b414	broadcom/compiler: implement nir_intrinsic_load_subgroup_id correctly For some reason, this was implemented with the bulk of the compute shader enablement, but this intrinsic is specific to subgroups and thus was not really used. Also, its implementation was not correct, since it was returning the element index within the subgroup, not the subgroup index itself, which is the index of the batch in the dispatch. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	21b0a4c80c	v3dv: don't lower vulkan resource index result to scalar The intrinsic produces a vec2, so let's honor that and avoid the weird lowering to scalar and later reconstruction to vec2 when we find load vulkan descriptor intrinsics. It fixes tests like this (which require that we expose KHR_spirv_1_4): dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.null_comparisons_ssbo_equal that otherwise produce bad code that tries to access a vec2 from the result of that intrinsic, leading to NIR validation errors. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11257>	2021-06-10 05:47:29 +00:00
Caio Marcelo de Oliveira Filho	8af6766062	nir: Move workgroup_size and workgroup_variable_size into common shader_info Move it out the "cs" sub-struct, since these will be used for other shader stages in the future. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11225>	2021-06-08 09:23:55 -07:00
Caio Marcelo de Oliveira Filho	c8a7bd0dc8	nir: Rename WORK_GROUP (and similar) to WORKGROUP Be consistent with other usages in Vulkan and SPIR-V, and the recently added workgroup_size field. Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>	2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho	430d2206da	compiler: Rename local_size to workgroup_size Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>	2021-06-07 22:34:42 +00:00
Eric Anholt	ec3bc5da74	v3d: Use the ra_alloc_contig_reg_class() function to speed up RA. It means we don't need to do the n^2 loop over the regs to set up the pq values, nor do we need the register conflicts lists. Acked-by: Erico Nunes <nunes.erico@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Eric Anholt	95d41a3525	ra: Use struct ra_class in the public API. All these unsigned ints are awful to keep track of. Use pointers so we get some type checking. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Alejandro Piñeiro	4e9f1261ee	broadcom/compiler: use proper type field for atomic operations We were using the num_components to infer it, but in the end it is VEC2 for CMPXCHG and 32BIT for anything else. This doesn't affect any test with the real hw, but fixes an assert with the last version of the simulator. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:22 +02:00
Iago Toral Quiroga	f07c797e93	v3dv: implement vkCmdDispatchBase This was added with VK_KHR_device_group and allows users to specify a base offset that will be automatically added to gl_WorkGroupID. Unfortunately, V3D doesn't support this natively, so we need to add the base to the workgroup id generated by hardware manually. For this, we inject add instructions that source from a QUNIFORM that will retrieve the actual dispatch base from the compute job when it is dispatched. Since a compute shader can be dispatched with CmdDispatch and/or CmdDispatchBase, we always need to add these additional add instructions and use a base of (0,0,0) for regular dispatches. Since we don't support any version of OpenGL with this dispatch base functionality we can avoid the extra instructions there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Iago Toral Quiroga	39c41169ba	broadcom/compiler: consider RT component size when lowering logic ops in Vulkan In Vulkan we configure our integer RTs to clamp automatically, so with logic operations we need to be careful and avoid overflows by discarding any bits that won't fit in the RT component size. Fixes remaining CTS test failures in: dEQP-VK.pipeline.logic_op.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Iago Toral Quiroga	df7185d0d1	broadcom/compiler: don't emit TLB loads for components that don't exist This avoids debug builds to assert crash. Components that don't exist won't be used and will be eventually DCEd, so simply lower them to 0. Fixes CTS tests like these in debug builds: dEQP-VK.pipeline.logic_op.r8_uint.clear dEQP-VK.pipeline.logic_op.r8_uint.and dEQP-VK.pipeline.logic_op.r8_uint.and_reverse dEQP-VK.pipeline.logic_op.r8_uint.copy dEQP-VK.pipeline.logic_op.r8_uint.and_inverted dEQP-VK.pipeline.logic_op.r8_uint.no_op dEQP-VK.pipeline.logic_op.r8_uint.xor dEQP-VK.pipeline.logic_op.r8_uint.or Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Connor Abbott	a40714abf7	nir/lower_phis_to_scalar: Add "lower_all" option We don't want to have to deal with vector phis in freedreno, because vectors are always split/unsplit around vectorized instructions anyways, and the stated reason for not scalarising them (it hurting coalescing) won't apply to us because we won't be using nir_from_ssa. Add this option so that we don't have to do the equivalent thing while translating from NIR. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>	2021-05-17 09:59:45 +00:00
Iago Toral Quiroga	6f93354bae	broadcom/compiler: clarify PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR setting We enabled this in the past to fix some register allocation issues we faced with geometry shaders but we didn't document why it is safe for us to do this, which is not immediately obvious. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10745>	2021-05-11 12:26:19 +02:00
Iago Toral Quiroga	f0fef41917	broadcom/compiler: don't unroll due to indirect indexing of outputs We can handle this natively now, so there is no point. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	0235ed18a7	broadcom/compiler: don't use nir_src_is_dynamically_uniform Now that we have divergence analysis we should use that. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	cb39dca2d3	broadcom/compiler: make vir_VPM_WRITE_indirect handle non-uniform offsets Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	f71893a942	broadcom/compiler: implement non-uniform offset on vertex outputs Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	067ad7eccc	broadcom/compiler: move vertex shader output handling to its own function Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Juan A. Suarez Romero	54ec9c95cf	broadcom/compiler: fix dynamic-stack-buffer-overflow error When spilling a register, the number of temps can be increased when introducing a temporal variable. Those nodes are not elegible to be spilled, but we need to take care of no accessing out-of-bounds of the arrays defined with a size equal to the original number of temps. Fixes address sanitizer error on KHR-GLES3.shaders.uniform_block.random.all_shared_buffer.14 (and many others). v2 (Iago): - Add clarification in assertion. - Use `vir_get_temp` to increase num_temps. v3 (Iago): - Update clarification Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10643>	2021-05-11 07:46:17 +00:00
Iago Toral Quiroga	d81a6e5f1d	broadcom/compiler: change register allocation policy for accumulators The current policy is to always favor accumulators if possible, however, this is not always optimal. Particularly, accumulators play a crucial role in enabling QPU instruction merges, since these are limited to both the ADD and the ALU instructions addressing at most 2 physical registers. For 2-src instructions, this means that to be able to merge we need them to address at least 2 accumulators. While favoring accumulators does help the case for instruction merges in general, it is risky to assign accumulators to variables that have long life spans. Doing so will make the accumulator unavailable for any other instructions during that life span, and since we only have a few accumulators, we can quickly run out and losing our capacity to merge instructions for large parts of the qpu program. On the other hand, we also want to avoid the extreme case were we keep allocating physical registers to the point we run out, even if we have accumulators available, since accumulators have additional restrictions and may not be suitable for everything. This change continues the policy of favoring accumulators, but it only does so if the life span of the temps is short, to ensure that we can recycle accumulators often across instructions and avoid running out for sections of the QPU code, unless we are already running out of physical registers. total instructions in shared programs: 13654647 -> 13336921 (-2.33%) instructions in affected programs: 11015919 -> 10698193 (-2.88%) helped: 39758 HURT: 17325 Instructions are helped. total threads in shared programs: 412046 -> 412038 (<.01%) threads in affected programs: 16 -> 8 (-50.00%) helped: 0 HURT: 4 Threads are HURT. total uniforms in shared programs: 3745726 -> 3746003 (<.01%) uniforms in affected programs: 17296 -> 17573 (1.60%) helped: 76 HURT: 99 Uniforms are HURT. total max-temps in shared programs: 2364430 -> 2359942 (-0.19%) max-temps in affected programs: 109117 -> 104629 (-4.11%) helped: 2893 HURT: 772 Max-temps are helped. total spills in shared programs: 5727 -> 5746 (0.33%) spills in affected programs: 221 -> 240 (8.60%) helped: 1 HURT: 2 total fills in shared programs: 13121 -> 13139 (0.14%) fills in affected programs: 466 -> 484 (3.86%) helped: 1 HURT: 2 total sfu-stalls in shared programs: 33432 -> 34491 (3.17%) sfu-stalls in affected programs: 18219 -> 19278 (5.81%) helped: 4459 HURT: 5087 Inconclusive result total inst-and-stalls in shared programs: 13688079 -> 13371412 (-2.31%) inst-and-stalls in affected programs: 11030017 -> 10713350 (-2.87%) helped: 39630 HURT: 17429 Inst-and-stalls are helped. total nops in shared programs: 335753 -> 333708 (-0.61%) nops in affected programs: 112659 -> 110614 (-1.82%) helped: 8726 HURT: 7383 Inconclusive result Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10686>	2021-05-08 13:15:42 +02:00
Iago Toral Quiroga	c11e479852	broadcom/compiler: specify maximum thread count in compile strategies Once we have exhausted compile strategies at 4 threads and we start enabling lower thread counts, there is no point in starting compiles with 4 threads for them, we know these will fail, so let's start at 2 in these cases. This also has another nice implication: if the driver compiles at 4 threads and fails to register allocate, we were allowing it to try with 2 threads, but this would only retry the register allocation process and would not really recompile the shader with 2 threads. This is not optimal, because at 2 threads we have more TMU fifo space for each thread and we can do more TMU pipelining, so we were missing that opportunity. This improves performance in Sponza by ~1.5% and also seems to help UE4 slightly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	d19ce36ff2	broadcom/compiler: refactor compile strategies Until now, if we can't compile at 4 threads we would lower thread count with optimizations disabled, however, lowering thread count doubles the amount of registers available per thread, so that alone is already a big relief for register pressure so it makes sense to enable optimizations when we do that, and progressively disable them until we enable spilling as a last resort. This can slightly improve performance for some applications. Sponza, for example, gets a ~1.5% boost. I see several UE4 shaders that also get compiled to better code at 2 threads with this, but it is more difficult to assess how much this improves performance in practice due to the large variance in frame times that we observe with UE4 demos. Also, if a compiler strategy disables an optimization that did not make any progress in the previous compile attempt, we would end up re-compiling the exact same shader code and failing again. This, patch keeps track of which strategies won't make progress and skips them in that case to save some CPU time during shader compiles. Care should be taken to ensure that we try to compile with the default NIR scheduler at minimum thread count at least once though, so a specific strategy for this is added, to prevent the scenario where no optimizations are used and we skip directly to the fallback scheduler if the default strategy fails at 4 threads. Similarly, we now also explicitly specify which strategies are allowed to do TMU spills and make sure we take this into account when deciding to skip strategies. This prevents the case where no optimizations are used in a shader and we skip directly to the fallback scheduler after failing compilation at 2 threads with the default NIR scheduler but without trying to spill first. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	296fe4daa6	broadcom/compiler: add a compiler strategy to disable loop unrolling Loop unrolling can increase register pressure significantly, leading to lower thread counts and spilling. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	4742300e6b	v3d: move NIR compiler options to GL driver The Vulkan driver was already creating and using its own set of options, so the ones defined in the compiler are only used with GL, which is confusing. Move them to the GL driver. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	ec72b876fe	broadcom/compiler: add a loop unrolling pass Right now this is useful for Vulkan onnly, because GL gets loop unrolling from the GLSL compiler and/or mesa state tracker NIR front-ends. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:24:29 +02:00
Iago Toral Quiroga	f099fc3e07	v3d: choose a larger CSD supergroup size if possible Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>	2021-05-04 15:53:23 +00:00
Iago Toral Quiroga	f514280524	broadcom/compiler: track if a shader has control barriers in prog_data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>	2021-05-04 15:53:23 +00:00
Juan A. Suarez Romero	64943f2063	broadcom/compiler: use VPM offsets in GS load_per_vertex input Vertex Shader has a store_out lowering pass that converts gallium driver locations in offsets inside the VPM. One of the consequences is that these offsets are consecutives; that is, if the VS stores VARYING_SLOT_VAR0.xyz and VARYING_SLOT_VAR1.xyzw, there isn't a hole in the VPM offsets for the un-stored VARYING_SLOT_VAR0.w. Thus we need to change how the VPM offset is computed in the Geometry Shader when loading the inputs. This bug is exposed by !9050. v2 (Iago): - Include explanatory comment. - Use assert. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10129>	2021-04-13 16:08:00 +00:00
Rhys Perry	a2619b97f5	nir/lower_idiv: add options to use fp32 for 8-bit division lowering Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>	2021-04-12 16:19:46 +00:00
Juan A. Suarez Romero	e7f4f1b582	broadcom/compiler: use signed pointers for packed condition `qpu.raddr_b` is an unsigned int, so it is always positive, even after casting to signed int. Fixes CID#1438117 "Operands don't affect result (CONSTANT_EXPRESSION_RESULT)": "result_independent_of_operands: (int)inst->qpu.raddr_b >= -16 is always true regardless of the values of its operands. This occurs as the logical first operand of "&&". v2: - Use signed pointers (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10131>	2021-04-12 15:22:05 +00:00
Iago Toral Quiroga	0a3bfacabb	broadcom/compiler: rename unifa tracking fields The term 'last' may be misleading because the offset represents the current unifa offset, which is the offset used by the last load plus 4 bytes, so rename these to use the term 'current' instead. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	8998666de7	broadcom/compiler: sort constant UBO loads by index and offset This implements a NIR pass that groups together constant UBO loads for the same UBO index in order of increasing offset when the distance between them is small enough that it enables the "skip unifa write" optimization. This may increase register pressure because it can move UBO loads earlier, so we also add a compiler strategy fallback to disable the optimization if we need to drop thread count to compile the shader with this optimization enabled. total instructions in shared programs: 13557555 -> 13550300 (-0.05%) instructions in affected programs: 814684 -> 807429 (-0.89%) helped: 4485 HURT: 2377 Instructions are helped. total uniforms in shared programs: 3777243 -> 3760990 (-0.43%) uniforms in affected programs: 112554 -> 96301 (-14.44%) helped: 7226 HURT: 36 Uniforms are helped. total max-temps in shared programs: 2318133 -> 2333761 (0.67%) max-temps in affected programs: 63230 -> 78858 (24.72%) helped: 23 HURT: 3044 Max-temps are HURT. total sfu-stalls in shared programs: 32245 -> 32567 (1.00%) sfu-stalls in affected programs: 389 -> 711 (82.78%) helped: 139 HURT: 451 Inconclusive result. total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%) inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%) helped: 4478 HURT: 2395 Inst-and-stalls are helped. total nops in shared programs: 354365 -> 342202 (-3.43%) nops in affected programs: 31000 -> 18837 (-39.24%) helped: 4405 HURT: 265 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	fb2214a441	broadcom/compiler: allow compilation strategies to limit minimum thread count This adds a minimum thread count parameter to each compilation strategy with the intention to limit the minimum allowed thread count that can be used to register allocate with that strategy. For now all strategies allow the minimum thread count supported by the hardware, but we will be using this infrastructure to impose a more strict limit in an upcoming optimization. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00

1 2 3 4 5 ...

541 commits