Commit graph

541 commits

Author SHA1 Message Date
Timothy Arceri
a9ed4538ab nir: add indirect loop unrolling to compiler options
This is where it should be rather than having to pass it into the
optimisation pass every time.

It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Iago Toral Quiroga
d5acae3206 broadcom/compiler: implement nir_intrinsic_load_view_index
This is used for multiview's gl_ViewIndex built-in.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
2021-07-27 07:31:31 +00:00
Juan A. Suarez Romero
dc40157888 broadcom/compiler: emit TMU flush before a jump
Like in the case of emitting a block, process pending TMU operations
before a jump is executed.

Fixes dEQP-VK.graphicsfuzz.stable-binarysearch-tree-nested-if-and-conditional.

Fixes: 197090a3fc ("broadcom/compiler: implement pipelining for general
TMU operations")

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11971>
2021-07-20 10:15:21 +00:00
Iago Toral Quiroga
940725a7d9 broadcom/compiler: implement gl_PrimitiveID in FS without a GS
OpenGL ES 3.1 specifies that a geometry shader can write to gl_PrimitiveID,
which can then be read by a fragment shader.

OpenGL ES 3.2 additionally adds the capacity for the fragment shader
to read gl_PrimitiveID even if there is no geometry shader. This
commit adds support for this feature, which is also implicitly
expected by the geometry shader feature in Vulkan 1.0.

Fixes:
dEQP-VK.pipeline.framebuffer_attachment.no_attachments
dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11874>
2021-07-14 12:05:56 +00:00
Thomas H.P. Andersen
cf05a7e66f broadcom/compiler: fix add vs. mul
Spotted by a compile warning

Fixes: 7f61ff7b4d ("broadcom/compiler: Merge instructions more efficiently")

Reviewed-by: Iago Torral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11764>
2021-07-12 11:54:03 +00:00
Thomas H.P. Andersen
458801e2c3 broadcom/compiler: use correct flag enum
They have the same value, so no functional change

Reviewed-by: Iago Torral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11764>
2021-07-12 11:54:03 +00:00
Iago Toral Quiroga
ee11e9183d broadcom/compiler: don't ignore constant offset on per-vertex input loads
Fixes:
dEQP-VK.clipping.user_defined.clip_distance.vert_geom.{5,6,7,8}

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>
2021-07-12 08:35:56 +02:00
Iago Toral Quiroga
e1a24a0047 broadcom/compiler: handle compact input arrays for geometry shaders
Clip distance arrays will come as compact array variables, so we need
to handle them as such, like we did for vertex inputs.

Fixes:
dEQP-VK.clipping.user_defined.clip_distance.vert_geom.{1,2,3,4}

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>
2021-07-12 08:35:56 +02:00
Iago Toral Quiroga
353f0a180f broadcom/compiler: create a helper for computing VPM config
This code is the same across drivers.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>
2021-07-12 08:35:55 +02:00
Iago Toral Quiroga
2733a17b14 broadcom/compiler: track if geometry shaders write gl_PointSize
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>
2021-07-12 08:35:55 +02:00
Iago Toral Quiroga
8fada5cb21 broadcom/compiler: use nir_sort_variables_with_modes
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11624>
2021-06-29 10:11:58 +00:00
Iago Toral Quiroga
10313b03b5 broadcom/compiler: track if a compute shader uses subgroup functionality
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
5081de07f7 broadcom/compiler: add a set_a_flags_for_subgroup helper
We will need this in the future to implement more subgroup operations,
so make this code available in a helper.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
b9f510087d broadcom/compiler: add a ntq_emit_cond_to_bool helper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
53341e44ad broadcom/compiler: implement more subgroup intrinsics
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
87fa5908b3 broadcom/compiler: add FLAFIRST and FLNAFIRST opcodes
We will at least need the former to implement subgroupElect()

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
a9ad04f17d broadcom/compiler: lower nir_intrinsic_load_num_subgroups
The number of subgroups is the local workgroup size divided by the
dispatch width.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
30dec8b414 broadcom/compiler: implement nir_intrinsic_load_subgroup_id correctly
For some reason, this was implemented with the bulk of the compute
shader enablement, but this intrinsic is specific to subgroups and
thus was not really used. Also, its implementation was not correct,
since it was returning the element index within the subgroup, not
the subgroup index itself, which is the index of the batch in the
dispatch.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>
2021-06-29 08:43:06 +02:00
Iago Toral Quiroga
21b0a4c80c v3dv: don't lower vulkan resource index result to scalar
The intrinsic produces a vec2, so let's honor that and avoid the weird
lowering to scalar and later reconstruction to vec2 when we find
load vulkan descriptor intrinsics.

It fixes tests like this (which require that we expose KHR_spirv_1_4):
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.null_comparisons_ssbo_equal
that otherwise produce bad code that tries to access a vec2 from the result of
that intrinsic, leading to NIR validation errors.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11257>
2021-06-10 05:47:29 +00:00
Caio Marcelo de Oliveira Filho
8af6766062 nir: Move workgroup_size and workgroup_variable_size into common shader_info
Move it out the "cs" sub-struct, since these will be used for other
shader stages in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11225>
2021-06-08 09:23:55 -07:00
Caio Marcelo de Oliveira Filho
c8a7bd0dc8 nir: Rename WORK_GROUP (and similar) to WORKGROUP
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho
430d2206da compiler: Rename local_size to workgroup_size
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Eric Anholt
ec3bc5da74 v3d: Use the ra_alloc_contig_reg_class() function to speed up RA.
It means we don't need to do the n^2 loop over the regs to set up the pq
values, nor do we need the register conflicts lists.

Acked-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Eric Anholt
95d41a3525 ra: Use struct ra_class in the public API.
All these unsigned ints are awful to keep track of.  Use pointers so we
get some type checking.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Alejandro Piñeiro
4e9f1261ee broadcom/compiler: use proper type field for atomic operations
We were using the num_components to infer it, but in the end it is
VEC2 for CMPXCHG and 32BIT for anything else.

This doesn't affect any test with the real hw, but fixes an assert
with the last version of the simulator.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
2021-06-01 12:22:22 +02:00
Iago Toral Quiroga
f07c797e93 v3dv: implement vkCmdDispatchBase
This was added with VK_KHR_device_group and allows users to specify
a base offset that will be automatically added to gl_WorkGroupID.

Unfortunately, V3D doesn't support this natively, so we need to add
the base to the workgroup id generated by hardware manually. For this,
we inject add instructions that source from a QUNIFORM that will
retrieve the actual dispatch base from the compute job when it is
dispatched.

Since a compute shader can be dispatched with CmdDispatch and/or
CmdDispatchBase, we always need to add these additional add
instructions and use a base of (0,0,0) for regular dispatches.
Since we don't support any version of OpenGL with this dispatch
base functionality we can avoid the extra instructions there.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>
2021-05-31 09:06:18 +00:00
Iago Toral Quiroga
39c41169ba broadcom/compiler: consider RT component size when lowering logic ops in Vulkan
In Vulkan we configure our integer RTs to clamp automatically, so with logic
operations we need to be careful and avoid overflows by discarding any bits
that won't fit in the RT component size.

Fixes remaining CTS test failures in:
dEQP-VK.pipeline.logic_op.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>
2021-05-18 11:28:17 +00:00
Iago Toral Quiroga
df7185d0d1 broadcom/compiler: don't emit TLB loads for components that don't exist
This avoids debug builds to assert crash. Components that don't exist
won't be used and will be eventually DCEd, so simply lower them to 0.

Fixes CTS tests like these in debug builds:
dEQP-VK.pipeline.logic_op.r8_uint.clear
dEQP-VK.pipeline.logic_op.r8_uint.and
dEQP-VK.pipeline.logic_op.r8_uint.and_reverse
dEQP-VK.pipeline.logic_op.r8_uint.copy
dEQP-VK.pipeline.logic_op.r8_uint.and_inverted
dEQP-VK.pipeline.logic_op.r8_uint.no_op
dEQP-VK.pipeline.logic_op.r8_uint.xor
dEQP-VK.pipeline.logic_op.r8_uint.or

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>
2021-05-18 11:28:17 +00:00
Connor Abbott
a40714abf7 nir/lower_phis_to_scalar: Add "lower_all" option
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
2021-05-17 09:59:45 +00:00
Iago Toral Quiroga
6f93354bae broadcom/compiler: clarify PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR setting
We enabled this in the past to fix some register allocation issues we
faced with geometry shaders but we didn't document why it is safe for
us to do this, which is not immediately obvious.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10745>
2021-05-11 12:26:19 +02:00
Iago Toral Quiroga
f0fef41917 broadcom/compiler: don't unroll due to indirect indexing of outputs
We can handle this natively now, so there is no point.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
2021-05-11 09:31:31 +00:00
Iago Toral Quiroga
0235ed18a7 broadcom/compiler: don't use nir_src_is_dynamically_uniform
Now that we have divergence analysis we should use that.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
2021-05-11 09:31:31 +00:00
Iago Toral Quiroga
cb39dca2d3 broadcom/compiler: make vir_VPM_WRITE_indirect handle non-uniform offsets
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
2021-05-11 09:31:31 +00:00
Iago Toral Quiroga
f71893a942 broadcom/compiler: implement non-uniform offset on vertex outputs
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
2021-05-11 09:31:31 +00:00
Iago Toral Quiroga
067ad7eccc broadcom/compiler: move vertex shader output handling to its own function
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
2021-05-11 09:31:31 +00:00
Juan A. Suarez Romero
54ec9c95cf broadcom/compiler: fix dynamic-stack-buffer-overflow error
When spilling a register, the number of temps can be increased when
introducing a temporal variable.

Those nodes are not elegible to be spilled, but we need to take care of
no accessing out-of-bounds of the arrays defined with a size equal to
the original number of temps.

Fixes address sanitizer error on
KHR-GLES3.shaders.uniform_block.random.all_shared_buffer.14 (and many
others).

v2 (Iago):
 - Add clarification in assertion.
 - Use `vir_get_temp` to increase num_temps.

v3 (Iago):
 - Update clarification

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10643>
2021-05-11 07:46:17 +00:00
Iago Toral Quiroga
d81a6e5f1d broadcom/compiler: change register allocation policy for accumulators
The current policy is to always favor accumulators if possible, however,
this is not always optimal.

Particularly, accumulators play a crucial role in enabling QPU instruction
merges, since these are limited to both the ADD and the ALU instructions
addressing at most 2 physical registers. For 2-src instructions, this means
that to be able to merge we need them to address at least 2 accumulators.

While favoring accumulators does help the case for instruction merges in
general, it is risky to assign accumulators to variables that have
long life spans. Doing so will make the accumulator unavailable for
any other instructions during that life span, and since we only have a few
accumulators, we can quickly run out and losing our capacity to merge
instructions for large parts of the qpu program.

On the other hand, we also want to avoid the extreme case were we keep
allocating physical registers to the point we run out, even if we have
accumulators available, since accumulators have additional restrictions
and may not be suitable for everything.

This change continues the policy of favoring accumulators, but it only
does so if the life span of the temps is short, to ensure that we can
recycle accumulators often across instructions and avoid running out
for sections of the QPU code, unless we are already running out of
physical registers.

total instructions in shared programs: 13654647 -> 13336921 (-2.33%)
instructions in affected programs: 11015919 -> 10698193 (-2.88%)
helped: 39758
HURT: 17325
Instructions are helped.

total threads in shared programs: 412046 -> 412038 (<.01%)
threads in affected programs: 16 -> 8 (-50.00%)
helped: 0
HURT: 4
Threads are HURT.

total uniforms in shared programs: 3745726 -> 3746003 (<.01%)
uniforms in affected programs: 17296 -> 17573 (1.60%)
helped: 76
HURT: 99
Uniforms are HURT.

total max-temps in shared programs: 2364430 -> 2359942 (-0.19%)
max-temps in affected programs: 109117 -> 104629 (-4.11%)
helped: 2893
HURT: 772
Max-temps are helped.

total spills in shared programs: 5727 -> 5746 (0.33%)
spills in affected programs: 221 -> 240 (8.60%)
helped: 1
HURT: 2

total fills in shared programs: 13121 -> 13139 (0.14%)
fills in affected programs: 466 -> 484 (3.86%)
helped: 1
HURT: 2

total sfu-stalls in shared programs: 33432 -> 34491 (3.17%)
sfu-stalls in affected programs: 18219 -> 19278 (5.81%)
helped: 4459
HURT: 5087
Inconclusive result

total inst-and-stalls in shared programs: 13688079 -> 13371412 (-2.31%)
inst-and-stalls in affected programs: 11030017 -> 10713350 (-2.87%)
helped: 39630
HURT: 17429
Inst-and-stalls are helped.

total nops in shared programs: 335753 -> 333708 (-0.61%)
nops in affected programs: 112659 -> 110614 (-1.82%)
helped: 8726
HURT: 7383
Inconclusive result

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10686>
2021-05-08 13:15:42 +02:00
Iago Toral Quiroga
c11e479852 broadcom/compiler: specify maximum thread count in compile strategies
Once we have exhausted compile strategies at 4 threads and we start
enabling lower thread counts, there is no point in starting compiles
with 4 threads for them, we know these will fail, so let's start at
2 in these cases.

This also has another nice implication: if the driver compiles at 4
threads and fails to register allocate, we were allowing it to try
with 2 threads, but this would only retry the register allocation
process and would not really recompile the shader with 2 threads. This
is not optimal, because at 2 threads we have more TMU fifo space for
each thread and we can do more TMU pipelining, so we were missing that
opportunity.

This improves performance in Sponza by ~1.5% and also seems to help
UE4 slightly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
2021-05-06 12:27:06 +02:00
Iago Toral Quiroga
d19ce36ff2 broadcom/compiler: refactor compile strategies
Until now, if we can't compile at 4 threads we would lower thread count
with optimizations disabled, however, lowering thread count doubles the
amount of registers available per thread, so that alone is already a big
relief for register pressure so it makes sense to enable optimizations
when we do that, and progressively disable them until we enable spilling
as a last resort.

This can slightly improve performance for some applications. Sponza,
for example, gets a ~1.5% boost. I see several UE4 shaders that also get
compiled to better code at 2 threads with this, but it is more difficult
to assess how much this improves performance in practice due to the large
variance in frame times that we observe with UE4 demos.

Also, if a compiler strategy disables an optimization that did not make
any progress in the previous compile attempt, we would end up re-compiling
the exact same shader code and failing again. This, patch keeps track of
which strategies won't make progress and skips them in that case to save
some CPU time during shader compiles.

Care should be taken to ensure that we try to compile with the default
NIR scheduler at minimum thread count at least once though, so a specific
strategy for this is added, to prevent the scenario where no optimizations
are used and we skip directly to the fallback scheduler if the default
strategy fails at 4 threads.

Similarly, we now also explicitly specify which strategies are allowed to do
TMU spills and make sure we take this into account when deciding to skip
strategies. This prevents the case where no optimizations are used in a
shader and we skip directly to the fallback scheduler after failing
compilation at 2 threads with the default NIR scheduler but without trying
to spill first.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
2021-05-06 12:27:06 +02:00
Iago Toral Quiroga
296fe4daa6 broadcom/compiler: add a compiler strategy to disable loop unrolling
Loop unrolling can increase register pressure significantly, leading to
lower thread counts and spilling.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
2021-05-06 12:27:06 +02:00
Iago Toral Quiroga
4742300e6b v3d: move NIR compiler options to GL driver
The Vulkan driver was already creating and using its own set of options, so
the ones defined in the compiler are only used with GL, which is confusing.
Move them to the GL driver.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
2021-05-06 12:27:06 +02:00
Iago Toral Quiroga
ec72b876fe broadcom/compiler: add a loop unrolling pass
Right now this is useful for Vulkan onnly, because GL gets loop
unrolling from the GLSL compiler and/or mesa state tracker
NIR front-ends.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
2021-05-06 12:24:29 +02:00
Iago Toral Quiroga
f099fc3e07 v3d: choose a larger CSD supergroup size if possible
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
2021-05-04 15:53:23 +00:00
Iago Toral Quiroga
f514280524 broadcom/compiler: track if a shader has control barriers in prog_data
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
2021-05-04 15:53:23 +00:00
Juan A. Suarez Romero
64943f2063 broadcom/compiler: use VPM offsets in GS load_per_vertex input
Vertex Shader has a store_out lowering pass that converts gallium driver
locations in offsets inside the VPM.

One of the consequences is that these offsets are consecutives; that is,
if the VS stores VARYING_SLOT_VAR0.xyz and VARYING_SLOT_VAR1.xyzw, there
isn't a hole in the VPM offsets for the un-stored VARYING_SLOT_VAR0.w.

Thus we need to change how the VPM offset is computed in the Geometry
Shader when loading the inputs.

This bug is exposed by !9050.

v2 (Iago):
 - Include explanatory comment.
 - Use assert.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10129>
2021-04-13 16:08:00 +00:00
Rhys Perry
a2619b97f5 nir/lower_idiv: add options to use fp32 for 8-bit division lowering
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>
2021-04-12 16:19:46 +00:00
Juan A. Suarez Romero
e7f4f1b582 broadcom/compiler: use signed pointers for packed condition
`qpu.raddr_b` is an unsigned int, so it is always positive, even after
casting to signed int.

Fixes CID#1438117 "Operands don't affect result
(CONSTANT_EXPRESSION_RESULT)":

   "result_independent_of_operands: (int)inst->qpu.raddr_b >= -16 is
    always true regardless of the values of its operands. This occurs as
    the logical first operand of "&&".

v2:
 - Use signed pointers (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10131>
2021-04-12 15:22:05 +00:00
Iago Toral Quiroga
0a3bfacabb broadcom/compiler: rename unifa tracking fields
The term 'last' may be misleading because the offset represents
the current unifa offset, which is the offset used by the last
load plus 4 bytes, so rename these to use the term 'current'
instead.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00
Iago Toral Quiroga
8998666de7 broadcom/compiler: sort constant UBO loads by index and offset
This implements a NIR pass that groups together constant UBO loads
for the same UBO index in order of increasing offset when the distance
between them is small enough that it enables the "skip unifa write"
optimization.

This may increase register pressure because it can move UBO loads
earlier, so we also add a compiler strategy fallback to disable the
optimization if we need to drop thread count to compile the shader
with this optimization enabled.

total instructions in shared programs: 13557555 -> 13550300 (-0.05%)
instructions in affected programs: 814684 -> 807429 (-0.89%)
helped: 4485
HURT: 2377
Instructions are helped.

total uniforms in shared programs: 3777243 -> 3760990 (-0.43%)
uniforms in affected programs: 112554 -> 96301 (-14.44%)
helped: 7226
HURT: 36
Uniforms are helped.

total max-temps in shared programs: 2318133 -> 2333761 (0.67%)
max-temps in affected programs: 63230 -> 78858 (24.72%)
helped: 23
HURT: 3044
Max-temps are HURT.

total sfu-stalls in shared programs: 32245 -> 32567 (1.00%)
sfu-stalls in affected programs: 389 -> 711 (82.78%)
helped: 139
HURT: 451
Inconclusive result.

total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%)
inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%)
helped: 4478
HURT: 2395
Inst-and-stalls are helped.

total nops in shared programs: 354365 -> 342202 (-3.43%)
nops in affected programs: 31000 -> 18837 (-39.24%)
helped: 4405
HURT: 265
Nops are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00
Iago Toral Quiroga
fb2214a441 broadcom/compiler: allow compilation strategies to limit minimum thread count
This adds a minimum thread count parameter to each compilation strategy with
the intention to limit the minimum allowed thread count that can be used to
register allocate with that strategy.

For now all strategies allow the minimum thread count supported by the
hardware, but we will be using this infrastructure to impose a more
strict limit in an upcoming optimization.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00