This follows the same idea as for TMU general instructions of reusing
the existing infrastructure to first count required register writes and
flush outstanding TMU dependencies, and then emit the actual writes, which
requires that we split the code that decides about register writes to
a helper.
We also need to start using a component mask instead of the number
of components that we need to read with a particular TMU operation.
v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
This creates the basic infrastructure to implement TMU pipelining and
applies it to general TMU. Follow-up patches will expand this
to texture and image/load store operations.
TMU pipelining means that we don't immediately end TMU sequences,
and instead, we postpone the thread switch and LDTMU (for loads)
or TMUWT (for stores) until we really need to do them.
For loads, we may need to flush them if another instruction reads
the result of a load operation. We can detect this because in that
case ntq_get_src() will not find the definition for that ssa/reg
(since we have not emitted the LDTMU instructions for it yet), so
when that happens, we flush all pending TMU operations and then
try again to find the definition for the source.
We also need to flush pending TMU operations when we reach the end
of a control flow block, to prevent the case where we emit a TMU
operation in a block, but then we read the result in another block
possibly under control flow.
It is also required to flush across barriers and discards to honor
their semantics.
Since this change doesn't implement pipelining for texture and
image load/store, we also need to flush outstanding TMU operations
if we ever have to emit one of these. This will be corrected with
follow-up patches.
Finally, the TMU has 3 fifos where it can queue TMU operations.
These fifos have limited capacity, depending on the number of threads
used to compile the shader, so we also need to ensure that we
don't have too many outstanding TMU requests and flush pending
TMU operations if a new TMU operation would overflow any of these
fifos. While overflowing the Input and Config fifos only leads
to stalls (which we want to avoid anyway), overflowing the Output
fifo is incorrect and would end up with a broken shader. This means
that we need to know how many TMU register writes are required
to emit a TMU operation and use that information to decide if we need
to flush pending TMU operations before we emit any register
writes for the new TMU operation.
v2: fix TMU flushing for NIR registers reads (jasuarez)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
Follow-up patches will implement support for TMU pipelining in the
compiler, which basically means that we will be able to have more
than one outstanding TMU operation.
Our spilling code currently relies on properly identifying the end
of a TMU sequence (since we can't emit a new TMU sequence for a spill
in the middle of an existing TMU sequence), however, that code expects
that only one TMU sequence may be outstanding, which won't be true
once we implement pipelining.
This change fixes the 'end of TMU sequence' checks to account for this
in preparation for upcoming patches.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
Where it is safe to do so, avoid the generation of code to convert a
condition code into a boolean which is then tested to generate a
condition code. This is only done in uniform ifs, and only for condition
values that are SSA and only used once (in that if statement).
shader-db relative to MR 7726:
total instructions in shared programs: 8985667 -> 8974151 (-0.13%)
instructions in affected programs: 390140 -> 378624 (-2.95%)
helped: 810
HURT: 276
helped stats (abs) min: 1 max: 49 x̄: 17.77 x̃: 16
helped stats (rel) min: 0.10% max: 33.63% x̄: 7.97% x̃: 6.45%
HURT stats (abs) min: 1 max: 46 x̄: 10.42 x̃: 10
HURT stats (rel) min: 0.16% max: 21.54% x̄: 2.26% x̃: 2.03%
95% mean confidence interval for instructions value: -11.46 -9.75
95% mean confidence interval for instructions %-change: -5.76% -4.97%
Instructions are helped.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8709>
Now that all the drivers are converted, it's set to 'vk' by everyone so
there's no point in having the parameter.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
Now that all drivers are converted over, we can make a few changes.
First off, vk_device_init no longer takes two separate allocators
because we can assume that the parent instance is non-null and it can
pull the instance allocator from that. Second, dispatch tables and the
instance extension table are no longer optional. We leave the device
extension table optional for now because we don't do any verification at
vk_init_physical_device time and some drivers find it more convenient to
set the extensions later in their own physical_device_init for various
reasons.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
This moves v3dv over to using the new common dispatch layer code.
v2 (Jason Ekstrand):
- Remove some now dead function declarations
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
As we already have a reference to vk_instance at vk_physical_device,
that we are setting when calling vk_physical_device_init.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
This moves to using the common base structs for these two objects, but
doesn't use any of the new features yet.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
Things are going to start getting more complicated so let's avoid the
single mega-file approach.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
Larger arrays get lowered to scratch access later, as that turns out
to be faster, while the if-chain is actually faster for arrays of
2 elements.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>
PER_QUAD TMU lookups will partially override the predication mask on TMU
writes. If some but not all lanes in a quad are predicated out, setting
PER_QUAD will force them all to be enabled. This can result in TMU
access to bogus addresses when in nonuniform control flow. Also, since
PER_QUAD is needed to make sure derivatives work with helper
invocations, and derivatives are undefined in nonuniform control flow,
there is no reason to leave it enabled in this case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>
Similarly to if statements, uniform loops are now emitted without
predication, using simple branches for breaks and continues. The
uniformity of the loop is determined by running the
nir_divergence_analysis pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>
Fix defect reported by Coverity Scan.
Side effect in assertion (ASSERT_SIDE_EFFECT)
assignment_where_comparison_intended: Assignment job->ez_state =
VC5_EZ_DISABLED has a side effect. This code will work differently in a
non-debug build.
Fixes: cec2ed7c80 ("v3dv: fix disabling Early Z for the whole frame")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8666>
It is currently a bitset on top of a uint64_t but there are already
more than 64 values. Change to use BITSET to cover all the
SYSTEM_VALUE_MAX bits.
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>
From vkCmdBindPipeline spec:
"pipelineBindPoint is a VkPipelineBindPoint value specifying to
which bind point the pipeline is bound. Binding one does not disturb
the others."
But internally we were only handling one pipeline per command buffer,
so binding a pipeline of one type would override an alredy bound
pipeline of other type.
Note that for push constants, in the same way that we were keeping one
client array and one bo for the values, for all stages, independently
of the stageFlags specified by vkCmdPushConstants, we are keeping the
same idea here, so such client array and bo is still tied to the
command buffer, and used by the two pipeline bind points. That makes
far easier tracking the push constants. We could revisit in the future
if we want a more fine grained tracking.
Fixes the following crashes:
dEQP-VK.pipeline.push_constant.lifetime.pipeline_change_diff_range_bind_push_vert_and_comp
dEQP-VK.pipeline.push_constant.lifetime.pipeline_change_same_range_bind_push_vert_and_comp
v2 (from Iago review)
* Move removal of v3dv_resource definition to a different commit.
* Use the new v3dv_cmd_pipeline_state on the cmd buffer meta
sub-struct, call it gfx for consistency
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8613>
Although I assume that this should be caught by the validation layers,
recently while triaging the following tests:
dEQP-VK.ycbcr.query.*r8g8b8a8_unorm*
We found that they were setting a descriptorCount of zero, because it
was not handling correctly the differences between Vulkan 1.0 and
Vulkan 1.1.
So let's just assert, just in case it happens again, as that would
make the bugfixing far easier.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8614>
The documentation states that if we disable Early Z for the whole
frame in the RCL Tile Rendering Mode packet, then we should not
emit any draw calls with it enabled (which we can do by enabling
it in the CFG_BITS packet).
Since we emit our RCL after recording our draw calls in the BCL
and we were not considering there if any condition for global disable
would be met, it was possible that we end up with an incorrect
configuration when we decide for a global disable in the RCL, which
can cause rendering artifacts. This can be easily observed by simply
forcing the RCL bit to disable early Z in applications that are known
to enable it in CFG_BITS (such as the UE Shooter demo for example).
With this change we keep track of this scenario when we record
draw calls in the BCL and if decide that we need to disable EZ for
the entire job, we make sure we never enable it for any draw calls
in the frame.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8589>
This is an optimization that should make Z/S clears faster. To enable
this we can't have any Z/S loads or stores in the job. Also, it seems
that enabling early Z/S clearing is independent of whether early Z/S
testing is enabled.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8589>
There was a misunderstanding regarding the scope of some hardware bugs
that led us to think that:
1. The Clear Tile Buffer Z/S bit was broken
2. The Clear Tile Buffer RTs bit would also clear Z/S.
1) is not really true, what happened was that some other bugs for which
we need workarounds anyway would have that effect. 2) was only true
for V3D 4.1, so it doesn't affect v3dv.
This change makes proper use of the Z/S bit instead of falling back to
clearing all tile buffers every time we have a Z/S clear. This also
allows us to do color clears on the tile store (which is faster) rather
than falling back to the clear all RTs bit every time we have a Z/S clear.
v2: rewrite the original comment about the hardwarebug description to
include recent discussions with Broadcom instead of keeping it as
is and amending it with an update note.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8589>
We don't support them by hw, so we would need to get them
lowered. This fix some crashes while using renderdoc with UE4 shooter
demo traces.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8582>
Instead, wait for the specific queries being reset to
not be in use by the GPU.
This takes query pool resets in the UE4 Shooter demo from
50-60ms down to 0.5-2ms.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8554>
I saw this while inspecting CL dumps from the UE Shooter demo,
where they disable Z writes for occlusion queries. The hardware
is probably doing this internally, but it doesn't hurt
to do this explicitly and make CL traces consistent with intended
behavior.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8571>
If we have dirty descriptor set state we have to update our uniform
data to reference the new resources such as addresses for textures
or UBOs. This is known to have a high CPU cost, so we want to limit
this as much as we can.
It is a common rendering pattern in applications to render many objects
using the same pipeline, but modifying the descriptor sets bound to update
textures, UBOs, etc. In this scenario, we would be incurring in unnecessary
uniform stream updates for stages that don't access descriptor sets at all.
This change makes it so we track which shader stages in a pipeline
use descriptor set state and skips updating uniform streams for them
when dirty descriptor set state is the only reason requiring us to
generate new uniform streams for a draw call.
v2: reuse shader stage information from the pipeline set layouts
to track shader stages that use descriptor state.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8555>
Instead of checking whether the source and destination are the same,
we should check if the underlying BOs are the same, since we may
be suballocating resources from the same allocation and the kernel
will fail to execute jobs if the BO list has duplicated entries.
Fixes aborts with Unreal Engine due to failed TFU jobs.
Fixes: 30f1fc25ce ('v3dv: implement TFU blits')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8098>