AMD recommends doing this to speed up the CP when it processes
the draw ring entries. LLPC also does this.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22211>
When writing the draw ready bit, don't write the high 24 bits
of DWORD3, because that is used by the HW for something else
according to LLPC.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22211>
We have corresponding global dirty bits for each of the per-stage dirty
bits. We can use this to skip iterating over shader stages when there
is no per-stage dirty state to handle.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
If a resource is dirty but already tracked by the current batch, no need
to process it at draw time.
Note that the batch could change (ie. new fb state bound, etc) after the
check if we need resource dirty tracking, but in these cases all the
dirty-resource state is marked dirty.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
This wasn't taking into account a change in corresponding bit in
writeable_bitmask, causing problem if an SSBO was first bound for
read, and then rebound for write, we wouldn't update the buffers
valid range. Instead just drop the premature optimization.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
Previously when there was an & or | with two BitmaskEnums, the compiler
would try to cast the RHS and find a matching overload, but there were
many different casts (to the enum itself, to an integer, to a boolean,
etc.) each with a matching overload which meant that it couldn't pick
one and errored out due to an ambiguous overload. Fix this by
explicitly providing an overload that takes a BitmaskEnum on the RHS.
It has to also provide a BitmaskEnum output, so that subsequent
operators with the result on the LHS (e.g. when or'ing together three
BitmaskEnums without any parentheses tricks) also get the right
overload.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
NIR loop unrolling is only working if the loop counter is a scalar.
So keep the loop counter separate and move the aL emulation and
the aL increment to a new register.
This allows loop unrolling with vec4 backends where unconditional
scalarizing of phi nodes is undesirable, like for example r300.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7222
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21243>
If a compute pipeline is bound after a raytracing pipeline, the
computes shader slot (aka RT prolog) will be overwritten.
To fix this, move the RT prolog outside of the compute shader slot.
Fixes: d109362a3d ("radv: copy bound shaders to the cmdbuf state")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22235>
Inspired by Nicolai Hähnle's commit in LLPC.
Instead of using a SALU instruction to add to the scalar
offset, rely on the buffer swizzling and use constant offset.
Fossil DB stats on GFX1100:
Totals from 47910 (35.51% of 134913) affected shaders:
CodeSize: 87927612 -> 86968136 (-1.09%)
Instrs: 17584007 -> 17440094 (-0.82%)
Latency: 97232173 -> 97126311 (-0.11%)
InvThroughput: 9904586 -> 9905288 (+0.01%); split: -0.02%, +0.02%
VClause: 544430 -> 542566 (-0.34%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22227>
Bos that will be scanout in display need to be allocated with
flags = XE_GEM_CREATE_FLAG_SCANOUT in Xe and that implies to different
caching rules for this buffer.
So here not allowing to get scanout buffer from cache or allow it
to be placed in a cache bucket for reuse.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22060>
Bos that will be exported need to be allocated with vm_id = 0 in Xe,
so don't try to get a bo from cache that was allocated with a
valid vm_id.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22060>
Xe KMD requires special handling for exported buffers during creation.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22060>
This lets us provide a vk_device_memory_range helper similar to what's
provided for buffers for dealing with VK_WHOLE_SIZE. We can also handle
flags and some annoyance around Android hardware buffer import.
Reviewed-by: Lina Versace <lina@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22038>