There are blit shader specific optimizations available.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9002>
We don't need to create display-targets for shared or scanout, becuase
we never even see those in the sw-winsys case.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8858>
The only type that should really require to be host-visible is the
display-target, and that's just because of our silly flush_frontbuffer
implementation.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8858>
We only need these display-targets to be linear in the case of a
software winsys. In the DRM case, they can be tiled without issues.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8858>
The reason we check for staging-resources here is really because they
are the only images guaranteed to be host-visible.
But on UMA architectures, it's quite likely to have memory that is
*both* host-visible *and* device-local, so let's see what we found
instead.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8858>
spec@arb_timer_query@timestamp-get seems to fail on D3D12 / Windows
every now and then. Until that's been figured out, let's disable the
test in CI.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8978>
The Vulkan spec says the following for vkCmdBeginTransformFeedbackEXT:
"For each element of pCounterBuffers that is VK_NULL_HANDLE, transform
feedback will start capturing vertex data to byte zero in the
corresponding bound transform feedback buffer."
While not quite as explicit, similar wording exists for
vkCmdEndTransformFeedbackEXT in "Valid Usage" section.
So, this means that we should handle NULL in this case, and simply
ignore the corresponding reads and writes.
This fixes a whole lot of crashes when using transform-feedback with
Zink.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8982>
By the Vulkan specification, and similarly to many other Vulkan calls,
it is allowed to destroy a null descriptor update template.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Fixes: af5f13e58c ("anv: add VK_KHR_descriptor_update_template support")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9005>
This basically processes UBO loads as uniform loads by writing
the load address to the unifa register and reading sequential
values with ldunifa.
This process is faster than going through the TMU, but we can only
use it when the address we are reading from is uniform across all
channels, since we are basically reading from the UBO address
as if it was a uniform stream.
This leads to better performance in the UE4 Shooter demo.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
ldunifa reads a uniform from the unifa address and updates the unifa
address implicitly, so if we dead-code-eliminate one a follow-up
ldunifa will not read from the appropriate address.
We could avoid this if the compiler ensures that every ldunifa is
paired with an explicit unifa, so for example if we are reading a
vec4, we could emit:
unifa (addrr)
ldunifa
unifa (addr+4)
ldunifa
unifa (addr+8)
ldunifa
unifa (addr+12)
ldunifa
instead of:
unifa (addr)
ldunifa
ldunifa
ldunifa
ldunifa
But since each unifa has a 3 delay slot before we can do ldunifa,
that would end up being quite expensive.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
The simulator asserts on this, which can happen if we merge a ldunif
(or any other instruction that reads a uniform implicitly) and
ldunifa in the same instruction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
unifa writes the addresss from which follow-up ldunifa loads,
and each ldunifa increments the unifa addeess by 32-bit so the
loads need to be ordered too.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware
we are targetting. Our version resolution doesn't allow us to check
for 4.2 versions lower than .14, but that is okay because the
simulator would still validate this in any case.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA
(which isn't a TMU write address). This change passes a devinfo to
any functions that need to do these checks so we can account for the
target V3D version correctly.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
Useful for determining whether certain optimizations are legal for a
compute shader (e.g. optimizing workgroup size in the driver).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6312>
Don't suppress inf/nan. Triggers bugs in broken apps like glmark2 (fixed
upstream but traces don't have the fix yet), so update the trace
expectations.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7550>
Shaders are now shared across contexts, so we'd like to avoid requiring
access to a full context. Instead, we pass the screen and an uploader
to use.
Fixes: 84a38ec133 ("iris: Enable PIPE_CAP_SHAREABLE_SHADERS.")
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8922>
Now that shaders are shared between contexts, we can't pre-bake the
shader scratch address into the derived 3DSTATE_XS packets. Scratch
buffers are and must be per-context, as multiple contexts could be
executing shaders using scratch at the same time.
So instead, we leave that field blank when pre-filling those packets
up-front, and merge in the actual address when emitting them. It's
a little more overhead, but only in the case where scratch is used.
Fixes: 84a38ec133 ("iris: Enable PIPE_CAP_SHAREABLE_SHADERS.")
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8922>
I cargo-culted these from the llvmpipe tests, but they seem to pass and
not take much time or memory at all, so let's try enabling them. If this
works fine, we might want to try the same for llvmpipe as well...
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8979>
Now that we lower FAU correctly, we don't need to write the extra move
explicitly, it will be lowered in later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8973>
Move and reshape bi_lower_fau to bi_schedule.c. This generalizes the
pass for FAU reads, allowing copyprop to work with FAU without problems.
The pass must run immediately before scheduling. Its post-conditions are
directly specified as the scheduler's pre-conditions. It momentarily
will depend on internal scheduler predicates. It is, for all intents and
purposes, part of the scheduler. Keep it all together.
Finally, adjust the 0 handling to avoid a move at the expense of
constrained scheduling of something like `FADD.v2f16.clamp_0_1 u0, #0`
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8973>
Will prevent failures when we start using FAU together with modifiers in
a few commits.
Fixes: fc7770b1dd ("pan/bi: Add trivial rewrite helpers")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8973>
Ensure we don't fall over if we have an instruction like
FADD.f32 u0, #0
In this case, the tuple's FAU requirement implies the instruction can be
scheduler without lowering to the FMA slot but not the ADD slot.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8973>
Uses "u3, u3[1]" syntax which is close enough to the assembly syntax
"u3.w0, u3.w1".
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8973>