OP_TYPE_CONVERTS denotes an opcode that returns a different type than is
source (going from int-domain to float-domain or vice versa), named
after the f2i/i2f family of opcodes it covers. We care because source
mods are determined by the source type (i/f) but output modifiers are
determined by the output type (equals the source type, unless the op
type converts, in which case it's the opposite).
The upshot is that floating-point compares (feq/fne/etc) actually do
type-convert. That is, that take in floating-points and output in
integer space (a boolean), so we mark them off this way to ensure the
correct output modifiers are used.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's just -easier- to render to aligned framebuffers. For winsys
targets, we already align, but even for an internal linear FBO we ought
to align everything nicely.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we support custom strides on mipmapped textures (theoretically,
at least), extend the stride check to support mipmaps. Fixes incorrect
strides of linear windows in Weston.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We move some coding packing the texture/sampler descriptors into
dedicated functions (out of the terrifyingly long emit_for_draw
monolith), cleaning them up as we go.
The discovery triggering the cleanup is the format for including manual
strides in the presence of mipmaps/cubemaps. Rather than placed at the
end like previously assumed, they are interleaved after each address.
This difference is relevant when handling NPOT linear mipmaps.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We refactor the wallpaper rendering code to separate the
wallpaper-specific bits from the general blitting capabilities. In the
(hopefully near) future, we'll turn this on to implement real Gallium
blits, e.g. for automatic mipmap generation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This patch does a substantial cleanup of the code for handling AFBC,
moving various disparate misplaced functions into a new central
pan_afbc.c file.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Use the infrastructure in wayland/ci-templates to build the container
images.
This prevents from getting into some situations in which the images
wouldn't be rebuilt, and allows us to share some infrastructure with
other projects in freedesktop.org.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.
To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.
Fixes kernel errors reported when running texture_upload tests in glbench.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Extend copy transfers to also be used for busy textures.
Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.
This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.
Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.
Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Introduce definitions for the copy_transfer3d protocol command and virgl
capability. This command transfers data to the host by copying through
another resource, and will be used in upcoming commits to avoid waiting
when transferring data for busy resources.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
This could help performance when trying to recreate such resources for
copy transfers.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
If we are not allowed to block, and we know that we will have to wait,
either because the resource is busy, or because it will become busy due
to a readback, return early to avoid performing an incomplete
transfer_get. Such an incomplete transfer_get may finish at any time,
during which another unsynchronized map could write to the resource
contents, leaving the contents in an undefined state.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate
code snippets.
Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they
don't go through the cache in the previous commit.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Resources for fences should not be from the cache, since we are basing
the fence status on the resource creation busy status.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
We will need the full info. This also speeds up
virgl_attach_res_atomic_buffers and fixes resource leaks when the
context is destroyed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
virgl_shader_binding_state will be used to manage all per-stage
shader bindings. For now, it manages only sampler views.
This replaces virgl_textures_info and fixes some issues
- start_slot is now honored
- views outside of [start_slot, slart_slot+count) are unmodified
- views are released when the context is destroyed
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)
Split out a separate program config state group to run early before the
other groups.
This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.
It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw). Re-work the GMEM code to structure things a bit closer to
the blob. This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop. But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.
Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking like it is the newer fw that is
headed for linux-firmware. This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This seems to be in a block of non buffered/context regs. Blob always
WFIs before write, so probably a good idea.
Annoyingly, compared to ealier gens, it is a bit harder to tell from the
register offset whether it is a buffered reg, it isn't as simple as
everything below 0x2000, it seems.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
In some cases the draw for the text wasn't working. This seems to be
fixed by resyncing some of the "golded registers" from blob (initial
values were based on somewhat older blob version).
Perhaps good to have a bit of soak time on master, but would be good
to eventually land in 19.x stable branches.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We were previously lowering to inand, but the second arg was not
duplicated so inot would always return ~0. Oops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This uses the new mesa/st functionality for NIR I/O vectorization, which
eliminates a number of corner cases (resulting in assorted dEQP
failures and regressions) and should improve performance substantial due
to lessened pressure on the load/store pipe.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This pass interfered with the more delicate path required for
non-vectorized I/O. It's also ugly and duplicating the job of an actual
honest-to-goodness scheduler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This now boils down to just picking between binning or vertex shader
and dummy_fs or real fs, which we can do in a couple of lines of code
instead. The constlen logic isn't doing what it thinks it's doing,
both constlens at this point
MAX2(s[VS].constlen, align(state->bs->constlen, 4));
are binning shader constlens. We'll have to revisit the constlen
logic, but this commit doesn't change how it works.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We have a similar function in fd6_program.c. Move to fd6_emit.h and
share.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's already a bit of duplicated logic here and tessellation will
add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the
process.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>