Need to flush the rasterizer and wait for everything to finish,
with new overlap flush isn't enough.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
When llvmpipe is allowed execute fragment shaders overlapping
with other stuff, we have to start using the pipe fences.
With presentation, the acquire path needs to signal a semaphore
that can be waited on by the user, so add support for passing
signal/wait semaphores for non-timeline in, and just put a
fence pointer in the semaphore for that case.
This fixes rendering once we allow overlapping rasterization.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
Refactor out the code for finishing a fence and used it in
pipeline barrier and event waiting.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
If the users ask for an infinte timeout, just pass it through
to gallium.
When llvmpipe ends up allowing async fragment shaders, it's important
to get this right for lots of CTS tests.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
The availability is meant to be the last integer value written,
but for pipeline stats this was being done wrong. calculate
the availability position properly.
With the old non-overlapping execution model queries never
were unavailable.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
Currently neither llvmpipe or softpipe ever leave any drawing in
the pipeline, but I'd like to change that for llvmpipe.
This makes drisw block for completed rendering before sending
data to the X server.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14923>
The zw were already known, but throw them in here too. I'm not extremely
happy with the description of "y", feels like there's a simpler
explanation there, but I couldn't find it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14672>
This causes the flushed_depth_texture is allocated without
multi sample. So the blit will cause VM fault.
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14990>
If the luminance formats aren't renderable, they back out to R*
formats, but those will end up with a 1 in alpha rather than 0 when
textured. So instead make them explicitly renderable, which will cause
the correct texture format swizzle to be applied.
Fixes query-rgba-signed-components and probably others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15097>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) on drm and we would
double-free self-imported BOs because kernel doesn't return
a unique gem_handle on each import.
Beside this the submit ioctl checks for duplicates in the
BO list and returns an error if there is one.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
KGSL on the other hand does not have the same problems,
at least not with ION buffers which are used for exportable
BOs on pre 5.10 android kernels.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5936
Fixes CTS tests: dEQP-VK.drm_format_modifiers.export_import.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15031>
Bit is flipped compared to all the other packets.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 705395344d ("intel/fs: Add support for compiling bindless shaders with resume shaders")
Fixes: c3ac9afca3 ("anv: Create and return ray-tracing pipeline SBT handles")
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15078>
It is good to know that we run out of ring space and have to wait. This
happens easily with fossilize-replay because encoding a
vkCreateGraphicsPipeline takes microseconds while executing it can take
milliseconds, >100ms sometimes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14966>
This reverts commit 29d319c767.
Now that we use nir_lower_bool_to_bitsize, we don't see 1-bit booleans
anymore, so the issue this fixed doesn't apply. Actually, that issue was
(in part) why I started looking into boolean handling in the first
place.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
lower_bool_to_bitsize can generate i2i32 from a 32-bit source, which is
trivial but needs to be handled explicitly to avoid going down the 8-bit
conversion path.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
This lets us avoid generating SWZ instructions. Those instructions could
be constant folded but that complicates the replication analysis
introduced in the next commit.
Almost no shader-db changes.
quadwords HURT: shaders/glmark/1-22.shader_test MESA_SHADER_FRAGMENT: 718 -> 722 (0.56%)
total quadwords in shared programs: 68169 -> 68173 (<.01%)
quadwords in affected programs: 718 -> 722 (0.56%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14576>
We've got lots of VK coverage on 618, so take some of the load off (but
leave a little bit of testing just to make sure we don't totally break
630). This should help with our Marge times since we've added some other
coverage to 630 that's started overloading the runners.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15085>
The old calculation for mat3 was clever, but it turns out that a
straightforward application of subdeterminants similar to how mat4 is
handled is more efficient: on a scalar architecture with some sort of
combined multiply+add instruction with a negate modifier (both fairly
common), the new determinant is 9 instructions vs. 15 for the old one,
and without the multiply-add it's 14 instructions vs. 18 for the old
one. When used as a routine for inverse() the savings are compounded,
because we now use the same method as used to compute the adjucate
matrix and so CSE can combine most of the calculations with the adjucate
matrix ones.
Once mat3 and mat4 use the same method for computing determinants, we
can combine them into a single recursive function. I also pulled up the
mat_subdet() function because it was doing basically what we need, so
it's now shared between determinant and inverse. This shrinks the
implementation significantly, as can be seen from the diffstat.
The real reason I want to change this, though, is that it fixes
dEQP-VK.glsl.builtin.precision_fp16_storage16b.inverse.compute.mat3 with
turnip. Qualcomm uses round-to-zero for 16-bit frcp, which combined with
some inaccuracy in the old method of calculating the determinant led us
to fail. Qualcomm's driver uses something like the new method to
calculate the determinant in the inverse. We could argue that Mesa's
method should be allowed, because round-to-zero for floating-point
division is within spec and there are no precision guarantees given for
determinant() or inverse(). However we might as well use the more
efficient method.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14652>
It won't work if the blob is fixed-size and we overrun the size, which
will be the case with the Vulkan pipeline cache.
This gets a bit tricky for the repeated-header optimization, because we
can't read the header from the blob. Instead we have to store the header
itself.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15028>
Label IDVS variants as being MESA_SHADER_{POSITION, VARYING} stages;
reserve the MESA_SHADER_VERTEX label for non-IDVS shaders. This reduces
confusion where a single shader compiles to two MESA_SHADER_VERTEX
shaders with different stats.
While we're at it, de-vendor the blend shader stage name; these stats
are internal anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15086>
Automatically packs and unpacks float <==> clamped 4:6 fixed point, used
for min/max LOD fields on the Sampler descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
C++ requires explicit casts from integers to enums. Fixes errors like
the following when trying to use Asahi GenXML from a GTest unit test.
src/asahi/lib/agx_pack.h:554:23: error: assigning to 'enum agx_channels' from incompatible type 'uint64_t' (aka 'unsigned long long')
values->channels = __gen_unpack_uint(cl, 0, 6);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
Required to clamp array indices against the array sizes per the GLSL
spec. Metal also does this, implying it's required by the hardware for
correct operation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
The texture descriptor we construct for reloading needs to respect the
surface's texture/layer selection. Fix exactly the same bug as
b8c31ac06d ("lima: fix glCopyTexSubImage2D").
Fixes:
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgba
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
Streamline access to particular layer/levels. These patterns show up
across the driver and are easy to screw up, so add a helper.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
As far as I can tell, these *must* be tiled. Other than that, the
implementation is completely routine. Passes
dEQP-GLES3.functional.texture.format.unsized.*2d_array*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>