gallium hud uses different tgsi fragment shaders for text and background
quads, which have different varying layouts. Since these are compiled
directly from tgsi they bypass some optimizations and may not work
properly on all backends.
A simple fix for the varying layout problem is to define a vs_text
shader to match the varyings in fs_text so the problem is avoided.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10222>
This field was introduced 2 months ago and it breaks virgl
compatibility between guest/host. Switch the new added field
to the end. We will still have compatibility issue but the
"bug window" is much smaller.
Fixes: e778aceaae ("virgl: update headers")
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10333>
As requested by Ken since we're now (after 20e2c7308f)
re-emitting constants at the beginning of every batch which may lead
to some redundant constant restore overhead. No statistically
significant performance changes observed with either change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9903>
The code around it checks that it is found, so clearly it was meant to
be optional, not arequired.
Fixes: cd2832ee51
("meson: add an optional OpenMP dependency for AMD tests")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested-by: Bernd Kuhls <bernd.kuhls@t-online.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10286>
One of the conclusions of our recent clean up on the limits was that
the pipeline limits needed to be the per-stage limits multiplied by
the number of stages.
But until now we only have a set of descriptor maps for the full
pipeline. That would work if we could set the same limit per pipeline
that per stage, but that is not the case. So if, for example, we have
the fragment shader using V3D_MAX_TEXTURE_SAMPLERS textures, and then
the vertex shader, with a different descriptor set, using one texture,
we would get an index greater that V3D_MAX_TEXTURE_SAMPLERS. We assert
that index as an error on the vulkan backend, but fwiw, it would be
also asserted on the compiler.
With this commit we track and allocate a descriptor map per stage,
although we reuse the vertex shader descriptor map for the vertex bin.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10272>
Fixes piglit crashes doing glCopyTexSubImage from (for example)
PIPE_FORMAT_Z24_UNORM_S8_UINT to PIPE_FORMAT_Z32_FLOAT_S8X24_UINT where,
in addition to reading the source Z values incorrectly, we would try to
dereference the missing separate stencil of the Z24S8 buffer.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10328>
I have a note in -flakes.txt about how it can flake to being a Crash
instead of a Fail, but I haven't been able to reproduce that flake today.
It does always fail in the GS subtest, though, so quiet the flake spam in
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10330>
We set this in deqp-runner runs to disable Mesa debug/debugoptimized
builds printing to stderr for expected GL test behavior, and with the
addition of dEQP-EGL mapi got very verbose. Move it to a call_once() too
to avoid data races.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10240>
This will be used to test llvmpipe against Xvfb and freedreno against
Xorg. We keep the core deqp testing on surfaceless because
--deqp-surface-type=pbuffer fails on x11_egl_glx, =fbo has never worked in
VK-GL-CTS, and =window would increase test runtime for all the
swapbuffers.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10240>
A fragment shader that forgets to write to one of the bound render
targets (in the presence of MRT) invalidates a core FPK invariant. Check
for this and add it to the naughty list.
I don't think this needs a backport since FPK isn't really used yet.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10271>
Easier to understand (and match to actual hardware behaviour) this way.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10271>
This was renamed when I was in high school. I remember updating the
Midgard compiler while sitting in AP Physics.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10296>
We already support everything necessary and just need to ask the frontend
to DTRT. This makes UBO0 get more tightly packed, saving upload space,
and allows for _mesa_optimize_state_parameters() as well.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10150>
Since we can handle arbitrary offsets in the load_ubo paths, we can let
the GLSL compiler pack UBO 0 tighter, saving uniform uploads. This may
cause some straddling loads that could reduce performance for vec3s, but
those are rare in shader-db and we expect this to be outweighed by the
wins for normal float/vec2 packing.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10151>
Without this extension, we misrender when custom border colors are
used. Let's document this, and emit a warning when the extension is
missing.
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10316>
Add vn_queue_submission_count_batch_semaphores that works on a single
batch at a time. Also avoid code duplication.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10290>
In looking at the profile of dEQP, GLES3 was spending 5-10% of its time in
ReadPixels, and almost all of that is b8g8r8a8_unorm8. It's really slow
because we're getting about 47MB/s by doing uncached reads 32 bits at a
time in the code-generated unpack. If we use NEON to generate larger bus
transactions, we can speed things up to 136MB/s. In comparison, raw
ldr/str read/writes with no byte swapping can hit a max of 216MB/sec.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10014>
We have only a few callers of unpack that do rects, so add a helper that
iterates over y adding the strides. This saves us 36kb of generated code
and means that adding cpu-specific variants for RGBA format unpack will be
much simpler.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10014>
OpTerminateInvocation provides the behavior required by the GLSL
discard statement, which we already implement.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
The "demote" intrinsic has the semantics of D3D discard, which means
it doesn't change the control flow, allowing derivatives to work.
On A6xx there is no known way to check whether invocation was demoted,
thus we use nir_lower_is_helper_invocation.
Add "logical" OPC_DEMOTE which is later translated to "kill".
Such separation is necessary to run "kill" specific optimizations
which are invalid for "demote".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
Some hardware doesn't have a way to check if invocation was demoted,
in such case we have to track it ourselves.
OpIsHelperInvocationEXT is specified as:
"An invocation is currently a helper invocation if it was originally
invoked as a helper invocation or if it has been demoted to a helper
invocation by OpDemoteToHelperInvocationEXT."
Therefore we:
- Set gl_IsHelperInvocationEXT = gl_HelperInvocation
- Add "gl_IsHelperInvocationEXT = true" right before each demote
- Add "gl_IsHelperInvocationEXT = gl_IsHelperInvocationEXT || condition"
right before each demote_if
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
Finally LAVA users will be able to see deqp XMLs on failures from the
job's artifacts browser. This replaces a couple of one-off minio uploads
in the piglit runner.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10297>
Make sure to preserve signed zeroes.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
on GFX6 (Pitcairn). Untested on GFX7.
Fixes: 54a09545ec ("aco: optimize a*0.0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>
immediates_count and immediates_size are supposed to have the same
units, but it was only incrementing immediates_count by 1. While we're
here, also fix the case where constants are specified out-of-order.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10291>
Otherwise we can end up in situations like having divide-by-zero. If the optimization is smart enough
that we end up with a *constant* divide-by-zero, then the DXIL validator will fail to sign, which
can trigger fatal errors with CLOn12.
We want to run an initial translation of all kernels during program build, but at that point we don't
know the local size to be able to specify it through kernel specialization data.
v2: Metadata output of 0 is used to indicate that the size wasn't explicitly specific. Copy the
size to the metadata before overriding it to (1,1,1). If conf was explicitly specified,
update the metadata again (though nobody should be paying attention to it).
Closes: https://github.com/microsoft/OpenCLOn12/issues/20
Closes: https://github.com/darktable-org/darktable/issues/8700
Reviewed-By: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10303>
Certain games create and destroy lots of resources without binding them.
This can take quite a bit of time and even create unneeded
synchronization. However, we know that if a resource was never bound to
anything, it can be cached. This change does that.
Counting the number of uncached allocation with a tabletop simulator trace:
Before: 2967 uncached allocations over the replay
After: 24 uncached allocations over the replay
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10225>
If a has been lowered to float16 here, then we end up trying to
construct a vector of mixed precision, which the validator asserts
about.
So let's make sure we use the same type for all arguments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10201>