We previously had GLSL do ldexp lowering to bitops, but NIR can do it
instead. It's tempting to just pass the NIR op through to the host Vulkan
driver, but to do that we'd need to split up NIR's flag between 32 and
64-bit support, and that's not worth anyone's time for an op we've never
seen used.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
In V3D we were doing this incorrectly by peeking into the sampler state
unconditionally, which is not correct if the TMU operations don't use
sampler state at all (like PBOs). This was causing us to fail the second
test in this sequence when both tests run back back to back in the same
process:
dEQP-GLES3.functional.texture.shadow.2d.linear.greater_or_equal_depth_component32f
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rg32f_cube
Here, the first test would setup sampler state for shadow comparisons and
the second test would setup a PBO upload, which would incorrectly pick
up the sampler state to decide about the TMU output size for the PBO
operation.
In V3DV we were doing this right looking through each texture/sampler
instruction and checking if they all involved shadow comparisons or had
relaxed precission, defaulting to 32-bit otherwise.
This special-casing for shadow comparisons also leaks from drivers
into the compiler where we are forced to emit some pieces of sampler
state for 32-bit outputs, so we had to special-case shadow instructions
there as well and we also had a fix for CS textures not having correct
sampler state representing shadow operations too. Finally,
we also had at least a couple of bugs where forcing 32-bit TMU output
through V3D_DEBUG wasn't correctly forcing shadow comparisons to actually
be 32-bit in all the right places, leading to visual bugs with the
option enabled (Sponza being one example of this). This change eliminates
all of these issues.
Finally, the performance improvement observed from special casing shadow
comparison is negligible, and in specific scenarios it can even be
detrimental to performance due to increased register pressure (Sponza with
PCF filtering set to 4 is an example of this again).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8684
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22284>
Get rid of the uint64 result pointer which was used by some query
types. Handle each switch case with self-contained code. Remove
unneeded casts. Use MIN2/MAX2 macros.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22281>
We were not initializing the PS invocation count to zero before
computing the sum of the per-thread results.
This fixes an issue where querying the result of the query more
than once would cause the result to grow larger each time.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22281>
In f6c06ef2f6 ("ci: Add manual rules variations to disable irrelevant
driver jobs."), we fixed this for *most* driver. This fixes up the last
driver, hopefully removing an annoying needless button in the UI for
some MRs.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22263>
there are two cases to be handled here:
* normal
* software
the latter case requires env vars based on the frontend, and if a sw
device isn't found then init should fail
the former case should (in theory) just yolo the first device and assume
that's what the user wanted based on whatever env vars and layers are
in use
fixes#7508, #7132
maybe also affects #8152
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22184>
For instance, with "piglit/bin/arb_shader_image_load_store-host-mem-barrier --quick -auto -fbo":
==18549==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61200000a059 at pc 0x7f65d8937b80 bp 0x7fff6ed19a00 sp 0x7fff6ed199f8
READ of size 1 at 0x61200000a059 thread T0
#0 0x7f65d8937b7f in evergreen_set_shader_images ../src/gallium/drivers/r600/evergreen_state.c:4277
#1 0x7f65d6b471b8 in st_bind_images ../src/mesa/state_tracker/st_atom_image.c:172
#2 0x7f65d6b76b26 in st_validate_state ../src/mesa/state_tracker/st_util.h:129
#3 0x7f65d6b76b26 in prepare_draw ../src/mesa/state_tracker/st_draw.c:88
#4 0x7f65d6b77c8a in st_draw_gallium ../src/mesa/state_tracker/st_draw.c:141
#5 0x7f65d72698a2 in _mesa_draw_arrays ../src/mesa/main/draw.c:1202
Fixes: a6b3792843 ("r600: add core pieces of image support.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22273>
Fixes: 5e1bd07a ("radeonsi: vcn: implement the get_decoder_fence vfunc")
The commit [5e1bd07a] puts a timeout on fence_wait which causes a 8k AV1
decoding regression on gfx940. By adding DECODER_FEEDBACK_TIMEOUT to
add fence wait time.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22268>
compute is a special case because the zink_shader itself is created
in a thread, which means it cannot be accessed directly at bind time
since it may not have finished creating itself yet
to avoid prematurely waiting on an async fence, the one value needed
at bind time can instead be stored separately
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22266>
nir_shader objects are hefty, and they really add up when there's a lot
of them. there's also not much use in keeping them around, as any time
they'll be used, they're always cloned first, and deserializing isn't
likely to be any slower than a clone
cuts driver memory usage by ~40% for tomb raider
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22266>
It's currently used when LS store output to LDS.
The LS/HS bug fix seems does not affect this case.
But we'd better treat it as other fixed args.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22045>
We have corresponding global dirty bits for each of the per-stage dirty
bits. We can use this to skip iterating over shader stages when there
is no per-stage dirty state to handle.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>
If a resource is dirty but already tracked by the current batch, no need
to process it at draw time.
Note that the batch could change (ie. new fb state bound, etc) after the
check if we need resource dirty tracking, but in these cases all the
dirty-resource state is marked dirty.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22224>