Ensure that the register's liveness is not expanded to loops.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
Those SENDs are still doing a full register write. We just inserted
some predication for a workaround.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
There are a number of instances of the dead code elimination pass that
could reduce the count. For some reason this also seems to affect
register allocation itself.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
These were running on armhf because that's the default in the custom
distro that Raspberry Pi provides, but arm64 is ~20% faster, and we
already run weekly tests on both arm64 & armhf, so let's keep only the
faster one in the pre-merge path.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22272>
When a graphics pipeline library is created with only the vertex input
state, the driver binds this state at pipeline bind time. Though the
vertex binding stride is not necessarily dynamic, in this case the
pipeline stride should be used.
This fixes GPU hangs with recent
dEQP-VK.pipeline.fast_linked_library.vertex_input.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22285>
In V3D we were doing this incorrectly by peeking into the sampler state
unconditionally, which is not correct if the TMU operations don't use
sampler state at all (like PBOs). This was causing us to fail the second
test in this sequence when both tests run back back to back in the same
process:
dEQP-GLES3.functional.texture.shadow.2d.linear.greater_or_equal_depth_component32f
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rg32f_cube
Here, the first test would setup sampler state for shadow comparisons and
the second test would setup a PBO upload, which would incorrectly pick
up the sampler state to decide about the TMU output size for the PBO
operation.
In V3DV we were doing this right looking through each texture/sampler
instruction and checking if they all involved shadow comparisons or had
relaxed precission, defaulting to 32-bit otherwise.
This special-casing for shadow comparisons also leaks from drivers
into the compiler where we are forced to emit some pieces of sampler
state for 32-bit outputs, so we had to special-case shadow instructions
there as well and we also had a fix for CS textures not having correct
sampler state representing shadow operations too. Finally,
we also had at least a couple of bugs where forcing 32-bit TMU output
through V3D_DEBUG wasn't correctly forcing shadow comparisons to actually
be 32-bit in all the right places, leading to visual bugs with the
option enabled (Sponza being one example of this). This change eliminates
all of these issues.
Finally, the performance improvement observed from special casing shadow
comparison is negligible, and in specific scenarios it can even be
detrimental to performance due to increased register pressure (Sponza with
PCF filtering set to 4 is an example of this again).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8684
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22284>
the spec for vkGetDrmDisplayEXT says:
"If there is no VkDisplayKHR corresponding to the connectorId on the
physicalDevice, the returning display must be set to VK_NULL_HANDLE.
The provided drmFd must correspond to the one owned by the physicalDevice.
If not, the error code VK_ERROR_UNKNOWN must be returned. (...)
The given connectorId must be a resource owned by the provided drmFd.
If not, the error code VK_ERROR_UNKNOWN must be returned"
We were only setting the display pointer to VK_NULL_HANDLE if the provided
drmFd was valid, however, there are CTS tests checking that it is also set
to NULL when it is not.
Fixes the following test on all drivers exposing EXT_acquire_drm_display
(tested with Intel and V3DV):
dEQP-VK.wsi.acquire_drm_display.acquire_drm_display_invalid_fd
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22283>
Usually, we postpone acquisition until a swapchain is created, but there are
some cases with display extensions (at least with EXT_acquire_drm_display)
where we need to acquire before a swapchain is ever created.
Fixes various tests in:
dEQP-VK.wsi.acquire_drm_display.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22283>
Get rid of the uint64 result pointer which was used by some query
types. Handle each switch case with self-contained code. Remove
unneeded casts. Use MIN2/MAX2 macros.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22281>
We were not initializing the PS invocation count to zero before
computing the sum of the per-thread results.
This fixes an issue where querying the result of the query more
than once would cause the result to grow larger each time.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22281>
In f6c06ef2f6 ("ci: Add manual rules variations to disable irrelevant
driver jobs."), we fixed this for *most* driver. This fixes up the last
driver, hopefully removing an annoying needless button in the UI for
some MRs.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22263>
We're about to add a dependency on stuff from the intel-rules, and
moving virgl down here allows us to depend on them without having to
move the definition out of the intel-section.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22263>
there are two cases to be handled here:
* normal
* software
the latter case requires env vars based on the frontend, and if a sw
device isn't found then init should fail
the former case should (in theory) just yolo the first device and assume
that's what the user wanted based on whatever env vars and layers are
in use
fixes#7508, #7132
maybe also affects #8152
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22184>
Note: The meaning of clockwise vs counter-clockwise changes after the
yz flip, therefore the determination of winding needs to be done before
the yz flip logic. Therefore the yz flip is moved to the GS and applied
as a lowering on top of the base GS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22277>
For instance, with "piglit/bin/arb_shader_image_load_store-host-mem-barrier --quick -auto -fbo":
==18549==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61200000a059 at pc 0x7f65d8937b80 bp 0x7fff6ed19a00 sp 0x7fff6ed199f8
READ of size 1 at 0x61200000a059 thread T0
#0 0x7f65d8937b7f in evergreen_set_shader_images ../src/gallium/drivers/r600/evergreen_state.c:4277
#1 0x7f65d6b471b8 in st_bind_images ../src/mesa/state_tracker/st_atom_image.c:172
#2 0x7f65d6b76b26 in st_validate_state ../src/mesa/state_tracker/st_util.h:129
#3 0x7f65d6b76b26 in prepare_draw ../src/mesa/state_tracker/st_draw.c:88
#4 0x7f65d6b77c8a in st_draw_gallium ../src/mesa/state_tracker/st_draw.c:141
#5 0x7f65d72698a2 in _mesa_draw_arrays ../src/mesa/main/draw.c:1202
Fixes: a6b3792843 ("r600: add core pieces of image support.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22273>
Fixes: 5e1bd07a ("radeonsi: vcn: implement the get_decoder_fence vfunc")
The commit [5e1bd07a] puts a timeout on fence_wait which causes a 8k AV1
decoding regression on gfx940. By adding DECODER_FEEDBACK_TIMEOUT to
add fence wait time.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22268>
When using multiple binaries, we don't know the required number of VGPRs beforehand,
which means we either have to over-allocate VGPRs or avoid shared VGPRs.
As bpermute is the only instructions needing shared VGPRs, we decide for the latter.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22267>
These helper functions will only get invoked for GFX < 11 and the
L3BypassDisable field is present starting from GFX12+.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22275>
This was the last missing feature for GPL. The main problem is that
the on-disk shaders cache size will increase a lot because we don't
deduplicate shaders but there is on-going work to improve that.
We also can't use the shaders cache for libraries created with the
RETAIN_LINK_TIME_OPTIMIZATION flag and module identifiers because we
don't know the SPIR-V and thus can't retain NIR shaders for linking.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
Even if we are able to get the assembly from the shaders cache for
graphics pipeline libraries, we still need to retain NIR shaders in
case the LTO pipelines won't be find in the cache.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
This is to generate a different key for a library created with
FRAGMENT_SHADER_BIT and no FS (ie. it would generate a noop FS) and
a library created with FRAGMENT_OUTPUT_INTERFACE with no CB attachments.
Otherwise, the same key would be generated and this would corrupt
the cache.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>