nir_intrinsic_index_size() expects a nir_intrinsic_index_flag, not
the position in the intrinsic's index list. This could cause
part of a multi-slot index to be ignored.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41593>
Update the headless Android WSI patch to fix intermittent timeout issues. It
now uses an ImageReader listener to actively drain and instantly release frames
from the buffer queue. This acts as a "null compositor" that prevents buffer
starvation while maintaining stable GPU backpressure.
This fixes dEQP-VK.wsi.android.maintenance1.* in newer VKCTS versions and
resolves the race conditions that caused occasional teardown crashes.
Also rebase build-deqp-gl_Build-Don-t-build-Vulkan-utilities-for-GL-builds.patch
on top of the updated WSI patch.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41541>
Timeline assignment has been asynchronous upon vkGetDeviceQueue2.
However, fence submission via vq can get ahead of it for things like
ring and vq synchronization. This change fixes vkGetDeviceQueue2 to be
synchronous instead, which is fine since it's off the critical path.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41582>
nir_def_init sets divergent = true, this means for something like
reduce(reduce(convergent)) we previously only optimized the inner
reduce.
No fossil changes at the moment, but I hit this when trying to
optimize shared memory to subgroup operations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41542>
This was fundamentally broken for workgroup sizes >= 8x8.
This fixes new VKCTS coverage
dEQP-VK.glsl.texture_functions.texture.*_compute, and also few tests
from the vkd3d-proton testsuite (note that quad derivatives is
currently disabled for < GFX12).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41483>
Otherwise, if a cmdbuf is recycled it would assume that a gang CS is
always is present even if it's not used. That means, it would emit
useless synchronization and use gang submit with a mostly empty gang
CS for nothing.
It seems better to create the gang CS on-demand only when it's strictly
required (for compute fallback with SDMA and task shaders). Even for
heavy uses of task shaders, that shouldn't hurt.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41543>
Some cooperative properties are defined by the driver itself and
are not a property of the HW. In particular whether the scope is
subgroup or workgroup is not directly related to the HW.
It could make sense encode the DPAS combinations into intel_device_info
but we are not using all possible combinations yet and wouldn't be very
useful in practice.
The new scheme was based on radv and will set us up for also filling
the flexible dimensions properties too.
Note: this also fixes a subtle issue where ARL was incorrectly inheriting
the PRE_XEHP configurations which included FLOAT16/FLOAT16/FLOAT16/FLOAT16
which it does not support.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
Since we don't have any DPAS-based implementation of those, it is odd to
support them in the emulation mode that is only enabled with the debug
flag INTEL_LOWER_DPAS nowadays. Remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
This splits the nir_move_to_top_input_loads option into 2 options. The latter
option is mainly for at_offset/at_sample loads. Then it updates most places to
use only the first option.
The rationale is that moving at_sample loads makes Control (game) shaders
worse, as per the code comment.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41167>
These tests were fixed by 68cb76de5d ("pco: Fix encoding of branch to an empty
block").
Fixes: ef860bcaa1 ("pvr/ci: Add dEQP-VK testing for BXS-4-64 on TI AM68 SK")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Acked-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41544>
The generated util_format_description(),
util_format_pack_description(), and
util_format_unpack_description_generic() helpers assert
format < PIPE_FORMAT_COUNT but not format >= 0. MSVC's prefast
static analyzer reports C33010 (UNCHECKED_LOWER_BOUND_FOR_ENUMINDEX)
on the subsequent array subscript, since it cannot prove the
non-negative side of the bound. Extending the existing assert in
the generator silences the warning across all three accessors.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41447>
blitter_get_fs_texfetch_col asserts target < PIPE_MAX_TEXTURE_TYPES
but not >= 0. MSVC's prefast static analyzer reports C33010
(UNCHECKED_LOWER_BOUND_FOR_ENUMINDEX) when target is later used as
an array subscript, since it cannot prove the non-negative side of
the bound. Extending the existing assert to both sides silences the
warning and is a real bound check.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41447>
draw_set_sampler_views asserts shader_stage < DRAW_MAX_SHADER_STAGE
but not >= 0. MSVC's prefast static analyzer reports C33010
(UNCHECKED_LOWER_BOUND_FOR_ENUMINDEX) when shader_stage is
subsequently used as an array subscript, since it cannot prove the
non-negative side of the bound. Extending the existing assert to
both sides silences the warning and is a real bound check.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41447>
The fix for this test is merged between the start and the merging of the
Vulkan CTS uprev MR.
Remove it from the fails list because it was already fixed.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41571>
f4812dc1 introduces optimizations that turn ior into bcsel. The MSL
compiler will incorrectly compile the shader internally when bcsel is used
leading to incorrect outputs. This commit adds a workaround that tricks
the MSL compiler into correctly compiling the shader internally.
Reviewed-by: squidbus <squidbus@proton.me>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41548>
Metal provides device properties for the recommended maximum memory usage and
the current amount of memory used. These can be used to provide an estimate
of heap usage and calculate a budget of memory usage by the application before
performance may degrade.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41523>
Also, re-title things to make it clear that the current text is about
implementing OpenGL[ES] extensions.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41400>
Unfortunately we have to disable concurrent binning by default
because it hurts performance in a number of desktop games without
any case where we know it helps.
There are less vertex fetch resource available in BV compared to BR,
so when binning runs in BV, there are many vertices, and vertices are
attribute heavy - BV has much worse performance than BR, sometimes more
than 50% worse.
Even with worse performance it won't be bad if concurrent binning
actually overlapped with other workload in those cases, but in case of
desktop games - there is almost never a chance for overlap.
However it's impossible to statically find out if binning on BV would
be much slower than on BR, and we also cannot statically predict if
there is enough overlap (if any) to cover for the performance penalty.
Given the above, I don't see a way out but to make concurrent binning
opt in via `tu_allow_concurrent_binning` driconf toggle.
Still allow concurrent binning in CI to catch issues early.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41394>
We can determine used components earlier.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>
Usually I'm able to run B580 capture on LNL, but in some cases the
oversubscription on replay would lead to allocation failures.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>