Passes the ACCESS_CAN_REORDER flag from NIR on to the backend so that we
can lower the loads to a non-volatile SEND. This allows the scheduler to
freely reorder them around stores or fences.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
Our scheduler is overly conservative about reordering instructions
around memory writes or fences. Fortunately, there are several simple
assumptions we can make about our IR to schedule these things a lot
more fluidly:
* Unless its an EOT, a SEND instruction's side effects will only be
observed through other SEND instructions
* The effects of workgroup barriers, memory fences, and BRW_OPCODE_SYNC,
are only used in the IR to synchronize SEND instructions
* All other scheduler dependencies related to memory access are already
expressed through the source and destination operands
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
`1 << 31` invokes signed shift UB. When the int result is assigned
to uint64_t, sign extension produces 0xFFFFFFFF80000000 (~18 EiB)
instead of the intended 0x80000000 (2 GiB).
Use 1ull << 31 to perform the shift in unsigned 64-bit type.
The 2 GiB value matches the surrounding finite cap values and
OpenCL minimum requirements, making the original intent clear.
Detected by UBSan with piglit.
Fixes: a65b74af51 ("llvmpipe: init shader and compute caps")
Signed-off-by: yserrr <dlwognsdc610@naver.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41464>
Intel uses nir_lower_bit_size to convert 8-bit integer values to 16-bit
for most instructions. By constant folding u2u8 or i2i8 through a bcsel,
this lowering is undone.
Fixes assertion failure in fossils/parallel-rdp/small_subgroup.foz.
fossilize-replay: src/intel/compiler/brw/brw_from_nir.cpp:852: void brw_from_nir_emit_alu(nir_to_brw_state&, nir_alu_instr*, bool): Assertion `brw_type_size_bytes(op[i].type) > 1' failed.
v2: Reject all integer conversions. Suggested by Daniel Schürmann.
Fixes: f4812dc11d ("nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41412>
It seems like davinci resolve conflicts on those symbols and we got
regressions from our static libstdc++ linking workaround.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41488>
nir_intrinsic_index_size() expects a nir_intrinsic_index_flag, not
the position in the intrinsic's index list. This could cause
part of a multi-slot index to be ignored.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41593>
Update the headless Android WSI patch to fix intermittent timeout issues. It
now uses an ImageReader listener to actively drain and instantly release frames
from the buffer queue. This acts as a "null compositor" that prevents buffer
starvation while maintaining stable GPU backpressure.
This fixes dEQP-VK.wsi.android.maintenance1.* in newer VKCTS versions and
resolves the race conditions that caused occasional teardown crashes.
Also rebase build-deqp-gl_Build-Don-t-build-Vulkan-utilities-for-GL-builds.patch
on top of the updated WSI patch.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41541>
Timeline assignment has been asynchronous upon vkGetDeviceQueue2.
However, fence submission via vq can get ahead of it for things like
ring and vq synchronization. This change fixes vkGetDeviceQueue2 to be
synchronous instead, which is fine since it's off the critical path.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41582>
nir_def_init sets divergent = true, this means for something like
reduce(reduce(convergent)) we previously only optimized the inner
reduce.
No fossil changes at the moment, but I hit this when trying to
optimize shared memory to subgroup operations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41542>
This was fundamentally broken for workgroup sizes >= 8x8.
This fixes new VKCTS coverage
dEQP-VK.glsl.texture_functions.texture.*_compute, and also few tests
from the vkd3d-proton testsuite (note that quad derivatives is
currently disabled for < GFX12).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41483>
Otherwise, if a cmdbuf is recycled it would assume that a gang CS is
always is present even if it's not used. That means, it would emit
useless synchronization and use gang submit with a mostly empty gang
CS for nothing.
It seems better to create the gang CS on-demand only when it's strictly
required (for compute fallback with SDMA and task shaders). Even for
heavy uses of task shaders, that shouldn't hurt.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41543>
Some cooperative properties are defined by the driver itself and
are not a property of the HW. In particular whether the scope is
subgroup or workgroup is not directly related to the HW.
It could make sense encode the DPAS combinations into intel_device_info
but we are not using all possible combinations yet and wouldn't be very
useful in practice.
The new scheme was based on radv and will set us up for also filling
the flexible dimensions properties too.
Note: this also fixes a subtle issue where ARL was incorrectly inheriting
the PRE_XEHP configurations which included FLOAT16/FLOAT16/FLOAT16/FLOAT16
which it does not support.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
Since we don't have any DPAS-based implementation of those, it is odd to
support them in the emulation mode that is only enabled with the debug
flag INTEL_LOWER_DPAS nowadays. Remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
This splits the nir_move_to_top_input_loads option into 2 options. The latter
option is mainly for at_offset/at_sample loads. Then it updates most places to
use only the first option.
The rationale is that moving at_sample loads makes Control (game) shaders
worse, as per the code comment.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41167>
These tests were fixed by 68cb76de5d ("pco: Fix encoding of branch to an empty
block").
Fixes: ef860bcaa1 ("pvr/ci: Add dEQP-VK testing for BXS-4-64 on TI AM68 SK")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Acked-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41544>
The generated util_format_description(),
util_format_pack_description(), and
util_format_unpack_description_generic() helpers assert
format < PIPE_FORMAT_COUNT but not format >= 0. MSVC's prefast
static analyzer reports C33010 (UNCHECKED_LOWER_BOUND_FOR_ENUMINDEX)
on the subsequent array subscript, since it cannot prove the
non-negative side of the bound. Extending the existing assert in
the generator silences the warning across all three accessors.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41447>
blitter_get_fs_texfetch_col asserts target < PIPE_MAX_TEXTURE_TYPES
but not >= 0. MSVC's prefast static analyzer reports C33010
(UNCHECKED_LOWER_BOUND_FOR_ENUMINDEX) when target is later used as
an array subscript, since it cannot prove the non-negative side of
the bound. Extending the existing assert to both sides silences the
warning and is a real bound check.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41447>
draw_set_sampler_views asserts shader_stage < DRAW_MAX_SHADER_STAGE
but not >= 0. MSVC's prefast static analyzer reports C33010
(UNCHECKED_LOWER_BOUND_FOR_ENUMINDEX) when shader_stage is
subsequently used as an array subscript, since it cannot prove the
non-negative side of the bound. Extending the existing assert to
both sides silences the warning and is a real bound check.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41447>
The fix for this test is merged between the start and the merging of the
Vulkan CTS uprev MR.
Remove it from the fails list because it was already fixed.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41571>
f4812dc1 introduces optimizations that turn ior into bcsel. The MSL
compiler will incorrectly compile the shader internally when bcsel is used
leading to incorrect outputs. This commit adds a workaround that tricks
the MSL compiler into correctly compiling the shader internally.
Reviewed-by: squidbus <squidbus@proton.me>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41548>
Metal provides device properties for the recommended maximum memory usage and
the current amount of memory used. These can be used to provide an estimate
of heap usage and calculate a budget of memory usage by the application before
performance may degrade.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41523>
Also, re-title things to make it clear that the current text is about
implementing OpenGL[ES] extensions.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41400>
Unfortunately we have to disable concurrent binning by default
because it hurts performance in a number of desktop games without
any case where we know it helps.
There are less vertex fetch resource available in BV compared to BR,
so when binning runs in BV, there are many vertices, and vertices are
attribute heavy - BV has much worse performance than BR, sometimes more
than 50% worse.
Even with worse performance it won't be bad if concurrent binning
actually overlapped with other workload in those cases, but in case of
desktop games - there is almost never a chance for overlap.
However it's impossible to statically find out if binning on BV would
be much slower than on BR, and we also cannot statically predict if
there is enough overlap (if any) to cover for the performance penalty.
Given the above, I don't see a way out but to make concurrent binning
opt in via `tu_allow_concurrent_binning` driconf toggle.
Still allow concurrent binning in CI to catch issues early.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41394>