This extension is part of Vulkan 1.2 core and the feature is already
exposed; we just weren't advertising the extension separately.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41624>
Android CTS CtsGpuProfilingDataTest#testProfilingDataProducersAvailable
intermittently fails with "Render stages reported before their
VkQueueSubmit events". Root cause is in the Perfetto clock correlation:
render-stage timestamps go through intel_device_info_timebase_scale()
while VkQueueSubmit packets use BOOTTIME directly, so any drift in the
scaler shows up as render stages preceding their submits.
intel_device_info_timebase_scale() scales the upper and lower halves
of the raw timestamp separately and recombines them, but silently
drops the upper-half division's remainder. When the frequency doesn't
evenly divide 1e9, every wrap past 2^32 loses a fixed number of ns
and shows up as a step in Perfetto's GPU-vs-BOOTTIME snapshot offset.
Carry the upper-half remainder into the lower-half numerator before
dividing, so no precision is lost. All intermediates still fit in
uint64_t.
Cc: mesa-stable
Signed-off-by: Nemallapudi, Jaikrishna <nemallapudi.jaikrishna@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41630>
This is just something I had to type again and again to figure out
layout bugs, maybe it's better to just write it once and leave it
disabled by default. Right now it's not printing nir_alu_type since the
printing functions are private in nir_print.c and I don't want to expose
that.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41304>
This pass fuses var loads + float conversions into the loads.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41304>
As discussed on gitlab, we should not convert before interpolation
at highp.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41304>
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41304>
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41304>
Since the change to use TXF instead of TXF_LZ, TXF takes sample
or lod on the final component, whereas before with TXF_LZ the
lod was ignored.
Don't pass in a sample if the input texture isn't multisampled.
KHR-GL46.packed_depth_stencil.blit.depth32f_stencil8
on nvk/zink
Fixes: 9440c0e1614a ("gallium/u_blitter: stop emitting TEX_LZ")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41628>
This doesn't fix a known problem, but I spotted it in passing.
If copy_to_staging_dest fails then we end up using the
transformed 1D_ARRAY coords when we want the originals
in the other two paths.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41561>
We were entirely missing the local size in the accounting for CS
invocations.
Cc: mesa-stable
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41553>
We were not flushing sysmem and NVIDIA proprietary driver does.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: e1c1cdbd5f ("nvk: Implement vkCmdPipelineBarrier2 for real")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41055>
Previously, we were not doing anything in case of
VK_QUEUE_FAMILY_FOREIGN_EXT while the proprietary driver would emit
invalidation or flush of sysmem caches.
This patch adds the missing handling while following what the
proprietary driver emit on various generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: e1c1cdbd5f ("nvk: Implement vkCmdPipelineBarrier2 for real")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41055>
Passes the ACCESS_CAN_REORDER flag from NIR on to the backend so that we
can lower the loads to a non-volatile SEND. This allows the scheduler to
freely reorder them around stores or fences.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
Our scheduler is overly conservative about reordering instructions
around memory writes or fences. Fortunately, there are several simple
assumptions we can make about our IR to schedule these things a lot
more fluidly:
* Unless its an EOT, a SEND instruction's side effects will only be
observed through other SEND instructions
* The effects of workgroup barriers, memory fences, and BRW_OPCODE_SYNC,
are only used in the IR to synchronize SEND instructions
* All other scheduler dependencies related to memory access are already
expressed through the source and destination operands
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
`1 << 31` invokes signed shift UB. When the int result is assigned
to uint64_t, sign extension produces 0xFFFFFFFF80000000 (~18 EiB)
instead of the intended 0x80000000 (2 GiB).
Use 1ull << 31 to perform the shift in unsigned 64-bit type.
The 2 GiB value matches the surrounding finite cap values and
OpenCL minimum requirements, making the original intent clear.
Detected by UBSan with piglit.
Fixes: a65b74af51 ("llvmpipe: init shader and compute caps")
Signed-off-by: yserrr <dlwognsdc610@naver.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41464>
Intel uses nir_lower_bit_size to convert 8-bit integer values to 16-bit
for most instructions. By constant folding u2u8 or i2i8 through a bcsel,
this lowering is undone.
Fixes assertion failure in fossils/parallel-rdp/small_subgroup.foz.
fossilize-replay: src/intel/compiler/brw/brw_from_nir.cpp:852: void brw_from_nir_emit_alu(nir_to_brw_state&, nir_alu_instr*, bool): Assertion `brw_type_size_bytes(op[i].type) > 1' failed.
v2: Reject all integer conversions. Suggested by Daniel Schürmann.
Fixes: f4812dc11d ("nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41412>
It seems like davinci resolve conflicts on those symbols and we got
regressions from our static libstdc++ linking workaround.
Cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41488>
nir_intrinsic_index_size() expects a nir_intrinsic_index_flag, not
the position in the intrinsic's index list. This could cause
part of a multi-slot index to be ignored.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41593>
Update the headless Android WSI patch to fix intermittent timeout issues. It
now uses an ImageReader listener to actively drain and instantly release frames
from the buffer queue. This acts as a "null compositor" that prevents buffer
starvation while maintaining stable GPU backpressure.
This fixes dEQP-VK.wsi.android.maintenance1.* in newer VKCTS versions and
resolves the race conditions that caused occasional teardown crashes.
Also rebase build-deqp-gl_Build-Don-t-build-Vulkan-utilities-for-GL-builds.patch
on top of the updated WSI patch.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41541>
Timeline assignment has been asynchronous upon vkGetDeviceQueue2.
However, fence submission via vq can get ahead of it for things like
ring and vq synchronization. This change fixes vkGetDeviceQueue2 to be
synchronous instead, which is fine since it's off the critical path.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41582>
nir_def_init sets divergent = true, this means for something like
reduce(reduce(convergent)) we previously only optimized the inner
reduce.
No fossil changes at the moment, but I hit this when trying to
optimize shared memory to subgroup operations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41542>
This was fundamentally broken for workgroup sizes >= 8x8.
This fixes new VKCTS coverage
dEQP-VK.glsl.texture_functions.texture.*_compute, and also few tests
from the vkd3d-proton testsuite (note that quad derivatives is
currently disabled for < GFX12).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41483>
Otherwise, if a cmdbuf is recycled it would assume that a gang CS is
always is present even if it's not used. That means, it would emit
useless synchronization and use gang submit with a mostly empty gang
CS for nothing.
It seems better to create the gang CS on-demand only when it's strictly
required (for compute fallback with SDMA and task shaders). Even for
heavy uses of task shaders, that shouldn't hurt.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41543>
Some cooperative properties are defined by the driver itself and
are not a property of the HW. In particular whether the scope is
subgroup or workgroup is not directly related to the HW.
It could make sense encode the DPAS combinations into intel_device_info
but we are not using all possible combinations yet and wouldn't be very
useful in practice.
The new scheme was based on radv and will set us up for also filling
the flexible dimensions properties too.
Note: this also fixes a subtle issue where ARL was incorrectly inheriting
the PRE_XEHP configurations which included FLOAT16/FLOAT16/FLOAT16/FLOAT16
which it does not support.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>
Since we don't have any DPAS-based implementation of those, it is odd to
support them in the emulation mode that is only enabled with the debug
flag INTEL_LOWER_DPAS nowadays. Remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41564>