Vulkan 1.4 raises the minimum for maxPushConstantSize to 256, and given
that we intend on supporting 1.4 eventually and the change is very simple
might as well do it now.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35191>
All the Malis in existence out there support at most 64 user-supplied FAUs.
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35191>
"Native" is relative for UBOs. On the one hand, we don't loop in the
shader for non-uniform UBO access ever. On the other hand, uniformity
does affect UBOs on Turing since we can only use bindless UBOs if the
handle (and therefore the loaded descriptor) are uniform. But if it's
non-uniform, we fall back to ld.constant which is pretty fast. On
Volta and earlier where we don't have bindless UBOs, we use ld.uniform
or ld.ci which are just as fast uniform as non. On all hardware,
non-constant UBO indexing prevents cbuf promotion so that's always
slower no matter what.
The moral of the story is that "native" non-uniform for UBOs is a
nonsense anyway and we should just set NonUniformIndexingNative so we
don't scare apps into doing something silly. The proprietary driver
claims native non-uniform UBOs as well.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35268>
The UBO lowering did nothing because nir_lower_non_uniform_access
doesn't handle load_deref. For texture and storage image lowering,
nir_lower_non_uniform_access handles bindless handles just as well as
derefs. For textures, it's probably better this way anyway because we
combine the image and sampler into a single handle in
nvk_nir_lower_descriptors() and this way nir_lower_non_uniform_access()
will generate a loop on a single 32-bit handle instead of multiple array
indices.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35268>
This now includes image_msaa_load and the new atomic instructions in
GFX12.
It also treats point sample accelerated MIMG as either sample or load,
like the waitcnt insertion pass. I'm not sure if that's necessary or not,
though.
No fossil-db changes (gfx1201, gfx1150 and navi31).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
LLVM commit 62dea99a7d7df9daedbb86133f3d46699cd2728d made this instruction
a sample for all GFX levels, then with f898161bfa95723954a273a519180e070a5ccd2e
it was changed to be GFX12+. Now 34b6285735c999d2fab77b0ff8e5b497d86df3af
changed it to be all GFX levels again.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
This try to mitigate the HiZ GPU hang by increasing a timeout. Loosely
based on PAL but I can confirm it delays the hang when
BOTTOM_OF_PIPE_TS is used as a workaround.
This must be emitted when the GFX queue is idle.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35212>
In newer Android versions, SurfaceFlinger uses Vulkan by default,
so `dumpsys SurfaceFlinger` no longer reveals the GLES implementation.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35232>
Streamline the conditions for when `RESET_NOTIFICATION_STRATEGY_EXT` can
be queried to match the conditions when it can be set - notably only
with GLES.
While on it, add support to query the KHR and suffix-less versions.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35242>
They are the only APIs supported these days and, most likely,
going forward.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35242>
When we have partial VGRF MOVs with offsets, we will reach
`channels_remaining == 0` with `inst` that is not writing the whole VGRF.
Currently, even though we check `can_coalesce_vars()` for each offset
separately, it will always check if the dst value is not changed only
for the offset from the instruction that satisfied the
`channels_remaining == 0` condition.
Instead, we should remember and use the correct instruction for each
written offset separately.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10916
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35062>
This matches what CUDA does. This makes Unigine Heaven go about 4x
faster on my GTX 750 Ti when run with NVK_DEBUG=no_cbuf (to force all
UBO loads down the global memory path).
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265>
We just sort of YOLO'd it before, with no real plan. But it passed all
the tests so it never cared. It turns out the cache ops on Maxwell are
mostly the same as the ones we already added to Kepler, we just need to
encode them. The only big difference is that we no longer need to avoid
the L1 cache on Maxwell as it's either coherent or disabled in hardware
for global memory (I don't know which).
The only substantive change this MR makes is that images are now using
.ca by default rather than .cg. However, this is the same choice we're
currently making for global access and it still passes all the memory
model tests so it should be okay.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265>
We never actually create any MemScope::System instructions anymore, but
it's worth handling it now just so we don't forget.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265>
This variable appeared twice in debian-testing-msan, removing the first
one because it is probably being overridden by the second one.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35251>
We're about to change the way that lcssa is constructed, and we won't
be able to conclude that there are no lcssa phis based on this pass'
progress.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34912>