"Native" is relative for UBOs. On the one hand, we don't loop in the
shader for non-uniform UBO access ever. On the other hand, uniformity
does affect UBOs on Turing since we can only use bindless UBOs if the
handle (and therefore the loaded descriptor) are uniform. But if it's
non-uniform, we fall back to ld.constant which is pretty fast. On
Volta and earlier where we don't have bindless UBOs, we use ld.uniform
or ld.ci which are just as fast uniform as non. On all hardware,
non-constant UBO indexing prevents cbuf promotion so that's always
slower no matter what.
The moral of the story is that "native" non-uniform for UBOs is a
nonsense anyway and we should just set NonUniformIndexingNative so we
don't scare apps into doing something silly. The proprietary driver
claims native non-uniform UBOs as well.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35268>
The UBO lowering did nothing because nir_lower_non_uniform_access
doesn't handle load_deref. For texture and storage image lowering,
nir_lower_non_uniform_access handles bindless handles just as well as
derefs. For textures, it's probably better this way anyway because we
combine the image and sampler into a single handle in
nvk_nir_lower_descriptors() and this way nir_lower_non_uniform_access()
will generate a loop on a single 32-bit handle instead of multiple array
indices.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35268>
This now includes image_msaa_load and the new atomic instructions in
GFX12.
It also treats point sample accelerated MIMG as either sample or load,
like the waitcnt insertion pass. I'm not sure if that's necessary or not,
though.
No fossil-db changes (gfx1201, gfx1150 and navi31).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
LLVM commit 62dea99a7d7df9daedbb86133f3d46699cd2728d made this instruction
a sample for all GFX levels, then with f898161bfa95723954a273a519180e070a5ccd2e
it was changed to be GFX12+. Now 34b6285735c999d2fab77b0ff8e5b497d86df3af
changed it to be all GFX levels again.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
This try to mitigate the HiZ GPU hang by increasing a timeout. Loosely
based on PAL but I can confirm it delays the hang when
BOTTOM_OF_PIPE_TS is used as a workaround.
This must be emitted when the GFX queue is idle.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35212>
Streamline the conditions for when `RESET_NOTIFICATION_STRATEGY_EXT` can
be queried to match the conditions when it can be set - notably only
with GLES.
While on it, add support to query the KHR and suffix-less versions.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35242>
They are the only APIs supported these days and, most likely,
going forward.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35242>
When we have partial VGRF MOVs with offsets, we will reach
`channels_remaining == 0` with `inst` that is not writing the whole VGRF.
Currently, even though we check `can_coalesce_vars()` for each offset
separately, it will always check if the dst value is not changed only
for the offset from the instruction that satisfied the
`channels_remaining == 0` condition.
Instead, we should remember and use the correct instruction for each
written offset separately.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10916
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35062>
This matches what CUDA does. This makes Unigine Heaven go about 4x
faster on my GTX 750 Ti when run with NVK_DEBUG=no_cbuf (to force all
UBO loads down the global memory path).
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265>
We just sort of YOLO'd it before, with no real plan. But it passed all
the tests so it never cared. It turns out the cache ops on Maxwell are
mostly the same as the ones we already added to Kepler, we just need to
encode them. The only big difference is that we no longer need to avoid
the L1 cache on Maxwell as it's either coherent or disabled in hardware
for global memory (I don't know which).
The only substantive change this MR makes is that images are now using
.ca by default rather than .cg. However, this is the same choice we're
currently making for global access and it still passes all the memory
model tests so it should be okay.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265>
We never actually create any MemScope::System instructions anymore, but
it's worth handling it now just so we don't forget.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35265>
We're about to change the way that lcssa is constructed, and we won't
be able to conclude that there are no lcssa phis based on this pass'
progress.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34912>
Create svga_framebuffer_state as a subclass of pipe_framebuffer_state.
This contains pointers to svga_surface objects which correspond to
pipe_framebuffer_state's surfaces.
Replace pipe_surface with svga_surface in many functions.
Stop using deprecated util_framebuffer_init() function.
See https://gitlab.freedesktop.org/mesa/mesa/-/issues/13262
Signed-off-by: Brian Paul <brian.paul@broadcom.com>x
Reviewed-by: Neha Bhende <neha.bhende@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35238>
The util_copy_framebuffer_state() function copies the width, height,
nr_cbufs fields.
Also move a loop variable.
Signed-off-by: Brian Paul <brian.paul@broadcom.com>
Reviewed-by: Neha Bhende <neha.bhende@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35238>
Applications using Mesa built with LLVM 20.1.4 fail to start with
strange segmentfaults/bus errors when radeonsi driver is used. The last
piece of stacktrace looks like
- pipe_reference_described
- pipe_reference
- radeon_bo_reference
- radeon_ws_bo_reference
- radeon_lookup_or_add_real_buffer
Coredump shows the pointer dst passed to pipe_reference_described() is
either unaligned or even invalid, which is the reason of crashing. The
crash goes away when Mesa is built without optimization.
Looking through the related functions, it's found that
radeon_ws_bo_reference() contains unsafe type cast from radeon_bo to
pb_buffer_lean: though the former's first field is just the later, this
violates strict aliasing rules as pb_buffer_lean isn't compatible with
radeon_bo. Such violation ultimately results in miscompilation.
Let's take the address of pb_buffer_lean field, avoiding the unsafe
cast. It's still required to cast pb_buffer_lean back to radeon_bo since
radeon_bo_reference may update the pointer, which is safe as radeon_bo
contains a pb_buffer_lean member and C language permits access members
through a pointer in type of the container.
Fixes: 6d913a2bcc ("r300,r600,radeonsi: switch to pb_buffer_lean")
Link: https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Aliasing-Type-Rules.html
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35249>
With older FW this needs to be always enabled, but it can now be
disabled when using the new separate header instructions for
dependent_slice_segment_flag and slice_segment_address.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35072>