Since 31af989270 ("nir/lower_continue_constructs: Simplify loops before
lowering continue constructs"), we never ingest loops with continues. That lets
us delete a bunch of now dead code (and outdated comments) around control flow.
This patch is part of the treewide effort to improve loops in NIR. I already
sent the Intel patch earlier this week and this weekend hit delete here too.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40609
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40690>
struct nu_handle is hashed and deduplicated using struct nu_handle_key, which ignored
parent_deref. That means all instructions will use the first parent_deref when rewriting
the sources.
Avoid this by not including the parent deref in the struct, and instead querying it
when needed.
Fixes: 4d09cd7fa5 ("nir/lower_non_uniform_access: Group accesses using the same resource")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15173
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40654>
Initially I'd added the offset to make things match up to blob driver on
x2-85/a840. But this gets in the way on parts with smaller GMEM.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40655>
It didn't save states properly. The only correct place to save them is
si_blitter_begin. Unfortunately, we can't skip saving and restoring
those states because we don't know in advance whether the rectangle path
will be used.
Cc: mesa-stable
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
This should be faster because 2 triangles are inefficient on the diagonal,
generating helper invocations and potentially extra memory loads from dst
because tiles aren't fully covered.
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40633>
The most egregious case was AS updates, in which case radv_copy_memory
would decide to use compute, which overwrites the bound pipeline with
a copy shader. Subsequent dispatches assumed the update pipeline to be
bound, but dispatched another copy shader instead.
There is also a chance of this happening for geometry info copying for
RRA, so add another pass for that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39985>
The dirty state of stencil ops is not checked when deciding whether to
rebuild the ISP state, although the values are part of the ISP state
(the 27:16 bits of ISPB word).
Add MESA_VK_DYNAMIC_DS_STENCIL_OP to the condition for rebuilding ISP
control registers.
Fixes GLCTS tests when running on top of Zink:
dEQP-GLES2.functional.fragment_ops.stencil.zero_stencil_fail
Fixes: 88f1fad3f7 ("pvr: Use common pipeline & dynamic state frameworks")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40623>
When running GLES2 conformance tests with Zink on the PowerVR driver, I
found that the PowerVR driver has the same kind of weird behavior of not
ignoreing wrap mode for seamless cubes with Apple AGX (See !21978 for
the description of the quirk on AGX).
As GLES2 exposes non-seamless cubes, exposing non-seamless cube support
at PowerVR help seems to help lot about these GLES2 tests. Implementing
full GLES 3 and relying on the workaround for AGX is another choice, but
it's still too far.
Implementing non-seamless cube seems to be as easy as setting a bit in
the sampler control word, so do it.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40660>
Use KGSL_CONTEXT_NO_FAULT_TOLERANCE to push context into an error state
when a GPU fault is detected. This is useful when dealing with replays of
captures that are producing a GPU fault but might seem to replay just fine
because of the KGSL kernel fault tolerance.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40667>
This allows us to avoid dirtying all of the state for user compute
dispatches when we run a precomp shader.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37970>
The VK_KHR_present_wait extension contains no functionality to announce
(the lack of) support for vkWaitForPresentKHR() on a WSI (or WSI-bound
object) granularity.
On any driver advertising that extension and the headless WSI, the
application will expect vkWaitForPresentKHR() to be usable with the
headless WSI, which leads to assertion failure in debug Mesa builds or
crash in release builds.
Create a trivial wait_for_present implementation for the headless WSI,
which just assumes the image is immediately presented at the time of
queue_present is called, so it only checks the WSI present semaphore.
Tested with `dEQP-VK.wsi.headless.present_id_wait.wait.*` on RADV
without any failures.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40347>
Currently the wsi_headless_surface_create_swapchain() function abuses
the corresponding destroy function to perform cleanup operations when
any failure happens during images creation. This practice sounds
fragile and prevents further changes to the swapchain creation
procedure.
Implement a proper cleanup sequence to reverse all operations.
As another cleanup codepath above already contains call of vk_free(),
the call is changed to a goto targetting the corresponding label.
Regression tested with `dEQP-VK.wsi.headless.swapchain.simulate_oom.*`
on RADV.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40347>
Just a bit cleaner, and we can unify point size too.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
This could've saved me a lot of time debugging stack corruption.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
The state uploader was hardcoded to 4096 bytes, which doesn't fill the
full page on systems with 16KB pages. Use devinfo->page_size instead so
the uploader default matches the actual allocation granularity.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
The variable doesn't store a granularity specific to CLE buffers. It
stores the granularity that the OS imposes on buffer allocations (that
is, the OS page size). Therefore, rename the variable to best reflect
its meaning.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
Previously, each sampler view allocated a dedicated BO for its,
TEXTURE_SHADER_STATE packet (~24 bytes), which got rounded up to a
full 4KB page. This wastes memory and inflates the per-job BO handle
count.
Use u_upload_alloc_ref() to sub-allocate texture shader state from the
shared state_uploader, matching the pattern already used by image views.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
From the documentation, the state uploader should be used inside the
driver for long-term state inside buffers, while the stream uploader
should be used by Gallium's internals. Considering that the image view
texture shader state can be considered long-lived state data, use
`state_uploader` instead of `uploader` for consistency.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
Searching device->queues only according to queueIndex and queueFamilyIndex
could cause this issue: if there are two queues A and B created with same
queueIndex and queueFamilyIndex but different flags. When user try to get
B but vk_foreach_queue loop return A when it get A and find it have the
request queueIndex and queueFamilyIndex.
So this add a check of queue flags and return the queue with matching
flags, queueIndex and queueFamilyIndex.
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>
Because there is no way to know where the address has been allocated
(GTT or VRAM), the existing entrypoints aren't dropped and the sparse
bit is derived from VK_ADDRESS_COMMAND_FULLY_BOUND_BIT_KHR.
It would be nice to figure out if the CP DMA vs compute heuristic for
GTT BOs on dGPUs could be removed to simplify this implementation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
This doubles vkoverhead's draw_16vattrib_change_dynamic performance.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40603>
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.
Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.
Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.
Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
PyTorch Conv2d without explicit bias produces a NULL bias_tensor
in the Gallium pipe_ml_operation. Guard against NULL dereferences
in two places:
- ethosu_lower.c: pass NULL to fill_coefs when bias_tensor is NULL
- ethosu_coefs.c: treat missing biases as zero
Fixes crashes when running Conv2d models without bias through the
Ethos-U NPU backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Add ethosu_ml_subgraph_deserialize() which reconstructs a subgraph
from a serialized byte buffer. Parses the header (cmdstream size,
coefs size, io size, tensors size), restores the tensor array,
cmdstream, and coefficient buffers.
DRM buffer object creation is deferred to prepare_for_submission()
which is called lazily on first invoke.
Wire pctx->ml_subgraph_deserialize in ethosu_create_context().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>