Loosely based on RadeonSI.
This is supposed to be faster because parsing the packet header seems
to be the main bottleneck on GFX12.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35282>
Scalar block layout doesn't allow anything that our memory load/store vectorizer
couldn't create on its own. So I assume whatever reason there was to only
expose this feature on GFX7+ was incorrect or ended up being fixed.
Passes vkcts in CI on tahiti.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35279>
Looks like it's actually also affected by the memory explosion caused
by zerovram alloc by default in AMDGPU. Though it's very random,
sometimes the job will finish in 40 minutes, sometimes it needs more
than 1h15m. Let's bump the timeout because it's a post-merge job.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35157>
"shift exponent 1020 is too large for 32-bit type 'unsigned int'" with
madmax/25b8180e05220b8c and UBSan
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
Change how we package kernel modules: instead of storing them in
.tar.zst archives with uncompressed .ko files inside, we now compress
each .ko file individually with ZSTD and bundle them into a plain tar
archive.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35129>
This mirrors AMDVLK. 128-byte alignment is possible, but DOOM: The Dark
Ages screws up scratch allocation with alignments <256 bytes.
Fixes hangs in DOOM: The Dark Ages.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35152>
On Navi33, certain box sorting modes combined with infinity/-infinity in
the child AABBs cause image_bvh64_intersect_ray to return garbage node
pointers.
To avoid this, convert infinity to the maximum representable
floating-point value, which will still intersect with any non-inf ray.
Fixes consistent hangs in DOOM: The Dark Ages.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35254>
This now includes image_msaa_load and the new atomic instructions in
GFX12.
It also treats point sample accelerated MIMG as either sample or load,
like the waitcnt insertion pass. I'm not sure if that's necessary or not,
though.
No fossil-db changes (gfx1201, gfx1150 and navi31).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
LLVM commit 62dea99a7d7df9daedbb86133f3d46699cd2728d made this instruction
a sample for all GFX levels, then with f898161bfa95723954a273a519180e070a5ccd2e
it was changed to be GFX12+. Now 34b6285735c999d2fab77b0ff8e5b497d86df3af
changed it to be all GFX levels again.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
This try to mitigate the HiZ GPU hang by increasing a timeout. Loosely
based on PAL but I can confirm it delays the hang when
BOTTOM_OF_PIPE_TS is used as a workaround.
This must be emitted when the GFX queue is idle.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35212>
With older FW this needs to be always enabled, but it can now be
disabled when using the new separate header instructions for
dependent_slice_segment_flag and slice_segment_address.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35072>
Vector-aligned operands are handled by temporarily allocating
a vector-SSA value for the duration of the instruction.
On completion of the register assignment, the individual
operands are assigned to the reserved register space and,
if necessary, parallelcopies are emitted.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
The sparse image VA needs to be returned to the application for replay.
Reported by Baldur.
VKCTS has coverage but it doesn't verify this yet.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35162>
register shadow enabling for user queue is different code flow than
kernel queue. In case of kernel queue preamble ib is initialized which
is not requried for kernel queue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34803>
valid_child_count_minus_one is 15 for box nodes without child so every
child was considered valid which made the code read invalid data and use
that for addressing.
Fixes: 2d48b2c ("radv: Use subgroup OPs for BVH updates on GFX12")
Closes: #13217
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35119>