Since radeonsi sets the alu_to_scalar callback, frontends like Rusticl
might end up generating vec2 b2i16. Support this just like it's done for
b2f16.
Fixes: d692d433f2 ("radeonsi: use nir_lower_alu_to_scalar correctly")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23551>
amdgpu enables gfxoff by default and the feature resets the RLC clock
counter on idle on raven/raven2. Querying AMDGPU_INFO_TIMESTAMP does
not work as expected on those platforms.
There was an attempt in amdgpu to read from the TSC register instead,
but it did not work without a firmware update[1]. Another possible
solution is to disable the clock counter reset by clearing
AMD_PG_SUPPORT_RLC_SMU_HS, but that causes a 0.2W increase of power
consumption on idle which is undesirable.
The clock counter reset affects vkCmdWriteTimestamp as well. The spec
is vague on whether that is allowed or not. The WG is aware of the
issue[2] but never really addresses it.
[1] https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
[2] https://github.com/KhronosGroup/Vulkan-Docs/issues/216
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23481>
nir_opt_uniform_atomics can create 64-bit ALU instructions which need to
be lowered.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23502>
This reduce LDS bank conflict and align with radeonsi,
so we don't assume LDS access 16 byte aligned for both
driver.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23314>
This reduce LDS bank conflict and align with radeonsi,
so we don't assume LDS access 16 byte aligned for both
driver.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23314>
This is done to better match the terminology used by the kernel
and also because the follower may not always be ACE in the future.
- Gang: a group of command streams that are submitted to
more than one HW queue at the same time.
- Leader: the main command stream of a command buffer that works
on the queue type of the command buffer.
- Follower: a command stream on a different HW queue that doesn't
have a separate command buffer state and is submitted together
with its leader.
During submission, a follower must always precede the leader in
the submitted command streams array.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23462>
The firmware can write an 8-bit output buffer, but still needs
a 10-bit dpb allocation.
This also puts the 8-bit format after the 10-bit format though
apps should be smart enough to pick the correct one.
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23476>
With this patch, we compile separately
- general shaders (raygen, miss, callable)
- closest-hit shaders
- traversal shader (incl. all intersection / any-hit shaders)
Each shader uses the following scheme:
if (shader_pc == shader_va) {
<shader code>
}
next = select_next_shader(shader_va)
jump next
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22096>
We can remove the LLVM 13 Wave32 discard workaround and
SI_PROFILE_IGNORE_LLVM13_DISCARD_BUG that disabled the workaround.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23471>
The demote emulation can be removed, and FS_CORRECT_DERIVS_AFTER_KILL
can be removed because it's always enabled on LLVM >= 13.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23471>