When an exec write isn't used but writes other registers
besides exec, and also reads exec (such as s_and_saveexec),
we would mistakenly delete the previous instruction that
writes the exec value that this instruction uses.
No Fossil DB changes on Rembrandt (GFX10.3).
Fixes: 0211e66f65
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9036
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23576>
In order to prioritize the upcoming pre-merge jobs on VanGogh, we
limit all the post-merge jobs requiring VanGogh runners to the
low-priority pool of runners.
This will allow Marge to use any of the VanGogh runners we have (6),
while leaving 3 of them unused by post-merge jobs inside Mesa, or
DXVK-CI.
Suggested-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23548>
It's unclear why this is needed, but PAL uses RESET_FILTER_CAM
for some mesh shading draw packets:
- DISPATCH_MESH_INDIRECT_MULTI
- DISPATCH_TASKMESH_GFX
Let's do the same in radv.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23554>
Equivalent of 284e604872 but for acceleration structure queries.
If an app inserts a barrier between AS builds and writing AS properties,
we must respect it or things will blow up.
Cc: mesa-stable
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23568>
radeonsi switched over to CU wavefront execution mode, but didn't tell
LLVM. This can lead to shaders requiring too many VGPRs to be executed in
CU mode and so cause GPU resets.
Pass along +cumode to LLVM so it properly spills VGPRs.
Fixes: 9d7eab2ab1 ("radeonsi: don't enable WGP_MODE because of high cost of workgroup mem coherency")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23569>
PIPE_FUNC_ -> COMPARE_FUNC_
pipe_compare_func -> compare_func
Now include "pipe/p_state.h" is not needed and remove it in ac_nir.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23422>
This will be required for OpenCL subgroup support on radeonsi, but also
fixes some regressions today as radeonsi started to use the subgroup id
for invocation_index calculation.
Fixes: 39da12b7c7 ("ac/llvm: clean up visit_load_local_invocation_index and visit_load_subgroup_id")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23551>
Since radeonsi sets the alu_to_scalar callback, frontends like Rusticl
might end up generating vec2 b2i16. Support this just like it's done for
b2f16.
Fixes: d692d433f2 ("radeonsi: use nir_lower_alu_to_scalar correctly")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23551>
amdgpu enables gfxoff by default and the feature resets the RLC clock
counter on idle on raven/raven2. Querying AMDGPU_INFO_TIMESTAMP does
not work as expected on those platforms.
There was an attempt in amdgpu to read from the TSC register instead,
but it did not work without a firmware update[1]. Another possible
solution is to disable the clock counter reset by clearing
AMD_PG_SUPPORT_RLC_SMU_HS, but that causes a 0.2W increase of power
consumption on idle which is undesirable.
The clock counter reset affects vkCmdWriteTimestamp as well. The spec
is vague on whether that is allowed or not. The WG is aware of the
issue[2] but never really addresses it.
[1] https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
[2] https://github.com/KhronosGroup/Vulkan-Docs/issues/216
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23481>
nir_opt_uniform_atomics can create 64-bit ALU instructions which need to
be lowered.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23502>
This reduce LDS bank conflict and align with radeonsi,
so we don't assume LDS access 16 byte aligned for both
driver.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23314>
This reduce LDS bank conflict and align with radeonsi,
so we don't assume LDS access 16 byte aligned for both
driver.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23314>
This is done to better match the terminology used by the kernel
and also because the follower may not always be ACE in the future.
- Gang: a group of command streams that are submitted to
more than one HW queue at the same time.
- Leader: the main command stream of a command buffer that works
on the queue type of the command buffer.
- Follower: a command stream on a different HW queue that doesn't
have a separate command buffer state and is submitted together
with its leader.
During submission, a follower must always precede the leader in
the submitted command streams array.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23462>
The firmware can write an 8-bit output buffer, but still needs
a 10-bit dpb allocation.
This also puts the 8-bit format after the 10-bit format though
apps should be smart enough to pick the correct one.
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23476>
With this patch, we compile separately
- general shaders (raygen, miss, callable)
- closest-hit shaders
- traversal shader (incl. all intersection / any-hit shaders)
Each shader uses the following scheme:
if (shader_pc == shader_va) {
<shader code>
}
next = select_next_shader(shader_va)
jump next
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22096>