radeonsi: Lower larger workgroups to 256 for CS regalloc bug

Even though radeonsi may not use compute queues, other processes
might run compute jobs in the background, so radeonsi must make
sure not to use	larger than 256	sized workgroups on GPUs that
are affected by	the regalloc hang.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
This commit is contained in:
Timur Kristóf 2025-10-21 17:28:43 +02:00 committed by Marge Bot
parent d31b4451f2
commit 0961aba8a7

View file

@ -670,6 +670,22 @@ static void si_preprocess_nir(struct si_nir_shader_ctx *ctx)
}
if (mesa_shader_stage_is_compute(nir->info.stage)) {
if (sel->screen->info.has_cs_regalloc_hang_bug) {
const uint32_t wg_size = nir->info.workgroup_size[0] *
nir->info.workgroup_size[1] *
nir->info.workgroup_size[2];
if (wg_size > 256) {
si_nir_opts(sel->screen, nir, true);
NIR_PASS(progress, nir, nir_lower_workgroup_size, 256);
if (progress)
si_nir_opts(sel->screen, nir, true);
nir_shader_gather_info(nir, nir_shader_get_entrypoint(nir));
}
}
/* gl_LocalInvocationIndex must be derived from gl_LocalInvocationID.xyz to make it correct
* with quad derivatives. Using gl_SubgroupID for that (which is what we do by default) is
* incorrect with a non-linear thread order.