radeonsi: Lower larger workgroups to 256 for CS regalloc bug

Even though radeonsi may not use compute queues, other processes might run compute jobs in the background, so radeonsi must make sure not to use larger than 256 sized workgroups on GPUs that are affected by the regalloc hang. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
2026-05-06 09:28:07 +02:00 · 2025-10-21 17:28:43 +02:00 · 2025-10-21 17:28:43 +02:00 · 0961aba8a7
commit 0961aba8a7
parent d31b4451f2
1 changed files with 16 additions and 0 deletions
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@ -670,6 +670,22 @@ static void si_preprocess_nir(struct si_nir_shader_ctx *ctx)
   }

   if (mesa_shader_stage_is_compute(nir->info.stage)) {
+      if (sel->screen->info.has_cs_regalloc_hang_bug) {
+         const uint32_t wg_size = nir->info.workgroup_size[0] *
+                                  nir->info.workgroup_size[1] *
+                                  nir->info.workgroup_size[2];
+
+         if (wg_size > 256) {
+            si_nir_opts(sel->screen, nir, true);
+            NIR_PASS(progress, nir, nir_lower_workgroup_size, 256);
+
+            if (progress)
+               si_nir_opts(sel->screen, nir, true);
+
+            nir_shader_gather_info(nir, nir_shader_get_entrypoint(nir));
+         }
+      }
+
      /* gl_LocalInvocationIndex must be derived from gl_LocalInvocationID.xyz to make it correct
       * with quad derivatives. Using gl_SubgroupID for that (which is what we do by default) is
       * incorrect with a non-linear thread order.