nir/lower_int64: Fix [iu]mul_high handling

e551040c60, which added a new mechanism for 64-bit imul which is more efficient on BDW and later Intel hardware also introduced a bug where we weren't properly walking both X and Y. No idea how testing didn't find this. Fixes: e551040c60 ("nir/glsl: Add another way of doing lower_imul64 for gen8+" Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6306 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>
2025-12-25 19:30:11 +01:00 · 2022-04-08 15:06:11 -05:00 · 2022-04-08 15:06:11 -05:00 · d0ace28790
commit d0ace28790
parent 48ae404b42
1 changed files with 2 additions and 2 deletions
--- a/src/compiler/nir/nir_lower_int64.c
+++ b/src/compiler/nir/nir_lower_int64.c
@ -455,7 +455,7 @@ lower_mul_high64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y,
   for (unsigned i = 0; i < 4; i++) {
      nir_ssa_def *carry = NULL;
      for (unsigned j = 0; j < 4; j++) {
-         /* The maximum values of x32[i] and y32[i] are UINT32_MAX so the
+         /* The maximum values of x32[i] and y32[j] are UINT32_MAX so the
          * maximum value of tmp is UINT32_MAX * UINT32_MAX.  The maximum
          * value that will fit in tmp is
          *
@ -466,7 +466,7 @@ lower_mul_high64(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y,
          * so we're guaranteed that we can add in two more 32-bit values
          * without overflowing tmp.
          */
-         nir_ssa_def *tmp = nir_umul_2x32_64(b, x32[i], y32[i]);
+         nir_ssa_def *tmp = nir_umul_2x32_64(b, x32[i], y32[j]);

         if (res[i + j])
            tmp = nir_iadd(b, tmp, nir_u2u64(b, res[i + j]));