radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes

Not the most optimal solution but 64-bit vertex attributes are rarely used. Could still revisit if we find a real use case that matters. This fixes recent VKCTS coverage: dEQP-VK.pipeline.fast_linked_library.vertex_input.component_mismatch.r64g64b64.*_to_dvec2 dEQP-VK.pipeline.shader_object_.*.vertex_input.component_mismatch.r64g64b64.*_to_dvec2 Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14243 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit a0d607bfdb) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
2026-01-06 21:50:11 +01:00 · 2025-11-04 11:34:21 +01:00 · 2025-11-04 11:34:21 +01:00 · e817b525d8
commit e817b525d8
parent 8eec239517
3 changed files with 14 additions and 1 deletions
--- a/.pick_status.json
+++ b/.pick_status.json
@ -1704,7 +1704,7 @@
        "description": "radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes",
        "nominated": true,
        "nomination_type": 1,
-        "resolution": 0,
+        "resolution": 1,
        "main_sha": null,
        "because_sha": null,
        "notes": null
--- a/src/amd/compiler/instruction_selection/aco_select_vs_prolog.cpp
+++ b/src/amd/compiler/instruction_selection/aco_select_vs_prolog.cpp
@ -643,6 +643,14 @@ select_vs_prolog(Program* program, const struct aco_vs_prolog_info* pinfo, ac_sh
      continue_pc = Operand(prolog_input, s2);
   }

+   /* Wait for all pending VMEM loads when the prolog loads large 64-bit
+    * attributes because the vertex shader isn't required to consume all of
+    * them and they might be overwritten. This isn't the most optimal solution
+    * but 64-bit vertex attributes are rarely used.
+    */
+   if (is_last_attr_large)
+      wait_for_vmem_loads(bld);
+
   bld.sop1(aco_opcode::s_setpc_b64, continue_pc);

   program->config->float_mode = program->blocks[0].fp_mode.val;
--- a/src/amd/vulkan/radv_shader_args.c
+++ b/src/amd/vulkan/radv_shader_args.c
@ -191,6 +191,11 @@ declare_vs_input_vgprs(enum amd_gfx_level gfx_level, const struct radv_shader_in
      unsigned num_attributes = util_last_bit(info->vs.input_slot_usage_mask);
      for (unsigned i = 0; i < num_attributes; i++) {
         ac_add_arg(&args->ac, AC_ARG_VGPR, 4, AC_ARG_VALUE, &args->vs_inputs[i]);
+
+         /* The vertex shader isn't required to consume all components that are loaded by the prolog
+          * and it's possible that more VGPRs are written. This specific case is handled at the end
+          * of the prolog which waits for all pending VMEM loads if needed.
+          */
         args->ac.args[args->vs_inputs[i].arg_index].pending_vmem = true;
      }
   }