radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes

Not the most optimal solution but 64-bit vertex attributes are rarely
used. Could still revisit if we find a real use case that matters.

This fixes recent VKCTS coverage:

dEQP-VK.pipeline.fast_linked_library.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
dEQP-VK.pipeline.shader_object_.*.vertex_input.component_mismatch.r64g64b64.*_to_dvec2

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14243
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit a0d607bfdb)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38432>
This commit is contained in:
Samuel Pitoiset 2025-11-04 11:34:21 +01:00 committed by Dylan Baker
parent 8eec239517
commit e817b525d8
3 changed files with 14 additions and 1 deletions

View file

@ -1704,7 +1704,7 @@
"description": "radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes",
"nominated": true,
"nomination_type": 1,
"resolution": 0,
"resolution": 1,
"main_sha": null,
"because_sha": null,
"notes": null

View file

@ -643,6 +643,14 @@ select_vs_prolog(Program* program, const struct aco_vs_prolog_info* pinfo, ac_sh
continue_pc = Operand(prolog_input, s2);
}
/* Wait for all pending VMEM loads when the prolog loads large 64-bit
* attributes because the vertex shader isn't required to consume all of
* them and they might be overwritten. This isn't the most optimal solution
* but 64-bit vertex attributes are rarely used.
*/
if (is_last_attr_large)
wait_for_vmem_loads(bld);
bld.sop1(aco_opcode::s_setpc_b64, continue_pc);
program->config->float_mode = program->blocks[0].fp_mode.val;

View file

@ -191,6 +191,11 @@ declare_vs_input_vgprs(enum amd_gfx_level gfx_level, const struct radv_shader_in
unsigned num_attributes = util_last_bit(info->vs.input_slot_usage_mask);
for (unsigned i = 0; i < num_attributes; i++) {
ac_add_arg(&args->ac, AC_ARG_VGPR, 4, AC_ARG_VALUE, &args->vs_inputs[i]);
/* The vertex shader isn't required to consume all components that are loaded by the prolog
* and it's possible that more VGPRs are written. This specific case is handled at the end
* of the prolog which waits for all pending VMEM loads if needed.
*/
args->ac.args[args->vs_inputs[i].arg_index].pending_vmem = true;
}
}