mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-21 17:38:08 +02:00
We don't have v_permlane64_b32 yet, but we can still optimize it using shared vgprs. Using the DPP16 row mask, we can even avoid writing exec. With v0 input/output and v24/v25 as shared vgprs, this results in: v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390> |
||
|---|---|---|
| .. | ||
| aco_instruction_selection.h | ||
| aco_isel_cfg.cpp | ||
| aco_isel_helpers.cpp | ||
| aco_isel_setup.cpp | ||
| aco_select_nir.cpp | ||
| aco_select_nir_alu.cpp | ||
| aco_select_nir_intrinsics.cpp | ||
| aco_select_ps_epilog.cpp | ||
| aco_select_ps_prolog.cpp | ||
| aco_select_rt_prolog.cpp | ||
| aco_select_trap_handler.cpp | ||
| aco_select_vs_prolog.cpp | ||