For cases when less than 4 components are read, the original code
would compute an incorrect dmask. eg: with a single component + is_sparse,
the dmask was 0x13:
- 0x 3 = coming from nir_def_components_read
- 0x10 = the sparse bit
While it should have at 2 bits set (1 for the color/depth, 1 for tfe).
This caused problem when expand_vector() used the dmask to generate
the final results, because the value for the sparse component was
read from the wrong index.
So after the call to emit_mimg() dmask needs to be adjusted
because the components will be stored in order, so if mask is 0x11
the tfe value would be stored at invalid index=5 (while it should
be at index=1).
This fixes KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupResidency_texture_2d_depth_component16
and KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup_texture_2d_depth_component16
with ACO.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35206>
The ISA docs don't mention this, but instead of always truncating
like other integer conversions, this opcode actually uses the single
precision rounding mode.
We could continue to use the opcode and set the rounding mode to rtz
in lower_to_hw_instrs, but I think I should just concede that f2u8
isn't worth the effort.
Fixes: 9bb10b58 ("aco: use v_cvt_pk_u8_f32 for f2u8")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
This just removes the if/endif wrapping for LLVM, and hopefully the ACO
change does the same thing.
ACO had redundant code in endif_merged_wave_info, which is removed here.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>