asahi: implement KHR_shader_subgroup

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36649>
This commit is contained in:
Alyssa Rosenzweig 2025-08-07 13:42:37 -04:00 committed by Marge Bot
parent 70e3234570
commit 2610d2afaf
3 changed files with 15 additions and 6 deletions

View file

@ -342,7 +342,7 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_EXT_texture_view DONE (all drivers that support GL_OES_texture_view)
GL_KHR_blend_equation_advanced_coherent DONE (freedreno/a6xx, panfrost, zink, asahi, iris/gen9+, v3d)
GL_KHR_robust_buffer_access_behavior DONE (panfrost)
GL_KHR_shader_subgroup DONE (radeonsi, zink)
GL_KHR_shader_subgroup DONE (radeonsi, zink, asahi)
GL_KHR_texture_compression_astc_hdr DONE (panfrost, asahi)
GL_KHR_texture_compression_astc_sliced_3d DONE (freedreno/a4xx+, r600, radeonsi, panfrost, softpipe, v3d, zink, lima, asahi, iris/gen9+)
GL_OES_depth_texture_cube_map DONE (all drivers that support GLSL 1.30+)
@ -806,7 +806,7 @@ Rusticl Optional OpenCL 2.x Features:
Device and host timer synchronization DONE (freedreno, iris, llvmpipe, radeonsi, zink)
OpenCL C 2.0 in progress
- Memory Consistency Model (atomics) not started
- Sub-groups DONE (iris, llvmpipe, radeonsi)
- Sub-groups DONE (iris, llvmpipe, radeonsi, asahi)
- Work-group Collective Functions not started
- Generic Address Space in progress
cl_khr_il_program DONE
@ -869,8 +869,8 @@ Rusticl extensions:
cl_khr_subgroup_non_uniform_arithmetic not started
cl_khr_subgroup_non_uniform_vote not started
cl_khr_subgroup_rotate not started
cl_khr_subgroup_shuffle DONE (iris, llvmpipe, radeonsi)
cl_khr_subgroup_shuffle_relative DONE (iris, llvmpipe, radeonsi)
cl_khr_subgroup_shuffle DONE (iris, llvmpipe, radeonsi, asahi)
cl_khr_subgroup_shuffle_relative DONE (iris, llvmpipe, radeonsi, asahi)
cl_khr_subgroups in progress
cl_khr_suggested_local_work_size DONE
cl_khr_terminate_context not started

View file

@ -406,4 +406,7 @@ static const nir_shader_compiler_options agx_nir_options = {
.discard_is_demote = true,
.scalarize_ddx = true,
.io_options = nir_io_always_interpolate_convergent_fs_inputs,
.subgroup_size = 32,
.ballot_bit_size = 32,
.ballot_components = 1,
};

View file

@ -1955,8 +1955,8 @@ agx_init_compute_caps(struct pipe_screen *pscreen)
caps->max_compute_units = agx_get_num_cores(dev);
caps->subgroup_sizes = 32;
caps->max_variable_threads_per_block = 1024; // TODO
caps->max_variable_threads_per_block = 1024;
caps->max_subgroups = caps->max_variable_threads_per_block / 32;
}
static void
@ -2000,6 +2000,12 @@ agx_init_screen_caps(struct pipe_screen *pscreen)
/* Timer resolution is the length of a single tick in nanos */
caps->timer_resolution = agx_gpu_timestamp_to_ns(agx_device(pscreen), 1);
caps->shader_subgroup_size = 32;
caps->shader_subgroup_supported_stages = BITFIELD_MASK(MESA_SHADER_STAGES);
caps->shader_subgroup_supported_features =
BITFIELD_MASK(PIPE_SHADER_SUBGROUP_NUM_FEATURES);
caps->shader_subgroup_quad_all_stages = true;
caps->sampler_view_target = true;
caps->texture_swizzle = true;
caps->blend_equation_separate = true;