mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 22:48:05 +02:00

History

Rhys Perry ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f` sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f` sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f` There are two kinds of "parent" in relation to a src/def: - the instruction where the def or src's def is defined - the instruction which the src is a part of and where the def is used Clarify that the parent here is where the src's def is used, not where it's defined. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>		2026-05-06 17:09:22 +00:00
..
test	treewide: use UTIL_DYNARRAY_INIT	2025-11-04 13:39:48 +00:00
agx_builder.h.py	agx: use util_lut2	2025-09-16 21:48:37 +00:00
agx_compile.c	agx: drop NIR continue handling	2026-03-29 14:06:14 +00:00
agx_compile.h	nir: replace lower_ldexp with has_ldexp	2026-03-20 08:15:08 +00:00
agx_compiler.h	agx: drop NIR continue handling	2026-03-29 14:06:14 +00:00
agx_dce.c	treewide: use BITSET_CALLOC	2025-10-09 12:29:55 +00:00
agx_debug.h
agx_insert_waits.c	treewide: use BITSET_*_COUNT	2025-12-16 17:42:10 +00:00
agx_ir.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
agx_liveness.c	agx: use sparse live-sets	2025-11-06 21:34:33 +00:00
agx_lower_64bit.c
agx_lower_divergent_shuffle.c
agx_lower_parallel_copy.c
agx_lower_pseudo.c	agx: use util_lut2	2025-09-16 21:48:37 +00:00
agx_lower_spill.c
agx_lower_uniform_sources.c	agx: optimize imgwblk uniform	2025-07-21 11:42:20 +00:00
agx_nir.h
agx_nir_algebraic.py	asahi/compiler: remove unpack_half support	2026-02-06 06:12:36 +00:00
agx_nir_lower_address.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
agx_nir_lower_cull_distance.c	treewide: Switch to nir_progress	2025-02-26 15:19:53 +00:00
agx_nir_lower_discard_zs_emit.c	asahi/lib: Move alpha_to_one and alpha_to_coverage lowering to common code.	2025-04-23 09:03:41 +00:00
agx_nir_lower_fminmax.c	nir: move exact bit to nir_fp_math_control	2026-01-07 09:40:57 +00:00
agx_nir_lower_frag_sidefx.c	nir: Split nir_load_frag_coord_zw to separate z/w intrinsics.	2025-06-18 23:11:36 +00:00
agx_nir_lower_interpolation.c
agx_nir_lower_sample_mask.c	agx: Fix alpha-to-coverage bit size	2026-03-30 08:19:57 +00:00
agx_nir_lower_shared_bitsize.c	agx: use nir_is_shared_access	2026-01-09 20:51:12 +00:00
agx_nir_lower_subgroups.c	nir: rename nir_src_parent_instr to nir_src_use_instr	2026-05-06 17:09:22 +00:00
agx_nir_lower_texture.c	treewide: use nir_def_as_*	2025-08-01 15:34:24 +00:00
agx_nir_opt_preamble.c	nir: rename nir_src_parent_instr to nir_src_use_instr	2026-05-06 17:09:22 +00:00
agx_nir_texture.h
agx_opcodes.c.py
agx_opcodes.h.py	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
agx_opcodes.py	agx: plumb texture state store instruction	2025-07-10 14:55:17 -04:00
agx_opt_break_if.c
agx_opt_compact_constants.c
agx_opt_cse.c
agx_opt_empty_else.c
agx_opt_jmp_none.c
agx_opt_promote_constants.c	treewide: use BITSET_BYTES, BITSET_RZALLOC	2025-11-05 18:44:23 +00:00
agx_opt_register_cache.c	treewide: use BITSET_*_COUNT	2025-12-16 17:42:10 +00:00
agx_optimizer.c	treewide: use BITSET_CALLOC	2025-10-09 12:29:55 +00:00
agx_pack.c	util/dynarray: infer type in append	2025-10-24 18:32:07 +00:00
agx_performance.c	agx: plumb is_alu query for reg cache opt	2025-08-03 14:40:54 -04:00
agx_pressure_schedule.c	agx: use sparse live-sets	2025-11-06 21:34:33 +00:00
agx_print.c	agx: use util_is_probably_float	2026-02-23 18:23:41 +00:00
agx_register_allocate.c	treewide: use BITSET_*_COUNT	2025-12-16 17:42:10 +00:00
agx_reindex_ssa.c
agx_repair_ssa.c	agx: fix SSA repair with phis with constants	2026-01-16 09:45:40 +00:00
agx_spill.c	agx: use sparse live-sets	2025-11-06 21:34:33 +00:00
agx_validate.c	treewide: use BITSET_CALLOC	2025-10-09 12:29:55 +00:00
agx_validate_ra.c	asahi: fix some copyright headers	2026-02-23 20:04:12 +00:00
meson.build	agx: set register cache hints	2025-08-03 14:40:54 -04:00
README.md	agx: plumb vertex_id_zero_base	2025-04-23 16:20:59 +00:00

README.md

Special registers

r0l is the hardware nesting counter.

r1 is the hardware link register.

r5 and r6 are preloaded in vertex shaders to the vertex ID and instance ID.

ABI

The following section describes the ABI used by non-monolithic programs.

Vertex

Registers have the following layout at the beginning of the vertex shader (written by the vertex prolog):

r0-r3 and r7 undefined. This avoids preloading into the nesting counter or having unaligned values. The prolog is free to use these registers as temporaries.
r4 is the zero-based vertex ID if the vertex shader is running as a hardware compute shader, useful to avoid a redundant special register read in the main shader. Undefined in hardware vertex shaders.
r5-r6 retain their usual meanings, even if the vertex shader is running as a hardware compute shader. This allows software index fetch code to run in the prolog without contaminating the main shader key.
r8 onwards contains 128-bit uniform vectors for each attribute. Accommodates 30 attributes without spilling, exceeding the 16 attribute API minimum. For 32 attributes, we will need to use function calls or the stack.

One useful property is that the GPR usage of the combined program is equal to the GPR usage of the main shader. The prolog cannot write higher registers than read by the main shader.

Vertex prologs do not have any uniform registers allocated for preamble optimization or constant promotion, as this adds complexity without any legitimate use case.

For a vertex shader reading n attributes, the following layout is used:

The first n 64-bit uniforms are the base addresses of each attribute.
The next n 32-bit uniforms are the associated clamps (sizes). Presently robustness is always used.
The next 2x32-bit uniform is the base vertex and base instance. This must always be reserved because it is unknown at vertex shader compile-time whether any attribute will use instancing. Reserving also the base vertex allows us to push both conveniently with a single USC Uniform word.
The next 16-bit is the draw ID.
For a hardware compute shader, the next 48-bit is padding.
For a hardware compute shader, the next 64-bit uniform is a pointer to the input assembly buffer.

In total, the first 6n + 5 16-bit uniform slots are reserved for a hardware vertex shader, or 6n + 12 for a hardware compute shader.

Fragment

When sample shading is enabled in a non-monolithic fragment shader, the fragment shader has the following register inputs:

r0l = 0. This is the hardware nesting counter.
r1l is the mask of samples currently being shaded. This usually equals to 1 << sample ID, for "true" per-sample shading.

When sample shading is disabled, no register inputs are defined. The fragment prolog (if present) may clobber whatever registers it pleases.

Registers have the following layout at the end of the fragment shader (read by the fragment epilog):

r0l = 0 if sample shading is enabled. This is implicitly true.
r1l preserved if sample shading is enabled.
r2 and r3l contain the emitted depth/stencil respectively, if depth and/or stencil are written by the fragment shader. Depth/stencil writes must be deferred to the epilog for correctness when the epilog can discard (i.e. when alpha-to-coverage is enabled).
r3h contains the logically emitted sample mask, if the fragment shader uses forced early tests. This predicates the epilog's stores.
The vec4 of 32-bit registers beginning at r(4 * (i + 1)) contains the colour output for render target i. When dual source blending is enabled, there is only a single render target and the dual source colour is treated as the second render target (registers r8-r11).

Uniform registers have the following layout:

u0_u1: 64-bit render target texture heap
u2...u5: Blend constant
u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter address and (OpenGL only) the 64-bit polygon stipple address