mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 11:38:06 +02:00

History

Alyssa Rosenzweig fcf1a8062b asahi: switch to VS/FS prolog/epilog system With the exception of some variants for framebuffer fetch (to be addressed in a follow up MR, this is big enough as it is) -- this switches us to a shader precompile path for VS & FS. VS prologs let us implement vertex buffer fetch with dynamic formats, FS prologs let us implement misc emulation like API sample masking and cull distance, while FS epilogs handle blending and tilebuffer stores. This should cut down shader recompile jank significantly in the GL driver. It also prepares us with most of what we need for big ticket Vulkan extensions like ESO, GPL, and EDS3. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28483>		2024-03-30 00:26:20 +00:00
..
test	agx: test constant compaction	2024-03-30 00:26:18 +00:00
agx_builder.h.py	agx: Remove and/or/xor pseudo ops	2024-02-14 21:02:29 +00:00
agx_compile.c	asahi: switch to VS/FS prolog/epilog system	2024-03-30 00:26:20 +00:00
agx_compile.h	asahi: switch to VS/FS prolog/epilog system	2024-03-30 00:26:20 +00:00
agx_compiler.h	agx: implement exports	2024-03-30 00:26:19 +00:00
agx_dce.c	agx: Fix atomics with no destination	2023-08-11 20:31:27 +00:00
agx_debug.h	agx: promote constants to uniforms	2024-03-30 00:26:18 +00:00
agx_insert_waits.c	agx: trust in agx_index size	2024-01-10 08:44:38 -04:00
agx_internal_formats.h	agx: use #pragma once	2024-02-14 21:02:32 +00:00
agx_ir.c	agx: allow 16-bit immediate on stack load/store	2024-02-14 21:02:31 +00:00
agx_liveness.c	agx: Put else instructions in the right block	2023-08-11 20:31:27 +00:00
agx_lower_64bit.c	asahi: Convert to SPDX headers	2023-03-28 05:14:00 +00:00
agx_lower_parallel_copy.c	agx: fix 16-bit mem swaps	2024-03-30 00:26:18 +00:00
agx_lower_pseudo.c	agx: implement exports	2024-03-30 00:26:19 +00:00
agx_lower_spill.c	agx: move spill/fills accounting to shaderdb	2024-03-30 00:26:18 +00:00
agx_lower_uniform_sources.c	agx: model 64-bit uniform restriction on ALU	2024-03-30 00:26:18 +00:00
agx_minifloat.h	agx: use #pragma once	2024-02-14 21:02:32 +00:00
agx_nir.h	agx: split select opt into its own pass	2024-03-30 00:26:18 +00:00
agx_nir_algebraic.py	agx: split select opt into its own pass	2024-03-30 00:26:18 +00:00
agx_nir_lower_address.c	agx/lower_address: Remove not used has_offset	2023-09-05 18:50:34 +00:00
agx_nir_lower_cull_distance.c	nir: add offset to load_coefficients_agx	2024-03-30 00:26:19 +00:00
agx_nir_lower_discard_zs_emit.c	agx: remove spurious z/s writes in force early-z shaders	2023-12-09 12:08:39 -04:00
agx_nir_lower_frag_sidefx.c	asahi: don't use NIR_PASS_V	2024-01-12 01:13:03 +00:00
agx_nir_lower_interpolation.c	asahi: delete layer id code	2024-03-30 00:26:19 +00:00
agx_nir_lower_sample_mask.c	asahi: switch to VS/FS prolog/epilog system	2024-03-30 00:26:20 +00:00
agx_nir_lower_shared_bitsize.c	treewide: Drop nir_ssa_for_src users	2023-09-18 10:25:17 -04:00
agx_nir_lower_subgroups.c	agx: optimize vote_eq	2024-02-14 21:02:29 +00:00
agx_nir_opt_preamble.c	agx/opt_preamble: improve rewrite cost est	2024-03-30 00:26:19 +00:00
agx_opcodes.c.py	agx: Include schedule class in the opcode info	2023-09-05 18:50:34 +00:00
agx_opcodes.h.py	agx: Include schedule class in the opcode info	2023-09-05 18:50:34 +00:00
agx_opcodes.py	agx: implement exports	2024-03-30 00:26:19 +00:00
agx_opt_break_if.c	agx: Augment if/else/while_cmp with a target	2023-10-01 12:32:11 -04:00
agx_opt_compact_constants.c	agx: compact 32-bit constants	2024-03-30 00:26:18 +00:00
agx_opt_cse.c	agx/opt_cse: alloc less	2024-03-30 00:26:18 +00:00
agx_opt_empty_else.c	agx: Use agx_first_instr	2023-09-05 18:50:34 +00:00
agx_opt_jmp_none.c	agx: Insert jmp_exec_none instructions	2023-10-01 12:32:11 -04:00
agx_opt_promote_constants.c	agx: promote constants to uniforms	2024-03-30 00:26:18 +00:00
agx_optimizer.c	agx: implement exports	2024-03-30 00:26:19 +00:00
agx_pack.c	asahi: switch to VS/FS prolog/epilog system	2024-03-30 00:26:20 +00:00
agx_performance.c	agx: start a crude cycle model	2024-03-30 00:26:19 +00:00
agx_pressure_schedule.c	agx: sink wait_pix	2024-02-14 21:02:32 +00:00
agx_print.c	agx: add parallel copy printing	2024-02-14 21:02:31 +00:00
agx_register_allocate.c	agx: implement exports	2024-03-30 00:26:19 +00:00
agx_reindex_ssa.c	agx: add SSA reindexing pass	2024-03-30 00:26:18 +00:00
agx_repair_ssa.c	agx: add SSA repair pass	2024-03-30 00:26:18 +00:00
agx_spill.c	agx: implement get_sr remat	2024-03-30 00:26:18 +00:00
agx_validate.c	agx: implement exports	2024-03-30 00:26:19 +00:00
meson.build	asahi: rewrite varying linking	2024-03-30 00:26:19 +00:00
README.md	agx: document non-monolithic ABI	2024-03-30 00:26:19 +00:00

README.md

Special registers

r0l is the hardware nesting counter.

r1 is the hardware link register.

r5 and r6 are preloaded in vertex shaders to the vertex ID and instance ID.

ABI

The following section describes the ABI used by non-monolithic programs.

Vertex

Registers have the following layout at the beginning of the vertex shader (written by the vertex prolog):

r0-r4 and r7 undefined. This avoids preloading into the nesting counter or having unaligned values. The prolog is free to use these registers as temporaries.
r5-r6 retain their usual meanings, even if the vertex shader is running as a hardware compute shader. This allows software index fetch code to run in the prolog without contaminating the main shader key.
r8 onwards contains 128-bit uniform vectors for each attribute. Accommodates 30 attributes without spilling, exceeding the 16 attribute API minimum. For 32 attributes, we will need to use function calls or the stack.

One useful property is that the GPR usage of the combined program is equal to the GPR usage of the main shader. The prolog cannot write higher registers than read by the main shader.

Vertex prologs do not have any uniform registers allocated for preamble optimization or constant promotion, as this adds complexity without any legitimate use case.

For a vertex shader reading n attributes, the following layout is used:

The first n 64-bit uniforms are the base addresses of each attribute.
The next n 32-bit uniforms are the associated clamps (sizes). Presently robustness is always used.
The next 32-bit uniform is the base instance. This must always be reserved because it is unknown at vertex shader compile-time whether any attribute will use instancing.
For a hardware compute shader, the next 32-bit uniform is the base/first vertex.
For a hardware compute shader, the next 64-bit uniform is a pointer to the input assembly buffer.

In total, the first 6n + 2 16-bit uniform slots are reserved for a hardware vertex shader, or 6n + 8 for a hardware compute shader.

Fragment

When sample shading is enabled in a non-monolithic fragment shader, the fragment shader has the following register inputs:

r0l = 0. This is the hardware nesting counter.
r0h is the mask of samples currently being shaded. This usually equals to 1 << sample ID, for "true" per-sample shading.

When sample shading is disabled, no register inputs are defined. The fragment prolog (if present) may clobber whatever registers it pleases.

Registers have the following layout at the end of the fragment shader (read by the fragment epilog):

r0l = 0 if sample shading is enabled. This is implicitly true.
r0h preserved if sample shading is enabled.
r2 and r3l contain the emitted depth/stencil respectively, if depth and/or stencil are written by the fragment shader. Depth/stencil writes must be deferred to the epilog for correctness when the epilog can discard (i.e. when alpha-to-coverage is enabled).
The vec4 of 32-bit registers beginning at r(4 * (i + 1)) contains the colour output for render target i. When dual source blending is enabled, there is only a single render target and the dual source colour is treated as the second render target (registers r8-r11).

Uniform registers have the following layout:

u0_u1: 64-bit render target texture heap
u2...u5: Blend constant
u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter address and (OpenGL only) the 64-bit polygon stipple address