mesa/src/asahi/compiler
Marek Olšák 7f4e36ff7d gallium: replace PIPE_SHADER_CAP_INDIRECT_INPUT/OUTPUT_ADDR with NIR options
This is a prerequisite for enabling nir_opt_varyings for all gallium
drivers.

nir_lower_io_passes (called by the GLSL linker) only uses NIR options
to lower indirect IO access before lowering IO and calling
nir_opt_varyings.

Most drivers report full support for indirect IO and lower it themselves,
which prevents compaction of lowered indirectly accessed varyings because
nir_opt_varyings doesn't touch indirect varyings.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (Rb for asahi)
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> (for r300)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32423>
2024-12-03 12:57:36 +00:00
..
test agx: add tests for sign/zero-extend propagate 2024-11-11 14:33:02 +00:00
agx_builder.h.py
agx_compile.c agx: reduce preamble/main alignment 2024-12-02 19:26:46 +00:00
agx_compile.h gallium: replace PIPE_SHADER_CAP_INDIRECT_INPUT/OUTPUT_ADDR with NIR options 2024-12-03 12:57:36 +00:00
agx_compiler.h agx: rewrite address mode lowering 2024-11-08 21:15:42 -04:00
agx_dce.c agx: speed-up dce 2024-05-14 04:57:25 +00:00
agx_debug.h agx: promote constants to uniforms 2024-03-30 00:26:18 +00:00
agx_insert_waits.c
agx_ir.c agx: negate iadd/imsub constants 2024-10-30 10:14:07 -04:00
agx_liveness.c agx: reset kill bits in liveness 2024-10-05 18:30:12 +00:00
agx_lower_64bit.c
agx_lower_divergent_shuffle.c agx: handle non-immediate shuffles in divergent CF 2024-05-14 04:57:25 +00:00
agx_lower_parallel_copy.c agx: lower swaps late 2024-10-05 18:30:12 +00:00
agx_lower_pseudo.c agx: add pseudo for signext 2024-11-11 14:33:01 +00:00
agx_lower_spill.c agx: clarify spill lowering math 2024-10-05 18:30:11 +00:00
agx_lower_uniform_sources.c agx: fix corner with uniform source lowering 2024-10-05 18:30:12 +00:00
agx_minifloat.h
agx_nir.h asahi,hk: reenable rgb32 buffer textures 2024-11-24 13:06:08 +00:00
agx_nir_algebraic.py agx: fuse also 8-bit address math 2024-11-11 14:33:02 +00:00
agx_nir_lower_address.c agx: rewrite address mode lowering 2024-11-08 21:15:42 -04:00
agx_nir_lower_cull_distance.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
agx_nir_lower_discard_zs_emit.c agx: set discard_is_demote 2024-06-22 10:09:45 -04:00
agx_nir_lower_fminmax.c nir,agx: lower fmin/fmax in NIR 2024-10-30 10:14:07 -04:00
agx_nir_lower_frag_sidefx.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
agx_nir_lower_interpolation.c asahi: plumb tri fan flatshading through common 2024-05-14 04:57:27 +00:00
agx_nir_lower_sample_mask.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
agx_nir_lower_shared_bitsize.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
agx_nir_lower_subgroups.c agx: optimize popcount(ballot(true)) 2024-08-12 18:46:31 -04:00
agx_nir_lower_texture.c asahi,hk: reenable rgb32 buffer textures 2024-11-24 13:06:08 +00:00
agx_nir_opt_preamble.c agx: support bindless block image store 2024-09-02 23:27:14 +00:00
agx_nir_texture.h asahi,agx: move texture lowering into the compiler 2024-11-24 13:06:08 +00:00
agx_opcodes.c.py agx: encoding_32 -> encoding 2024-10-30 10:14:07 -04:00
agx_opcodes.h.py agx: drop encoding_16 2024-10-30 10:14:07 -04:00
agx_opcodes.py agx: add pseudo for signext 2024-11-11 14:33:01 +00:00
agx_opt_break_if.c
agx_opt_compact_constants.c agx: compact 32-bit constants 2024-03-30 00:26:18 +00:00
agx_opt_cse.c
agx_opt_empty_else.c
agx_opt_jmp_none.c agx: tweak jmp_exec_none heuristic 2024-08-12 18:46:31 -04:00
agx_opt_promote_constants.c agx: don't upload constant padding at the start 2024-10-30 10:14:07 -04:00
agx_optimizer.c agx: optimize signext imad 2024-11-11 14:33:02 +00:00
agx_pack.c agx: make needs_g13x_coherency a tri-state 2024-11-20 16:10:11 +00:00
agx_performance.c agx: fix bfeil timing 2024-11-08 21:15:42 -04:00
agx_pressure_schedule.c
agx_print.c agx: add reg to agx_index 2024-10-05 18:30:12 +00:00
agx_register_allocate.c agx: validate RA 2024-10-05 18:30:13 +00:00
agx_reindex_ssa.c agx: add SSA reindexing pass 2024-03-30 00:26:18 +00:00
agx_repair_ssa.c agx: don't propagate constants from trivial phis 2024-10-05 18:30:12 +00:00
agx_spill.c agx: don't reserve regs if we won't use them 2024-10-05 18:30:12 +00:00
agx_validate.c agx: validate sizes are consistent in the IR 2024-10-05 18:30:12 +00:00
agx_validate_ra.c agx: validate RA 2024-10-05 18:30:13 +00:00
meson.build asahi,agx: move texture lowering into the compiler 2024-11-24 13:06:08 +00:00
README.md asahi: move sample mask to r1l 2024-10-05 18:30:12 +00:00

Special registers

r0l is the hardware nesting counter.

r1 is the hardware link register.

r5 and r6 are preloaded in vertex shaders to the vertex ID and instance ID.

ABI

The following section describes the ABI used by non-monolithic programs.

Vertex

Registers have the following layout at the beginning of the vertex shader (written by the vertex prolog):

  • r0-r4 and r7 undefined. This avoids preloading into the nesting counter or having unaligned values. The prolog is free to use these registers as temporaries.
  • r5-r6 retain their usual meanings, even if the vertex shader is running as a hardware compute shader. This allows software index fetch code to run in the prolog without contaminating the main shader key.
  • r8 onwards contains 128-bit uniform vectors for each attribute. Accommodates 30 attributes without spilling, exceeding the 16 attribute API minimum. For 32 attributes, we will need to use function calls or the stack.

One useful property is that the GPR usage of the combined program is equal to the GPR usage of the main shader. The prolog cannot write higher registers than read by the main shader.

Vertex prologs do not have any uniform registers allocated for preamble optimization or constant promotion, as this adds complexity without any legitimate use case.

For a vertex shader reading n attributes, the following layout is used:

  • The first n 64-bit uniforms are the base addresses of each attribute.
  • The next n 32-bit uniforms are the associated clamps (sizes). Presently robustness is always used.
  • The next 2x32-bit uniform is the base vertex and base instance. This must always be reserved because it is unknown at vertex shader compile-time whether any attribute will use instancing. Reserving also the base vertex allows us to push both conveniently with a single USC Uniform word.
  • The next 16-bit is the draw ID.
  • For a hardware compute shader, the next 48-bit is padding.
  • For a hardware compute shader, the next 64-bit uniform is a pointer to the input assembly buffer.

In total, the first 6n + 5 16-bit uniform slots are reserved for a hardware vertex shader, or 6n + 12 for a hardware compute shader.

Fragment

When sample shading is enabled in a non-monolithic fragment shader, the fragment shader has the following register inputs:

  • r0l = 0. This is the hardware nesting counter.
  • r1l is the mask of samples currently being shaded. This usually equals to 1 << sample ID, for "true" per-sample shading.

When sample shading is disabled, no register inputs are defined. The fragment prolog (if present) may clobber whatever registers it pleases.

Registers have the following layout at the end of the fragment shader (read by the fragment epilog):

  • r0l = 0 if sample shading is enabled. This is implicitly true.
  • r1l preserved if sample shading is enabled.
  • r2 and r3l contain the emitted depth/stencil respectively, if depth and/or stencil are written by the fragment shader. Depth/stencil writes must be deferred to the epilog for correctness when the epilog can discard (i.e. when alpha-to-coverage is enabled).
  • r3h contains the logically emitted sample mask, if the fragment shader uses forced early tests. This predicates the epilog's stores.
  • The vec4 of 32-bit registers beginning at r(4 * (i + 1)) contains the colour output for render target i. When dual source blending is enabled, there is only a single render target and the dual source colour is treated as the second render target (registers r8-r11).

Uniform registers have the following layout:

  • u0_u1: 64-bit render target texture heap
  • u2...u5: Blend constant
  • u6_u7: Root descriptor, so we can fetch the 64-bit fragment invocation counter address and (OpenGL only) the 64-bit polygon stipple address