We can't have merged shaders where the first part is compiled using ACO
and the second part is compiled using LLVM.
Add ACO-specific main shader parts to fix that.
This happens when ACO is enabled for gfx12 streamout where GS can be paired
with a previous shader compiled by LLVM.
Fixes: 8ba718fb7d - radeonsi/gfx12: use ACO for streamout because it's faster
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34491>
This will prevent accidental crashes and hangs because of how we define
tracked enums.
The reg_enum parameter must be a compile-time constant.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
SPI_BARYC_CNTL is moved to the preamble because it's always 0.
We set frag_coord_is_center for the NIR pass to indicate that sample_pos
should be lowered differently.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
There is no test for this, but it's been broken.
ARB_sample_shading doesn't set fs.uses_sample_shading in shader_info,
which causes us to enter this path to force per-sample interpolation,
but doing so breaks the shader if the PS prolog is used.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
It just indicates that sample shading is enabled, which we were
checking already. The state is redundant.
Just check shader_info::fs::uses_sample_shading. ARB_sample_shading (GL3.3)
doesn't set fs.uses_sample_shading in shader_info (which is for GL4.0), and
that's why we have this codepath that forces per-sample interpolation.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
This just implements it in the PS prolog and LLVM IR (ACO already
implements it), and enables it for monolithic shaders where it's already
implemented in ac_nir_lower_ps_early.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
This further reduces dependence on si_shader_info.
union si_ps_input_info is added because we don't need usage_mask in there.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
This removes duplicated gathering from 3 places for shader variants,
and adds it where it should be, which is before late optimizations and
late lowering passes, which is where we want it for the radeonsi linker.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
We don't need to look at the framebuffer state and record how many color
buffers to write. Instead, we can deduce which color buffers are enabled
from spi_shader_col_format, which already does the right thing.
So PS epilogs only need a single bool flag that determines whether all
enabled color buffers should be written.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
This adds an option into the prolog key to replace frag_coord.xy with
pixel_coord when sample shading is disabled, which is most of the time.
This reduces the number of input VGPRs.
It's already implement in ac_nir_lower_ps_early for monolithic shaders
and the PS prolog in ACO, so this just implements it for the PS prolog
in LLVM IR.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32910>
Since the sample mask is always 1 << sample_id with full sample shading,
just use that instead of loading sample_mask_in. Set it to 0 if it's
a helper invocation. This removes the sample mask input VGPR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
- really update sample shading state when it's changed
- reduce log state bits in the shader key to 2 because we don't support
16x EQAA
- exit early from si_update_ps_iter_samples if ps_iter_sample has the same
value since the last call
- set missing wqm for the PS prolog (this might fix tests)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
- many non-inline functions are only used in 1 .c file: make them static
- some inline functions are only use in 1 .c file: move them there
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>
This compiles an optimized shader variant for NGG streamout where the output
primitive is known at compile time. This allows putting stores for all
vertices into the same VMEM clause.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
This uses si_shader_info to store whether gl_FragDepth can be removed,
and it uses the kill_z epilog flag to do the removal without recompilation.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
It works exactly like gfx11 except that COVERAGE_TO_MASK_ENABLE must be 1
to indicate that alpha for alpha-to-coverage should be read from mrtz.a.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
alpha-to-coverage must be applied before alpha-to-one. The only way to do
that is to export alpha for alpha-to-coverage via mrtz, and export 1 via
mrt0.a.
ACO and monolithic shader support is already in place thanks to RADV,
so we only need to change the LLVM PS epilog and the shader key.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.
It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders
Some of the samplemask code wasn't completely correct, but probably harmless.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.
Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
assert(!shader->key.ps.part.prolog.force_persp_center_interp ||
(!G_0286CC_PERSP_SAMPLE_ENA(input_ena) && !G_0286CC_PERSP_CENTROID_ENA(input_ena)));
failed when all FS inputs have been eliminated by optimizations, which
causes LLVM to set PERSP_SAMPLE_ENA because at least 1 of those must be
enabled, which this code didn't expect.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32186>