While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.
It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html
Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.
The old approach also stops worrking if passes reorder instructions.
This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
This adds an option to replace frag_coord.xy with pixel_coord when sample
shading is disabled, which is most of the time. This reduces the number of
input VGPRs.
It's already implement in ac_nir_lower_ps_early for monolithic shaders.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
Since the sample mask is always 1 << sample_id with full sample shading,
just use that instead of loading sample_mask_in. Set it to 0 if it's
a helper invocation. This removes the sample mask input VGPR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
This should be faster than 32_ABGR.
Also, stencil exports are changed from UINT16_ABGR to 32_GR,
which should have no effect on performance.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>
In nested control flow this is nessecary to not demote/terminate invocations
that are part of the global but not part of the local mask.
At the top level, the masks are the same and no additional invocations
can be accidentally disabled.
Foz-DB Navi21:
Totals from 2095 (2.64% of 79395) affected shaders:
Instrs: 1058326 -> 1056839 (-0.14%)
CodeSize: 5632480 -> 5626616 (-0.10%)
Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00%
Copies: 114446 -> 114433 (-0.01%)
SALU: 230585 -> 229098 (-0.64%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755>
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.
It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders
Some of the samplemask code wasn't completely correct, but probably harmless.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
Regression on test:
dEQP-GLES31.functional.geometry_shading.basic.output_256
voffset is missing if buffer store base >=4096, we need to
re-calculate offen after resolve_excess_vmem_const_offset().
Fixes: cdaf269924 ("aco: inline store_vmem_mubuf/emit_single_mubuf_store")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767>
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.
Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
On AMD hardware, it's possible to export a separate alpha channel for
applying alpha-to-one after alpha-to-coverage and not before.
On GFX11+, it's already mostly supported but alpha needs to be exported
to MRTZ.a and one to MRT0.a. The hw always uses alpha for
alpha-to-coverage from MRTZ.a.
On older generations, the driver needs the same separate alpha export
but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE
which selects alpha from MRTZ.a.
This should fix alpha-to-coverage with alpha-to-one when either
depth, stencil or samplemask are exported but it still needs a slightly
different solution without MRTZ. I will fix that later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>