ps needs to handle wqm:
1. main part may compute with args from prolog in wqm mode, so
prolog need to compute these args in wqm mode too.
2. prolog and main part need to end with exact exec, so next
shader part which inherit previous shader part's exec won't
do valid job for helper threads
1 need p_end_with_regs to operate in wqm mode and itself can't
be exact, otherwise some move instruction added by it won't be
in wqm mode so helper threads' compute result is not passed to
next shader part as args.
2 is done by p_end_wqm added by finish_program automatically
after p_end_with_regs.
Piglit tests can trigger the problem:
1. gl-2.1-polygon-stipple-fs
a. ps prolog call discard_if
b. ps main pass wqm exec to epilog
c. ps epilog export color for discarded pixel
2. fs-fwidth-color.shader_test
a. ps prolog need to pass args computed in wqm mode
b. set p_end_with_regs to exact will end wqm mode before
the move instructions, so helper threads's result is not
passed to next shader part
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
Used by radeonsi shader parts to pass args from one part to another.
It has variable number of operands to reserve fixed registers with
wanted value.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24442>
The new ac_hw_stage is going to be used by drivers as well.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597>
We don't want to rely on any NIR structures in ACO, because
we would like to avoid the need to include nir.h in aco_ir.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241>
Dual source blending must be in strict WQM mode.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643>
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).
ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.
This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).
There are two cases that benefit from the new change:
1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.
v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks
Fossil DB stats on Navi 21:
Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
Otherwise, in presence of p_exit_early_if the main FS will always
jump to the PS epilog regardless the exact mask.
This fixes dEQP-VK.draw.renderpass.shader_invocation.helper_invocation
and few vkd3d-proton regressions when PS epilogs are forced.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17617>
Because the data register of atomic VMEM instructions
is shared between src and dst, it might be necessary
to create live-range splits during RA.
Make the live-range splits explicit in WQM mode.
Totals from 7 (0.01% of 134913) affected shaders: (GFX10.3)
Latency: 17209 -> 17210 (+0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15347>
After transitioning to exact after a discard, the exec stack might be:
[exact|global, wqm, exact]
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.
Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
This case doesn't seem to happen in practice.
No need to micro-optimize it.
This patch merges instruction selection for discard/discard_if.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
Implement elect using a pseudo-op which is lowered during the
insert_exec_mask pass. This makes it possible to emit a more
optimal sequence when the exec mask is constant.
Fossil DB results on Sienna Cichlid:
Totals from 211 (0.16% of 128647) affected shaders:
CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00%
Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01%
Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00%
InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00%
Copies: 40058 -> 40162 (+0.26%)
Branches: 31209 -> 31211 (+0.01%)
PreSGPRs: 9927 -> 10125 (+1.99%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)
Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>