This is possible with two vectors which share a temporary, though I don't
think it currently happens in practice.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40825>
Entering a loop with empty exec mask might lead to
not be able to execute the break condition and
lead to infinite loops.
Totals from 81 (0.04% of 202440) affected shaders: (Navi48)
Instrs: 3040566 -> 3040716 (+0.00%)
CodeSize: 17506768 -> 17507188 (+0.00%)
Latency: 16342966 -> 16345166 (+0.01%)
InvThroughput: 3112932 -> 3113286 (+0.01%)
Branches: 82229 -> 82365 (+0.17%)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
isel.cf.deep_traversal is a new ACO test that verifies
that the iterative nir cf visitor allows arbitrary depth.
A depth of 10000 would cause a stack overflow on x86-64 linux
(4096 kB stack) for the old recursive code. This test
is by default not enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
When iterating control-flow recursively, we always run the
risk of causing a stack overflow if the control-flow
depth is too large. This patch resolves this by visiting
control-flow nodes in an iterative way, managing an explicit
stack on the heap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
By using the constant path we can combine the v_and and the v_cmp.
Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 2833 -> 2831 (-0.07%)
Latency: 27385 -> 27367 (-0.07%)
InvThroughput: 1712 -> 1710 (-0.12%)
VALU: 1301 -> 1299 (-0.15%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
Nvidia implements both the same way as AMD does, so it makes sense to
allow for code sharing here.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40541>
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.
Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.
Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.
Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
This is a bit unusual, as we otherwise only use the VOP2 codesize
optimization opcodes in the register allocator.
But unless we change the scheduler to not split v_mov_b32_dpp and
v_dot, we have no other choice.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40510>
We are going to disallow continue statements without
loop continue constructs.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
This is now done directly in the VOPD scheduler.
Foz-DB GFX1201:
Totals from 600 (0.52% of 114655) affected shaders:
no stats changed
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
This optimization was previously done in the post-RA optimizer,
but it is more fitting for the vopd scheduler.
Doing it here also has the benefit that we don't unnecessarily use
the constant bus when VOPD can't be used.
No Foz-DB changes on GFX12 until the next commit.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
We might need to DCE users of dead instructions removed by
process_block().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40091>