GFX11 has a lot of complicated data dependency hazards.
For debugging GFX10+ data dependency hazards. This creates an excessive
amount of s_waitcnt_depctr.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18273>
fossil-db (gfx1100):
Totals from 57858 (42.85% of 135032) affected shaders:
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18273>
Same as GFX10, but in a separate pass because it's the only hazard that's
shared.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18273>
It doesn't make sense for a "s_waitcnt vmcnt(0)" to affect a store or DS
instruction.
LLVM checks for "s_waitcnt vmcnt(0) lgkmcnt(0) expcnt(0)" but ignores
s_waitcnt_vscnt (which I assume is a bug).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: bcf94bb933 ("aco: properly recognize that s_waitcnt mitigates VMEMtoScalarWriteHazard")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18270>
According to LLVM, we only need to care about VOPC which writes exec.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
This prevents issues where we insert a s_waitcnt_vscnt(0) at the start of
a block or very end of the shader because we're joining two blocks (for
example, one with has_VMEM=true and the other with
has_branch_after_DS=true).
fossil-db (navi10):
Totals from 2441 (1.51% of 161220) affected shaders:
Instrs: 1383964 -> 1384094 (+0.01%); split: -0.07%, +0.08%
CodeSize: 7438212 -> 7438760 (+0.01%); split: -0.05%, +0.06%
Latency: 13780665 -> 13679664 (-0.73%); split: -1.53%, +0.80%
InvThroughput: 2950835 -> 2921511 (-0.99%); split: -1.06%, +0.07%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
For example, "DS -> branch -> VMEM -> branch -> DS".
fossil-db (navi10):
Totals from 639 (0.40% of 161220) affected shaders:
Instrs: 629090 -> 628254 (-0.13%); split: -0.19%, +0.06%
CodeSize: 3410164 -> 3406748 (-0.10%); split: -0.14%, +0.04%
Latency: 7834755 -> 7821011 (-0.18%); split: -0.70%, +0.52%
InvThroughput: 1369698 -> 1374495 (+0.35%); split: -0.12%, +0.47%
A lot of the fossil-db changes are noise.
threekingdoms.8db138826c386a62.1.foz/0b222ed175eebad0 is an example of a
shader that actually has this issue.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: c037ba1bb7 ("aco/gfx10: Mitigate LdsBranchVmemWARHazard.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
By default, std::stack uses std::deque to allocate its elements, which has
poor cache efficiency. std::vector makes appending elements more expensive
(due to potential reallocations), but in the changed contexts the element
count should always be low anyway.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11925>
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)
Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
The container is moved from before and hence returns size 0. To get the
correct value, the new instruction container must be used instead.
This was flagged by clang-tidy. The fixed call still triggers the
corresponding diagnostic, hence this change silences it by adding a
redundant clear() after move.
Fixes: 7f1b537304 ("aco: add new NOP insertion pass for GFX6-9")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9432>
This new pass is more similar to the GFX10 pass and should be able to
handle control flow better.
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4004>
For all VMEM instructions, the resource constant is now
in operands[0]. For MIMG instructions, the sampler shares
operands[1] with write data in case this instruction writes memory.
Moving the VADDR to be the last operand for MIMG is the first step to
support Navi NSA encoding.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3602>
It's required to insert 1 wait state if the dst VGPR of any v_interp_*
is followed by a read with v_readfirstlane or v_readlane to fix GPU
hangs on GFX6. Note that v_writelane_* is apparently not affected.
This hazard isn't documented anywhere but AMD confirmed it.
This fixes a GPU hang with the texturemipmapgen Sascha demo on GFX6.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3533>
LLVM and the proprietary compiler seem to do this
Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>