When the output patch size <= 32 we can be sure regardless
of wave size that each wave will take this branch, therefore
the jump can be removed.
Fossil DB stats on Navi 21:
Totals from 1385 (1.03% of 134906) affected shaders:
CodeSize: 2664436 -> 2658896 (-0.21%)
Instrs: 488618 -> 487233 (-0.28%)
Latency: 2290157 -> 2289199 (-0.04%)
InvThroughput: 898658 -> 898364 (-0.03%)
Branches: 6554 -> 5169 (-21.13%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
The GPU can skip LDS instructions when LGKMCNT==0, and for these
branches this should be always faster than a jump.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 158624792 -> 157893776 (-0.46%)
Instrs: 30234254 -> 30051500 (-0.60%)
Latency: 139521675 -> 139434597 (-0.06%); split: -0.06%, +0.00%
InvThroughput: 21184146 -> 21183653 (-0.00%); split: -0.00%, +0.00%
Branches: 1115134 -> 932380 (-16.39%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).
ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.
This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).
There are two cases that benefit from the new change:
1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.
v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks
Fossil DB stats on Navi 21:
Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
For fast-linking, we really want to upload the binaries directly in
a library to avoid creating and uploading at pipeline creation time.
To achieve that, add a radeon_winsys_bo pointer to radv_shader in
order to indicate that a shader is already uploaded. When a lib is
imported, the pipeline slab BO is also incremented to make sure it's
not freed when the lib is destroyed.
This also allows to free binaries right after they are uploaded.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18860>
Otherwise the wrong parent link might be set.
This kinda relies on waves being launched in order which tends to
be the case on AMD. To avoid the busy-wait loop waiting on stuff
from the same subgroup we do the actual processing in the body of
the loop. This can have performance implications but mostly in the
case we'd otherwise deadlock, so meh.
Reviewed-By: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18799>
First usage of the offset field, Can put more in it in the follow
up.
Reviewed-By: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18799>
Instead of relying on a certain BVH layout, this patch traverses the BVH
from the root node which gets rid of any layout requirements.
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18967>
v2: set third plane offset only for 3 plane formats (Boyuan Zhang)
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18914>
The layout calculation accidentally thought these would be
stored in variables, but that's not the case.
Fixes: 697ea02202
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18846>
This was missed. I guess CI doesn't have a recent enough LLVM for these
tests.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
Add GFX11 support and use wait_imm::unset_counter. Looping in the waitcnt
pass was probably broken on GFX11 because of this.
fossil-db (gfx1100):
Totals from 899 (0.56% of 161689) affected shaders:
Instrs: 1319368 -> 1319179 (-0.01%)
CodeSize: 7124640 -> 7123884 (-0.01%)
Latency: 26554304 -> 26404606 (-0.56%)
InvThroughput: 9032485 -> 8978773 (-0.59%); split: -0.59%, +0.00%
No navi10 fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
According to an LLVM comment, these are swapped in GFX11.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
It doesn't make sense for a "s_waitcnt vmcnt(0)" to affect a store or DS
instruction.
LLVM checks for "s_waitcnt vmcnt(0) lgkmcnt(0) expcnt(0)" but ignores
s_waitcnt_vscnt (which I assume is a bug).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: bcf94bb933 ("aco: properly recognize that s_waitcnt mitigates VMEMtoScalarWriteHazard")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18270>
This seems to resolve the memory explosion without hurting performance.
This workaround is only applied for native Vulkan.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18884>
This option (radv_enable_unified_heap_on_apu) allows to force the
driver to expose only one heap of VRAM because some games seem to
perform better with that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18884>
If a library with only the vertex input interface is created, this
would crash.
This fixes segfault with vkoverhead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18869>
attribute_mask was unused when I first introduced this but now it's
used again.
This fixes a bunch of GPL regressions.
Fixes: 0feab7b9cf ("radv: prepare the VS input state for prologs created with GPL")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18862>
The driver no longer bind NULL graphics pipelines, so these checks
are useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18873>
The memory layout logic is duplicated between
radv_GetAccelerationStructureBuildSizesKHR and
radv_CmdBuildAccelerationStructuresKHR. This patch adds a helper that
computes the scratch and acceleration structure memory layout for a
given build configuration.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18842>
Culls like 99% of the triangles that are culled at all.
Reduces VALU usage in Q2RTX traversal by ~8%, though doesn't look
like VALU is a bottleneck at this point ...
For Control we get a ~5% reduction in VALU usage, but similarly it
doesn't look like a bottleneck.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18830>