v2 by Timur Kristóf:
Do not add the affinity for instructions that can't write m0
reliably, such as readlane-like instructions on GFX8.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
This information was wrong in some places, let's fix it now.
GFX6:
The GPU has 64KB LDS, but only 32KB is usable by a workgroup.
NGG:
There was some misinformation about NGG only being able to
address 32 KB LDS, it turns out this is actually not true
and it can address the full 64K.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21935>
We haven't measured any noteworthy perf improvement
from these, and they are difficult to port to NIR,
so remove them before the NIR based VS input lowering
in order to make it easier to bisect and analyze stats.
Fossil DB stats on Rembrandt (GFX10.3):
Totals from 21750 (16.12% of 134913) affected shaders:
VGPRs: 868512 -> 868664 (+0.02%); split: -0.00%, +0.02%
CodeSize: 64406804 -> 64397572 (-0.01%); split: -0.08%, +0.07%
MaxWaves: 567904 -> 567888 (-0.00%); split: +0.00%, -0.00%
Instrs: 12327212 -> 12324851 (-0.02%); split: -0.10%, +0.08%
Latency: 61367324 -> 61371204 (+0.01%); split: -0.04%, +0.05%
InvThroughput: 9687734 -> 9686000 (-0.02%); split: -0.03%, +0.01%
VClause: 248207 -> 303449 (+22.26%); split: -0.02%, +22.28%
SClause: 314942 -> 315564 (+0.20%); split: -0.09%, +0.29%
Copies: 921581 -> 921820 (+0.03%); split: -0.16%, +0.19%
Branches: 341964 -> 341967 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16805>
It's just v_fma with fixed DPP8 and builtin s_waitcnt_expcnt, so it can mostly
be handled as a pure VALU instruction.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
When a shader has a comparison with the subgroup invocation id,
we can use a constant instead, saving a VALU instruction.
When the constant can't be represented as a 64-bit literal,
use the s_bfm_b64 instruction to generate it instead, which
is still a win.
Fossil DB stats on GFX11:
Totals from 300 (0.22% of 134913) affected shaders:
CodeSize: 2223052 -> 2214336 (-0.39%); split: -0.43%, +0.04%
Instrs: 430216 -> 429882 (-0.08%); split: -0.14%, +0.06%
Latency: 5881180 -> 5878181 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 731846 -> 729293 (-0.35%)
Copies: 31662 -> 31847 (+0.58%); split: -0.03%, +0.61%
Branches: 8241 -> 8100 (-1.71%)
PreVGPRs: 15788 -> 15786 (-0.01%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20843>
It looks like the LLVM assembler promotes true 16-bit instructions to VOP3
in this case.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
fossil-db (gfx1100):
Totals from 112 (0.08% of 134574) affected shaders:
Scratch: 1513472 -> 1455360 (-3.84%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
p_cvt_f16_f32_rtne will be lowered to v_cvt_f16_f32 and we already know that
preserves the high bits.
I tested the others on GFX1036.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20574>
Apparently the TLS constructor doesn't work well if RADV
is instantiated multiple times and/or used by a program with
already existing threads.
Fixes: a128d444cb ('aco: use monotonic_buffer_resource for instructions')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19219>
GFX11 has a lot of complicated data dependency hazards.
For debugging GFX10+ data dependency hazards. This creates an excessive
amount of s_waitcnt_depctr.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18273>
Add GFX11 support and use wait_imm::unset_counter. Looping in the waitcnt
pass was probably broken on GFX11 because of this.
fossil-db (gfx1100):
Totals from 899 (0.56% of 161689) affected shaders:
Instrs: 1319368 -> 1319179 (-0.01%)
CodeSize: 7124640 -> 7123884 (-0.01%)
Latency: 26554304 -> 26404606 (-0.56%)
InvThroughput: 9032485 -> 8978773 (-0.59%); split: -0.59%, +0.00%
No navi10 fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
As monotonic_buffer_resource is not thread-safe,
we use a thread_local instance which gets allocated once.
This change reduces the compile time spent in ACO by
approximately 10%.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18112>
We would assemble an instruction writing vcc instead.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 5ffc73896f ("aco/assembler: Fix v_cmpx with SDWA.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18077>
The old name is no longer accurate.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18077>
Instead of SDWA v_mov_b32/v_xor_b32, we can use a combination of
v_add_u16/v_sub_u16 (add/sub swap, similar to xor swap) and v_perm_b32
with a literal.
I don't know yet if GFX11 adds any new instructions which makes this
easier, but this approach should have full functionality.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16595>