Rhys Perry
295b7d606f
aco: insert NOP before dealloc_vgpr in the insert_NOPs pass
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884 >
2024-11-06 09:58:05 +00:00
Rhys Perry
0ad713ca9f
aco: add waitcnt build helper
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884 >
2024-11-06 09:58:04 +00:00
Daniel Schürmann
bc2d166b50
aco/lower_to_hw: don't allocate new temporaries
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31362 >
2024-10-07 07:00:19 +00:00
Daniel Schürmann
30e7644e5f
aco: simplify Definition constructors
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31362 >
2024-10-07 07:00:19 +00:00
Georg Lehmann
9bb10b58f3
aco: use v_cvt_pk_u8_f32 for f2u8
...
Foz-DB Navi31:
Totals from 42 (0.05% of 79395) affected shaders:
Instrs: 3253747 -> 3248867 (-0.15%); split: -0.15%, +0.00%
CodeSize: 16690136 -> 16661772 (-0.17%); split: -0.17%, +0.00%
VGPRs: 4176 -> 4128 (-1.15%)
Latency: 18485157 -> 18479752 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 3659404 -> 3658222 (-0.03%); split: -0.03%, +0.00%
Copies: 231891 -> 228145 (-1.62%)
VALU: 1785800 -> 1782054 (-0.21%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29532 >
2024-09-02 19:49:20 +00:00
Rhys Perry
aafb49f56b
aco: set prefer_remove for gfx9- too
...
This is a hint that the branch is worth removing. Assume that's the case,
regardless of the gfx level.
fossil-db (vega10):
Totals from 22 (0.03% of 63053) affected shaders:
Instrs: 23927 -> 23856 (-0.30%)
CodeSize: 125096 -> 124812 (-0.23%)
Latency: 138258 -> 137765 (-0.36%)
InvThroughput: 55900 -> 55884 (-0.03%)
Branches: 391 -> 320 (-18.16%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321 >
2024-08-15 16:00:19 +00:00
Rhys Perry
9f1a5645cf
aco: completely skip branches if they're never taken
...
fossil-db (navi21):
Totals from 196 (0.25% of 79395) affected shaders:
Instrs: 101902 -> 101706 (-0.19%)
CodeSize: 576988 -> 576232 (-0.13%)
Latency: 750344 -> 750280 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 119170 -> 119161 (-0.01%)
Branches: 3933 -> 3737 (-4.98%)
fossil-db (vega10):
Totals from 585 (0.93% of 63053) affected shaders:
Instrs: 346877 -> 346292 (-0.17%)
CodeSize: 1810600 -> 1808260 (-0.13%)
Latency: 1817743 -> 1814233 (-0.19%)
InvThroughput: 652142 -> 651944 (-0.03%)
Branches: 5087 -> 4502 (-11.50%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321 >
2024-08-15 16:00:19 +00:00
Rhys Perry
c29d9f1184
aco: only remove branch jumping over SMEM/barrier if it's never taken
...
SMEM might be an invalid access, and barriers are probably expensive.
fossil-db (navi21):
Totals from 126 (0.16% of 79395) affected shaders:
Instrs: 2764965 -> 2765377 (+0.01%)
CodeSize: 15155348 -> 15156788 (+0.01%)
Latency: 17604293 -> 17604296 (+0.00%)
Branches: 105211 -> 105623 (+0.39%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321 >
2024-08-15 16:00:19 +00:00
Rhys Perry
b934255510
aco: split selection_control_remove into rarely_taken and never_taken
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321 >
2024-08-15 16:00:18 +00:00
Georg Lehmann
e0818cb87b
aco/gfx11+: don't use VOP3 v_swap_b16
...
v_swap_b16 is not offically supported as VOP3, so it can't be used with v128-255.
Tests show that VOP3 appears to work correctly, but according to AMD that should
not be relied on.
https://github.com/llvm/llvm-project/pull/100442#discussion_r1703929676
Foz-DB Navi31:
Totals from 6 (0.01% of 79395) affected shaders:
Instrs: 64799 -> 65932 (+1.75%)
CodeSize: 360180 -> 368440 (+2.29%)
Latency: 1364648 -> 1365922 (+0.09%)
InvThroughput: 635843 -> 636475 (+0.10%)
Copies: 14766 -> 15698 (+6.31%)
VALU: 38743 -> 39675 (+2.41%)
Fixes: 80b8bbf0c5 ("aco/gfx11: use v_swap_b16")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30515 >
2024-08-06 20:40:12 +00:00
Georg Lehmann
53155ba12d
aco: add CompilationProgress::after_lower_to_hw
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30399 >
2024-07-29 18:35:33 +00:00
Georg Lehmann
4399c7bac3
aco: add aco_opcode::p_s_cvt_f16_f32_rtne
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29245 >
2024-07-18 08:36:14 +00:00
Rhys Perry
1643c933ef
aco/gfx11: don't use v_bfrev_b32 with wave64
...
v_mov_b32 can be dual issued.
fossil-db (navi31):
Totals from 1792 (2.26% of 79395) affected shaders:
CodeSize: 27462476 -> 27470308 (+0.03%); split: -0.00%, +0.03%
Latency: 29403214 -> 29402713 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 5005863 -> 5004702 (-0.02%); split: -0.02%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29912 >
2024-07-01 17:34:22 +00:00
Rhys Perry
f842bd81ca
aco: use s_pack_*_b32_b16 more in p_insert/p_extract lowering
...
This opcode doesn't write SCC, which gives later passes more freedom to
move instructions.
fossil-db (navi21):
Totals from 727 (0.92% of 79395) affected shaders:
Latency: 14943483 -> 14942704 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 3225790 -> 3225766 (-0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29912 >
2024-07-01 17:34:22 +00:00
Rhys Perry
9fe3af1e2a
aco: insert s_nop before discard early exit sendmsg(dealloc_vgpr)
...
Forgot about this one.
fossil-db (gfx1100):
Totals from 3920 (2.94% of 133461) affected shaders:
Instrs: 6632088 -> 6636008 (+0.06%)
CodeSize: 34165376 -> 34181056 (+0.05%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 37fbfa655a ("aco: insert s_nop before VGPR deallocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29770 >
2024-06-18 20:17:38 +00:00
Georg Lehmann
046414e061
aco: add more anonymous namespaces
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29740 >
2024-06-18 17:53:07 +00:00
Georg Lehmann
2b56a97374
aco/lower_to_hw: optimize split 64bit constant copies
...
Foz-DB Navi21:
Totals from 3209 (4.04% of 79395) affected shaders:
Instrs: 6502065 -> 6496612 (-0.08%)
CodeSize: 35578300 -> 35556596 (-0.06%)
Latency: 66092924 -> 66092668 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 16968953 -> 16968900 (-0.00%); split: -0.00%, +0.00%
SClause: 198651 -> 198647 (-0.00%)
Copies: 597323 -> 591872 (-0.91%)
SALU: 930918 -> 925467 (-0.59%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422 >
2024-05-29 11:59:22 +00:00
Georg Lehmann
5910a46101
aco/lower_to_hw: use copy_constant_sgpr for masks
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422 >
2024-05-29 11:59:22 +00:00
Georg Lehmann
23d88e68fc
aco: small constant copy optimizations
...
Foz-DB Navi21:
Totals from 13 (0.02% of 79395) affected shaders:
CodeSize: 93432 -> 93376 (-0.06%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422 >
2024-05-29 11:59:22 +00:00
Georg Lehmann
54ad07c32a
aco/lower_to_hw: add copy_constant_sgpr
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422 >
2024-05-29 11:59:22 +00:00
Georg Lehmann
56354c6cd7
aco: don't pass program to emit_bpermute
...
Also change the param order, because the builder typically comes first.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422 >
2024-05-29 11:59:22 +00:00
Rhys Perry
e79a8219d2
aco/gfx12: sign-extend s_getpc_b64
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29330 >
2024-05-28 10:52:11 +00:00
Rhys Perry
ae18c88409
aco/gfx12: implement workgroup barrier
...
Same sequence LLVM uses for llvm.amdgcn.s.barrier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29330 >
2024-05-28 10:52:11 +00:00
Rhys Perry
fae2a85d57
aco/gfx12: implement subgroup shader clock
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29330 >
2024-05-28 10:52:11 +00:00
Rhys Perry
74aa6437d6
aco: add GFX11.5+ opcodes
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29162 >
2024-05-14 20:50:27 +00:00
Georg Lehmann
80b8bbf0c5
aco/gfx11: use v_swap_b16
...
I tested that v_swap_b16 can be encoded as VOP3, because the ISA doc doesn't list
it as a possible VOP3 opcode. VOP3 is nessecary to access v128+.
Foz-DB Navi31:
Totals from 32 (0.04% of 79395) affected shaders:
Instrs: 201865 -> 195168 (-3.32%)
CodeSize: 1082220 -> 1031228 (-4.71%); split: -4.71%, +0.00%
Latency: 2258198 -> 2238586 (-0.87%)
InvThroughput: 796731 -> 788934 (-0.98%)
Copies: 34514 -> 29220 (-15.34%)
VALU: 122457 -> 117163 (-4.32%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29143 >
2024-05-13 18:42:19 +00:00
Georg Lehmann
44cc0d31b8
aco/gfx10: use v_add_u16 with literal for constant copies
...
This also means the v_perm_b32 path is now unused.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29045 >
2024-05-06 13:38:14 +00:00
Georg Lehmann
7823065f64
aco/gfx11+: use v_cvt_pk_u8_f32 for 8bit constant copies
...
Foz-DB Navi31:
Totals from 201 (0.25% of 79395) affected shaders:
Instrs: 186869 -> 186857 (-0.01%)
CodeSize: 1026760 -> 1026700 (-0.01%); split: -0.01%, +0.00%
Latency: 2302050 -> 2301969 (-0.00%)
InvThroughput: 739466 -> 739431 (-0.00%)
Copies: 26467 -> 26454 (-0.05%)
VALU: 93529 -> 93516 (-0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29045 >
2024-05-06 13:38:14 +00:00
Georg Lehmann
d4084f7f09
aco/lower_to_hw: remove gfx6/7 subdword paths
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28836 >
2024-05-02 11:09:36 +00:00
Georg Lehmann
6b35de971c
aco/lower_to_hw: don't use regClass to identify subdword reductions
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28836 >
2024-05-02 11:09:35 +00:00
Georg Lehmann
47d824a644
aco/lower_to_hw: fix 16bit p_insert on gfx8
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28881 >
2024-04-25 09:47:19 +00:00
Georg Lehmann
bb80ac7a70
aco/lower_to_hw: fix v_cvt_pk_u16_u32 instruction format
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28881 >
2024-04-25 09:47:18 +00:00
Georg Lehmann
4b5016a537
aco: support high_16bits FS IO
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28435 >
2024-04-10 07:49:27 +00:00
Samuel Pitoiset
7a69d78ba2
aco: use SPDX-License-Identifier
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28622 >
2024-04-08 15:49:25 +00:00
Rhys Perry
66153f7bfe
aco: always emit float mode for merged shaders compiled separately
...
We don't know what the float mode was by the end of the previous shader,
so we should always set it.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28392 >
2024-03-28 21:02:29 +00:00
Daniel Schürmann
a863c7951e
aco: remove create_instruction() template parameter
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
9b0ebcc39b
aco: change return type of create_instruction() to Instruction*
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
1187189235
aco: unify different SALU types into single struct SALU_instruction
...
This removes
- SOP1_instruction
- SOP2_instruction
- SOPC_instruction
- SOPK_instruction
- SOPP_instruction
and their corresponding methods.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
5d265257a0
aco: remove SOPP_instruction::block member
...
Re-use SOPP_instruction::imm instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
cef01e817d
aco: use instr_class::branch to identify SOPP branches
...
Also changes the instr_class of s_trap to instr_class::other.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Rhys Perry
3b28ba8239
aco: optimize for purely linear VGPR copies
...
fossil-db:
Totals from 2 (0.00% of 79242) affected shaders:
Instrs: 1344 -> 1340 (-0.30%)
CodeSize: 6968 -> 6952 (-0.23%)
Latency: 4414 -> 4410 (-0.09%)
InvThroughput: 1018 -> 1020 (+0.20%)
Copies: 60 -> 56 (-6.67%)
SALU: 40 -> 36 (-10.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697 >
2024-03-06 12:55:46 +00:00
Rhys Perry
5e17a39b15
aco: allow p_start_linear_vgpr to use multiple operands
...
Merging the p_create_vector into the p_start_linear_vgpr is useful since
we stopped attempting to place the p_start_linear_vgpr definition in the
same registers as the operand.
fossil-db (navi31):
Totals from 927 (1.17% of 79242) affected shaders:
MaxWaves: 26412 -> 26442 (+0.11%)
Instrs: 938328 -> 938181 (-0.02%); split: -0.14%, +0.13%
CodeSize: 4891448 -> 4890820 (-0.01%); split: -0.11%, +0.10%
VGPRs: 47016 -> 47004 (-0.03%); split: -0.13%, +0.10%
SpillSGPRs: 222 -> 226 (+1.80%)
Latency: 5076065 -> 5075191 (-0.02%); split: -0.12%, +0.10%
InvThroughput: 712316 -> 712421 (+0.01%); split: -0.09%, +0.10%
SClause: 27992 -> 27972 (-0.07%); split: -0.09%, +0.02%
Copies: 38042 -> 38104 (+0.16%); split: -1.95%, +2.12%
PreVGPRs: 39448 -> 39369 (-0.20%)
VALU: 570157 -> 570224 (+0.01%); split: -0.13%, +0.14%
SALU: 51672 -> 51678 (+0.01%); split: -0.01%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697 >
2024-03-06 12:55:45 +00:00
Rhys Perry
f99443a68b
aco: don't combine linear and normal VGPR copies
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697 >
2024-03-06 12:55:45 +00:00
Georg Lehmann
cd6d9c5918
aco: don't remove branches that skip v_writelane_b32
...
Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27537 >
2024-02-09 21:55:02 +00:00
Rhys Perry
174e37afb9
aco: fix >8 byte linear vgpr copies
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27436 >
2024-02-06 15:40:58 +00:00
Georg Lehmann
1d167d187e
aco/gfx10+: don't use v_cmpx with VCC def
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
e49c413a86
aco: use null operand for SOPK s_waitcnt
...
Both null def and op result in the same correct encoding, but these
instructions optionally read a sgpr, so it makes more sense to use an operand.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Qiang Yu
5ef7c54829
aco: wait memory ops done before go to next shader part
...
Next part don't know whether p_end_with_regs args are loaded from
memory ops or not, need to wait it's done here.
Other memory load needs to be waited too like:
a = load_mem()
b = ...
if (...) {
wait_mem(a)
store_mem(a)
}
p_end_with_regs(b)
"a" still needs to be waited, otherwise next shader part regs may
be overwritten by unfinished memory loads.
Memory stores are waited too. When >=gfx10 and last VGT has no
parameter export, we need to wait all memeory stores done before
pos export (see ac_nir_export_position). So when merged shader
(ES+GS or VS+GS) is partially built, first stage needs to wait
all memory stores done, otherwise second stage don't know if
any memory stores pending before.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signe-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
5ba68f92b4
aco: create exit block for p_end_with_regs to branch to
...
To handle ps discard in radeonsi part mode shader.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Georg Lehmann
34d8fa6185
aco/gfx11: optimize dual source export
...
We can combine dpp with the v_cndmask_b32.
Foz-DB Navi31:
Totals from 222 (0.28% of 79330) affected shaders:
Instrs: 564392 -> 563373 (-0.18%); split: -0.19%, +0.01%
CodeSize: 2867040 -> 2864728 (-0.08%); split: -0.09%, +0.01%
Latency: 4278957 -> 4275199 (-0.09%); split: -0.09%, +0.00%
InvThroughput: 586636 -> 585824 (-0.14%); split: -0.14%, +0.00%
SClause: 20210 -> 20211 (+0.00%); split: -0.02%, +0.02%
Copies: 39763 -> 39778 (+0.04%); split: -0.13%, +0.17%
PreVGPRs: 13924 -> 13922 (-0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25541 >
2023-10-05 10:37:34 +00:00