Commit graph

229 commits

Author SHA1 Message Date
Rhys Perry
295b7d606f aco: insert NOP before dealloc_vgpr in the insert_NOPs pass
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
0ad713ca9f aco: add waitcnt build helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:04 +00:00
Daniel Schürmann
bc2d166b50 aco/lower_to_hw: don't allocate new temporaries
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31362>
2024-10-07 07:00:19 +00:00
Daniel Schürmann
30e7644e5f aco: simplify Definition constructors
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31362>
2024-10-07 07:00:19 +00:00
Georg Lehmann
9bb10b58f3 aco: use v_cvt_pk_u8_f32 for f2u8
Foz-DB Navi31:
Totals from 42 (0.05% of 79395) affected shaders:
Instrs: 3253747 -> 3248867 (-0.15%); split: -0.15%, +0.00%
CodeSize: 16690136 -> 16661772 (-0.17%); split: -0.17%, +0.00%
VGPRs: 4176 -> 4128 (-1.15%)
Latency: 18485157 -> 18479752 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 3659404 -> 3658222 (-0.03%); split: -0.03%, +0.00%
Copies: 231891 -> 228145 (-1.62%)
VALU: 1785800 -> 1782054 (-0.21%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29532>
2024-09-02 19:49:20 +00:00
Rhys Perry
aafb49f56b aco: set prefer_remove for gfx9- too
This is a hint that the branch is worth removing. Assume that's the case,
regardless of the gfx level.

fossil-db (vega10):
Totals from 22 (0.03% of 63053) affected shaders:
Instrs: 23927 -> 23856 (-0.30%)
CodeSize: 125096 -> 124812 (-0.23%)
Latency: 138258 -> 137765 (-0.36%)
InvThroughput: 55900 -> 55884 (-0.03%)
Branches: 391 -> 320 (-18.16%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321>
2024-08-15 16:00:19 +00:00
Rhys Perry
9f1a5645cf aco: completely skip branches if they're never taken
fossil-db (navi21):
Totals from 196 (0.25% of 79395) affected shaders:
Instrs: 101902 -> 101706 (-0.19%)
CodeSize: 576988 -> 576232 (-0.13%)
Latency: 750344 -> 750280 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 119170 -> 119161 (-0.01%)
Branches: 3933 -> 3737 (-4.98%)

fossil-db (vega10):
Totals from 585 (0.93% of 63053) affected shaders:
Instrs: 346877 -> 346292 (-0.17%)
CodeSize: 1810600 -> 1808260 (-0.13%)
Latency: 1817743 -> 1814233 (-0.19%)
InvThroughput: 652142 -> 651944 (-0.03%)
Branches: 5087 -> 4502 (-11.50%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321>
2024-08-15 16:00:19 +00:00
Rhys Perry
c29d9f1184 aco: only remove branch jumping over SMEM/barrier if it's never taken
SMEM might be an invalid access, and barriers are probably expensive.

fossil-db (navi21):
Totals from 126 (0.16% of 79395) affected shaders:
Instrs: 2764965 -> 2765377 (+0.01%)
CodeSize: 15155348 -> 15156788 (+0.01%)
Latency: 17604293 -> 17604296 (+0.00%)
Branches: 105211 -> 105623 (+0.39%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321>
2024-08-15 16:00:19 +00:00
Rhys Perry
b934255510 aco: split selection_control_remove into rarely_taken and never_taken
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321>
2024-08-15 16:00:18 +00:00
Georg Lehmann
e0818cb87b aco/gfx11+: don't use VOP3 v_swap_b16
v_swap_b16 is not offically supported as VOP3, so it can't be used with v128-255.
Tests show that VOP3 appears to work correctly, but according to AMD that should
not be relied on.

https://github.com/llvm/llvm-project/pull/100442#discussion_r1703929676

Foz-DB Navi31:
Totals from 6 (0.01% of 79395) affected shaders:
Instrs: 64799 -> 65932 (+1.75%)
CodeSize: 360180 -> 368440 (+2.29%)
Latency: 1364648 -> 1365922 (+0.09%)
InvThroughput: 635843 -> 636475 (+0.10%)
Copies: 14766 -> 15698 (+6.31%)
VALU: 38743 -> 39675 (+2.41%)

Fixes: 80b8bbf0c5 ("aco/gfx11: use v_swap_b16")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30515>
2024-08-06 20:40:12 +00:00
Georg Lehmann
53155ba12d aco: add CompilationProgress::after_lower_to_hw
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30399>
2024-07-29 18:35:33 +00:00
Georg Lehmann
4399c7bac3 aco: add aco_opcode::p_s_cvt_f16_f32_rtne
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29245>
2024-07-18 08:36:14 +00:00
Rhys Perry
1643c933ef aco/gfx11: don't use v_bfrev_b32 with wave64
v_mov_b32 can be dual issued.

fossil-db (navi31):
Totals from 1792 (2.26% of 79395) affected shaders:
CodeSize: 27462476 -> 27470308 (+0.03%); split: -0.00%, +0.03%
Latency: 29403214 -> 29402713 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 5005863 -> 5004702 (-0.02%); split: -0.02%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29912>
2024-07-01 17:34:22 +00:00
Rhys Perry
f842bd81ca aco: use s_pack_*_b32_b16 more in p_insert/p_extract lowering
This opcode doesn't write SCC, which gives later passes more freedom to
move instructions.

fossil-db (navi21):
Totals from 727 (0.92% of 79395) affected shaders:
Latency: 14943483 -> 14942704 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 3225790 -> 3225766 (-0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29912>
2024-07-01 17:34:22 +00:00
Rhys Perry
9fe3af1e2a aco: insert s_nop before discard early exit sendmsg(dealloc_vgpr)
Forgot about this one.

fossil-db (gfx1100):
Totals from 3920 (2.94% of 133461) affected shaders:
Instrs: 6632088 -> 6636008 (+0.06%)
CodeSize: 34165376 -> 34181056 (+0.05%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 37fbfa655a ("aco: insert s_nop before VGPR deallocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29770>
2024-06-18 20:17:38 +00:00
Georg Lehmann
046414e061 aco: add more anonymous namespaces
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29740>
2024-06-18 17:53:07 +00:00
Georg Lehmann
2b56a97374 aco/lower_to_hw: optimize split 64bit constant copies
Foz-DB Navi21:
Totals from 3209 (4.04% of 79395) affected shaders:
Instrs: 6502065 -> 6496612 (-0.08%)
CodeSize: 35578300 -> 35556596 (-0.06%)
Latency: 66092924 -> 66092668 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 16968953 -> 16968900 (-0.00%); split: -0.00%, +0.00%
SClause: 198651 -> 198647 (-0.00%)
Copies: 597323 -> 591872 (-0.91%)
SALU: 930918 -> 925467 (-0.59%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422>
2024-05-29 11:59:22 +00:00
Georg Lehmann
5910a46101 aco/lower_to_hw: use copy_constant_sgpr for masks
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422>
2024-05-29 11:59:22 +00:00
Georg Lehmann
23d88e68fc aco: small constant copy optimizations
Foz-DB Navi21:
Totals from 13 (0.02% of 79395) affected shaders:
CodeSize: 93432 -> 93376 (-0.06%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422>
2024-05-29 11:59:22 +00:00
Georg Lehmann
54ad07c32a aco/lower_to_hw: add copy_constant_sgpr
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422>
2024-05-29 11:59:22 +00:00
Georg Lehmann
56354c6cd7 aco: don't pass program to emit_bpermute
Also change the param order, because the builder typically comes first.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29422>
2024-05-29 11:59:22 +00:00
Rhys Perry
e79a8219d2 aco/gfx12: sign-extend s_getpc_b64
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29330>
2024-05-28 10:52:11 +00:00
Rhys Perry
ae18c88409 aco/gfx12: implement workgroup barrier
Same sequence LLVM uses for llvm.amdgcn.s.barrier.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29330>
2024-05-28 10:52:11 +00:00
Rhys Perry
fae2a85d57 aco/gfx12: implement subgroup shader clock
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29330>
2024-05-28 10:52:11 +00:00
Rhys Perry
74aa6437d6 aco: add GFX11.5+ opcodes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29162>
2024-05-14 20:50:27 +00:00
Georg Lehmann
80b8bbf0c5 aco/gfx11: use v_swap_b16
I tested that v_swap_b16 can be encoded as VOP3, because the ISA doc doesn't list
it as a possible VOP3 opcode. VOP3 is nessecary to access v128+.

Foz-DB Navi31:
Totals from 32 (0.04% of 79395) affected shaders:
Instrs: 201865 -> 195168 (-3.32%)
CodeSize: 1082220 -> 1031228 (-4.71%); split: -4.71%, +0.00%
Latency: 2258198 -> 2238586 (-0.87%)
InvThroughput: 796731 -> 788934 (-0.98%)
Copies: 34514 -> 29220 (-15.34%)
VALU: 122457 -> 117163 (-4.32%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29143>
2024-05-13 18:42:19 +00:00
Georg Lehmann
44cc0d31b8 aco/gfx10: use v_add_u16 with literal for constant copies
This also means the v_perm_b32 path is now unused.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29045>
2024-05-06 13:38:14 +00:00
Georg Lehmann
7823065f64 aco/gfx11+: use v_cvt_pk_u8_f32 for 8bit constant copies
Foz-DB Navi31:
Totals from 201 (0.25% of 79395) affected shaders:
Instrs: 186869 -> 186857 (-0.01%)
CodeSize: 1026760 -> 1026700 (-0.01%); split: -0.01%, +0.00%
Latency: 2302050 -> 2301969 (-0.00%)
InvThroughput: 739466 -> 739431 (-0.00%)
Copies: 26467 -> 26454 (-0.05%)
VALU: 93529 -> 93516 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29045>
2024-05-06 13:38:14 +00:00
Georg Lehmann
d4084f7f09 aco/lower_to_hw: remove gfx6/7 subdword paths
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28836>
2024-05-02 11:09:36 +00:00
Georg Lehmann
6b35de971c aco/lower_to_hw: don't use regClass to identify subdword reductions
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28836>
2024-05-02 11:09:35 +00:00
Georg Lehmann
47d824a644 aco/lower_to_hw: fix 16bit p_insert on gfx8
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28881>
2024-04-25 09:47:19 +00:00
Georg Lehmann
bb80ac7a70 aco/lower_to_hw: fix v_cvt_pk_u16_u32 instruction format
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28881>
2024-04-25 09:47:18 +00:00
Georg Lehmann
4b5016a537 aco: support high_16bits FS IO
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28435>
2024-04-10 07:49:27 +00:00
Samuel Pitoiset
7a69d78ba2 aco: use SPDX-License-Identifier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28622>
2024-04-08 15:49:25 +00:00
Rhys Perry
66153f7bfe aco: always emit float mode for merged shaders compiled separately
We don't know what the float mode was by the end of the previous shader,
so we should always set it.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28392>
2024-03-28 21:02:29 +00:00
Daniel Schürmann
a863c7951e aco: remove create_instruction() template parameter
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370>
2024-03-28 11:25:43 +00:00
Daniel Schürmann
9b0ebcc39b aco: change return type of create_instruction() to Instruction*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370>
2024-03-28 11:25:43 +00:00
Daniel Schürmann
1187189235 aco: unify different SALU types into single struct SALU_instruction
This removes
- SOP1_instruction
- SOP2_instruction
- SOPC_instruction
- SOPK_instruction
- SOPP_instruction

and their corresponding methods.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370>
2024-03-28 11:25:43 +00:00
Daniel Schürmann
5d265257a0 aco: remove SOPP_instruction::block member
Re-use SOPP_instruction::imm instead.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370>
2024-03-28 11:25:43 +00:00
Daniel Schürmann
cef01e817d aco: use instr_class::branch to identify SOPP branches
Also changes the instr_class of s_trap to instr_class::other.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370>
2024-03-28 11:25:43 +00:00
Rhys Perry
3b28ba8239 aco: optimize for purely linear VGPR copies
fossil-db:
Totals from 2 (0.00% of 79242) affected shaders:
Instrs: 1344 -> 1340 (-0.30%)
CodeSize: 6968 -> 6952 (-0.23%)
Latency: 4414 -> 4410 (-0.09%)
InvThroughput: 1018 -> 1020 (+0.20%)
Copies: 60 -> 56 (-6.67%)
SALU: 40 -> 36 (-10.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697>
2024-03-06 12:55:46 +00:00
Rhys Perry
5e17a39b15 aco: allow p_start_linear_vgpr to use multiple operands
Merging the p_create_vector into the p_start_linear_vgpr is useful since
we stopped attempting to place the p_start_linear_vgpr definition in the
same registers as the operand.

fossil-db (navi31):
Totals from 927 (1.17% of 79242) affected shaders:
MaxWaves: 26412 -> 26442 (+0.11%)
Instrs: 938328 -> 938181 (-0.02%); split: -0.14%, +0.13%
CodeSize: 4891448 -> 4890820 (-0.01%); split: -0.11%, +0.10%
VGPRs: 47016 -> 47004 (-0.03%); split: -0.13%, +0.10%
SpillSGPRs: 222 -> 226 (+1.80%)
Latency: 5076065 -> 5075191 (-0.02%); split: -0.12%, +0.10%
InvThroughput: 712316 -> 712421 (+0.01%); split: -0.09%, +0.10%
SClause: 27992 -> 27972 (-0.07%); split: -0.09%, +0.02%
Copies: 38042 -> 38104 (+0.16%); split: -1.95%, +2.12%
PreVGPRs: 39448 -> 39369 (-0.20%)
VALU: 570157 -> 570224 (+0.01%); split: -0.13%, +0.14%
SALU: 51672 -> 51678 (+0.01%); split: -0.01%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697>
2024-03-06 12:55:45 +00:00
Rhys Perry
f99443a68b aco: don't combine linear and normal VGPR copies
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697>
2024-03-06 12:55:45 +00:00
Georg Lehmann
cd6d9c5918 aco: don't remove branches that skip v_writelane_b32
Cc: mesa-stable

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27537>
2024-02-09 21:55:02 +00:00
Rhys Perry
174e37afb9 aco: fix >8 byte linear vgpr copies
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27436>
2024-02-06 15:40:58 +00:00
Georg Lehmann
1d167d187e aco/gfx10+: don't use v_cmpx with VCC def
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
e49c413a86 aco: use null operand for SOPK s_waitcnt
Both null def and op result in the same correct encoding, but these
instructions optionally read a sgpr, so it makes more sense to use an operand.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Qiang Yu
5ef7c54829 aco: wait memory ops done before go to next shader part
Next part don't know whether p_end_with_regs args are loaded from
memory ops or not, need to wait it's done here.

Other memory load needs to be waited too like:
  a = load_mem()
  b = ...
  if (...) {
    wait_mem(a)
    store_mem(a)
  }
  p_end_with_regs(b)

"a" still needs to be waited, otherwise next shader part regs may
be overwritten by unfinished memory loads.

Memory stores are waited too. When >=gfx10 and last VGT has no
parameter export, we need to wait all memeory stores done before
pos export (see ac_nir_export_position). So when merged shader
(ES+GS or VS+GS) is partially built, first stage needs to wait
all memory stores done, otherwise second stage don't know if
any memory stores pending before.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signe-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
5ba68f92b4 aco: create exit block for p_end_with_regs to branch to
To handle ps discard in radeonsi part mode shader.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Georg Lehmann
34d8fa6185 aco/gfx11: optimize dual source export
We can combine dpp with the v_cndmask_b32.

Foz-DB Navi31:
Totals from 222 (0.28% of 79330) affected shaders:
Instrs: 564392 -> 563373 (-0.18%); split: -0.19%, +0.01%
CodeSize: 2867040 -> 2864728 (-0.08%); split: -0.09%, +0.01%
Latency: 4278957 -> 4275199 (-0.09%); split: -0.09%, +0.00%
InvThroughput: 586636 -> 585824 (-0.14%); split: -0.14%, +0.00%
SClause: 20210 -> 20211 (+0.00%); split: -0.02%, +0.02%
Copies: 39763 -> 39778 (+0.04%); split: -0.13%, +0.17%
PreVGPRs: 13924 -> 13922 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25541>
2023-10-05 10:37:34 +00:00