Commit graph

187 commits

Author SHA1 Message Date
Rhys Perry
f99443a68b aco: don't combine linear and normal VGPR copies
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27697>
2024-03-06 12:55:45 +00:00
Georg Lehmann
cd6d9c5918 aco: don't remove branches that skip v_writelane_b32
Cc: mesa-stable

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27537>
2024-02-09 21:55:02 +00:00
Rhys Perry
174e37afb9 aco: fix >8 byte linear vgpr copies
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27436>
2024-02-06 15:40:58 +00:00
Georg Lehmann
1d167d187e aco/gfx10+: don't use v_cmpx with VCC def
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
e49c413a86 aco: use null operand for SOPK s_waitcnt
Both null def and op result in the same correct encoding, but these
instructions optionally read a sgpr, so it makes more sense to use an operand.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Qiang Yu
5ef7c54829 aco: wait memory ops done before go to next shader part
Next part don't know whether p_end_with_regs args are loaded from
memory ops or not, need to wait it's done here.

Other memory load needs to be waited too like:
  a = load_mem()
  b = ...
  if (...) {
    wait_mem(a)
    store_mem(a)
  }
  p_end_with_regs(b)

"a" still needs to be waited, otherwise next shader part regs may
be overwritten by unfinished memory loads.

Memory stores are waited too. When >=gfx10 and last VGT has no
parameter export, we need to wait all memeory stores done before
pos export (see ac_nir_export_position). So when merged shader
(ES+GS or VS+GS) is partially built, first stage needs to wait
all memory stores done, otherwise second stage don't know if
any memory stores pending before.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signe-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
5ba68f92b4 aco: create exit block for p_end_with_regs to branch to
To handle ps discard in radeonsi part mode shader.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Georg Lehmann
34d8fa6185 aco/gfx11: optimize dual source export
We can combine dpp with the v_cndmask_b32.

Foz-DB Navi31:
Totals from 222 (0.28% of 79330) affected shaders:
Instrs: 564392 -> 563373 (-0.18%); split: -0.19%, +0.01%
CodeSize: 2867040 -> 2864728 (-0.08%); split: -0.09%, +0.01%
Latency: 4278957 -> 4275199 (-0.09%); split: -0.09%, +0.00%
InvThroughput: 586636 -> 585824 (-0.14%); split: -0.14%, +0.00%
SClause: 20210 -> 20211 (+0.00%); split: -0.02%, +0.02%
Copies: 39763 -> 39778 (+0.04%); split: -0.13%, +0.17%
PreVGPRs: 13924 -> 13922 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25541>
2023-10-05 10:37:34 +00:00
Rhys Perry
26fce534b5 aco: shrink DPP8_instruction
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525>
2023-10-04 18:53:43 +00:00
Georg Lehmann
4ea611bca0 aco: fix p_extract with v1 dst and s1 operand
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f14023666c ("aco: Allow p_extract to have different definition and operand sizes.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25403>
2023-09-27 14:12:29 +00:00
Daniel Schürmann
040142684c aco: make p_wqm a marker instruction without Operands/Definitions
Totals from 28277 (36.93% of 76572) affected shaders: (GFX11)

MaxWaves: 833930 -> 833898 (-0.00%); split: +0.01%, -0.01%
Instrs: 21366950 -> 21353346 (-0.06%); split: -0.11%, +0.05%
CodeSize: 112855368 -> 112610508 (-0.22%); split: -0.24%, +0.03%
VGPRs: 1157748 -> 1158540 (+0.07%); split: -0.10%, +0.17%
SpillSGPRs: 2465 -> 2463 (-0.08%); split: -0.16%, +0.08%
Latency: 168339886 -> 168383646 (+0.03%); split: -0.10%, +0.12%
InvThroughput: 25164895 -> 25158376 (-0.03%); split: -0.08%, +0.06%
VClause: 347660 -> 346256 (-0.40%); split: -0.55%, +0.15%
SClause: 794460 -> 799521 (+0.64%); split: -0.33%, +0.97%
Copies: 1151908 -> 1148370 (-0.31%); split: -0.54%, +0.23%
Branches: 359447 -> 359437 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Rhys Perry
1d29a1e2fc aco: add adjust_bpermute_dst helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24693>
2023-08-23 12:36:46 +01:00
Rhys Perry
9169fbf83c aco: clarify bpermute pseudo opcode names
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24693>
2023-08-23 12:36:46 +01:00
Rhys Perry
8a024c985f aco: fix p_bpermute_gfx6's exec save/restore with wave32
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24693>
2023-08-23 12:36:46 +01:00
Rhys Perry
85957dd6e5 aco: fix p_bpermute_gfx6 with input at non-zero byte
Same as the other bpermute pseudo instructions.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24693>
2023-08-23 12:36:46 +01:00
Georg Lehmann
7a3e5dd2ec aco: use s_bitreplicate_b64_b32 to set exec to 0xffff0000ffff0000
Foz-DB Navi21:
Totals from 29 (0.02% of 132657) affected shaders:
Instrs: 19342 -> 19301 (-0.21%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24561>
2023-08-09 20:29:01 +00:00
Vitaliy Triang3l Kuzmin
e0f4b52559 aco: Add Primitive Ordered Pixel Shading waitcnt rules
When letting the overlapping waves enter their ordered sections, there must
be no memory accesses to resources which need primitive-ordered access that
are still pending, or there would be a race between the current wave and
the overlapping waves.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
2023-06-26 15:58:04 +00:00
Vitaliy Triang3l Kuzmin
a87628cd08 aco: Send MSG_ORDERED_PS_DONE where necessary
If the wave has set the Primitive Ordered Pixel Shading packer ID hardware
register, it must send MSG_ORDERED_PS_DONE once before the program ends.
It's also safe to send the message if the packer ID register hasn't been
set yet, therefore the message may be sent conservatively. For simplicity,
to ensure that it's sent on all execution paths after setting the packer ID
register, always sending it from a top-level block. This is required for
GFX9-10.3 POPS.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
2023-06-26 15:58:04 +00:00
Vitaliy Triang3l Kuzmin
f8e744f07f aco: Add Primitive Ordered Pixel Shading pseudo-instructions
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250>
2023-06-26 15:58:04 +00:00
Timur Kristóf
05928f4200 aco: Use ac_hw_stage instead of aco-specific HWStage.
The new ac_hw_stage is going to be used by drivers as well.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597>
2023-06-23 12:49:04 +00:00
Rhys Perry
cfa7eec06c aco: don't set exec_hi for wave32 scan reductions
fossil-db (wave32):
Totals from 21 (0.02% of 133428) affected shaders:
Instrs: 10778 -> 10712 (-0.61%)
CodeSize: 56604 -> 56208 (-0.70%)
Latency: 168293 -> 168251 (-0.02%)
InvThroughput: 25256 -> 25253 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23745>
2023-06-21 17:58:44 +00:00
Georg Lehmann
68f7c53814 aco/gfx10+: use v_cndmask with literal for reduction identity
Totals from 10 (0.01% of 132657) affected shaders:
CodeSize: 171576 -> 171288 (-0.17%)
Instrs: 32127 -> 32055 (-0.22%)
Latency: 219145 -> 219027 (-0.05%)
InvThroughput: 130287 -> 130041 (-0.19%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23695>
2023-06-20 14:48:18 +00:00
Eric Engestrom
6b21653ab4 aco: reformat according to its .clang-format
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23253>
2023-06-16 19:59:52 +00:00
Daniel Schürmann
f66f274304 aco: implement nir_intrinsic_load_resume_shader_address_amd
Similar to p_constaddr but targeting BBs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22096>
2023-06-08 00:37:03 +00:00
Rhys Perry
35c133a77b aco: add MIMG_instruction::strict_wqm
This lets us use linear VGPRs for part of the texture sample's address.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22636>
2023-05-25 16:29:16 +00:00
Rhys Perry
1a6a57ac96 aco: let p_start_linear_vgpr take an operand
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22636>
2023-05-25 16:29:16 +00:00
Qiang Yu
3c59df7318 aco: get scratch addr from symbol for radeonsi
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22727>
2023-04-28 11:33:28 +08:00
Harri Nieminen
aea48a4ff1 amd: fix typos
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22432>
2023-04-13 23:08:22 +00:00
Timur Kristóf
8e9d269da6 aco: Don't use nir_selection_control in aco_ir.
We don't want to rely on any NIR structures in ACO, because
we would like to avoid the need to include nir.h in aco_ir.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241>
2023-04-10 20:01:28 +00:00
Timur Kristóf
54da863956 aco: Consider p_cbranch_nz as divergent branch too.
A p_cbranch_nz instruction that reads exec is divergent too.

Fixes: f030b75b7d
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21493>
2023-04-03 14:36:07 +00:00
Georg Lehmann
8ee1519cee aco/to_hw_instr: use VOP1 opsel for v_mov_b16
Foz-DB GFX1100:
Totals from 4661 (3.46% of 134864) affected shaders:
CodeSize: 36500568 -> 36391704 (-0.30%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069>
2023-03-30 03:34:34 +00:00
Daniel Schürmann
39c828cb9f aco: remove aco::rt_stack variable
Since we initialize scratch in the RT proglog,
there is no need for this variable anymore.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21780>
2023-03-16 01:40:30 +00:00
Daniel Schürmann
7d35bf24f6 aco: create hw_init_scratch() function for p_init_scratch lowering
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21780>
2023-03-16 01:40:30 +00:00
Daniel Schürmann
41ae2d0725 radv/rt: use terminate() when returning from raygen shaders
Q2RTX stats:
Totals from 7 (0.01% of 134913) affected shaders:

CodeSize: 204712 -> 204744 (+0.02%); split: -0.06%, +0.07%
Instrs: 37526 -> 37522 (-0.01%); split: -0.07%, +0.06%
Latency: 950563 -> 956024 (+0.57%)
InvThroughput: 187915 -> 188977 (+0.57%)
Copies: 4829 -> 4763 (-1.37%)
Branches: 1570 -> 1583 (+0.83%)
PreSGPRs: 407 -> 400 (-1.72%)
PreVGPRs: 614 -> 617 (+0.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21736>
2023-03-08 16:59:41 +00:00
Georg Lehmann
097a97cc42 aco: remove VOP[123C]P? structs
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
2023-03-07 11:53:23 +00:00
Georg Lehmann
77afe7d960 aco: treat VINTERP_INREG as VALU
It's just v_fma with fixed DPP8 and builtin s_waitcnt_expcnt, so it can mostly
be handled as a pure VALU instruction.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
2023-03-07 11:53:23 +00:00
Daniel Schürmann
b338d59047 radv: unconditionally enable scratch for RT shaders
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21159>
2023-02-16 19:37:25 +00:00
Rhys Perry
b4383821e7 aco: don't modify exec in p_interp_gfx11
The RDNA3 ISA docs say that lds_param_load write the entire quad
regardless of exec, so this isn't needed.

fossil-db (gfx1100):
Totals from 5291 (3.93% of 134574) affected shaders:
Instrs: 4891396 -> 4789628 (-2.08%)
CodeSize: 25519032 -> 25111960 (-1.60%)
Latency: 36122982 -> 36074300 (-0.13%); split: -0.14%, +0.00%
InvThroughput: 4162436 -> 4161424 (-0.02%); split: -0.02%, +0.00%
Copies: 263862 -> 263838 (-0.01%)
PreSGPRs: 225012 -> 224179 (-0.37%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21171>
2023-02-08 19:35:54 +00:00
Georg Lehmann
2b264455b5 aco: use s_pack_ll_b32_b16 for constant copies
Totals from 2 (0.00% of 134913) affected shaders:
CodeSize: 28636 -> 28628 (-0.03%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20970>
2023-02-01 17:07:25 +00:00
Georg Lehmann
9ee9b0859b aco: use s_bfm_64 for constant copies
Foz-DB Navi21:
Totals from 1025 (0.76% of 134913) affected shaders:
CodeSize: 1436752 -> 1432412 (-0.30%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20970>
2023-02-01 17:07:25 +00:00
Rhys Perry
c3dd1931d9 aco: allow Builder::Result to be dereferenced
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
2023-01-10 16:01:38 +00:00
Rhys Perry
e386523380 aco/gfx11: fix discard early exit removal optimization
This optimization never happened because the NULL target was removed in
GFX11.

fossil-db (gfx1100):
Totals from 5439 (4.04% of 134574) affected shaders:
Instrs: 407865 -> 387123 (-5.09%)
CodeSize: 2163340 -> 2060644 (-4.75%)
Latency: 3432378 -> 3327802 (-3.05%)
InvThroughput: 270133 -> 262980 (-2.65%)
Branches: 8524 -> 3085 (-63.81%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20513>
2023-01-10 14:01:29 +00:00
Georg Lehmann
39b7502f04 aco: Use v_mov_b16 on GFX11.
Foz-DB GFX1100:
Totals from 4684 (3.47% of 134913) affected shaders:
CodeSize: 41086444 -> 41043476 (-0.10%)
Instrs: 8176019 -> 8175995 (-0.00%)
Latency: 83792071 -> 83792023 (-0.00%)
InvThroughput: 10311371 -> 10311369 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20369>
2023-01-03 22:49:46 +00:00
Rhys Perry
192486b7aa aco/gfx11: export mrtz in discard early exit for non-color shaders
If a shader doesn't export any color targets and instead only exports
mrtz, the discard early exit block should match.

Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in
The Witcher 3 (classic).

https://reviews.llvm.org/D128185

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: bc8da20dda ("aco: export MRT0 instead of NULL on GFX11")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>
2022-12-16 15:35:28 +00:00
Timur Kristóf
db5c3f170f aco: Emulate Wave64 bpermute on GFX11.
Similar to emit_gfx10_wave64_bpermute, but uses the new
v_permlane64_b32 instruction to swap data between wave halves.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
853e76f007 aco: Stylistic changes to emit_gfx10_wave64_bpermute.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
640e801651 aco: Split opcodes for GFX6 and GFX10 emulated bpermute.
Different sequences are emitted for these, so it makes sense to
have different opcodes too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Bas Nieuwenhuizen
89663828ea aco: Don't use v_lshrrev_b64 for moves on GFX11.
Looking at VOPD things, shifts are not very likely to get dual issued
but plain moves are. Looking at RDNA2 v_lshrrev_b64 are half the perf
of v_mov_b32 (but you need twice as many moves), so on GFX11 this likely
reaches the threshold where moves are faster.

Totals from 68400 (50.70% of 134906) affected shaders:

CodeSize: 275489516 -> 275459536 (-0.01%); split: -0.01%, +0.00%
Instrs: 51775474 -> 51991286 (+0.42%)
Latency: 589884847 -> 589066439 (-0.14%); split: -0.15%, +0.01%
InvThroughput: 127154986 -> 126037619 (-0.88%); split: -0.88%, +0.00%
Copies: 3756157 -> 3976193 (+5.86%)
Branches: 1259604 -> 1260072 (+0.04%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>
2022-12-02 13:25:57 +00:00
Rhys Perry
9b6ab40b3b aco: improve do_pack_2x16() with zero constants
We can skip the v_or_b32 or use an instruction smaller than
v_alignbyte_b32.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
ce5838599d aco/gfx11: use v_cvt_i32_i16/v_cvt_u32_u16
fossil-db (gfx1100):
Totals from 52753 (39.07% of 135032) affected shaders:
CodeSize: 153603860 -> 153163384 (-0.29%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00