Commit graph

2571 commits

Author SHA1 Message Date
Rhys Perry
c3dd1931d9 aco: allow Builder::Result to be dereferenced
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
2023-01-10 16:01:38 +00:00
Rhys Perry
e386523380 aco/gfx11: fix discard early exit removal optimization
This optimization never happened because the NULL target was removed in
GFX11.

fossil-db (gfx1100):
Totals from 5439 (4.04% of 134574) affected shaders:
Instrs: 407865 -> 387123 (-5.09%)
CodeSize: 2163340 -> 2060644 (-4.75%)
Latency: 3432378 -> 3327802 (-3.05%)
InvThroughput: 270133 -> 262980 (-2.65%)
Branches: 8524 -> 3085 (-63.81%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20513>
2023-01-10 14:01:29 +00:00
Rhys Perry
810ced93f3 aco: align scratch size during assembly
This lets us use less scratch if both VGPR spilling and scratch intrinsics
are used.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
2023-01-09 21:46:13 +00:00
Rhys Perry
c9846158cd aco/gfx11: reduce scratch allocation alignment
fossil-db (gfx1100):
Totals from 112 (0.08% of 134574) affected shaders:
Scratch: 1513472 -> 1455360 (-3.84%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
2023-01-09 21:46:13 +00:00
Georg Lehmann
c241980751 aco: Mark more instructions as 16bit on GFX10.
p_cvt_f16_f32_rtne will be lowered to v_cvt_f16_f32 and we already know that
preserves the high bits.

I tested the others on GFX1036.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20574>
2023-01-09 18:54:35 +00:00
Rhys Perry
b64afc1d37 aco: use s_delay_alu skip field
fossil-db (gfx1100):
Totals from 130066 (96.65% of 134574) affected shaders:
Instrs: 80208817 -> 71420648 (-10.96%)
CodeSize: 403523036 -> 368370360 (-8.71%)
Latency: 658064779 -> 657935384 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 87698268 -> 87693326 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
2023-01-09 18:22:59 +00:00
Rhys Perry
e2f083c0a7 aco: add more dependency instructions under waitcnt class
This makes these instructions free when considering pipeline statistics
and s_delay_alu insertion.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
2023-01-09 18:22:59 +00:00
Rhys Perry
c8357136d4 aco: improve parse_delay_alu
Use gpr_map to determine how many cycles each dependency of the
s_delay_alu needs. This information helps the pass avoid further
s_delay_alu instructions.

fossil-db (gfx1100):
Totals from 13097 (9.73% of 134574) affected shaders:
Instrs: 30711894 -> 30702692 (-0.03%)
CodeSize: 153462500 -> 153425692 (-0.02%)
Latency: 372758612 -> 372741922 (-0.00%)
InvThroughput: 50164111 -> 50160717 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
2023-01-09 18:22:59 +00:00
Rhys Perry
9e55b3b790 aco/gfx11: update s_code_end padding
Match ac_rtld_open().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Cc: 22.3 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20536>
2023-01-06 16:09:51 +00:00
Georg Lehmann
39b7502f04 aco: Use v_mov_b16 on GFX11.
Foz-DB GFX1100:
Totals from 4684 (3.47% of 134913) affected shaders:
CodeSize: 41086444 -> 41043476 (-0.10%)
Instrs: 8176019 -> 8175995 (-0.00%)
Latency: 83792071 -> 83792023 (-0.00%)
InvThroughput: 10311371 -> 10311369 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20369>
2023-01-03 22:49:46 +00:00
Daniel Schürmann
83b31b11a5 aco: Reassign dead definitions of p_split_vector to associated register
Any unused split_vector definition can always use the same register
as the operand. This avoids creating unnecessary copies.

Fossil DB stats on Rembrandt (RDNA2):
Totals from 3904 (2.89% of 134906) affected shaders:
CodeSize: 18326692 -> 18271688 (-0.30%)
Instrs: 3386632 -> 3372888 (-0.41%)
Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00%
Copies: 224301 -> 210559 (-6.13%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:04:07 +01:00
Timur Kristóf
75b1027722 aco: Try to reassign split vector registers post-RA.
Eliminate unnecessary copies when the operand registers of a
p_split_vector instruction are not clobbered between the p_split_vector
and the user of its definitions.

This happens when p_split_vector doesn't kill its operand and its
definitions have a shorter lifespan that the operand. It affects every
NGG culling shader among other things.

This optimization exists because it's too difficult to solve it
in RA, and should be removed after we solved this in RA.

v2 by Daniel Schürmann:
- Rearrange and simplify conditions for the new optimization
- Fix a few bugs

v3 by Daniel Schürmann:
- Check number of encoded ALU operands

Fossil DB stats on Rembrandt (RDNA2):
Totals from 64896 (48.10% of 134906) affected shaders:
CodeSize: 175693348 -> 175434944 (-0.15%)
Instrs: 33333912 -> 33269388 (-0.19%)
Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00%
Copies: 2806550 -> 2742038 (-2.30%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:04:07 +01:00
Timur Kristóf
3d29779a25 aco/optimizer_postRA: Distinguish overwritten untrackable and subdword.
This allows is_overwritten_since to return false when the last
writer instruction of a register can't be tracked but we know it wasn't
written in the current block.

Fossil DB stats on Rembrandt (RDNA2):
Totals from 1163 (0.86% of 134906) affected shaders:
CodeSize: 9815920 -> 9805016 (-0.11%)
Instrs: 1843688 -> 1840962 (-0.15%)
Latency: 19219153 -> 19209171 (-0.05%)
InvThroughput: 3354375 -> 3353852 (-0.02%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:04:07 +01:00
Daniel Schürmann
d3b0f78110 aco/optimizer_postRA: Initialize loop header with preheader information
This works because of SSA and should be safer than just setting 'not_written_yet'.

No Fossil DB changes on Rembrandt (RDNA2).

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:03:57 +01:00
Daniel Schürmann
8f4eccb138 aco: fix reset_block_regs() in postRA-optimizer
Accidentally, we picked the index of the predecessors instead of the predecessors.

Totals from 8496 (6.30% of 134913) affected shaders: (GFX10.3)
CodeSize: 64070724 -> 64022516 (-0.08%); split: -0.08%, +0.00%
Instrs: 11932750 -> 11920698 (-0.10%); split: -0.10%, +0.00%
Latency: 144040266 -> 144017062 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 29327735 -> 29326421 (-0.00%); split: -0.00%, +0.00%

Fossil DB stats on Rembrandt (RDNA2):
Totals from 4488 (3.33% of 134906) affected shaders:
CodeSize: 42759736 -> 42735392 (-0.06%); split: -0.06%, +0.00%
Instrs: 7960522 -> 7954436 (-0.08%); split: -0.08%, +0.00%
Latency: 96192647 -> 96172571 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 19313576 -> 19312575 (-0.01%); split: -0.01%, +0.00%

Fixes: 75967a4814 ('aco/optimizer_postRA: Speed up reset_block() with predecessors.')
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:03:51 +01:00
Rhys Perry
98e83f19f9 aco/gfx11: implement load_input_vertex
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20341>
2022-12-16 17:45:34 +00:00
Rhys Perry
192486b7aa aco/gfx11: export mrtz in discard early exit for non-color shaders
If a shader doesn't export any color targets and instead only exports
mrtz, the discard early exit block should match.

Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in
The Witcher 3 (classic).

https://reviews.llvm.org/D128185

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: bc8da20dda ("aco: export MRT0 instead of NULL on GFX11")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>
2022-12-16 15:35:28 +00:00
Timur Kristóf
db5c3f170f aco: Emulate Wave64 bpermute on GFX11.
Similar to emit_gfx10_wave64_bpermute, but uses the new
v_permlane64_b32 instruction to swap data between wave halves.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
853e76f007 aco: Stylistic changes to emit_gfx10_wave64_bpermute.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
640e801651 aco: Split opcodes for GFX6 and GFX10 emulated bpermute.
Different sequences are emitted for these, so it makes sense to
have different opcodes too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
614348f28b aco: Don't accept constants on p_bpermute.
The sequence emitted for this pseudo instruction is not ready
to handle constants or literals at all.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Marek Olšák
716ac4a55d nir: replace IS_SWIZZLED flag with ACCESS_IS_SWIZZLED_AMD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Marek Olšák
7998c3bdd3 nir: remove redundant SLC_AMD in favor of ACCESS_STREAM_CACHE_POLICY
ACCESS_STREAM_CACHE_POLICY was added to map to SLC for AMD.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Rhys Perry
20e670d060 aco/ra: don't swap create_vector operand with definition blocker for SGPRs
There is no SGPR swap instruction, we always need 3 XORs.

fossil-db (navi21):
Totals from 76 (0.06% of 135636) affected shaders:
Instrs: 58400 -> 58347 (-0.09%); split: -0.10%, +0.01%
CodeSize: 312580 -> 312368 (-0.07%); split: -0.08%, +0.01%
Latency: 843333 -> 843180 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 126431 -> 126412 (-0.02%)
Copies: 4008 -> 3955 (-1.32%); split: -1.47%, +0.15%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>
2022-12-09 15:58:43 +00:00
Rhys Perry
a05dd58309 aco/ra: don't swap p_create_vector operand with definition blocker for scc
SCC is 1-bit, and we can't copy a 32-bit value into it.

Fixes dEQP-VK.spirv_assembly.type.scalar.i32.iequal_tesse with
ACO_DEBUG=noopt.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 9476986e6f ("aco/ra: special-case get_reg_for_create_vector_copy()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>
2022-12-09 15:58:43 +00:00
Samuel Pitoiset
011a0b97b2 radv,aco: move radv_ps_epilog_key to the graphics pipeline key
To avoid redundant structs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>
2022-12-08 13:28:00 +00:00
Samuel Pitoiset
b7f49de625 radv,aco: use 8-bit for color_is_int{8,10} everywhere
Do not need 32-bits because there is only up to 8 MRTs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>
2022-12-08 13:28:00 +00:00
Samuel Pitoiset
9079bd821c radv,aco: rename color output related fields for consistency
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>
2022-12-08 13:28:00 +00:00
Tatsuyuki Ishi
327c906424 aco: Migrate RA to use std::optional
The use of std::optional simplifies expressions and would be useful for some
upcoming RA tweaks.

C++17 has been available since the merge of rusticl and should be safe to use as
far as packaging is concerned.

A few style choices are:
- Testing for emptiness uses implicit bool conversion.
- Constructing an empty value uses {}.
- Constructing a filled value uses the implicit conversion constructor.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20125>
2022-12-08 12:08:01 +00:00
Bas Nieuwenhuizen
513442dc32 aco: Add s_delay_alu support for GFX11+
Roughly copied from LLVM. This facilitates better ALU usage by
switching between waves when there is an ALU stall, which isn't
automatic anymore on GFX11.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>
2022-12-07 22:05:25 +00:00
Bas Nieuwenhuizen
cd3bf56ace aco: Add helper to get cycle info for an instruction.
For use in s_delay_alu tracking

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>
2022-12-07 22:05:25 +00:00
Bas Nieuwenhuizen
352e492c7b aco: Add isTrans helper.
For the s_delay_alu tracking.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>
2022-12-07 22:05:25 +00:00
Rhys Perry
bd30adf89d aco: apply NUW to additions for scratch access
fossil-db (navi21):
Totals from 52 (0.04% of 135636) affected shaders:
Instrs: 79036 -> 78567 (-0.59%)
CodeSize: 431188 -> 427984 (-0.74%)
Latency: 1318142 -> 1313821 (-0.33%)
InvThroughput: 293842 -> 292836 (-0.34%)
VClause: 2555 -> 2361 (-7.59%); split: -8.06%, +0.47%
Copies: 8746 -> 8767 (+0.24%); split: -0.11%, +0.35%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>
2022-12-06 15:23:38 +00:00
Rhys Perry
381de3c809 aco: more carefully apply constant offsets into scratch accesses
Death stranding does scratch_arr[80-idx]. This doesn't seem to work if we
try to combine the subtraction into the access.

fossil-db (navi21):
Totals from 52 (0.04% of 135636) affected shaders:
Instrs: 78560 -> 79036 (+0.61%)
CodeSize: 427940 -> 431188 (+0.76%)
Latency: 1313809 -> 1318142 (+0.33%)
InvThroughput: 292833 -> 293842 (+0.34%)
VClause: 2361 -> 2555 (+8.22%); split: -0.51%, +8.73%
Copies: 8767 -> 8746 (-0.24%); split: -0.35%, +0.11%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 0e783d687a ("aco: use scratch_* for scratch load/store on GFX9+")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7735
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>
2022-12-06 15:23:38 +00:00
Samuel Pitoiset
da32cbb5c6 aco: fix missing uses of MRT output flags
Fixes regressions on GFX6 and the RAGE2 workaround.

Fixes: a297ac10a4 ("radv,aco: stop lowering FS outputs in NIR")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20154>
2022-12-05 15:01:19 +00:00
Samuel Pitoiset
a297ac10a4 radv,aco: stop lowering FS outputs in NIR
This was a bad idea because:
- it diverges too much with the fragment shader epilog
- it doesn't allow to implement alpha-to-coverage via MRTZ correctly
- it was supposed to be used by LLVM but this never happened

Reverting this back allows us to fix alpha-to-coverage via MRTZ
on GFX11 easily, including for fragment shader epilogs.

fossils-db (NAVI21):
Totals from 20411 (15.13% of 134913) affected shaders:
VGPRs: 972056 -> 971400 (-0.07%); split: -0.08%, +0.01%
CodeSize: 92284804 -> 92295392 (+0.01%); split: -0.05%, +0.06%
MaxWaves: 465010 -> 465166 (+0.03%); split: +0.03%, -0.00%
Instrs: 17034162 -> 17034963 (+0.00%); split: -0.00%, +0.01%
Latency: 252013190 -> 251971764 (-0.02%); split: -0.03%, +0.02%
InvThroughput: 45859625 -> 45842556 (-0.04%); split: -0.04%, +0.01%
VClause: 324627 -> 324629 (+0.00%); split: -0.03%, +0.03%
SClause: 672918 -> 672826 (-0.01%); split: -0.05%, +0.04%
Copies: 1172126 -> 1158152 (-1.19%); split: -1.20%, +0.01%
Branches: 420602 -> 420604 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1025441 -> 1025481 (+0.00%)
PreVGPRs: 861787 -> 860650 (-0.13%); split: -0.17%, +0.03%

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>
2022-12-05 08:22:28 +00:00
Samuel Pitoiset
3be728f1d0 aco: fix indexing MRT0 alpha channel for alpha-to-coverage via MRTZ on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>
2022-12-05 08:22:28 +00:00
Samuel Pitoiset
20856bfe0f aco: always use 32-bit for exporting alpha-to-coverage via MRTZ on GFX11
16-bit isn't possible. Note that this is currently style broken for
compressed formats because the w channel is never written to.

Ported from RadeonSI ('radeonsi/gfx11: fix alpha-to-coverage with
stencil or samplemask export')

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>
2022-12-05 08:22:28 +00:00
Bas Nieuwenhuizen
89663828ea aco: Don't use v_lshrrev_b64 for moves on GFX11.
Looking at VOPD things, shifts are not very likely to get dual issued
but plain moves are. Looking at RDNA2 v_lshrrev_b64 are half the perf
of v_mov_b32 (but you need twice as many moves), so on GFX11 this likely
reaches the threshold where moves are faster.

Totals from 68400 (50.70% of 134906) affected shaders:

CodeSize: 275489516 -> 275459536 (-0.01%); split: -0.01%, +0.00%
Instrs: 51775474 -> 51991286 (+0.42%)
Latency: 589884847 -> 589066439 (-0.14%); split: -0.15%, +0.01%
InvThroughput: 127154986 -> 126037619 (-0.88%); split: -0.88%, +0.00%
Copies: 3756157 -> 3976193 (+5.86%)
Branches: 1259604 -> 1260072 (+0.04%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>
2022-12-02 13:25:57 +00:00
Bas Nieuwenhuizen
91fe2a2361 aco: Use more detailed wave64 timing for GFX10+.
Also nabbed some dual issue stuff for GFX11 from LLVM.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>
2022-12-02 13:25:57 +00:00
Rhys Perry
9b6ab40b3b aco: improve do_pack_2x16() with zero constants
We can skip the v_or_b32 or use an instruction smaller than
v_alignbyte_b32.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
917cfd587c aco: use v_minmax/v_maxmin opcodes
fossil-db (gfx1100):
Totals from 29868 (22.12% of 135032) affected shaders:
MaxWaves: 741336 -> 741344 (+0.00%)
Instrs: 34624902 -> 34539766 (-0.25%); split: -0.25%, +0.00%
CodeSize: 187196804 -> 187192100 (-0.00%); split: -0.01%, +0.01%
VGPRs: 1816860 -> 1816788 (-0.00%); split: -0.01%, +0.01%
Latency: 502597202 -> 502245627 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 84813176 -> 84586122 (-0.27%); split: -0.28%, +0.01%
VClause: 633826 -> 633749 (-0.01%); split: -0.02%, +0.01%
SClause: 1317738 -> 1317047 (-0.05%); split: -0.06%, +0.01%
Copies: 2130610 -> 2130954 (+0.02%); split: -0.03%, +0.05%
Branches: 766093 -> 765969 (-0.02%); split: -0.02%, +0.00%
PreSGPRs: 1630250 -> 1630034 (-0.01%); split: -0.02%, +0.00%
PreVGPRs: 1590777 -> 1590664 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
dfbc8e0192 aco: change order in combine_minmax()
Prepare for future optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
ce5838599d aco/gfx11: use v_cvt_i32_i16/v_cvt_u32_u16
fossil-db (gfx1100):
Totals from 52753 (39.07% of 135032) affected shaders:
CodeSize: 153603860 -> 153163384 (-0.29%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Georg Lehmann
a3beb82cf6 aco: Use wave size specific opcode for s_or in cube map coord code.
Cc: mesa-stable

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20041>
2022-12-01 01:39:27 +00:00
Georg Lehmann
22be0d09a0 aco: Don't prematurely emit s_andn2.
Split s_not + s_and allows more inverse comparision and s_cbranch_vccz
optimizations.

Foz-DB Navi21:
Totals from 516 (0.38% of 134913) affected shaders:
CodeSize: 7273724 -> 7273720 (-0.00%)
Instrs: 1364408 -> 1364407 (-0.00%)
Latency: 14604862 -> 14604858 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19143>
2022-11-30 18:25:15 +00:00
Rhys Perry
0cb48ec3b7 radv,aco: remove old streamout code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Rhys Perry
3a96977542 radv,aco: remove old GS copy shader code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Rhys Perry
17bd2721e6 radv,aco: implement GS copy shaders using NIR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00