Commit graph

2261 commits

Author SHA1 Message Date
Daniel Schürmann
83b31b11a5 aco: Reassign dead definitions of p_split_vector to associated register
Any unused split_vector definition can always use the same register
as the operand. This avoids creating unnecessary copies.

Fossil DB stats on Rembrandt (RDNA2):
Totals from 3904 (2.89% of 134906) affected shaders:
CodeSize: 18326692 -> 18271688 (-0.30%)
Instrs: 3386632 -> 3372888 (-0.41%)
Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00%
Copies: 224301 -> 210559 (-6.13%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:04:07 +01:00
Timur Kristóf
75b1027722 aco: Try to reassign split vector registers post-RA.
Eliminate unnecessary copies when the operand registers of a
p_split_vector instruction are not clobbered between the p_split_vector
and the user of its definitions.

This happens when p_split_vector doesn't kill its operand and its
definitions have a shorter lifespan that the operand. It affects every
NGG culling shader among other things.

This optimization exists because it's too difficult to solve it
in RA, and should be removed after we solved this in RA.

v2 by Daniel Schürmann:
- Rearrange and simplify conditions for the new optimization
- Fix a few bugs

v3 by Daniel Schürmann:
- Check number of encoded ALU operands

Fossil DB stats on Rembrandt (RDNA2):
Totals from 64896 (48.10% of 134906) affected shaders:
CodeSize: 175693348 -> 175434944 (-0.15%)
Instrs: 33333912 -> 33269388 (-0.19%)
Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00%
Copies: 2806550 -> 2742038 (-2.30%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:04:07 +01:00
Timur Kristóf
3d29779a25 aco/optimizer_postRA: Distinguish overwritten untrackable and subdword.
This allows is_overwritten_since to return false when the last
writer instruction of a register can't be tracked but we know it wasn't
written in the current block.

Fossil DB stats on Rembrandt (RDNA2):
Totals from 1163 (0.86% of 134906) affected shaders:
CodeSize: 9815920 -> 9805016 (-0.11%)
Instrs: 1843688 -> 1840962 (-0.15%)
Latency: 19219153 -> 19209171 (-0.05%)
InvThroughput: 3354375 -> 3353852 (-0.02%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:04:07 +01:00
Daniel Schürmann
d3b0f78110 aco/optimizer_postRA: Initialize loop header with preheader information
This works because of SSA and should be safer than just setting 'not_written_yet'.

No Fossil DB changes on Rembrandt (RDNA2).

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:03:57 +01:00
Daniel Schürmann
8f4eccb138 aco: fix reset_block_regs() in postRA-optimizer
Accidentally, we picked the index of the predecessors instead of the predecessors.

Totals from 8496 (6.30% of 134913) affected shaders: (GFX10.3)
CodeSize: 64070724 -> 64022516 (-0.08%); split: -0.08%, +0.00%
Instrs: 11932750 -> 11920698 (-0.10%); split: -0.10%, +0.00%
Latency: 144040266 -> 144017062 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 29327735 -> 29326421 (-0.00%); split: -0.00%, +0.00%

Fossil DB stats on Rembrandt (RDNA2):
Totals from 4488 (3.33% of 134906) affected shaders:
CodeSize: 42759736 -> 42735392 (-0.06%); split: -0.06%, +0.00%
Instrs: 7960522 -> 7954436 (-0.08%); split: -0.08%, +0.00%
Latency: 96192647 -> 96172571 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 19313576 -> 19312575 (-0.01%); split: -0.01%, +0.00%

Fixes: 75967a4814 ('aco/optimizer_postRA: Speed up reset_block() with predecessors.')
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
2023-01-01 15:03:51 +01:00
Rhys Perry
98e83f19f9 aco/gfx11: implement load_input_vertex
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20341>
2022-12-16 17:45:34 +00:00
Rhys Perry
192486b7aa aco/gfx11: export mrtz in discard early exit for non-color shaders
If a shader doesn't export any color targets and instead only exports
mrtz, the discard early exit block should match.

Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in
The Witcher 3 (classic).

https://reviews.llvm.org/D128185

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: bc8da20dda ("aco: export MRT0 instead of NULL on GFX11")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>
2022-12-16 15:35:28 +00:00
Timur Kristóf
db5c3f170f aco: Emulate Wave64 bpermute on GFX11.
Similar to emit_gfx10_wave64_bpermute, but uses the new
v_permlane64_b32 instruction to swap data between wave halves.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
853e76f007 aco: Stylistic changes to emit_gfx10_wave64_bpermute.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
640e801651 aco: Split opcodes for GFX6 and GFX10 emulated bpermute.
Different sequences are emitted for these, so it makes sense to
have different opcodes too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Timur Kristóf
614348f28b aco: Don't accept constants on p_bpermute.
The sequence emitted for this pseudo instruction is not ready
to handle constants or literals at all.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
2022-12-14 13:54:04 +00:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Marek Olšák
716ac4a55d nir: replace IS_SWIZZLED flag with ACCESS_IS_SWIZZLED_AMD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Marek Olšák
7998c3bdd3 nir: remove redundant SLC_AMD in favor of ACCESS_STREAM_CACHE_POLICY
ACCESS_STREAM_CACHE_POLICY was added to map to SLC for AMD.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19422>
2022-12-13 20:33:05 +00:00
Rhys Perry
20e670d060 aco/ra: don't swap create_vector operand with definition blocker for SGPRs
There is no SGPR swap instruction, we always need 3 XORs.

fossil-db (navi21):
Totals from 76 (0.06% of 135636) affected shaders:
Instrs: 58400 -> 58347 (-0.09%); split: -0.10%, +0.01%
CodeSize: 312580 -> 312368 (-0.07%); split: -0.08%, +0.01%
Latency: 843333 -> 843180 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 126431 -> 126412 (-0.02%)
Copies: 4008 -> 3955 (-1.32%); split: -1.47%, +0.15%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>
2022-12-09 15:58:43 +00:00
Rhys Perry
a05dd58309 aco/ra: don't swap p_create_vector operand with definition blocker for scc
SCC is 1-bit, and we can't copy a 32-bit value into it.

Fixes dEQP-VK.spirv_assembly.type.scalar.i32.iequal_tesse with
ACO_DEBUG=noopt.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 9476986e6f ("aco/ra: special-case get_reg_for_create_vector_copy()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20240>
2022-12-09 15:58:43 +00:00
Samuel Pitoiset
011a0b97b2 radv,aco: move radv_ps_epilog_key to the graphics pipeline key
To avoid redundant structs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>
2022-12-08 13:28:00 +00:00
Samuel Pitoiset
b7f49de625 radv,aco: use 8-bit for color_is_int{8,10} everywhere
Do not need 32-bits because there is only up to 8 MRTs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>
2022-12-08 13:28:00 +00:00
Samuel Pitoiset
9079bd821c radv,aco: rename color output related fields for consistency
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20199>
2022-12-08 13:28:00 +00:00
Tatsuyuki Ishi
327c906424 aco: Migrate RA to use std::optional
The use of std::optional simplifies expressions and would be useful for some
upcoming RA tweaks.

C++17 has been available since the merge of rusticl and should be safe to use as
far as packaging is concerned.

A few style choices are:
- Testing for emptiness uses implicit bool conversion.
- Constructing an empty value uses {}.
- Constructing a filled value uses the implicit conversion constructor.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20125>
2022-12-08 12:08:01 +00:00
Bas Nieuwenhuizen
513442dc32 aco: Add s_delay_alu support for GFX11+
Roughly copied from LLVM. This facilitates better ALU usage by
switching between waves when there is an ALU stall, which isn't
automatic anymore on GFX11.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>
2022-12-07 22:05:25 +00:00
Bas Nieuwenhuizen
cd3bf56ace aco: Add helper to get cycle info for an instruction.
For use in s_delay_alu tracking

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>
2022-12-07 22:05:25 +00:00
Bas Nieuwenhuizen
352e492c7b aco: Add isTrans helper.
For the s_delay_alu tracking.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19743>
2022-12-07 22:05:25 +00:00
Rhys Perry
bd30adf89d aco: apply NUW to additions for scratch access
fossil-db (navi21):
Totals from 52 (0.04% of 135636) affected shaders:
Instrs: 79036 -> 78567 (-0.59%)
CodeSize: 431188 -> 427984 (-0.74%)
Latency: 1318142 -> 1313821 (-0.33%)
InvThroughput: 293842 -> 292836 (-0.34%)
VClause: 2555 -> 2361 (-7.59%); split: -8.06%, +0.47%
Copies: 8746 -> 8767 (+0.24%); split: -0.11%, +0.35%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>
2022-12-06 15:23:38 +00:00
Rhys Perry
381de3c809 aco: more carefully apply constant offsets into scratch accesses
Death stranding does scratch_arr[80-idx]. This doesn't seem to work if we
try to combine the subtraction into the access.

fossil-db (navi21):
Totals from 52 (0.04% of 135636) affected shaders:
Instrs: 78560 -> 79036 (+0.61%)
CodeSize: 427940 -> 431188 (+0.76%)
Latency: 1313809 -> 1318142 (+0.33%)
InvThroughput: 292833 -> 293842 (+0.34%)
VClause: 2361 -> 2555 (+8.22%); split: -0.51%, +8.73%
Copies: 8767 -> 8746 (-0.24%); split: -0.35%, +0.11%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 0e783d687a ("aco: use scratch_* for scratch load/store on GFX9+")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7735
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20117>
2022-12-06 15:23:38 +00:00
Samuel Pitoiset
da32cbb5c6 aco: fix missing uses of MRT output flags
Fixes regressions on GFX6 and the RAGE2 workaround.

Fixes: a297ac10a4 ("radv,aco: stop lowering FS outputs in NIR")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20154>
2022-12-05 15:01:19 +00:00
Samuel Pitoiset
a297ac10a4 radv,aco: stop lowering FS outputs in NIR
This was a bad idea because:
- it diverges too much with the fragment shader epilog
- it doesn't allow to implement alpha-to-coverage via MRTZ correctly
- it was supposed to be used by LLVM but this never happened

Reverting this back allows us to fix alpha-to-coverage via MRTZ
on GFX11 easily, including for fragment shader epilogs.

fossils-db (NAVI21):
Totals from 20411 (15.13% of 134913) affected shaders:
VGPRs: 972056 -> 971400 (-0.07%); split: -0.08%, +0.01%
CodeSize: 92284804 -> 92295392 (+0.01%); split: -0.05%, +0.06%
MaxWaves: 465010 -> 465166 (+0.03%); split: +0.03%, -0.00%
Instrs: 17034162 -> 17034963 (+0.00%); split: -0.00%, +0.01%
Latency: 252013190 -> 251971764 (-0.02%); split: -0.03%, +0.02%
InvThroughput: 45859625 -> 45842556 (-0.04%); split: -0.04%, +0.01%
VClause: 324627 -> 324629 (+0.00%); split: -0.03%, +0.03%
SClause: 672918 -> 672826 (-0.01%); split: -0.05%, +0.04%
Copies: 1172126 -> 1158152 (-1.19%); split: -1.20%, +0.01%
Branches: 420602 -> 420604 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1025441 -> 1025481 (+0.00%)
PreVGPRs: 861787 -> 860650 (-0.13%); split: -0.17%, +0.03%

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>
2022-12-05 08:22:28 +00:00
Samuel Pitoiset
3be728f1d0 aco: fix indexing MRT0 alpha channel for alpha-to-coverage via MRTZ on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>
2022-12-05 08:22:28 +00:00
Samuel Pitoiset
20856bfe0f aco: always use 32-bit for exporting alpha-to-coverage via MRTZ on GFX11
16-bit isn't possible. Note that this is currently style broken for
compressed formats because the w channel is never written to.

Ported from RadeonSI ('radeonsi/gfx11: fix alpha-to-coverage with
stencil or samplemask export')

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20126>
2022-12-05 08:22:28 +00:00
Bas Nieuwenhuizen
89663828ea aco: Don't use v_lshrrev_b64 for moves on GFX11.
Looking at VOPD things, shifts are not very likely to get dual issued
but plain moves are. Looking at RDNA2 v_lshrrev_b64 are half the perf
of v_mov_b32 (but you need twice as many moves), so on GFX11 this likely
reaches the threshold where moves are faster.

Totals from 68400 (50.70% of 134906) affected shaders:

CodeSize: 275489516 -> 275459536 (-0.01%); split: -0.01%, +0.00%
Instrs: 51775474 -> 51991286 (+0.42%)
Latency: 589884847 -> 589066439 (-0.14%); split: -0.15%, +0.01%
InvThroughput: 127154986 -> 126037619 (-0.88%); split: -0.88%, +0.00%
Copies: 3756157 -> 3976193 (+5.86%)
Branches: 1259604 -> 1260072 (+0.04%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>
2022-12-02 13:25:57 +00:00
Bas Nieuwenhuizen
91fe2a2361 aco: Use more detailed wave64 timing for GFX10+.
Also nabbed some dual issue stuff for GFX11 from LLVM.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>
2022-12-02 13:25:57 +00:00
Rhys Perry
9b6ab40b3b aco: improve do_pack_2x16() with zero constants
We can skip the v_or_b32 or use an instruction smaller than
v_alignbyte_b32.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
917cfd587c aco: use v_minmax/v_maxmin opcodes
fossil-db (gfx1100):
Totals from 29868 (22.12% of 135032) affected shaders:
MaxWaves: 741336 -> 741344 (+0.00%)
Instrs: 34624902 -> 34539766 (-0.25%); split: -0.25%, +0.00%
CodeSize: 187196804 -> 187192100 (-0.00%); split: -0.01%, +0.01%
VGPRs: 1816860 -> 1816788 (-0.00%); split: -0.01%, +0.01%
Latency: 502597202 -> 502245627 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 84813176 -> 84586122 (-0.27%); split: -0.28%, +0.01%
VClause: 633826 -> 633749 (-0.01%); split: -0.02%, +0.01%
SClause: 1317738 -> 1317047 (-0.05%); split: -0.06%, +0.01%
Copies: 2130610 -> 2130954 (+0.02%); split: -0.03%, +0.05%
Branches: 766093 -> 765969 (-0.02%); split: -0.02%, +0.00%
PreSGPRs: 1630250 -> 1630034 (-0.01%); split: -0.02%, +0.00%
PreVGPRs: 1590777 -> 1590664 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
dfbc8e0192 aco: change order in combine_minmax()
Prepare for future optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Rhys Perry
ce5838599d aco/gfx11: use v_cvt_i32_i16/v_cvt_u32_u16
fossil-db (gfx1100):
Totals from 52753 (39.07% of 135032) affected shaders:
CodeSize: 153603860 -> 153163384 (-0.29%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
2022-12-01 21:43:28 +00:00
Georg Lehmann
a3beb82cf6 aco: Use wave size specific opcode for s_or in cube map coord code.
Cc: mesa-stable

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20041>
2022-12-01 01:39:27 +00:00
Georg Lehmann
22be0d09a0 aco: Don't prematurely emit s_andn2.
Split s_not + s_and allows more inverse comparision and s_cbranch_vccz
optimizations.

Foz-DB Navi21:
Totals from 516 (0.38% of 134913) affected shaders:
CodeSize: 7273724 -> 7273720 (-0.00%)
Instrs: 1364408 -> 1364407 (-0.00%)
Latency: 14604862 -> 14604858 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19143>
2022-11-30 18:25:15 +00:00
Rhys Perry
0cb48ec3b7 radv,aco: remove old streamout code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Rhys Perry
3a96977542 radv,aco: remove old GS copy shader code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Rhys Perry
17bd2721e6 radv,aco: implement GS copy shaders using NIR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Rhys Perry
12becb8839 radv: lower streamout in NIR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Rhys Perry
19d0403594 radv,aco: export legacy vertex outputs in NIR
This new behaviour will let us insert exports in GS copy shader control
flow.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18898>
2022-11-29 14:28:11 +00:00
Georg Lehmann
7aa94efe82 aco: Combine constant bit test to s_bitcmp.
Foz-DB Navi21:
Totals from 73988 (54.84% of 134913) affected shaders:
VGPRs: 2959768 -> 2959752 (-0.00%)
SpillSGPRs: 10250 -> 10697 (+4.36%); split: -0.64%, +5.00%
SpillVGPRs: 2326 -> 2291 (-1.50%); split: -2.24%, +0.73%
CodeSize: 261339476 -> 261045912 (-0.11%); split: -0.12%, +0.00%
Scratch: 239616 -> 238592 (-0.43%)
Instrs: 49214044 -> 49188242 (-0.05%); split: -0.06%, +0.00%
Latency: 413214139 -> 413296229 (+0.02%); split: -0.03%, +0.05%
InvThroughput: 71741622 -> 71786300 (+0.06%); split: -0.07%, +0.13%
VClause: 856838 -> 856973 (+0.02%); split: -0.01%, +0.02%
SClause: 1504502 -> 1504567 (+0.00%); split: -0.01%, +0.02%
Copies: 4058433 -> 4060424 (+0.05%); split: -0.03%, +0.08%
Branches: 1502953 -> 1502945 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 3081927 -> 3081531 (-0.01%); split: -0.02%, +0.01%
PreVGPRs: 2513990 -> 2513992 (+0.00%)

The vast majority of instruction count regressions are caused by parallel-rdp.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
73be938c48 aco: Combine bit test to s_bitcmp.
Foz-DB Navi21:
Totals from 6396 (4.74% of 134913) affected shaders:
VGPRs: 483280 -> 483152 (-0.03%); split: -0.03%, +0.01%
SpillSGPRs: 8119 -> 7941 (-2.19%)
CodeSize: 63377880 -> 63268556 (-0.17%); split: -0.20%, +0.03%
MaxWaves: 86778 -> 86810 (+0.04%)
Instrs: 11745621 -> 11725857 (-0.17%); split: -0.20%, +0.03%
Latency: 162400148 -> 162282230 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 29179429 -> 29133173 (-0.16%); split: -0.16%, +0.00%
VClause: 208032 -> 208100 (+0.03%); split: -0.01%, +0.05%
SClause: 431390 -> 430849 (-0.13%); split: -0.24%, +0.11%
Copies: 896222 -> 893285 (-0.33%); split: -0.62%, +0.30%
Branches: 349806 -> 348770 (-0.30%); split: -0.90%, +0.60%
PreSGPRs: 618908 -> 613773 (-0.83%); split: -0.83%, +0.00%
PreVGPRs: 482901 -> 482893 (-0.00%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
853d2cb6f1 aco: Combine s_abs and s_sub/s_add to s_absdiff.
Totals from 2 (0.00% of 134913) affected shaders:
CodeSize: 1344 -> 1336 (-0.60%)
Instrs: 277 -> 275 (-0.72%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
7e1d77fd90 aco: Ignore instructions with exec operands in follow_operand.
No Foz-DB changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Georg Lehmann
65a3328b4c aco/optimizer: Cleanup ctx.uses handling for patterns which use follow_operand(..., true).
No Foz-DB changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18870>
2022-11-28 18:43:53 +00:00
Yonggang Luo
cdbe1ad570 aco: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in aco_optimizer.cpp
error message:
error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19969>
2022-11-24 02:56:03 +00:00
Samuel Pitoiset
ce11c06429 aco: fix emitting DEALLOC_VGPRS in the discard block
It should be emitted right before s_endpgm.

Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19931>
2022-11-22 19:52:04 +00:00
Rhys Perry
3061bc792d aco: ensure MRT0 is written with dual source blending
Fixes crucible test func.shader.dualsrc_mrt0_undef on polaris10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 22.3 mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19806>
2022-11-21 15:01:56 +00:00