Commit graph

2129 commits

Author SHA1 Message Date
Rhys Perry
2bd16256a6 aco: fix VMEMtoScalarWriteHazard s_waitcnt mitigation
It doesn't make sense for a "s_waitcnt vmcnt(0)" to affect a store or DS
instruction.

LLVM checks for "s_waitcnt vmcnt(0) lgkmcnt(0) expcnt(0)" but ignores
s_waitcnt_vscnt (which I assume is a bug).

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: bcf94bb933 ("aco: properly recognize that s_waitcnt mitigates VMEMtoScalarWriteHazard")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18270>
2022-09-30 11:44:38 +00:00
Daniel Schürmann
97850c0bf0 aco/opt_value_numbering: use monotonic_allocator for unordered_map
This patch also changes the rename map to unordered.
Roughly halves the time spent on CSE in ACO.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18112>
2022-09-28 09:25:20 +00:00
Daniel Schürmann
b39d2168a7 aco: implement allocator_traits for monotonic_allocator<T>
For easier usage, this patch also adds aliases for std::map
and std::unordered_map using this allocator.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18112>
2022-09-28 09:25:20 +00:00
Daniel Schürmann
a128d444cb aco: use monotonic_buffer_resource for instructions
As monotonic_buffer_resource is not thread-safe,
we use a thread_local instance which gets allocated once.

This change reduces the compile time spent in ACO by
approximately 10%.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18112>
2022-09-28 09:25:20 +00:00
Daniel Schürmann
15b3cc73bf aco: implement custom memory resource
This basic allocator implements an arena allocation strategy
and cannot free individual allocations.
It is intended for very fast memory allocations in situations
where memory is used to build up a few objects and then is
released all at once.

This class mimics std::pmr::monotonic_buffer_resource.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18112>
2022-09-28 09:25:20 +00:00
Daniel Schürmann
0b76e22a96 aco: simplify operands_offset calculation in create_instruction()
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18112>
2022-09-28 09:25:20 +00:00
Rhys Perry
3730be9873 aco: mostly implement FS input loads on GFX11
Quad-divergent CF and vertex selection doesn't work, but should at least
prevent crashes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:57 +00:00
Rhys Perry
826ed52174 aco/tests: add GFX11 assembly tests
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:57 +00:00
Rhys Perry
48c8c25e68 aco: omit read-only memory_sync_info when printing
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:57 +00:00
Rhys Perry
aadb7aef01 aco: add VINTERP instruction format
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
55cd74d468 aco: add LDSDIR instruction format
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
a7a9aad14d aco: limit GFX11 to 128 VGPRs for now
See https://reviews.llvm.org/D128054

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
4e55b5b851 aco: update assembler for GFX11
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
077dd12ac6 aco/gfx11: don't use more than 1 NSA dword
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
d8d99c3c4f aco: add GFX11 opcode numbers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
2f74df7117 aco: fix assembly of MUBUF-to-LDS loads
These have an implicit m0 read and don't write VGPRs.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
78779fd63d aco: add reg() helper to assembler
SGPR encoding is slightly different on GFX11.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Rhys Perry
7a1b522148 aco: rename Interp_instruction to VINTRP_instruction
These is clearer since GFX11 adds another interpolation format.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
2022-09-26 14:49:56 +00:00
Yonggang Luo
091249dff4 aco: Fixes compiling error about char8_t with c++20
The error is:
../mesa/src/amd/compiler/aco_register_allocation.cpp:382:7: error: no matching function for call to 'printf'
      printf(u8"☐");

Fixes: 209a89e51d ("aco: Convert to use u8 literal for Unicode character to fixes msvc warning")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7318

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18796>
2022-09-26 13:28:33 +00:00
Daniel Schürmann
cd36a29759 aco/optimizer: change inverse_comparison in-place
This avoids creating a second comparison which is more expensive
than just using the existing s_not.

Totals from 1089 (0.81% of 134913) affected shaders: (GFX10.3)
VGPRs: 50472 -> 50184 (-0.57%)
CodeSize: 4724692 -> 4760824 (+0.76%); split: -0.03%, +0.79%
MaxWaves: 23964 -> 24012 (+0.20%)
Instrs: 859588 -> 859687 (+0.01%); split: -0.11%, +0.12%
Latency: 10674653 -> 10650353 (-0.23%); split: -0.41%, +0.18%
InvThroughput: 1752987 -> 1750238 (-0.16%); split: -0.20%, +0.04%
VClause: 20921 -> 20872 (-0.23%); split: -0.68%, +0.45%
SClause: 31417 -> 31550 (+0.42%)
Copies: 69428 -> 68738 (-0.99%); split: -1.52%, +0.53%
PreSGPRs: 48033 -> 49649 (+3.36%)
PreVGPRs: 44490 -> 43699 (-1.78%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Timur Kristóf
c8445c1691 aco: Change inverse-comparison optimization to work with s_not
Some time ago we stopped using s_andn2 with exec for boolean NOT.
The reasoning behind that change was that those booleans will be
always ANDed with exec when necessary.

This inhibited the inverse-comparison optimization in most cases
which is fixed by this patch.

Fossil DB stats on Navi 21:

Totals from 12251 (9.08% of 134913) affected shaders:
VGPRs: 801744 -> 802016 (+0.03%); split: -0.00%, +0.04%
SpillSGPRs: 8863 -> 8893 (+0.34%)
CodeSize: 100593244 -> 100370684 (-0.22%); split: -0.22%, +0.00%
MaxWaves: 204994 -> 204948 (-0.02%); split: +0.00%, -0.02%
Instrs: 18717001 -> 18668965 (-0.26%); split: -0.26%, +0.00%
Latency: 263255046 -> 262874896 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 52760249 -> 52721736 (-0.07%); split: -0.08%, +0.01%
VClause: 329631 -> 329680 (+0.01%); split: -0.03%, +0.04%
SClause: 681563 -> 681435 (-0.02%); split: -0.02%, +0.00%
Copies: 1331612 -> 1372446 (+3.07%); split: -0.03%, +3.10%
Branches: 548325 -> 548301 (-0.00%)
PreSGPRs: 911317 -> 909700 (-0.18%)
PreVGPRs: 766279 -> 767070 (+0.10%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Daniel Schürmann
cf5f9854bc aco/optimizer: optimize s_and(exec, s_and(x, y)) more aggressively
It is enough if either of the two operands is respecting the
current exec mask.

Totals from 1680 (1.25% of 134913) affected shaders: (GFX10.3)
CodeSize: 3929436 -> 3922372 (-0.18%); split: -0.18%, +0.00%
Instrs: 730305 -> 728536 (-0.24%); split: -0.24%, +0.00%
Latency: 6839314 -> 6835154 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 1371351 -> 1371267 (-0.01%); split: -0.01%, +0.00%
SClause: 32819 -> 32802 (-0.05%); split: -0.09%, +0.04%
Copies: 33264 -> 33271 (+0.02%); split: -0.01%, +0.03%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Daniel Schürmann
79a8e8b5b2 aco/optimizer: do can_eliminate_and_exec() optimization later
This will allow to optimize s_not(v_cmp()) safely.

Totals from 1024 (0.76% of 134913) affected shaders: (GFX10.3)
CodeSize: 5695424 -> 5701860 (+0.11%); split: -0.00%, +0.11%
Scratch: 242688 -> 241664 (-0.42%)
Instrs: 1040656 -> 1041635 (+0.09%); split: -0.00%, +0.09%
Latency: 16842282 -> 16922790 (+0.48%); split: -0.06%, +0.54%
InvThroughput: 4772728 -> 4810868 (+0.80%); split: -0.10%, +0.90%
VClause: 20013 -> 20000 (-0.06%); split: -0.12%, +0.05%
Copies: 115057 -> 114384 (-0.58%); split: -1.22%, +0.63%
Branches: 34531 -> 34532 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 46263 -> 46267 (+0.01%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253>
2022-09-22 12:35:11 +00:00
Georg Lehmann
84c0529258 aco: Unswizzle v_pk_fma_f16 literals to produce more v_pk_fmac_f16.
No Foz-DB difference, but it reduces code size in some angle shaders.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18676>
2022-09-21 23:04:06 +00:00
Timur Kristóf
ed76471d08 aco/optimizer_postRA: Clarify terminology.
Change the terminology around the post-RA optimizer, primarily this
changes the use of "clobbered" to "overwritten" to avoid confusion,
and it removes some redundant states.

Proposed for backporting to stable, to make sure it is easy to
backport further fixes (if any) on top of this.

Fossil DB stats unaffected on Navi 21.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:57 +00:00
Timur Kristóf
a8dd07518c aco/optimizer_postRA: Fix logical control flow handling.
Change reset_block() so it only considers the logical
predecessors for VGPRs. Relevant for some optimizations
across loops.

This commit fixes an assertion failure which was triggered
by Zink in a piglit test.

Fossil DB stats unaffected on Navi 21.

Fixes: 2e56e23420
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:57 +00:00
Timur Kristóf
2eab413cf7 aco/optimizer_postRA: Don't assume all operand registers were written by same instr.
This assumption is no longer true since the post-RA optimizer
can work across blocks. It is now possible that some control
flow paths overwrite some but not all registers of an operand.

This commit may prevent invalid optimizations and/or assertion
failures (on debug builds).

Fossil DB stats unaffected on Navi 21.

Fixes: 0e4747d3fb
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:57 +00:00
Timur Kristóf
63063dd5ce aco/optimizer_postRA: Mark a register overwritten when predecessors disagree.
Affects blocks whose some (but not all) predecessors overwrite a register.
This commit fixes glitches in some games which regressed because of the
improved SCC no-compare optimization.

Fossil DB stats on Navi 21:

Totals from 2816 (2.09% of 134906) affected shaders:
CodeSize: 24224276 -> 24241580 (+0.07%)
Instrs: 4570595 -> 4574921 (+0.09%)
Latency: 53680256 -> 53693655 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 9829289 -> 9830573 (+0.01%)

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7257
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7305
Fixes: 2e56e23420
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:57 +00:00
Timur Kristóf
5e80edfa78 aco/tests: Add post-RA SCC no-compare tests cases with control flow.
- scc_nocmp_across_cf: passes
- scc_nocmp_across_cf_partially_overwritten: fails (fixed later)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:56 +00:00
Timur Kristóf
d4b3f81d94 aco/tests: Add post-RA DPP test cases with control flow.
These are intended to make sure that the post-RA optimizer works
correctly across control flow. The new tests emit a divergent
if-else branch (with full logical+linear CFG).

- dpp_across_cf:
  Simple case of DPP optimizable across control flow. Should pass.
- dpp_across_cf_overwritten:
  Similar case but the DPP source register is overwritten in CF.
  This shows a bug so the test fails now (will be fixed).

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:56 +00:00
Timur Kristóf
d7cd49d54b aco/tests: Add post-RA optimizer testcase for partially overwritten VCC.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488>
2022-09-21 16:56:56 +00:00
Daniel Schürmann
98e3c446d8 aco/optimizer: disallow can_eliminate_and_exec() with s_not
Totals from 295 (0.22% of 134913) affected shaders: (GFX10.3)
CodeSize: 1016564 -> 1016896 (+0.03%); split: -0.05%, +0.09%
Instrs: 187659 -> 187724 (+0.03%); split: -0.08%, +0.11%
Latency: 2839516 -> 2839541 (+0.00%); split: -0.01%, +0.01%
Copies: 12301 -> 12305 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 10266 -> 10268 (+0.02%)

Closes: #7024
Cc: mesa-stable
Tested-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18722>
2022-09-21 15:33:43 +00:00
Georg Lehmann
94b5225c9b aco: Use v_fmaak/v_fmamk if two operands are the same literal.
Foz-DB Navi21:
Totals from 5744 (4.26% of 134913) affected shaders:
VGPRs: 237128 -> 237056 (-0.03%); split: -0.04%, +0.01%
CodeSize: 16654484 -> 16620668 (-0.20%); split: -0.23%, +0.03%
MaxWaves: 152838 -> 152846 (+0.01%)
Instrs: 3063214 -> 3058572 (-0.15%); split: -0.17%, +0.02%
Latency: 23935195 -> 23934827 (-0.00%); split: -0.03%, +0.03%
InvThroughput: 5478562 -> 5478160 (-0.01%); split: -0.01%, +0.01%
VClause: 60432 -> 60435 (+0.00%); split: -0.02%, +0.03%
SClause: 121032 -> 120896 (-0.11%); split: -0.20%, +0.09%
Copies: 147865 -> 143144 (-3.19%); split: -3.59%, +0.40%
PreSGPRs: 195722 -> 195661 (-0.03%); split: -0.06%, +0.03%
PreVGPRs: 182849 -> 182787 (-0.03%)

Foz-DB Vega10:
Totals from 5290 (3.92% of 135041) affected shaders:
SGPRs: 357952 -> 359616 (+0.46%); split: -0.11%, +0.57%
VGPRs: 204048 -> 203928 (-0.06%); split: -0.08%, +0.02%
CodeSize: 14043176 -> 14003100 (-0.29%); split: -0.29%, +0.00%
MaxWaves: 39401 -> 39398 (-0.01%); split: +0.01%, -0.02%
Instrs: 2636739 -> 2631246 (-0.21%); split: -0.21%, +0.00%
Latency: 25264088 -> 25256482 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 12039643 -> 12039346 (-0.00%); split: -0.00%, +0.00%
VClause: 55603 -> 55584 (-0.03%); split: -0.04%, +0.00%
SClause: 101577 -> 101342 (-0.23%); split: -0.30%, +0.07%
Copies: 213344 -> 207929 (-2.54%); split: -2.58%, +0.05%
Branches: 34053 -> 34054 (+0.00%)
PreSGPRs: 172405 -> 172260 (-0.08%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18645>
2022-09-21 12:16:47 +00:00
Yonggang Luo
209a89e51d aco: Convert to use u8 literal for Unicode character to fixes msvc warning
Warning:
aco_register_allocation.cpp(383): warning C4819: The file contains a character that cannot be represented in the current code page (0). Save the file in Unicode format to prevent data loss

This warning was treated as error with compiling with msvc
u8 is belongs to c11 standard so it's safe to use it

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18682>
2022-09-20 18:40:50 +00:00
Samuel Pitoiset
19eec024d2 radv,aco: do not compact MRTs if the pipeline uses a PS epilog
We can't detect color attachment without exports when compiling a PS
epilog, so we can't compact MRTs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18514>
2022-09-20 13:12:49 +00:00
Rhys Perry
6df5ff7f19 aco: DCE ra_ctx::defs_done
This was used to distinguish definitions fixed before and during RA, but
it seems it isn't used anymore.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18547>
2022-09-20 12:24:03 +00:00
Samuel Pitoiset
704ef1fd3b radv,aco: lower barycentric_at_sample in NIR
fossils-db (NAVI21):
Totals from 158 (0.12% of 134913) affected shaders:
CodeSize: 569456 -> 568824 (-0.11%)

Only Control seems affected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615>
2022-09-20 09:52:37 +00:00
Rhys Perry
7d26fafacf radv: fix dynamic RT stack size with VGPR spilling
VGPR spilling might cause VGPRs to be spilled at scratch offset 0, so we
can't use that.

fossil-db (Sienna Cichlid, Q2RTX and Control):
Totals from 4 (0.26% of 1524) affected shaders:
Instrs: 8734 -> 8737 (+0.03%)
CodeSize: 48492 -> 48504 (+0.02%)
Latency: 384375 -> 384369 (-0.00%)
InvThroughput: 256250 -> 256246 (-0.00%)
Copies: 1312 -> 1313 (+0.08%)
Branches: 256 -> 258 (+0.78%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18541>
2022-09-20 01:39:20 +00:00
Samuel Pitoiset
9a6aa3e23a aco: prevent a division by zero when patch control points is dynamic
tess_input_vertices is zero if the state is dynamic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18344>
2022-09-13 08:24:14 +00:00
Georg Lehmann
28a69b72d8 aco: Use plain VOPC for vcmpx when possible.
Foz-DB Navi21:
Totals from 66947 (49.62% of 134913) affected shaders:
CodeSize: 210383024 -> 210033376 (-0.17%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18417>
2022-09-07 16:08:36 +00:00
Samuel Pitoiset
8fcb4aa0eb radv: compact MRTs to save PS export memory space
If there are holes between color outputs (e.g. a shader exports MRT1,
but not MRT0), we can remove the holes by moving higher MRTs lower. The
hardware will remap the MRTs to their correct locations if we remove
holes in SPI_SHADER_COL_FORMAT but not CB_SHADER_MASK. This is good for
performance because the hardware will allocate less space for color
MRTs.

This also allows to remove even more unused color exports because we no
longer need to force previous targets to be non-zero. Only SotTR seems
affected from our fossils db.

fossils-db (NAVI21):
Totals from 859 (0.64% of 134913) affected shaders:
VGPRs: 24328 -> 24216 (-0.46%)
CodeSize: 1433276 -> 1422576 (-0.75%)
Instrs: 255275 -> 253728 (-0.61%)
Latency: 1666836 -> 1661544 (-0.32%)
InvThroughput: 346038 -> 343406 (-0.76%)
Copies: 16520 -> 16506 (-0.08%)
PreSGPRs: 25934 -> 25920 (-0.05%)
PreVGPRs: 19903 -> 19662 (-1.21%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5786>
2022-09-07 08:17:20 +00:00
Samuel Pitoiset
df997cf47d aco: remove unused isel_context::tcs_num_patches
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18278>
2022-09-01 17:02:16 +00:00
Rhys Perry
061b8bfd29 aco/ra: rework fixed operands
This moves all fixed operands at once, so they don't interfere with one
another.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17493>
2022-09-01 11:22:46 +00:00
Rhys Perry
ec867ef0e7 aco/ra: remove bounds parameter from get_regs_for_copies()
I don't think it makes sense for this to be anything but get_reg_bounds(),
and this change makes this function usuable with a mix of SGPRs and VGPRs.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17493>
2022-09-01 11:22:46 +00:00
Rhys Perry
efcbccaf0e aco/ra: handle empty def_reg interval in get_regs_for_copies
If def_reg is empty, then def_reg.lo() may be lower than bounds.lo() if
we're moving VGPRs and info.bounds will be invalid.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17493>
2022-09-01 11:22:46 +00:00
Timur Kristóf
16c14663e5 aco: Fix p_init_scratch for task shaders.
Fixes: d2d94b62f2
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18339>
2022-09-01 09:10:47 +00:00
Georg Lehmann
a82b9d6001 aco: Fix image instructions with lod when 2d_view_of_3d is enabled on GFX9.
If there's a lod parameter it matter if the image is 3d or 2d because
the hw reads either the fourth or third component as lod. So detect
3d images and place the lod at the third component otherwise.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18114>
2022-08-31 17:44:38 +00:00
Rhys Perry
96df4499ac radv,aco: implement 64-bit vertex inputs
Note that, from 22.4.1. Vertex Input Extraction of Vulkan spec:
The input variable in the shader must be declared as a 64-bit data type if
and only if format is a 64-bit data type.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17894>
2022-08-30 19:02:11 +00:00
Rhys Perry
831257bdce radv,aco: use pipe_format for dynamic vertex input state
Also prepare for 64-bit and R8G8B8/R16G16B16 with the addition of
radv_vs_input_state::nontrivial_formats.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5021
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17894>
2022-08-30 19:02:11 +00:00
Rhys Perry
c06a5a5ebd radv,aco: use pipe_format for static vertex input state
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17894>
2022-08-30 19:02:11 +00:00