Daniel Schürmann
115ff5f95b
aco/insert_exec_mask: don't restore exec in continue_or_break blocks
...
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
7f7c1d463a
aco/insert_exec_mask: Don't immediately set exec to zero in break/continue blocks
...
Instead, only indicate that exec should be zero and do
so in the successive helper block. This allows to insert
the parallelcopies from logical phis directly before the
branch in break and continue blocks.
Totals from 56 (0.07% of 79377) affected shaders: (Navi31)
Latency: 2472367 -> 2472422 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 253053 -> 253055 (+0.00%); split: -0.00%, +0.00%
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Georg Lehmann
272ff275fa
aco/insert_exec: reset top exec for p_discard_if
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12363
Fixes: 31f62a6123 ("aco/insert_exec: don't always reset top exec")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32830 >
2025-01-02 15:18:48 +00:00
Georg Lehmann
42512208d8
aco/insert_exec: exit shader using exec for top level discard
...
Totals from 14538 (18.31% of 79395) affected shaders:
no changes
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731 >
2024-12-30 13:05:23 +00:00
Georg Lehmann
c279e63a79
aco: rename p_early_exit_if to if_not
...
It exits the shaders if the condition is false, not true.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731 >
2024-12-30 13:05:23 +00:00
Daniel Schürmann
b64fff7731
aco: remove definition from Pseudo branch instructions
...
They are not needed anymore.
Totals from 7019 (8.84% of 79395) affected shaders: (Navi31)
Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13%
CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08%
SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43%
Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00%
SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00%
Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04%
Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 360167 -> 355363 (-1.33%)
VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00%
SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037 >
2024-12-06 14:34:03 +00:00
Georg Lehmann
d2dcaf1f5e
aco/insert_exec: reuse old exec temp instead using s_and_saveexec
...
This means the v_cmpx optimization in ssa_elimination no longer
needs to insert a copy to save exec.
Foz-DB Navi31:
Totals from 13816 (17.40% of 79395) affected shaders:
Instrs: 23694267 -> 23670199 (-0.10%); split: -0.11%, +0.01%
CodeSize: 124559288 -> 124457508 (-0.08%); split: -0.09%, +0.01%
SpillSGPRs: 5324 -> 5354 (+0.56%); split: -1.00%, +1.56%
Latency: 207245846 -> 207213681 (-0.02%); split: -0.03%, +0.01%
InvThroughput: 35442657 -> 35437220 (-0.02%); split: -0.02%, +0.01%
VClause: 444672 -> 444670 (-0.00%); split: -0.00%, +0.00%
SClause: 639419 -> 639373 (-0.01%); split: -0.04%, +0.03%
Copies: 1529008 -> 1515871 (-0.86%); split: -1.02%, +0.16%
Branches: 557201 -> 557701 (+0.09%); split: -0.00%, +0.09%
PreSGPRs: 682840 -> 686048 (+0.47%)
VALU: 13978010 -> 13978032 (+0.00%); split: -0.00%, +0.00%
SALU: 2214600 -> 2197061 (-0.79%); split: -0.81%, +0.02%
VOPD: 5561 -> 5560 (-0.02%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567 >
2024-10-23 19:34:53 +00:00
Georg Lehmann
0471522377
aco/insert_exec: reuse old exec temp in loop pre-header
...
Avoid an exec copy.
Foz-DB Navi31:
Totals from 2315 (2.92% of 79395) affected shaders:
Instrs: 9082831 -> 9058990 (-0.26%); split: -0.27%, +0.00%
CodeSize: 48017244 -> 47858064 (-0.33%); split: -0.34%, +0.01%
SpillSGPRs: 1680 -> 1684 (+0.24%); split: -0.48%, +0.71%
Latency: 109511718 -> 109525041 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 20287085 -> 20289370 (+0.01%); split: -0.00%, +0.02%
VClause: 192259 -> 192260 (+0.00%)
SClause: 234082 -> 234124 (+0.02%); split: -0.01%, +0.03%
Copies: 667271 -> 645577 (-3.25%); split: -3.27%, +0.02%
Branches: 264086 -> 264088 (+0.00%)
PreSGPRs: 136831 -> 136966 (+0.10%)
VALU: 5234735 -> 5234740 (+0.00%); split: -0.00%, +0.00%
SALU: 949283 -> 927327 (-2.31%); split: -2.32%, +0.01%
VOPD: 1529 -> 1535 (+0.39%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567 >
2024-10-23 19:34:53 +00:00
Georg Lehmann
31f62a6123
aco/insert_exec: don't always reset top exec
...
This allows to re-use previous temporaries in case exec was restored
from a Temp, rather than having to create a new copy of exec.
Foz-DB Navi31:
Totals from 545 (0.69% of 79395) affected shaders:
Instrs: 216563 -> 215698 (-0.40%)
CodeSize: 1183536 -> 1180076 (-0.29%)
Latency: 1135269 -> 1135294 (+0.00%); split: -0.00%, +0.00%
Copies: 11933 -> 11072 (-7.22%)
SALU: 18990 -> 18129 (-4.53%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567 >
2024-10-23 19:34:53 +00:00
Georg Lehmann
4f04e6f0c4
aco/insert_exec: avoid phis for masks in exec
...
Exec always contains the same value as the top of stack, even if the
top of stack is a temporary/constant. So if the predecessors have different
top of stack operands, don't insert a phi and use exec as the new top of stack.
No Foz-DB changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31567 >
2024-10-23 19:34:53 +00:00
Georg Lehmann
5da34ebee4
aco/insert_exec: remove get_exec_op
...
We used to only store Temps in the stack, so undef meant exec.
Then the stack was changed to operands, and some places started storing exec
directly, drop the undef handling by replacing everything with Operand(exec, lm)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560 >
2024-10-22 17:03:26 +00:00
Georg Lehmann
6716fb08d8
aco/insert_exec: remove unused includes
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560 >
2024-10-22 17:03:26 +00:00
Georg Lehmann
23fb0883eb
aco/insert_exec: untangle add_branch_code control flow
...
All of the single ifs with return hide that this is effectively almost an
if-else chain, so convert it to one for real.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560 >
2024-10-22 17:03:26 +00:00
Georg Lehmann
de7d931962
aco/insert_exec: remove stray break_cond variable
...
This was always trivial since some discard rework.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560 >
2024-10-22 17:03:26 +00:00
Georg Lehmann
ade7f1a203
aco/insert_exec: replace pair with a named struct
...
.first and .second everywhere was hard to read.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560 >
2024-10-22 17:03:26 +00:00
Georg Lehmann
a3054499ba
aco/insert_exec: don't pretend WQMState is a bit mask
...
It's a simple enum.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31560 >
2024-10-22 17:03:26 +00:00
Rhys Perry
b934255510
aco: split selection_control_remove into rarely_taken and never_taken
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Backport-to: 24.1
Backport-to: 24.2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30321 >
2024-08-15 16:00:18 +00:00
Rhys Perry
cccfbe6141
aco: move s_setprio to before NGG exec initialization
...
fossil-db (gfx1150):
Totals from 32 (0.04% of 79395) affected shaders:
Instrs: 17397 -> 17365 (-0.18%)
CodeSize: 83700 -> 83580 (-0.14%)
Latency: 59006 -> 58974 (-0.05%)
fossil-db (navi21):
Totals from 4 (0.01% of 79395) affected shaders:
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30241 >
2024-07-23 13:14:52 +00:00
Rhys Perry
71afacff39
aco/insert_exec_mask: ensure top mask is not a temporary at loop exits
...
This is problematic when the successor of the loop exit is an invert
block. It assumes that the top mask is Operand(bld.lm) and doesn't change
it when entering the else branch.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11348
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29767 >
2024-06-20 12:47:05 +00:00
Samuel Pitoiset
7a69d78ba2
aco: use SPDX-License-Identifier
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28622 >
2024-04-08 15:49:25 +00:00
Daniel Schürmann
a863c7951e
aco: remove create_instruction() template parameter
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
9b0ebcc39b
aco: change return type of create_instruction() to Instruction*
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
1187189235
aco: unify different SALU types into single struct SALU_instruction
...
This removes
- SOP1_instruction
- SOP2_instruction
- SOPC_instruction
- SOPK_instruction
- SOPP_instruction
and their corresponding methods.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28370 >
2024-03-28 11:25:43 +00:00
Daniel Schürmann
9bbb9f1104
aco: use small_vec as Block::edge_vec for predecessors and successors
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27984 >
2024-03-19 13:06:58 +00:00
Daniel Schürmann
4fa27845e5
aco/insert_exec_mask: Reduce latency when switching to WQM.
...
Change pattern:
s_mov_b64 s[0:1], exec s_mov_b64 s[0:1], exec
s_wqm_b64 exec, s[0:1] -> s_wqm_b64 exec, exec
Totals from 16667 (21.03% of 79242) affected shaders: (GFX11)
Instrs: 11317502 -> 11307484 (-0.09%); split: -0.09%, +0.00%
CodeSize: 60194272 -> 60155088 (-0.07%); split: -0.07%, +0.00%
Latency: 94345873 -> 94338374 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 13568872 -> 13568683 (-0.00%); split: -0.00%, +0.00%
Copies: 808334 -> 808332 (-0.00%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27112 >
2024-02-02 18:55:15 +00:00
Daniel Schürmann
e89977ff71
aco: always terminate quads if they have been demoted entirely
...
Previously, quads got only terminated in top-level control flow.
This patch makes the behavior consistent.
Totals from 7811 (9.86% of 79242) affected shaders: (GFX11)
Instrs: 7859667 -> 7850757 (-0.11%); split: -0.18%, +0.07%
CodeSize: 41642280 -> 41611836 (-0.07%); split: -0.13%, +0.06%
Latency: 73692815 -> 73707588 (+0.02%); split: -0.02%, +0.04%
InvThroughput: 10672160 -> 10672323 (+0.00%); split: -0.01%, +0.01%
VClause: 137478 -> 137469 (-0.01%); split: -0.02%, +0.02%
SClause: 314905 -> 314924 (+0.01%); split: -0.19%, +0.20%
Copies: 587014 -> 576039 (-1.87%); split: -2.10%, +0.23%
Branches: 213101 -> 213123 (+0.01%); split: -0.01%, +0.02%
PreSGPRs: 313588 -> 313355 (-0.07%); split: -0.09%, +0.01%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27112 >
2024-02-02 18:55:15 +00:00
Daniel Schürmann
a42b83e3fb
aco/insert_exec_mask: tiny refactor
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27112 >
2024-02-02 18:55:15 +00:00
Daniel Schürmann
c309d20172
aco/insert_exec_mask: Fix unconditional demote at top-level control flow.
...
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27362 >
2024-01-31 13:50:46 +00:00
Daniel Schürmann
09413ff745
aco/insert_exec_mask: only create loop phis for exec mask if necessary
...
Totals from 195 (0.25% of 79242) affected shaders: (GFX11)
Instrs: 476457 -> 476031 (-0.09%); split: -0.23%, +0.14%
CodeSize: 2453964 -> 2452108 (-0.08%); split: -0.23%, +0.16%
SpillSGPRs: 944 -> 913 (-3.28%); split: -3.39%, +0.11%
SpillVGPRs: 838 -> 835 (-0.36%); split: -0.95%, +0.60%
Latency: 10811026 -> 10810125 (-0.01%); split: -0.08%, +0.07%
InvThroughput: 2276677 -> 2276698 (+0.00%); split: -0.12%, +0.12%
VClause: 9223 -> 9233 (+0.11%); split: -0.10%, +0.21%
SClause: 9025 -> 9005 (-0.22%); split: -0.38%, +0.16%
Copies: 67419 -> 67382 (-0.05%); split: -0.97%, +0.92%
PreSGPRs: 10830 -> 10668 (-1.50%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26937 >
2024-01-12 09:05:15 +00:00
Daniel Schürmann
e83d8e1366
aco/insert_exec_mask: replace phi for loop restore mask with explicit copies
...
Totals from 1785 (2.25% of 79242) affected shaders: (GFX11)
Instrs: 6787574 -> 6787041 (-0.01%); split: -0.01%, +0.00%
CodeSize: 34906500 -> 34904704 (-0.01%); split: -0.01%, +0.01%
SpillSGPRs: 5848 -> 5816 (-0.55%)
Latency: 88616877 -> 88617209 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 16644948 -> 16644717 (-0.00%); split: -0.00%, +0.00%
VClause: 141122 -> 141121 (-0.00%)
SClause: 178929 -> 178906 (-0.01%); split: -0.03%, +0.02%
Copies: 569444 -> 569081 (-0.06%); split: -0.09%, +0.03%
Branches: 186980 -> 186961 (-0.01%); split: -0.01%, +0.00%
PreSGPRs: 133648 -> 133369 (-0.21%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26937 >
2024-01-12 09:05:15 +00:00
Daniel Schürmann
d375d297cf
aco/insert_exec_mask: unify exec restore code after divergent control flow
...
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26937 >
2024-01-12 09:05:15 +00:00
Daniel Schürmann
dce695b24f
aco: refactor and speed-up dead code analysis
...
Assuming that no loop header phis are dead code,
we can perform the dead code analysis in a single iteration.
Totals from 25 (0.03% of 79330) affected shaders: (GFX11)
MaxWaves: 664 -> 662 (-0.30%)
Instrs: 487618 -> 488822 (+0.25%)
CodeSize: 2451548 -> 2459756 (+0.33%)
VGPRs: 1296 -> 1332 (+2.78%)
Latency: 2337256 -> 2338098 (+0.04%); split: -0.00%, +0.04%
InvThroughput: 560682 -> 576158 (+2.76%)
VClause: 15782 -> 15790 (+0.05%)
Copies: 37905 -> 38731 (+2.18%)
PreVGPRs: 1124 -> 1156 (+2.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26901 >
2024-01-08 09:43:53 +00:00
Qiang Yu
67244fc88a
aco: remove p_end_with_regs from needs_exact()
...
ps needs to handle wqm:
1. main part may compute with args from prolog in wqm mode, so
prolog need to compute these args in wqm mode too.
2. prolog and main part need to end with exact exec, so next
shader part which inherit previous shader part's exec won't
do valid job for helper threads
1 need p_end_with_regs to operate in wqm mode and itself can't
be exact, otherwise some move instruction added by it won't be
in wqm mode so helper threads' compute result is not passed to
next shader part as args.
2 is done by p_end_wqm added by finish_program automatically
after p_end_with_regs.
Piglit tests can trigger the problem:
1. gl-2.1-polygon-stipple-fs
a. ps prolog call discard_if
b. ps main pass wqm exec to epilog
c. ps epilog export color for discarded pixel
2. fs-fwidth-color.shader_test
a. ps prolog need to pass args computed in wqm mode
b. set p_end_with_regs to exact will end wqm mode before
the move instructions, so helper threads's result is not
passed to next shader part
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:33 +00:00
Daniel Schürmann
6eaf416f35
aco/insert_exec_mask: Simplify WQM handling (2/2)
...
by calculating WQM requirements on demand.
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:23 +00:00
Daniel Schürmann
5f66723188
aco/insert_exec_mask: Simplify WQM handling (1/2)
...
by using p_end_wqm as indicator for when to end WQM mode.
Totals from 10049 (13.12% of 76572) affected shaders: (GFX11)
MaxWaves: 301126 -> 301136 (+0.00%)
Instrs: 7061909 -> 7049272 (-0.18%); split: -0.21%, +0.03%
CodeSize: 37720684 -> 37664244 (-0.15%); split: -0.18%, +0.03%
VGPRs: 357204 -> 357180 (-0.01%); split: -0.13%, +0.12%
Latency: 62757830 -> 62827080 (+0.11%); split: -0.06%, +0.17%
InvThroughput: 8589248 -> 8589963 (+0.01%); split: -0.02%, +0.02%
VClause: 132541 -> 132547 (+0.00%); split: -0.03%, +0.03%
SClause: 322916 -> 322964 (+0.01%); split: -0.04%, +0.05%
Copies: 546446 -> 547657 (+0.22%); split: -0.13%, +0.35%
Branches: 189527 -> 188293 (-0.65%)
PreSGPRs: 332792 -> 332529 (-0.08%); split: -0.08%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:23 +00:00
Daniel Schürmann
45f6d38a76
aco: insert a single p_end_wqm after the last derivative calculation
...
This new instruction replaces p_wqm.
Totals from 28065 (36.65% of 76572) affected shaders: (GFX11)
MaxWaves: 823922 -> 823952 (+0.00%); split: +0.01%, -0.01%
Instrs: 22221375 -> 22180465 (-0.18%); split: -0.26%, +0.08%
CodeSize: 117310676 -> 117040684 (-0.23%); split: -0.30%, +0.07%
VGPRs: 1183476 -> 1186656 (+0.27%); split: -0.19%, +0.46%
SpillSGPRs: 2305 -> 2302 (-0.13%)
Latency: 176559310 -> 176427793 (-0.07%); split: -0.21%, +0.14%
InvThroughput: 26245204 -> 26195550 (-0.19%); split: -0.26%, +0.07%
VClause: 368058 -> 369460 (+0.38%); split: -0.21%, +0.59%
SClause: 857077 -> 842588 (-1.69%); split: -2.06%, +0.37%
Copies: 1245650 -> 1249434 (+0.30%); split: -0.33%, +0.63%
Branches: 394837 -> 396070 (+0.31%); split: -0.01%, +0.32%
PreSGPRs: 1019139 -> 1019567 (+0.04%); split: -0.02%, +0.06%
PreVGPRs: 925739 -> 931860 (+0.66%); split: -0.00%, +0.66%
Changes are due to scheduling and re-enabling cross-lane optimizations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:23 +00:00
Daniel Schürmann
0907b53740
aco/insert_exec_mask: set Exact mode after p_discard_if when necessary
...
Fixes: 5e9df85b1a ('aco: optimize discard_if when WQM is not needed afterwards')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Rhys Perry
41b6020ff3
aco: remove fast path in insert_exec_mask's process_instructions
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Samuel Pitoiset
37aa6d25e1
aco: ensure to initialize exec manually for non-monolithic {VS,TES}/GS on GFX9+
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24862 >
2023-08-25 10:22:41 +00:00
Samuel Pitoiset
196b355db6
aco: ensure to initialize exec manually for VS as LS on GFX9+
...
When VS and TCS are compiled separately with shader object on GFX9+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24697 >
2023-08-25 07:22:04 +00:00
Qiang Yu
85d9646288
aco: add p_end_with_regs pseudo instruction
...
Used by radeonsi shader parts to pass args from one part to another.
It has variable number of operands to reserve fixed registers with
wanted value.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24442 >
2023-08-16 02:27:45 +00:00
Timur Kristóf
05928f4200
aco: Use ac_hw_stage instead of aco-specific HWStage.
...
The new ac_hw_stage is going to be used by drivers as well.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597 >
2023-06-23 12:49:04 +00:00
Eric Engestrom
6b21653ab4
aco: reformat according to its .clang-format
...
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23253 >
2023-06-16 19:59:52 +00:00
Friedrich Vock
9de8134410
aco: Fix assert in insert_exec_mask
...
This assert would trigger on unconditional demotes, because the demotes
don't remove the mask_type_global flag from the exec mask.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23594 >
2023-06-12 14:20:28 +00:00
Timur Kristóf
8e9d269da6
aco: Don't use nir_selection_control in aco_ir.
...
We don't want to rely on any NIR structures in ACO, because
we would like to avoid the need to include nir.h in aco_ir.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241 >
2023-04-10 20:01:28 +00:00
Daniel Schürmann
caec48529b
aco/insert_exec_mask: allow for disconnected CFG
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20853 >
2023-03-12 18:07:18 +00:00
Timur Kristóf
81620fc7b0
aco: Enable constant exec mask based optimization on compute shaders.
...
We know for sure exec is initially -1 when the shader always has full subgroups.
Fossil DB stats on GFX11:
Totals from 3884 (2.88% of 134913) affected shaders:
SpillSGPRs: 1673 -> 1697 (+1.43%); split: -1.67%, +3.11%
SpillVGPRs: 2316 -> 2310 (-0.26%); split: -0.65%, +0.39%
CodeSize: 19584436 -> 19567156 (-0.09%); split: -0.13%, +0.04%
Scratch: 217088 -> 216832 (-0.12%)
Instrs: 3784596 -> 3780303 (-0.11%); split: -0.15%, +0.03%
Latency: 39971204 -> 39794967 (-0.44%); split: -0.47%, +0.03%
InvThroughput: 7885552 -> 7801247 (-1.07%); split: -1.14%, +0.07%
VClause: 74654 -> 74611 (-0.06%); split: -0.07%, +0.01%
SClause: 103139 -> 103043 (-0.09%); split: -0.13%, +0.04%
Copies: 279864 -> 281995 (+0.76%); split: -0.72%, +1.48%
Branches: 92082 -> 92084 (+0.00%); split: -0.03%, +0.03%
PreSGPRs: 155637 -> 149491 (-3.95%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670 >
2023-01-26 01:59:26 +00:00
Rhys Perry
c3dd1931d9
aco: allow Builder::Result to be dereferenced
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251 >
2023-01-10 16:01:38 +00:00
Samuel Pitoiset
bb90d29660
aco: add p_dual_src_export_gfx11 for dual source blending on GFX11
...
Dual source blending must be in strict WQM mode.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643 >
2022-11-16 18:35:10 +00:00
Timur Kristóf
d8639b7a80
aco: Allow explicitly removing jumps on GFX10+ when beneficial.
...
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).
ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.
This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).
There are two cases that benefit from the new change:
1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.
v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks
Fossil DB stats on Navi 21:
Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921 >
2022-10-11 15:42:54 +00:00