Daniel Schürmann
08560b8ff8
aco/lower_branches: stitch linear blocks if there is exactly one successor with one predecessor
...
Totals from 12906 (16.26% of 79395) affected shaders: (Navi31)
Instrs: 22051521 -> 22049488 (-0.01%); split: -0.01%, +0.00%
CodeSize: 116591240 -> 116583920 (-0.01%)
Latency: 196625178 -> 196538410 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 33943045 -> 33930615 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
c90ae5f773
aco: delete aco_jump_threading.cpp
...
This is now handled by lower_branches().
Totals from 47236 (59.49% of 79395) affected shaders: (Navi31)
Instrs: 29490400 -> 29490507 (+0.00%)
CodeSize: 152316812 -> 152317248 (+0.00%); split: -0.00%, +0.00%
Latency: 229665459 -> 229665106 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 36870605 -> 36870504 (-0.00%); split: -0.00%, +0.00%
Copies: 1966751 -> 2233467 (+13.56%)
SALU: 3122941 -> 3123048 (+0.00%)
Note, that only about 20 shaders are actually affected.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
c677809f25
aco/lower_branches: allow for non-fallthrough loop exits in try_merge_break_with_continue()
...
Totals from 211 (0.27% of 79395) affected shaders: (Navi31)
Instrs: 276961 -> 276545 (-0.15%)
CodeSize: 1404356 -> 1402248 (-0.15%)
Latency: 1344722 -> 1344887 (+0.01%); split: -0.00%, +0.01%
InvThroughput: 165624 -> 165622 (-0.00%); split: -0.00%, +0.00%
Branches: 6149 -> 5987 (-2.63%)
SALU: 25722 -> 25468 (-0.99%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
12656ea5f5
aco: move try_merge_break_with_continue() to lower_branches()
...
Totals from 3 (0.00% of 79395) affected shaders: (Navi31)
Instrs: 12888 -> 12882 (-0.05%)
Latency: 83253 -> 83246 (-0.01%)
InvThroughput: 9251 -> 9249 (-0.02%)
Branches: 483 -> 480 (-0.62%)
SALU: 1329 -> 1326 (-0.23%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
13ad3db43f
aco/lower_branches: implement try_remove_simple_block() in lower_branches()
...
This is mostly the same as in jump_threading, but can handle
multiple predecessors.
Totals from 3523 (4.44% of 79395) affected shaders: (Navi31)
Instrs: 10244892 -> 10244753 (-0.00%); split: -0.00%, +0.00%
CodeSize: 54171500 -> 54168540 (-0.01%); split: -0.01%, +0.00%
Latency: 75070425 -> 75059570 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 11606911 -> 11605767 (-0.01%); split: -0.01%, +0.00%
Branches: 331778 -> 331675 (-0.03%); split: -0.05%, +0.02%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
2b5a893e29
aco/lower_branches: do eliminate_useless_exec_writes_in_block() during branch lowering.
...
Totals from 728 (0.92% of 79395) affected shaders: (Navi31)
Instrs: 452926 -> 452161 (-0.17%)
CodeSize: 2255536 -> 2252504 (-0.13%)
Latency: 1683404 -> 1683470 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 210887 -> 210888 (+0.00%); split: -0.00%, +0.00%
SALU: 77865 -> 77106 (-0.97%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
eecdb45d61
aco: consider s_cbranch_exec* instructions in needs_exec_mask()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
de1e38e214
aco/assembler: Find loop exits using the successor's loop nest depth
...
Previously, we just used the next block after a loop that
has a back-edge. This assumes that loop-exit blocks can
only be removed when falling through to the next block,
when in fact it can also be a jump to somewhere else,
in future even to some block before the actual loop.
12 (0.02% of 79395) affected shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Daniel Schürmann
29c63de062
aco/jump_threading: don't remove loop preheaders
...
They might be needed as convergence point in order to
insert code (e.g. for loop alignment, wait states, etc.).
Totals from 1 (0.00% of 79395) affected shaders:
CodeSize: 12672 -> 12716 (+0.35%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477 >
2025-01-23 00:11:06 +00:00
Georg Lehmann
71cb394b02
aco: implement some more std::vector functions for small_vec
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33043 >
2025-01-17 09:25:48 +00:00
Georg Lehmann
31de188bc2
aco: support less trivial component types in small_vec
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33043 >
2025-01-17 09:25:48 +00:00
Georg Lehmann
15cba08db0
aco: guard small_vector move/copy operator against self assignment
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33043 >
2025-01-17 09:25:48 +00:00
Marek Olšák
d160252270
ac: use Z_EXPORT_FORMAT=32_AR for Z + Alpha mrtz exports
...
This should be faster than 32_ABGR.
Also, stencil exports are changed from UINT16_ABGR to 32_GR,
which should have no effect on performance.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046 >
2025-01-16 02:58:03 +00:00
Timur Kristóf
50035f0316
ac/nir: Move all ac_nir_* files to a new folder.
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966 >
2025-01-14 13:46:30 +01:00
Timur Kristóf
305fdfddb5
ac/nir: Move ac_set_nir_options to ac_nir.c
...
And rename it to ac_nir_set_options to match other functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966 >
2025-01-14 13:45:34 +01:00
Samuel Pitoiset
10e424f586
aco: always use ds_bpermute for shuffle/rotate on GFX12
...
ds_bpermute supports both 32 and 64 lanes now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974 >
2025-01-13 08:33:38 +00:00
Rhys Perry
2b10930b48
aco: use VOP3 v_mov_b16 if necessary
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944 >
2025-01-10 15:05:00 +00:00
Rhys Perry
46787fc2d0
aco/util: fix bit_reference::operator&=
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944 >
2025-01-10 15:05:00 +00:00
Rhys Perry
8ac4744706
aco/tests: fix skip_lines=True with remaining characters in matches
...
If the remaining character check fails, we should try a later line if
skip_lines=True. So the check has to be done earlier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32902 >
2025-01-08 15:28:37 +00:00
Friedrich Vock
71392fff25
aco: Fix dead instruction/index handling for try_insert_saveexec_out_of_loop
...
The loop checking if exec is overwritten didn't check for NULL
instructions, and didn't fix up reg write indices after inserting
instructions.
Fixes: fcd94a8c ("aco: move try_optimize_branching_sequence() to postRA optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32746 >
2025-01-08 10:48:01 +00:00
Samuel Pitoiset
f94bd67b82
aco: fix VS prologs on GFX12
...
MTBUF/MUBUF instructions must use zero for SOFFSET, use const_offset
instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32904 >
2025-01-07 13:44:32 +00:00
Marek Olšák
7fbca998b1
amd: optimize atomics before lowering intrinsics
...
ac_nir_lower_intrinsics_to_args will lower most system values.
I have to keep the divergence analysis in ACO, otherwise it goes haywire.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:56 +00:00
Marek Olšák
0d5b03f2b9
ac/nir: split local_invocation_ids to 3 separate VGPR inputs
...
so that we can set the upper range per VGPR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
ceb6f8fc32
amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR
...
The overwrite instruction complicates it a little, which is why these
intrinsics are lowered together.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
61bfb4fa06
amd: lower load_subgroup_invocation in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
e69f47faee
amd: lower load_local_invocation_index in NIR
...
This is the last intrinsic that needed the LS VGPR bug workaround in ACO
and ac_nir_to_llvm.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
342dcbdc8b
amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR
...
2 things complicate this:
- overwrite_vs_arguments_amd
- the LS VGPR bug workaround
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
66dd70adc5
amd: lower load_gs_wave_id_amd in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
923f59c971
amd: lower load_barycentric_at_offset in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
16ab05fad1
amd: lower load_barycentric_pixel/centroid/sample in NIR
...
radeonsi needs to preserve interp_mode in the arg load.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
7e83f6ca8b
amd: lower load_front_face in NIR
...
radeonsi must do this after si_lower_nir_abi, which optimizes front_face,
but doesn't lower it.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
6ad5225b2a
amd: lower load_frag_shading_rate in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
6d2e29ff6e
amd: lower load_sample_pos in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
110e474b4f
amd: lower load_sample_id in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
684c8da553
amd: lower load_invocation_id in NIR
...
ACO can't look for it because it's lowered there.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
d281240c57
amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
0d372b043b
amd: lower load_local_invocation_id in NIR
...
This is based on ACO.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
13cb5c7b72
amd: lower load_frag_coord in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Marek Olšák
58cb155068
amd: lower load_pixel_coord in NIR
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782 >
2025-01-02 17:36:55 +00:00
Georg Lehmann
272ff275fa
aco/insert_exec: reset top exec for p_discard_if
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12363
Fixes: 31f62a6123 ("aco/insert_exec: don't always reset top exec")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32830 >
2025-01-02 15:18:48 +00:00
Georg Lehmann
3da2d96bc5
aco/optimizer: fix signed extract of sub dword temps with SDWA
...
If an instruction didn't already use SDWA convert_to_SDWA in apply_extract
will add ubyte0/uword0 selections for v1b/v2b operands. This loses information
that the instruction doesn't care about the high bits and makes the next
apply_extract_twice fail.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 6cb9d39bc2 ("aco: combine extracts with sub-dword definitions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32803 >
2025-01-02 09:33:18 +00:00
Timur Kristóf
01bf998e17
aco: Update documentation
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32766 >
2024-12-31 23:01:23 +00:00
Georg Lehmann
43fca7fffe
amd: support load_front_face_fsign
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791 >
2024-12-30 22:31:35 +00:00
Georg Lehmann
aee0c7274c
amd: switch to FRONT_FACE_ALL_BITS(0)
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791 >
2024-12-30 22:31:34 +00:00
Georg Lehmann
6a6b26dfa5
aco: create v_cmpx with s_andn2(exec, v_cmp)
...
Foz-DB Navi21:
Totals from 3928 (4.95% of 79395) affected shaders:
Instrs: 1155370 -> 1151154 (-0.36%)
CodeSize: 6332192 -> 6314616 (-0.28%)
Latency: 11955231 -> 11933281 (-0.18%); split: -0.18%, +0.00%
InvThroughput: 1842283 -> 1841822 (-0.03%); split: -0.03%, +0.00%
SALU: 175431 -> 171215 (-2.40%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731 >
2024-12-30 13:05:23 +00:00
Georg Lehmann
42512208d8
aco/insert_exec: exit shader using exec for top level discard
...
Totals from 14538 (18.31% of 79395) affected shaders:
no changes
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731 >
2024-12-30 13:05:23 +00:00
Georg Lehmann
6b35d6f75b
aco: allow p_exit_early_if_not with exec condition
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731 >
2024-12-30 13:05:23 +00:00
Georg Lehmann
c279e63a79
aco: rename p_early_exit_if to if_not
...
It exits the shaders if the condition is false, not true.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731 >
2024-12-30 13:05:23 +00:00
Georg Lehmann
33a73203b0
aco/isel: skip and(exec) for top level demote_if/terminate_if
...
In nested control flow this is nessecary to not demote/terminate invocations
that are part of the global but not part of the local mask.
At the top level, the masks are the same and no additional invocations
can be accidentally disabled.
Foz-DB Navi21:
Totals from 2095 (2.64% of 79395) affected shaders:
Instrs: 1058326 -> 1056839 (-0.14%)
CodeSize: 5632480 -> 5626616 (-0.10%)
Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00%
Copies: 114446 -> 114433 (-0.01%)
SALU: 230585 -> 229098 (-0.64%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755 >
2024-12-26 18:34:38 +00:00
Marek Olšák
de996ac481
radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled
...
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.
It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders
Some of the samplemask code wasn't completely correct, but probably harmless.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713 >
2024-12-24 12:02:20 +00:00