Commit graph

3539 commits

Author SHA1 Message Date
Georg Lehmann
4b332afb32 aco/sched_ilp: add dependencies of later clause instrs more aggressively
Foz-DB GFX1150:
Totals from 22246 (28.03% of 79377) affected shaders:
Instrs: 22689053 -> 22684012 (-0.02%); split: -0.06%, +0.03%
CodeSize: 117622416 -> 117603292 (-0.02%); split: -0.04%, +0.03%
Latency: 182725630 -> 182702465 (-0.01%); split: -0.06%, +0.05%
InvThroughput: 37963256 -> 37956961 (-0.02%); split: -0.03%, +0.02%
VClause: 471019 -> 467248 (-0.80%)
SClause: 592620 -> 590034 (-0.44%); split: -0.44%, +0.01%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33111>
2025-01-27 11:59:45 +00:00
Georg Lehmann
ea514e9385 aco/sched_ilp: continue open clauses
Foz-DB GFX1150:
Totals from 13789 (17.37% of 79395) affected shaders:
Instrs: 16567395 -> 16570832 (+0.02%); split: -0.03%, +0.05%
CodeSize: 85737492 -> 85751072 (+0.02%); split: -0.02%, +0.04%
Latency: 140988872 -> 140831767 (-0.11%); split: -0.12%, +0.01%
InvThroughput: 29639206 -> 29614890 (-0.08%); split: -0.09%, +0.00%
VClause: 347065 -> 343779 (-0.95%); split: -0.96%, +0.01%
SClause: 424881 -> 418657 (-1.46%); split: -1.48%, +0.02%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33111>
2025-01-27 11:59:45 +00:00
Georg Lehmann
997ea2e273 aco: update is_dual_issue_capable for gfx11.5+
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33111>
2025-01-27 11:59:45 +00:00
Rhys Perry
e18e293e6c aco: don't use divergence information for most ALU defs
If one of the sources are divergent but all are SGPRs, then this can
cause issues (for example, imul with two SGPR sources and a VGPR definition).

No fossil-db changes (navi21)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12454
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32997>
2025-01-27 10:48:07 +00:00
Marek Olšák
f98613d47c aco: implement replacement of sample_mask_in with helper_invocation in PS prolog
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
a842f198d7 aco: simplify how broadcast_last_cbuf is implemented in PS epilog
So PS epilogs only need a single bool flag that determines whether all
enabled color buffers should be written.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
5c4f737b84 aco: implement replacing frag_coord with pixel_coord in PS prolog
This adds an option to replace frag_coord.xy with pixel_coord when sample
shading is disabled, which is most of the time. This reduces the number of
input VGPRs.

It's already implement in ac_nir_lower_ps_early for monolithic shaders.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
d7d4d56f5b ac,aco,radeonsi: replace SampleMaskIn with 1 << SampleID if full sample shading
Since the sample mask is always 1 << sample_id with full sample shading,
just use that instead of loading sample_mask_in. Set it to 0 if it's
a helper invocation. This removes the sample mask input VGPR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:25 -05:00
Rhys Perry
0eb5f66660 nir/validate: validate ssa dominance by default
This no longer modifies dominance metadata, so enable it by default.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32005>
2025-01-23 23:35:44 +00:00
Daniel Schürmann
08560b8ff8 aco/lower_branches: stitch linear blocks if there is exactly one successor with one predecessor
Totals from 12906 (16.26% of 79395) affected shaders: (Navi31)

Instrs: 22051521 -> 22049488 (-0.01%); split: -0.01%, +0.00%
CodeSize: 116591240 -> 116583920 (-0.01%)
Latency: 196625178 -> 196538410 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 33943045 -> 33930615 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
c90ae5f773 aco: delete aco_jump_threading.cpp
This is now handled by lower_branches().

Totals from 47236 (59.49% of 79395) affected shaders: (Navi31)
Instrs: 29490400 -> 29490507 (+0.00%)
CodeSize: 152316812 -> 152317248 (+0.00%); split: -0.00%, +0.00%
Latency: 229665459 -> 229665106 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 36870605 -> 36870504 (-0.00%); split: -0.00%, +0.00%
Copies: 1966751 -> 2233467 (+13.56%)
SALU: 3122941 -> 3123048 (+0.00%)

Note, that only about 20 shaders are actually affected.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
c677809f25 aco/lower_branches: allow for non-fallthrough loop exits in try_merge_break_with_continue()
Totals from 211 (0.27% of 79395) affected shaders: (Navi31)

Instrs: 276961 -> 276545 (-0.15%)
CodeSize: 1404356 -> 1402248 (-0.15%)
Latency: 1344722 -> 1344887 (+0.01%); split: -0.00%, +0.01%
InvThroughput: 165624 -> 165622 (-0.00%); split: -0.00%, +0.00%
Branches: 6149 -> 5987 (-2.63%)
SALU: 25722 -> 25468 (-0.99%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
12656ea5f5 aco: move try_merge_break_with_continue() to lower_branches()
Totals from 3 (0.00% of 79395) affected shaders: (Navi31)

Instrs: 12888 -> 12882 (-0.05%)
Latency: 83253 -> 83246 (-0.01%)
InvThroughput: 9251 -> 9249 (-0.02%)
Branches: 483 -> 480 (-0.62%)
SALU: 1329 -> 1326 (-0.23%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
13ad3db43f aco/lower_branches: implement try_remove_simple_block() in lower_branches()
This is mostly the same as in jump_threading, but can handle
multiple predecessors.

Totals from 3523 (4.44% of 79395) affected shaders: (Navi31)

Instrs: 10244892 -> 10244753 (-0.00%); split: -0.00%, +0.00%
CodeSize: 54171500 -> 54168540 (-0.01%); split: -0.01%, +0.00%
Latency: 75070425 -> 75059570 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 11606911 -> 11605767 (-0.01%); split: -0.01%, +0.00%
Branches: 331778 -> 331675 (-0.03%); split: -0.05%, +0.02%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
2b5a893e29 aco/lower_branches: do eliminate_useless_exec_writes_in_block() during branch lowering.
Totals from 728 (0.92% of 79395) affected shaders: (Navi31)

Instrs: 452926 -> 452161 (-0.17%)
CodeSize: 2255536 -> 2252504 (-0.13%)
Latency: 1683404 -> 1683470 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 210887 -> 210888 (+0.00%); split: -0.00%, +0.00%
SALU: 77865 -> 77106 (-0.97%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
eecdb45d61 aco: consider s_cbranch_exec* instructions in needs_exec_mask()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
de1e38e214 aco/assembler: Find loop exits using the successor's loop nest depth
Previously, we just used the next block after a loop that
has a back-edge. This assumes that loop-exit blocks can
only be removed when falling through to the next block,
when in fact it can also be a jump to somewhere else,
in future even to some block before the actual loop.

12 (0.02% of 79395) affected shaders.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Daniel Schürmann
29c63de062 aco/jump_threading: don't remove loop preheaders
They might be needed as convergence point in order to
insert code (e.g. for loop alignment, wait states, etc.).

Totals from 1 (0.00% of 79395) affected shaders:

CodeSize: 12672 -> 12716 (+0.35%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
2025-01-23 00:11:06 +00:00
Georg Lehmann
71cb394b02 aco: implement some more std::vector functions for small_vec
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33043>
2025-01-17 09:25:48 +00:00
Georg Lehmann
31de188bc2 aco: support less trivial component types in small_vec
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33043>
2025-01-17 09:25:48 +00:00
Georg Lehmann
15cba08db0 aco: guard small_vector move/copy operator against self assignment
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33043>
2025-01-17 09:25:48 +00:00
Marek Olšák
d160252270 ac: use Z_EXPORT_FORMAT=32_AR for Z + Alpha mrtz exports
This should be faster than 32_ABGR.

Also, stencil exports are changed from UINT16_ABGR to 32_GR,
which should have no effect on performance.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>
2025-01-16 02:58:03 +00:00
Timur Kristóf
50035f0316 ac/nir: Move all ac_nir_* files to a new folder.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:46:30 +01:00
Timur Kristóf
305fdfddb5 ac/nir: Move ac_set_nir_options to ac_nir.c
And rename it to ac_nir_set_options to match other functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:45:34 +01:00
Samuel Pitoiset
10e424f586 aco: always use ds_bpermute for shuffle/rotate on GFX12
ds_bpermute supports both 32 and 64 lanes now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974>
2025-01-13 08:33:38 +00:00
Rhys Perry
2b10930b48 aco: use VOP3 v_mov_b16 if necessary
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944>
2025-01-10 15:05:00 +00:00
Rhys Perry
46787fc2d0 aco/util: fix bit_reference::operator&=
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944>
2025-01-10 15:05:00 +00:00
Rhys Perry
8ac4744706 aco/tests: fix skip_lines=True with remaining characters in matches
If the remaining character check fails, we should try a later line if
skip_lines=True. So the check has to be done earlier.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32902>
2025-01-08 15:28:37 +00:00
Friedrich Vock
71392fff25 aco: Fix dead instruction/index handling for try_insert_saveexec_out_of_loop
The loop checking if exec is overwritten didn't check for NULL
instructions, and didn't fix up reg write indices after inserting
instructions.

Fixes: fcd94a8c ("aco: move try_optimize_branching_sequence() to postRA optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32746>
2025-01-08 10:48:01 +00:00
Samuel Pitoiset
f94bd67b82 aco: fix VS prologs on GFX12
MTBUF/MUBUF instructions must use zero for SOFFSET, use const_offset
instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32904>
2025-01-07 13:44:32 +00:00
Marek Olšák
7fbca998b1 amd: optimize atomics before lowering intrinsics
ac_nir_lower_intrinsics_to_args will lower most system values.

I have to keep the divergence analysis in ACO, otherwise it goes haywire.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:56 +00:00
Marek Olšák
0d5b03f2b9 ac/nir: split local_invocation_ids to 3 separate VGPR inputs
so that we can set the upper range per VGPR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
ceb6f8fc32 amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR
The overwrite instruction complicates it a little, which is why these
intrinsics are lowered together.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
61bfb4fa06 amd: lower load_subgroup_invocation in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
e69f47faee amd: lower load_local_invocation_index in NIR
This is the last intrinsic that needed the LS VGPR bug workaround in ACO
and ac_nir_to_llvm.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
342dcbdc8b amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR
2 things complicate this:
- overwrite_vs_arguments_amd
- the LS VGPR bug workaround

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
66dd70adc5 amd: lower load_gs_wave_id_amd in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
923f59c971 amd: lower load_barycentric_at_offset in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
16ab05fad1 amd: lower load_barycentric_pixel/centroid/sample in NIR
radeonsi needs to preserve interp_mode in the arg load.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
7e83f6ca8b amd: lower load_front_face in NIR
radeonsi must do this after si_lower_nir_abi, which optimizes front_face,
but doesn't lower it.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
6ad5225b2a amd: lower load_frag_shading_rate in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
6d2e29ff6e amd: lower load_sample_pos in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
110e474b4f amd: lower load_sample_id in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
684c8da553 amd: lower load_invocation_id in NIR
ACO can't look for it because it's lowered there.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
d281240c57 amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
0d372b043b amd: lower load_local_invocation_id in NIR
This is based on ACO.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
13cb5c7b72 amd: lower load_frag_coord in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
58cb155068 amd: lower load_pixel_coord in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Georg Lehmann
272ff275fa aco/insert_exec: reset top exec for p_discard_if
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12363
Fixes: 31f62a6123 ("aco/insert_exec: don't always reset top exec")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32830>
2025-01-02 15:18:48 +00:00
Georg Lehmann
3da2d96bc5 aco/optimizer: fix signed extract of sub dword temps with SDWA
If an instruction didn't already use SDWA convert_to_SDWA in apply_extract
will add ubyte0/uword0 selections for v1b/v2b operands. This loses information
that the instruction doesn't care about the high bits and makes the next
apply_extract_twice fail.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Fixes: 6cb9d39bc2 ("aco: combine extracts with sub-dword definitions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32803>
2025-01-02 09:33:18 +00:00