Daniel Schürmann
dce695b24f
aco: refactor and speed-up dead code analysis
...
Assuming that no loop header phis are dead code,
we can perform the dead code analysis in a single iteration.
Totals from 25 (0.03% of 79330) affected shaders: (GFX11)
MaxWaves: 664 -> 662 (-0.30%)
Instrs: 487618 -> 488822 (+0.25%)
CodeSize: 2451548 -> 2459756 (+0.33%)
VGPRs: 1296 -> 1332 (+2.78%)
Latency: 2337256 -> 2338098 (+0.04%); split: -0.00%, +0.04%
InvThroughput: 560682 -> 576158 (+2.76%)
VClause: 15782 -> 15790 (+0.05%)
Copies: 37905 -> 38731 (+2.18%)
PreVGPRs: 1124 -> 1156 (+2.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26901 >
2024-01-08 09:43:53 +00:00
Daniel Schürmann
023e78b4d7
aco: add new post-RA scheduler for ILP
...
Totals from 77247 (97.37% of 79330) affected shaders: (GFX11)
Instrs: 44371374 -> 43215723 (-2.60%); split: -2.64%, +0.03%
CodeSize: 227819532 -> 223188224 (-2.03%); split: -2.06%, +0.03%
Latency: 301016823 -> 290147626 (-3.61%); split: -3.70%, +0.09%
InvThroughput: 48551749 -> 47646212 (-1.87%); split: -1.88%, +0.01%
VClause: 870581 -> 834655 (-4.13%); split: -4.13%, +0.00%
SClause: 1487061 -> 1340851 (-9.83%); split: -9.83%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25676 >
2024-01-06 11:30:42 +00:00
Daniel Schürmann
72a5c659d4
aco: form clauses for LDS instructions
...
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25676 >
2024-01-06 11:30:42 +00:00
Daniel Schürmann
8f16745821
aco: fix should_form_clause() for memory instructions without operands
...
In particular, this applies to s_memtime and s_memrealtime.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25676 >
2024-01-06 11:30:41 +00:00
Rhys Perry
ae54cbeb3f
nir: remove sad_u8x4
...
All uses of this can be replaced with msad_4x8.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26907 >
2024-01-05 18:55:22 +00:00
Rhys Perry
1410735a62
aco: implement msad_4x8
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26907 >
2024-01-05 18:55:22 +00:00
Rhys Perry
3009dcd102
aco: correctly set min/max_subgroup_size for wave32-as-wave64
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26894 >
2024-01-05 17:35:48 +00:00
Friedrich Vock
1e3541728b
radv,aco: Convert 1D ray launches to 2D
...
Because we use unaligned dispatches, 1D launches only use 8 threads per
wave. Converting to 2D and fixing up launch IDs in the prolog
significantly increases occupancy.
Gives ~30% uplift in Ghostwire Tokyo.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26105 >
2024-01-05 17:08:05 +00:00
Georg Lehmann
71edf4de5e
aco/gfx12: implement broadcast dmask shrink behavior
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26897 >
2024-01-05 12:03:54 +00:00
Georg Lehmann
4a6ee2c483
aco: shrink buffer stores with undef/zero components
...
Buffer stores store 0 like image stores for unspecified components.
Foz-DB Navi21:
Totals from 91 (0.11% of 79330) affected shaders:
Instrs: 63327 -> 63121 (-0.33%)
CodeSize: 315312 -> 314440 (-0.28%); split: -0.28%, +0.00%
VGPRs: 3144 -> 3120 (-0.76%)
Latency: 441424 -> 441300 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 65501 -> 65130 (-0.57%)
Copies: 6197 -> 5999 (-3.20%)
PreVGPRs: 2197 -> 2182 (-0.68%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26897 >
2024-01-05 12:03:54 +00:00
Rhys Perry
cad2c0915d
aco/tests: use more raw strings
...
Python 3.12 started giving a SyntaxWarning for unrecognized escapes such
as "\w". This might become a SyntaxError in a future python version.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26850 >
2024-01-03 13:33:52 +00:00
Georg Lehmann
9ecfd7919b
aco: optimize 32bit fsign by using fmulz with Inf
...
2 instruction fsign with the power of cursed DX9 floating point rules.
Foz-DB Navi31:
Totals from 3803 (4.86% of 78196) affected shaders:
Instrs: 8436366 -> 8412549 (-0.28%); split: -0.29%, +0.00%
CodeSize: 43174284 -> 43114676 (-0.14%); split: -0.14%, +0.01%
SpillSGPRs: 3241 -> 3247 (+0.19%)
Latency: 66333841 -> 66287361 (-0.07%); split: -0.08%, +0.01%
InvThroughput: 10331902 -> 10316916 (-0.15%); split: -0.15%, +0.01%
VClause: 165455 -> 165472 (+0.01%); split: -0.01%, +0.02%
SClause: 242352 -> 242335 (-0.01%); split: -0.02%, +0.01%
Copies: 604086 -> 605781 (+0.28%); split: -0.04%, +0.32%
Branches: 214017 -> 214013 (-0.00%)
PreSGPRs: 209413 -> 209726 (+0.15%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26765 >
2024-01-02 13:07:30 +01:00
Eric Engestrom
7e8db6aedf
meson: always define {,DRAW_}LLVM_AVAILABLE one way or the other
...
With the usual benefits of `#if` instead of `#ifdef` (mostly the fact
that typos can be build failures instead of silently being interpreted
as if 0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3863 >
2023-12-24 10:01:39 +00:00
Daniel Schürmann
42e9ba1c70
aco: remove VCCZ and EXECZ register handling
...
We don't use these registers and since RDNA3 removed the explicit usage,
it is unlikely that we will properly support them in the future.
Removing the registers from the ACO IR prevents accidentally using them
without proper support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26664 >
2023-12-14 20:08:28 +00:00
Daniel Schürmann
dd7b6898e6
radv: fix number of physical SGPRs on GFX10+
...
This change has no effect.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26521 >
2023-12-11 10:39:51 +00:00
Daniel Schürmann
5ebba87772
aco: rename max_wave64_per_simd -> max_waves_per_simd
...
and update usage. Changes are because the scheduler targets
a different number of waves.
Totals from 195 (0.25% of 79330) affected shaders: (GFX11)
MaxWaves: 3120 -> 3108 (-0.38%)
Instrs: 71202 -> 71070 (-0.19%); split: -0.27%, +0.09%
CodeSize: 383272 -> 382828 (-0.12%); split: -0.21%, +0.10%
VGPRs: 7392 -> 7752 (+4.87%)
Latency: 2280141 -> 2262487 (-0.77%); split: -0.79%, +0.02%
InvThroughput: 4759022 -> 5725442 (+20.31%); split: -0.01%, +20.32%
VClause: 1737 -> 1741 (+0.23%); split: -3.11%, +3.34%
SClause: 2385 -> 2376 (-0.38%); split: -0.80%, +0.42%
Copies: 5257 -> 5274 (+0.32%); split: -0.25%, +0.57%
Branches: 1213 -> 1212 (-0.08%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26521 >
2023-12-11 10:39:50 +00:00
Samuel Pitoiset
8b87c985b0
radv: prepare the PS epilog key for exporting MRTZ on RDNA3
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26413 >
2023-12-06 11:49:31 +00:00
Samuel Pitoiset
81eeb157f8
aco: export depth/stencil/samplemask in create_fs_jump_to_epilog()
...
This currently has no effects because the store_output instructions
are removed earlier (in ac_nir_lower_ps). Though, this will be needed
for exporting MRTZ from PS epilogs for alpha to coverage on RDNA3.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26413 >
2023-12-06 11:49:31 +00:00
Qiang Yu
7656251294
aco: fix set_wqm segfault when ps prolog
...
ps prolog does not have nir shader.
Fixes: 3b10547e67 ("aco: enable helper lanes if shader->info.fs.require_full_quads")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26512 >
2023-12-06 05:34:30 +00:00
Rhys Perry
e110eac171
aco: insert p_end_wqm before p_jump_to_epilog
...
Otherwise, we can transition to exact before p_jump_to_epilog, then
transition to WQM again and then back to exact:
p_jump_to_epilog //transitions to exact
p_logical_end //transitions to wqm
p_end_wqm //transitions to exact
We rely on ssa elimination to clean most of this up.
fossil-db (navi21):
Totals from 1 (0.00% of 79330) affected shaders:
Instrs: 111 -> 110 (-0.90%)
CodeSize: 572 -> 568 (-0.70%)
Copies: 16 -> 15 (-6.25%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25440 >
2023-12-05 21:02:04 +00:00
Rhys Perry
7a37a39fe0
aco: simplify v_mul_* labelling slightly
...
This was from before VALU_instruction existed.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445 >
2023-12-05 16:56:58 +00:00
Rhys Perry
468ee8b80c
aco: implement 16-bit fsat on GFX8
...
GFX8 doesn't have v_med3_f16.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445 >
2023-12-05 16:56:58 +00:00
Rhys Perry
de51a21e26
aco: implement 16-bit derivatives
...
These are used by radeonsi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445 >
2023-12-05 16:56:58 +00:00
Rhys Perry
997a0884a5
aco: implement 16-bit fsign on GFX8
...
GFX8 doesn't have v_med3_i16.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445 >
2023-12-05 16:56:58 +00:00
Rhys Perry
b7725b072b
aco: flush denormals for 16-bit fmin/fmax on GFX8
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445 >
2023-12-05 16:56:57 +00:00
Georg Lehmann
4b9618ceec
aco: add test for post-ra DPP clobbered in linear cfg
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26373 >
2023-11-28 12:48:56 +00:00
Georg Lehmann
576afa8540
aco: don't optimize DPP across more than one block
...
Register write tracking doesn't work for inactive lanes, so this was unsafe.
Foz-DB Navi31:
Totals from 8 (0.01% of 78196) affected shaders:
Instrs: 11513 -> 11515 (+0.02%)
CodeSize: 61056 -> 61064 (+0.01%)
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10197
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26373 >
2023-11-28 12:48:56 +00:00
Daniel Schürmann
3b10547e67
aco: enable helper lanes if shader->info.fs.require_full_quads
...
This enables helper invocations also for lowered quad group operations.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26026 >
2023-11-22 11:32:53 +01:00
Georg Lehmann
0a5d3ac8d2
aco/sched: treat p_dual_src_export_gfx11 like export
...
This prevents the scheduler from moving the dual source export above mrtz
export, which caused hangs.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10173
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26317 >
2023-11-21 18:11:45 +00:00
Samuel Pitoiset
e1345c5295
aco: rename color_exports to exports in create_fs_jump_to_epilog()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26231 >
2023-11-21 08:47:50 +00:00
Qiang Yu
1fabf535fa
aco: handle GL_TEXTURE_RECTANGLE in tg4_integer_workarounds
...
Ported from LLVM side lower_gather4_integer().
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26244 >
2023-11-20 02:59:23 +00:00
Qiang Yu
695fc67baa
aco: set MIMG unrm for GL_TEXTURE_RECTANGLE
...
This fixes VDPAU compositor shaders compiled by ACO.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26244 >
2023-11-20 02:59:23 +00:00
Qiang Yu
dbbf566588
aco,ac/llvm,radeonsi: lower f2f16 to f2f16_rtz in nir
...
No need to handle f2f16 specially for OpenGL, and we can vectorize
f2f16 when using ACO.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990 >
2023-11-20 02:20:17 +00:00
Qiang Yu
5932990e08
aco,radv: add aco_is_nir_op_support_packed_math_16bit
...
To be shared by radeonsi and radv.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990 >
2023-11-20 02:20:17 +00:00
Daniel Schürmann
f2bb7b185d
aco: delete instruction selection for boolean subgroup operations
...
These are now lowered in NIR.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218 >
2023-11-17 09:45:40 +00:00
Daniel Schürmann
88afbbba11
nir: optimize open-coded quadVote* directly to new nir_quad intrinsics
...
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218 >
2023-11-17 09:45:40 +00:00
Connor Abbott
387e698bde
amd: Implement quad_vote intrinsics
...
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218 >
2023-11-17 09:45:40 +00:00
Samuel Pitoiset
d679d12359
aco: remove useless nir_intrinsic_load_force_vrs_rates_amd
...
It's lowered earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26248 >
2023-11-17 08:39:48 +00:00
Rhys Perry
ae30edd2a7
aco: remove f16<->f64 conversions
...
radeonsi and RADV now use nir_lower_fp16_casts.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25566 >
2023-11-16 11:02:31 +00:00
Georg Lehmann
b12d7f10d4
aco: validate ALU operands and defs
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
91539713bb
aco: add src/def count and size for all ALU opcodes
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
d9c3ba3b90
aco: use correct operand size for int tg4 wa
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
1d167d187e
aco/gfx10+: don't use v_cmpx with VCC def
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
509ce19643
aco: add missing scc def for SALU quad broadcast
...
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
18f6c2328f
aco: use lm for carry out in vsub32
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
9acd9c0100
aco/tests: use correct operand size for some 64bit ops
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
6a136b4e05
aco/tests: add some missing scc defs
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
2f4e53b22a
aco: fix detecting sgprs read by SMEM hazard
...
s_waitcnt_lgkmcnt is SOPK, not SOPP and there are other SOPK instructions
that don't mitigate the hazard.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
e49c413a86
aco: use null operand for SOPK s_waitcnt
...
Both null def and op result in the same correct encoding, but these
instructions optionally read a sgpr, so it makes more sense to use an operand.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Rhys Perry
b70c235e4a
aco: skip LS VGPR initialization bug workaround if the prolog exists
...
Otherwise, we would do this twice.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111 >
2023-11-13 12:09:55 +00:00