Commit graph

2816 commits

Author SHA1 Message Date
Samuel Pitoiset
8b87c985b0 radv: prepare the PS epilog key for exporting MRTZ on RDNA3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26413>
2023-12-06 11:49:31 +00:00
Samuel Pitoiset
81eeb157f8 aco: export depth/stencil/samplemask in create_fs_jump_to_epilog()
This currently has no effects because the store_output instructions
are removed earlier (in ac_nir_lower_ps). Though, this will be needed
for exporting MRTZ from PS epilogs for alpha to coverage on RDNA3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26413>
2023-12-06 11:49:31 +00:00
Qiang Yu
7656251294 aco: fix set_wqm segfault when ps prolog
ps prolog does not have nir shader.

Fixes: 3b10547e67 ("aco: enable helper lanes if shader->info.fs.require_full_quads")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26512>
2023-12-06 05:34:30 +00:00
Rhys Perry
e110eac171 aco: insert p_end_wqm before p_jump_to_epilog
Otherwise, we can transition to exact before p_jump_to_epilog, then
transition to WQM again and then back to exact:
p_jump_to_epilog //transitions to exact
p_logical_end //transitions to wqm
p_end_wqm //transitions to exact

We rely on ssa elimination to clean most of this up.

fossil-db (navi21):
Totals from 1 (0.00% of 79330) affected shaders:
Instrs: 111 -> 110 (-0.90%)
CodeSize: 572 -> 568 (-0.70%)
Copies: 16 -> 15 (-6.25%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25440>
2023-12-05 21:02:04 +00:00
Rhys Perry
7a37a39fe0 aco: simplify v_mul_* labelling slightly
This was from before VALU_instruction existed.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445>
2023-12-05 16:56:58 +00:00
Rhys Perry
468ee8b80c aco: implement 16-bit fsat on GFX8
GFX8 doesn't have v_med3_f16.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445>
2023-12-05 16:56:58 +00:00
Rhys Perry
de51a21e26 aco: implement 16-bit derivatives
These are used by radeonsi.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445>
2023-12-05 16:56:58 +00:00
Rhys Perry
997a0884a5 aco: implement 16-bit fsign on GFX8
GFX8 doesn't have v_med3_i16.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445>
2023-12-05 16:56:58 +00:00
Rhys Perry
b7725b072b aco: flush denormals for 16-bit fmin/fmax on GFX8
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26445>
2023-12-05 16:56:57 +00:00
Georg Lehmann
4b9618ceec aco: add test for post-ra DPP clobbered in linear cfg
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26373>
2023-11-28 12:48:56 +00:00
Georg Lehmann
576afa8540 aco: don't optimize DPP across more than one block
Register write tracking doesn't work for inactive lanes, so this was unsafe.

Foz-DB Navi31:
Totals from 8 (0.01% of 78196) affected shaders:
Instrs: 11513 -> 11515 (+0.02%)
CodeSize: 61056 -> 61064 (+0.01%)

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10197
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26373>
2023-11-28 12:48:56 +00:00
Daniel Schürmann
3b10547e67 aco: enable helper lanes if shader->info.fs.require_full_quads
This enables helper invocations also for lowered quad group operations.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26026>
2023-11-22 11:32:53 +01:00
Georg Lehmann
0a5d3ac8d2 aco/sched: treat p_dual_src_export_gfx11 like export
This prevents the scheduler from moving the dual source export above mrtz
export, which caused hangs.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10173

Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26317>
2023-11-21 18:11:45 +00:00
Samuel Pitoiset
e1345c5295 aco: rename color_exports to exports in create_fs_jump_to_epilog()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26231>
2023-11-21 08:47:50 +00:00
Qiang Yu
1fabf535fa aco: handle GL_TEXTURE_RECTANGLE in tg4_integer_workarounds
Ported from LLVM side lower_gather4_integer().

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26244>
2023-11-20 02:59:23 +00:00
Qiang Yu
695fc67baa aco: set MIMG unrm for GL_TEXTURE_RECTANGLE
This fixes VDPAU compositor shaders compiled by ACO.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26244>
2023-11-20 02:59:23 +00:00
Qiang Yu
dbbf566588 aco,ac/llvm,radeonsi: lower f2f16 to f2f16_rtz in nir
No need to handle f2f16 specially for OpenGL, and we can vectorize
f2f16 when using ACO.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990>
2023-11-20 02:20:17 +00:00
Qiang Yu
5932990e08 aco,radv: add aco_is_nir_op_support_packed_math_16bit
To be shared by radeonsi and radv.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25990>
2023-11-20 02:20:17 +00:00
Daniel Schürmann
f2bb7b185d aco: delete instruction selection for boolean subgroup operations
These are now lowered in NIR.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218>
2023-11-17 09:45:40 +00:00
Daniel Schürmann
88afbbba11 nir: optimize open-coded quadVote* directly to new nir_quad intrinsics
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218>
2023-11-17 09:45:40 +00:00
Connor Abbott
387e698bde amd: Implement quad_vote intrinsics
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218>
2023-11-17 09:45:40 +00:00
Samuel Pitoiset
d679d12359 aco: remove useless nir_intrinsic_load_force_vrs_rates_amd
It's lowered earlier.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26248>
2023-11-17 08:39:48 +00:00
Rhys Perry
ae30edd2a7 aco: remove f16<->f64 conversions
radeonsi and RADV now use nir_lower_fp16_casts.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25566>
2023-11-16 11:02:31 +00:00
Georg Lehmann
b12d7f10d4 aco: validate ALU operands and defs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
91539713bb aco: add src/def count and size for all ALU opcodes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
d9c3ba3b90 aco: use correct operand size for int tg4 wa
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
1d167d187e aco/gfx10+: don't use v_cmpx with VCC def
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
509ce19643 aco: add missing scc def for SALU quad broadcast
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
18f6c2328f aco: use lm for carry out in vsub32
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
9acd9c0100 aco/tests: use correct operand size for some 64bit ops
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
6a136b4e05 aco/tests: add some missing scc defs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
2f4e53b22a aco: fix detecting sgprs read by SMEM hazard
s_waitcnt_lgkmcnt is SOPK, not SOPP and there are other SOPK instructions
that don't mitigate the hazard.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Georg Lehmann
e49c413a86 aco: use null operand for SOPK s_waitcnt
Both null def and op result in the same correct encoding, but these
instructions optionally read a sgpr, so it makes more sense to use an operand.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163>
2023-11-15 12:35:32 +00:00
Rhys Perry
b70c235e4a aco: skip LS VGPR initialization bug workaround if the prolog exists
Otherwise, we would do this twice.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111>
2023-11-13 12:09:55 +00:00
Rhys Perry
967c52097e aco: workaround LS VGPR initialization bug in RADV prologs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111>
2023-11-13 12:09:53 +00:00
Tatsuyuki Ishi
55d21f2f12 radv, aco: Inline struct aco_vs_input_state.
Now that we no longer use the radv_vs_input_state pointer, we can simply
inline all the state-related fields.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
2023-11-13 11:47:42 +00:00
Tatsuyuki Ishi
3fc3a94bce radv, aco: Rework VS prolog key handling.
The main change is to use struct radv_vs_prolog_key directly instead of
the compressed representation to simplify an upcoming rework in prolog /
epilog caching. In doing so the state struct pointer was replaced with
an inline struct.

Care was also taken to pre-mask all the states with the active attribute
mask and other masks when it makes sense; this ensures that we don't
accidentally use information not hashed into the key during compilation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
2023-11-13 11:47:42 +00:00
Tatsuyuki Ishi
d8a5b76307 aco: Replace aco_vs_input_state.divisors with bitfields.
Instead of concrete divisor value, we only pass down the information
whether the divisor is zero or nontrivial (>1).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
2023-11-13 11:47:41 +00:00
Friedrich Vock
02942d6e7e aco: Update printed block kinds
Two were missing.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26103>
2023-11-09 09:58:28 +00:00
Georg Lehmann
b33aa7b01a aco: don't CSE v_permlane across exec
With bc=1 and fi=0 it needs to return 0 for inactive lanes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26045>
2023-11-08 22:02:20 +00:00
Rhys Perry
09eb6e3106 aco/tests: fix tests with LLVM 18
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26092>
2023-11-08 10:40:17 +00:00
Rhys Perry
e4d9f6fb50 aco/tests: fix tests with LLVM 17
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10106
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26092>
2023-11-08 10:40:17 +00:00
Georg Lehmann
6cd78281f6 aco: deduplicate Format definition
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943>
2023-11-06 23:16:38 +00:00
Georg Lehmann
6e0bf33a89 aco: deduplicate instr_class definition
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943>
2023-11-06 23:16:38 +00:00
Georg Lehmann
bdd81c6be7 aco: namespace aco_opcode
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943>
2023-11-06 23:16:38 +00:00
Georg Lehmann
1b9a3b7466 aco: stop using cstdint
We use stdint.h everywhere else.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943>
2023-11-06 23:16:38 +00:00
Georg Lehmann
04956d54ce aco: force uniform result for LDS load with uniform address if it can be non uniform
Because a LDS load is 2 separate loads on gfx10+ with wave64, a different wave
can write LDS in between and cause a non uniform result. Use v_readfirst_lane
instead of p_as_uniform because it cannot be copy propagated.

Fixes a OpenCL CTS test with zink+rusticl.

Totals from 136 (0.17% of 78196) affected shaders:
MaxWaves: 3236 -> 3244 (+0.25%)
Instrs: 130069 -> 131221 (+0.89%)
CodeSize: 698048 -> 703436 (+0.77%)
VGPRs: 5464 -> 5440 (-0.44%)
SpillSGPRs: 94 -> 96 (+2.13%)
Latency: 5361017 -> 5363781 (+0.05%); split: -0.00%, +0.05%
InvThroughput: 883010 -> 884100 (+0.12%)
SClause: 3822 -> 3821 (-0.03%); split: -0.05%, +0.03%
Copies: 14220 -> 14314 (+0.66%); split: -0.01%, +0.68%
Branches: 4549 -> 4551 (+0.04%)
PreSGPRs: 4934 -> 4940 (+0.12%)
PreVGPRs: 4666 -> 4655 (-0.24%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25973>
2023-11-06 22:43:33 +00:00
Georg Lehmann
ab87831ae8 aco, radv: vectorize f2f16 if rounding mode is rtz
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25952>
2023-11-06 21:05:34 +00:00
Rhys Perry
b18f0dec41 aco: collect Pre-Sched SGPRs/VGPRs before spilling
The usage after spilling is usually either the same as before or the
maximum.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25559>
2023-11-01 19:41:30 +00:00
Rhys Perry
d200916ca2 aco: add VALU/SALU/VMEM/SMEM statistics
This lets us measure optimizations without interference of waitcnt
instructions.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25559>
2023-11-01 19:41:30 +00:00