Connor Abbott
387e698bde
amd: Implement quad_vote intrinsics
...
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/218 >
2023-11-17 09:45:40 +00:00
Samuel Pitoiset
d679d12359
aco: remove useless nir_intrinsic_load_force_vrs_rates_amd
...
It's lowered earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26248 >
2023-11-17 08:39:48 +00:00
Rhys Perry
ae30edd2a7
aco: remove f16<->f64 conversions
...
radeonsi and RADV now use nir_lower_fp16_casts.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25566 >
2023-11-16 11:02:31 +00:00
Georg Lehmann
b12d7f10d4
aco: validate ALU operands and defs
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
91539713bb
aco: add src/def count and size for all ALU opcodes
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
d9c3ba3b90
aco: use correct operand size for int tg4 wa
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
1d167d187e
aco/gfx10+: don't use v_cmpx with VCC def
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
509ce19643
aco: add missing scc def for SALU quad broadcast
...
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
18f6c2328f
aco: use lm for carry out in vsub32
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
9acd9c0100
aco/tests: use correct operand size for some 64bit ops
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
6a136b4e05
aco/tests: add some missing scc defs
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
2f4e53b22a
aco: fix detecting sgprs read by SMEM hazard
...
s_waitcnt_lgkmcnt is SOPK, not SOPP and there are other SOPK instructions
that don't mitigate the hazard.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Georg Lehmann
e49c413a86
aco: use null operand for SOPK s_waitcnt
...
Both null def and op result in the same correct encoding, but these
instructions optionally read a sgpr, so it makes more sense to use an operand.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26163 >
2023-11-15 12:35:32 +00:00
Rhys Perry
b70c235e4a
aco: skip LS VGPR initialization bug workaround if the prolog exists
...
Otherwise, we would do this twice.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111 >
2023-11-13 12:09:55 +00:00
Rhys Perry
967c52097e
aco: workaround LS VGPR initialization bug in RADV prologs
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111 >
2023-11-13 12:09:53 +00:00
Tatsuyuki Ishi
55d21f2f12
radv, aco: Inline struct aco_vs_input_state.
...
Now that we no longer use the radv_vs_input_state pointer, we can simply
inline all the state-related fields.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023 >
2023-11-13 11:47:42 +00:00
Tatsuyuki Ishi
3fc3a94bce
radv, aco: Rework VS prolog key handling.
...
The main change is to use struct radv_vs_prolog_key directly instead of
the compressed representation to simplify an upcoming rework in prolog /
epilog caching. In doing so the state struct pointer was replaced with
an inline struct.
Care was also taken to pre-mask all the states with the active attribute
mask and other masks when it makes sense; this ensures that we don't
accidentally use information not hashed into the key during compilation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023 >
2023-11-13 11:47:42 +00:00
Tatsuyuki Ishi
d8a5b76307
aco: Replace aco_vs_input_state.divisors with bitfields.
...
Instead of concrete divisor value, we only pass down the information
whether the divisor is zero or nontrivial (>1).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023 >
2023-11-13 11:47:41 +00:00
Friedrich Vock
02942d6e7e
aco: Update printed block kinds
...
Two were missing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26103 >
2023-11-09 09:58:28 +00:00
Georg Lehmann
b33aa7b01a
aco: don't CSE v_permlane across exec
...
With bc=1 and fi=0 it needs to return 0 for inactive lanes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26045 >
2023-11-08 22:02:20 +00:00
Rhys Perry
09eb6e3106
aco/tests: fix tests with LLVM 18
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26092 >
2023-11-08 10:40:17 +00:00
Rhys Perry
e4d9f6fb50
aco/tests: fix tests with LLVM 17
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10106
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26092 >
2023-11-08 10:40:17 +00:00
Georg Lehmann
6cd78281f6
aco: deduplicate Format definition
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943 >
2023-11-06 23:16:38 +00:00
Georg Lehmann
6e0bf33a89
aco: deduplicate instr_class definition
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943 >
2023-11-06 23:16:38 +00:00
Georg Lehmann
bdd81c6be7
aco: namespace aco_opcode
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943 >
2023-11-06 23:16:38 +00:00
Georg Lehmann
1b9a3b7466
aco: stop using cstdint
...
We use stdint.h everywhere else.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25943 >
2023-11-06 23:16:38 +00:00
Georg Lehmann
04956d54ce
aco: force uniform result for LDS load with uniform address if it can be non uniform
...
Because a LDS load is 2 separate loads on gfx10+ with wave64, a different wave
can write LDS in between and cause a non uniform result. Use v_readfirst_lane
instead of p_as_uniform because it cannot be copy propagated.
Fixes a OpenCL CTS test with zink+rusticl.
Totals from 136 (0.17% of 78196) affected shaders:
MaxWaves: 3236 -> 3244 (+0.25%)
Instrs: 130069 -> 131221 (+0.89%)
CodeSize: 698048 -> 703436 (+0.77%)
VGPRs: 5464 -> 5440 (-0.44%)
SpillSGPRs: 94 -> 96 (+2.13%)
Latency: 5361017 -> 5363781 (+0.05%); split: -0.00%, +0.05%
InvThroughput: 883010 -> 884100 (+0.12%)
SClause: 3822 -> 3821 (-0.03%); split: -0.05%, +0.03%
Copies: 14220 -> 14314 (+0.66%); split: -0.01%, +0.68%
Branches: 4549 -> 4551 (+0.04%)
PreSGPRs: 4934 -> 4940 (+0.12%)
PreVGPRs: 4666 -> 4655 (-0.24%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25973 >
2023-11-06 22:43:33 +00:00
Georg Lehmann
ab87831ae8
aco, radv: vectorize f2f16 if rounding mode is rtz
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25952 >
2023-11-06 21:05:34 +00:00
Rhys Perry
b18f0dec41
aco: collect Pre-Sched SGPRs/VGPRs before spilling
...
The usage after spilling is usually either the same as before or the
maximum.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25559 >
2023-11-01 19:41:30 +00:00
Rhys Perry
d200916ca2
aco: add VALU/SALU/VMEM/SMEM statistics
...
This lets us measure optimizations without interference of waitcnt
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25559 >
2023-11-01 19:41:30 +00:00
Samuel Pitoiset
0477346c0b
aco: remove dead code in nir_intrinsic_xfb_counter_{add,sub}_amd
...
This code path is only used by GFX11 now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25903 >
2023-10-30 08:51:51 +00:00
Qiang Yu
23cb6768cb
aco: add aco_is_gpu_supported
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
9c63138ae3
aco: stop emit s_endpgm for first stage of merged shader
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
14022a3a0e
aco: move end program handling to select_shader
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
f3f2311d69
aco: extend max operands in a instruction to 128
...
We get more than 64 operands in p_end_with_regs when radeonsi.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
e2af0b0b3f
aco: add create_end_for_merged_shader
...
For radeonsi merged shader LS/ES part to pass args to next
stage.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Qiang Yu
71fd3c2a35
aco: do not fix_exports when separately compiled ngg vs or es
...
For radeonsi not abort when this case, as it does not jump at
last.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631 >
2023-10-26 02:01:49 +00:00
Rhys Perry
4c3677094e
aco,nir: add export_row_amd intrinsic
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25040 >
2023-10-24 21:36:06 +00:00
Bas Nieuwenhuizen
d8458c0559
aco: Make RA understand WMMA instructions.
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 >
2023-10-24 13:24:18 +00:00
Bas Nieuwenhuizen
5e7c828c0e
aco: Add WMMA instructions.
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 >
2023-10-24 13:24:18 +00:00
Samuel Pitoiset
f0abdaea9f
amd/llvm,aco,radv: implement NGG streamout with GDS_STRMOUT registers on GFX11
...
According to RadeonSI, this is required for preemption, user queues,
and we only have to wait for VS after streamout which should be more
performant.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25284 >
2023-10-12 16:58:22 +00:00
Rhys Perry
15a3515d0b
aco/tests: test that hazards are resolved at the end of shader parts
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
690cb0b211
aco: resolve all possible hazards at the end of shader parts
...
fossil-db (vega10):
Totals from 1266 (2.01% of 63055) affected shaders:
Instrs: 707116 -> 708382 (+0.18%)
CodeSize: 3512452 -> 3517516 (+0.14%)
Latency: 6661724 -> 6666788 (+0.08%)
InvThroughput: 4393626 -> 4393904 (+0.01%); split: -0.00%, +0.01%
fossil-db (navi10):
Totals from 1305 (2.07% of 63015) affected shaders:
Instrs: 719699 -> 722009 (+0.32%)
CodeSize: 3650836 -> 3660076 (+0.25%)
Latency: 5691633 -> 5693933 (+0.04%)
InvThroughput: 1532010 -> 1532024 (+0.00%); split: -0.00%, +0.00%
fossil-db (navi31):
Totals from 1580 (1.99% of 79332) affected shaders:
Instrs: 1678242 -> 1679879 (+0.10%)
CodeSize: 8463464 -> 8470168 (+0.08%)
Latency: 14273661 -> 14275298 (+0.01%)
InvThroughput: 3668049 -> 3668080 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
e4842c0270
aco: consider exec_hi in reads_exec()
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
ed3ca5b781
aco: fix s_setreg hazards
...
s_setreg doesn't have any definitions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
ce27875f09
aco: only mitigate VcmpxExecWARHazard when necessary
...
fossil-db (navi10):
Totals from 5059 (8.03% of 63015) affected shaders:
Instrs: 7384947 -> 7351196 (-0.46%)
CodeSize: 39393180 -> 39299196 (-0.24%); split: -0.28%, +0.04%
Latency: 119683018 -> 119585224 (-0.08%); split: -0.08%, +0.00%
InvThroughput: 29647188 -> 29623895 (-0.08%); split: -0.08%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Rhys Perry
a73f76750b
aco: fix LdsDirectVMEMHazard WaW with the wrong waitcnt
...
Seems we missed this case.
fossil-db (navi31):
Totals from 24 (0.03% of 79332) affected shaders:
Instrs: 3562 -> 3538 (-0.67%)
CodeSize: 18740 -> 18644 (-0.51%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2cdb3e4b6b ("aco: add VMEMtoScalarWriteHazard tests")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25374 >
2023-10-11 15:14:04 +00:00
Alyssa Rosenzweig
c39896b17b
nir: Use getters for nir_src::parent_*
...
First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671 >
2023-10-10 04:58:05 -04:00
Qiang Yu
5ef7c54829
aco: wait memory ops done before go to next shader part
...
Next part don't know whether p_end_with_regs args are loaded from
memory ops or not, need to wait it's done here.
Other memory load needs to be waited too like:
a = load_mem()
b = ...
if (...) {
wait_mem(a)
store_mem(a)
}
p_end_with_regs(b)
"a" still needs to be waited, otherwise next shader part regs may
be overwritten by unfinished memory loads.
Memory stores are waited too. When >=gfx10 and last VGT has no
parameter export, we need to wait all memeory stores done before
pos export (see ac_nir_export_position). So when merged shader
(ES+GS or VS+GS) is partially built, first stage needs to wait
all memory stores done, otherwise second stage don't know if
any memory stores pending before.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signe-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00
Qiang Yu
5ba68f92b4
aco: create exit block for p_end_with_regs to branch to
...
To handle ps discard in radeonsi part mode shader.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973 >
2023-10-10 02:36:34 +00:00