Commit graph

1073 commits

Author SHA1 Message Date
Rhys Perry
b70c235e4a aco: skip LS VGPR initialization bug workaround if the prolog exists
Otherwise, we would do this twice.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111>
2023-11-13 12:09:55 +00:00
Rhys Perry
967c52097e aco: workaround LS VGPR initialization bug in RADV prologs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26111>
2023-11-13 12:09:53 +00:00
Tatsuyuki Ishi
55d21f2f12 radv, aco: Inline struct aco_vs_input_state.
Now that we no longer use the radv_vs_input_state pointer, we can simply
inline all the state-related fields.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
2023-11-13 11:47:42 +00:00
Tatsuyuki Ishi
3fc3a94bce radv, aco: Rework VS prolog key handling.
The main change is to use struct radv_vs_prolog_key directly instead of
the compressed representation to simplify an upcoming rework in prolog /
epilog caching. In doing so the state struct pointer was replaced with
an inline struct.

Care was also taken to pre-mask all the states with the active attribute
mask and other masks when it makes sense; this ensures that we don't
accidentally use information not hashed into the key during compilation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
2023-11-13 11:47:42 +00:00
Tatsuyuki Ishi
d8a5b76307 aco: Replace aco_vs_input_state.divisors with bitfields.
Instead of concrete divisor value, we only pass down the information
whether the divisor is zero or nontrivial (>1).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
2023-11-13 11:47:41 +00:00
Georg Lehmann
04956d54ce aco: force uniform result for LDS load with uniform address if it can be non uniform
Because a LDS load is 2 separate loads on gfx10+ with wave64, a different wave
can write LDS in between and cause a non uniform result. Use v_readfirst_lane
instead of p_as_uniform because it cannot be copy propagated.

Fixes a OpenCL CTS test with zink+rusticl.

Totals from 136 (0.17% of 78196) affected shaders:
MaxWaves: 3236 -> 3244 (+0.25%)
Instrs: 130069 -> 131221 (+0.89%)
CodeSize: 698048 -> 703436 (+0.77%)
VGPRs: 5464 -> 5440 (-0.44%)
SpillSGPRs: 94 -> 96 (+2.13%)
Latency: 5361017 -> 5363781 (+0.05%); split: -0.00%, +0.05%
InvThroughput: 883010 -> 884100 (+0.12%)
SClause: 3822 -> 3821 (-0.03%); split: -0.05%, +0.03%
Copies: 14220 -> 14314 (+0.66%); split: -0.01%, +0.68%
Branches: 4549 -> 4551 (+0.04%)
PreSGPRs: 4934 -> 4940 (+0.12%)
PreVGPRs: 4666 -> 4655 (-0.24%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25973>
2023-11-06 22:43:33 +00:00
Georg Lehmann
ab87831ae8 aco, radv: vectorize f2f16 if rounding mode is rtz
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25952>
2023-11-06 21:05:34 +00:00
Samuel Pitoiset
0477346c0b aco: remove dead code in nir_intrinsic_xfb_counter_{add,sub}_amd
This code path is only used by GFX11 now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25903>
2023-10-30 08:51:51 +00:00
Qiang Yu
9c63138ae3 aco: stop emit s_endpgm for first stage of merged shader
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631>
2023-10-26 02:01:49 +00:00
Qiang Yu
14022a3a0e aco: move end program handling to select_shader
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631>
2023-10-26 02:01:49 +00:00
Qiang Yu
e2af0b0b3f aco: add create_end_for_merged_shader
For radeonsi merged shader LS/ES part to pass args to next
stage.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631>
2023-10-26 02:01:49 +00:00
Qiang Yu
71fd3c2a35 aco: do not fix_exports when separately compiled ngg vs or es
For radeonsi not abort when this case, as it does not jump at
last.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25631>
2023-10-26 02:01:49 +00:00
Rhys Perry
4c3677094e aco,nir: add export_row_amd intrinsic
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25040>
2023-10-24 21:36:06 +00:00
Bas Nieuwenhuizen
5e7c828c0e aco: Add WMMA instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683>
2023-10-24 13:24:18 +00:00
Samuel Pitoiset
f0abdaea9f amd/llvm,aco,radv: implement NGG streamout with GDS_STRMOUT registers on GFX11
According to RadeonSI, this is required for preemption, user queues,
and we only have to wait for VS after streamout which should be more
performant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25284>
2023-10-12 16:58:22 +00:00
Qiang Yu
6eb0910d45 aco: do not fix_exports when program has epilog
PS with epilog does not need to fix_exports. And radeonsi use
p_end_with_regs so does not have jump instruction at last.

radeonsi may also have exec restore instruction, so may break
before reach to p_end_with_regs.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
c9c18d3da5 aco: compact ps expilog color export for radeonsi
radeonsi need to compact color export for ps epilog while radv does not.
radv will fill empty color slot, so won't affected by this change.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
1517aa7a8a aco,radv: add radeonsi spec ps epilog code
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
9972a385fb aco: simplify export_fs_mrt_color
It's now used by ps epilog only.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
77d5966661 aco,radv: rename ps epilog info inputs to colors
Will add other mrtz args for radeonsi.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:34 +00:00
Qiang Yu
57b0f19582 aco: add create_fs_end_for_epilog for radeonsi
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:33 +00:00
Qiang Yu
90c901a987 aco: handle ps outputs from radeonsi
radeonsi will keep outputs <FRAG_RESULT_DATA0.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:33 +00:00
Qiang Yu
49250f9fc5 aco: add ps prolog generation for radeonsi
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:33 +00:00
Qiang Yu
80728a2e71 aco: do not eliminate final exec write when p_end_with_regs block
p_end_with_regs just partially end the program, next part need
exec mask to be set correctly. For example p_end_wqm will generate
a exec restore from WQM mode after p_end_with_regs.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:33 +00:00
Georg Lehmann
34d8fa6185 aco/gfx11: optimize dual source export
We can combine dpp with the v_cndmask_b32.

Foz-DB Navi31:
Totals from 222 (0.28% of 79330) affected shaders:
Instrs: 564392 -> 563373 (-0.18%); split: -0.19%, +0.01%
CodeSize: 2867040 -> 2864728 (-0.08%); split: -0.09%, +0.01%
Latency: 4278957 -> 4275199 (-0.09%); split: -0.09%, +0.00%
InvThroughput: 586636 -> 585824 (-0.14%); split: -0.14%, +0.00%
SClause: 20210 -> 20211 (+0.00%); split: -0.02%, +0.02%
Copies: 39763 -> 39778 (+0.04%); split: -0.13%, +0.17%
PreVGPRs: 13924 -> 13922 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25541>
2023-10-05 10:37:34 +00:00
Rhys Perry
1a268dc59d aco: disable FI for quad/masked swizzle
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8330
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525>
2023-10-04 18:53:43 +00:00
Rhys Perry
26fce534b5 aco: shrink DPP8_instruction
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25525>
2023-10-04 18:53:43 +00:00
Georg Lehmann
79b767f4c0 aco: remove -0.0 for 32 bit fsign with mul_legacy/omod when denorms are flushed
v_mul_legacy_f32 and omod return +0.0 if any operand is +0.0/-0.0.

Foz-DB Navi21:
Totals from 4289 (5.60% of 76572) affected shaders:
Instrs: 8100571 -> 8099319 (-0.02%); split: -0.02%, +0.00%
CodeSize: 43433200 -> 43435088 (+0.00%); split: -0.01%, +0.01%
Latency: 88151566 -> 88147232 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 22966705 -> 22965192 (-0.01%); split: -0.01%, +0.00%
VClause: 190010 -> 190009 (-0.00%)
SClause: 269697 -> 269689 (-0.00%)
Copies: 687294 -> 687296 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25347>
2023-10-02 14:02:49 +00:00
Rhys Perry
558738e3c5 aco: remove zero offset optimization
This is done in nir_opt_constant_folding now.

No fossil-db changes on navi31.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25477>
2023-10-02 10:11:37 +00:00
Rhys Perry
c3a894fb47 aco: disable zero offset optimization for strict WQM coords
If we try to do this, we end up using {undef,coordx} as the coordinates
for an image_sample instruction, because we can't shrink the linear VGPR.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9767
Fixes: 859e059aa9 ("radv: use fix_derivs_in_divergent_cf")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25477>
2023-10-02 10:11:37 +00:00
Georg Lehmann
b91616e800 aco: implement 64bit div find_lsb
This can be selected for divergent subgroupBallotFindLSB.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25407>
2023-09-27 14:47:42 +00:00
Georg Lehmann
336ec2a4b4 aco: simplify masked swizzle dpp selection by removing or_mask first
and_mask and xor_mask alone can represent all patterns without or_mask

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25115>
2023-09-21 10:07:27 +00:00
Connor Abbott
c93bcb32fe amd: Use inverse ballot intrinsic if available
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
2023-09-20 14:41:18 +00:00
Daniel Schürmann
45f6d38a76 aco: insert a single p_end_wqm after the last derivative calculation
This new instruction replaces p_wqm.

Totals from 28065 (36.65% of 76572) affected shaders: (GFX11)
MaxWaves: 823922 -> 823952 (+0.00%); split: +0.01%, -0.01%
Instrs: 22221375 -> 22180465 (-0.18%); split: -0.26%, +0.08%
CodeSize: 117310676 -> 117040684 (-0.23%); split: -0.30%, +0.07%
VGPRs: 1183476 -> 1186656 (+0.27%); split: -0.19%, +0.46%
SpillSGPRs: 2305 -> 2302 (-0.13%)
Latency: 176559310 -> 176427793 (-0.07%); split: -0.21%, +0.14%
InvThroughput: 26245204 -> 26195550 (-0.19%); split: -0.26%, +0.07%
VClause: 368058 -> 369460 (+0.38%); split: -0.21%, +0.59%
SClause: 857077 -> 842588 (-1.69%); split: -2.06%, +0.37%
Copies: 1245650 -> 1249434 (+0.30%); split: -0.33%, +0.63%
Branches: 394837 -> 396070 (+0.31%); split: -0.01%, +0.32%
PreSGPRs: 1019139 -> 1019567 (+0.04%); split: -0.02%, +0.06%
PreVGPRs: 925739 -> 931860 (+0.66%); split: -0.00%, +0.66%

Changes are due to scheduling and re-enabling cross-lane optimizations.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann
28904839da aco: don't insert a copy when emitting p_wqm
Totals from 351 (0.46% of 76572) affected shaders: (GFX11)

Instrs: 709202 -> 709600 (+0.06%); split: -0.02%, +0.08%
CodeSize: 3606364 -> 3608040 (+0.05%); split: -0.01%, +0.06%
Latency: 3589841 -> 3590756 (+0.03%); split: -0.01%, +0.03%
InvThroughput: 463303 -> 463324 (+0.00%)
SClause: 28147 -> 28201 (+0.19%); split: -0.02%, +0.22%
Copies: 43243 -> 43204 (-0.09%); split: -0.24%, +0.15%
PreSGPRs: 21028 -> 21042 (+0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Daniel Schürmann
040142684c aco: make p_wqm a marker instruction without Operands/Definitions
Totals from 28277 (36.93% of 76572) affected shaders: (GFX11)

MaxWaves: 833930 -> 833898 (-0.00%); split: +0.01%, -0.01%
Instrs: 21366950 -> 21353346 (-0.06%); split: -0.11%, +0.05%
CodeSize: 112855368 -> 112610508 (-0.22%); split: -0.24%, +0.03%
VGPRs: 1157748 -> 1158540 (+0.07%); split: -0.10%, +0.17%
SpillSGPRs: 2465 -> 2463 (-0.08%); split: -0.16%, +0.08%
Latency: 168339886 -> 168383646 (+0.03%); split: -0.10%, +0.12%
InvThroughput: 25164895 -> 25158376 (-0.03%); split: -0.08%, +0.06%
VClause: 347660 -> 346256 (-0.40%); split: -0.55%, +0.15%
SClause: 794460 -> 799521 (+0.64%); split: -0.33%, +0.97%
Copies: 1151908 -> 1148370 (-0.31%); split: -0.54%, +0.23%
Branches: 359447 -> 359437 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Rhys Perry
41b6020ff3 aco: remove fast path in insert_exec_mask's process_instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Daniel Schürmann
0e8192a76b aco: append p_logical_end after monolithic RT shaders
Fixes: bdec044c88 ('aco: Do not fixup registers if there are no shader calls')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Georg Lehmann
524a894ba4 aco/gfx11: don't use bfe for local_invocation_id if the others are always 0
Foz-DB GFX1100:
Totals from 4469 (3.37% of 132657) affected shaders:
Instrs: 3895053 -> 3893529 (-0.04%); split: -0.04%, +0.00%
CodeSize: 20244128 -> 20220952 (-0.11%); split: -0.11%, +0.00%
Latency: 37864147 -> 37862227 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 5578100 -> 5576469 (-0.03%); split: -0.03%, +0.00%
SClause: 108336 -> 108343 (+0.01%); split: -0.00%, +0.01%
Copies: 275897 -> 275900 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24514>
2023-09-08 11:28:24 +00:00
Qiang Yu
b5eaec6c80 aco,radv,radeonsi: rename is_monolithic to merged_shader_compiled_separately
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24990>
2023-09-04 10:53:44 +08:00
Georg Lehmann
2ae94b3894 aco: implement some exclusive scans with inclusive scans
exclusive scan lowering uses full wave shift, for iadd/ixor it's faster
to do inclusive scans and subtract/xor the thread's source.

Foz-DB Navi21:
Totals from 21 (0.02% of 132657) affected shaders:
Instrs: 10925 -> 10727 (-1.81%)
CodeSize: 58064 -> 56488 (-2.71%)
Latency: 178471 -> 177928 (-0.30%)
InvThroughput: 24374 -> 24145 (-0.94%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24555>
2023-09-02 11:42:22 +00:00
Konstantin Seurer
2a5d8d4cf4 aco: Unify demote and demote_if selection
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24906>
2023-09-01 07:23:33 +00:00
Konstantin Seurer
9af91edda9 aco: Use bytes() instead of size() in emit_wqm
This should get most of the cases that would fail validation.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24906>
2023-09-01 07:23:33 +00:00
Samuel Pitoiset
e104718c9f aco: allow separate compilation of NGG shaders
Also prevent to emit a long-jump for VS as NGG without a GS.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24907>
2023-09-01 06:52:40 +00:00
Samuel Pitoiset
bfb39031f1 aco: flag blocks with long-jump as export_end for separate compilation
This will allow us to adjust fix_exports().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24907>
2023-09-01 06:52:40 +00:00
Qiang Yu
a2ba50aee6 aco: add vs prolog instruction selection for radeonsi
Port from llvm si_llvm_build_vs_prolog().

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24712>
2023-08-31 02:39:15 +00:00
Qiang Yu
3f87413811 aco: prepare fix_ls_vgpr_init_bug to be used by gl vs prolog
Prolog does not have nir shader.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24712>
2023-08-31 02:39:15 +00:00
Qiang Yu
eb4f871034 aco: pass sw_stage when setup_isel_context
We are going to add more shader parts.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24712>
2023-08-31 02:39:15 +00:00
Samuel Pitoiset
f30d47c518 aco: fix emitting TCS epilogs end on GFX9+
With merged shaders, the long-jump should be emitted inside the
divergent if (ie. only for TCS invocations) and other non TCS
invocations should just end the program.

This fixes a bunch of failures with CTS by forcing TCS epilogs on
RDNA2.

Not sure how RadeonSI will handle that but maybe doing the merged
wave info thing in epilogs would help.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24832>
2023-08-29 09:30:07 +00:00
Samuel Pitoiset
8ebb29245c aco: add support for compiling {VS,TES}+GS separately on GFX9+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24862>
2023-08-25 10:22:41 +00:00