Commit graph

3777 commits

Author SHA1 Message Date
Georg Lehmann
f047a67fba nir,aco: optimize FP16_OFVL pattern created by vkd3d-proton
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:27 +00:00
Georg Lehmann
9e6adcbca0 aco: select fp32 to float8 conversions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:26 +00:00
Georg Lehmann
3a45802514 aco/lower_to_hw: support saturating fp8 conversions
Sadly amd only made this behavior controlable with global state.
We add a new pseudo opcode for this purpose and change FP16_OVFL
for each instruction. Ideally we would only do it once for clauses
and after ilp scheduling, but this can be improved in the future.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:25 +00:00
Georg Lehmann
65650cfef8 aco: emit float8 wmma
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:25 +00:00
Rhys Perry
325dfd809a radv,aco: switch to shader statistics framework
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12756
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35583>
2025-06-20 09:26:58 +00:00
Rhys Perry
2cfd2d3b1d aco/tests: add lower_branches tests
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
2025-06-19 10:58:39 +00:00
Rhys Perry
c45482e652 aco: validate that preds/succs match
This isn't done in validate_cfg() because that's called less frequently.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
2025-06-19 10:58:39 +00:00
Rhys Perry
85db025cd7 aco: continue when try_remove_simple_block can't remove a predecessor
We should update linear_preds so that the predecessors we can remove are
actually removed.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
2025-06-19 10:58:38 +00:00
Rhys Perry
5344abbc56 aco/lower_branches: keep blocks with multiple logical successors
It might be the case that both the branch and exec mask write in a
divergent branch block are removed. try_remove_simple_block() might then
try to remove it, but fail because it has multiple logical successors.
Instead, just skip these blocks.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
2025-06-19 10:58:38 +00:00
Georg Lehmann
001fe8c236 aco: optimize boolean phi with empty else block
We can keep the else empty by handling the phi in the "then" block.

Foz-DB Navi21:
Totals from 921 (1.15% of 80065) affected shaders:
Instrs: 4532598 -> 4527309 (-0.12%); split: -0.12%, +0.00%
CodeSize: 24498484 -> 24481780 (-0.07%); split: -0.08%, +0.01%
Latency: 41016915 -> 41020477 (+0.01%); split: -0.10%, +0.11%
InvThroughput: 9998405 -> 9991873 (-0.07%); split: -0.08%, +0.02%
SClause: 128261 -> 128267 (+0.00%)
Copies: 409949 -> 408585 (-0.33%); split: -0.36%, +0.02%
Branches: 169740 -> 169222 (-0.31%); split: -0.58%, +0.27%
PreSGPRs: 64408 -> 64398 (-0.02%)
VALU: 2972521 -> 2972518 (-0.00%)
SALU: 673844 -> 668973 (-0.72%); split: -0.72%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35165>
2025-06-19 07:32:43 +00:00
Georg Lehmann
88753ddd1d aco: allow nir divergence to be printed again
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32990>
2025-06-19 07:02:20 +00:00
Samuel Pitoiset
d23de4918e aco: add support for image f32 atomic add
It's supported on GFX12.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35493>
2025-06-13 08:47:59 +00:00
Pierre-Eric Pelloux-Prayer
3bcbd11a33 aco/isel: fix visit_tex handling of is_sparse
For cases when less than 4 components are read, the original code
would compute an incorrect dmask. eg: with a single component + is_sparse,
the dmask was 0x13:
  - 0x 3 = coming from nir_def_components_read
  - 0x10 = the sparse bit
While it should have at 2 bits set (1 for the color/depth, 1 for tfe).

This caused problem when expand_vector() used the dmask to generate
the final results, because the value for the sparse component was
read from the wrong index.

So after the call to emit_mimg() dmask needs to be adjusted
because the components will be stored in order, so if mask is 0x11
the tfe value would be stored at invalid index=5 (while it should
be at index=1).

This fixes KHR-GL46.sparse_texture_clamp_tests.SparseTextureClampLookupResidency_texture_2d_depth_component16
and KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup_texture_2d_depth_component16
with ACO.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35206>
2025-06-11 12:11:28 +00:00
Georg Lehmann
f36ac8434c aco: add a readme entry for v_pk_cvt_u8_f32
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
2025-06-10 07:32:05 +00:00
Georg Lehmann
94c191e6d9 aco: remove p_v_cvt_pk_u8_f32
Now unused.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
2025-06-10 07:32:04 +00:00
Georg Lehmann
d95e90ab5f aco: do not use v_cvt_pk_u8_f32 for f2u8
The ISA docs don't mention this, but instead of always truncating
like other integer conversions, this opcode actually uses the single
precision rounding mode.

We could continue to use the opcode and set the rounding mode to rtz
in lower_to_hw_instrs, but I think I should just concede that f2u8
isn't worth the effort.

Fixes: 9bb10b58 ("aco: use v_cvt_pk_u8_f32 for f2u8")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391>
2025-06-10 07:32:04 +00:00
Natalie Vock
a28515f096 aco/opt: Rename loop header phis
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fossil stats on top of !35269:
Totals from 133 (0.16% of 81077) affected shaders:

Instrs: 4328456 -> 4327891 (-0.01%)
CodeSize: 22890004 -> 22887732 (-0.01%); split: -0.01%, +0.00%
Latency: 28406452 -> 28404732 (-0.01%)
InvThroughput: 5361458 -> 5361153 (-0.01%)
Copies: 376788 -> 376222 (-0.15%)
VALU: 2429210 -> 2428645 (-0.02%)
VOPD: 57 -> 56 (-1.75%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35270>
2025-06-09 14:36:44 +00:00
Rhys Perry
00dd0d0dd1 aco: update VALUReadSGPRHazard comment
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35387>
2025-06-09 10:12:25 +00:00
Rhys Perry
a714a19e16 aco/gfx12: fix VALUReadSGPRHazard with carry-out
fossil-db (gfx1201):
Totals from 370 (0.46% of 79653) affected shaders:
Instrs: 3933639 -> 3935914 (+0.06%)
CodeSize: 20743448 -> 20752068 (+0.04%); split: -0.00%, +0.04%
Latency: 26261246 -> 26261921 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 5363675 -> 5363760 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 65f95ae74e ("aco/insert_NOPs: implement VALU -> VALU case for VALUReadSGPRHazard on GFX12")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35387>
2025-06-09 10:12:25 +00:00
Marek Olšák
80236f2367 ac/nir/tess: add if/endif for HS threads in NIR instead of ACO/LLVM
This just removes the if/endif wrapping for LLVM, and hopefully the ACO
change does the same thing.

ACO had redundant code in endif_merged_wave_info, which is removed here.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Rhys Perry
86ccceb4de aco: don't consider gfx1153 to have point sample acceleration
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
f10b49781d aco: make all wait entries linear
If we remove exec skips, then we can wait for an entry on all paths in the
linear cfg, but not the logical cfg.

fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 1586 (1.99% of 79653) affected shaders:
Instrs: 5118897 -> 5113206 (-0.11%); split: -0.11%, +0.00%
CodeSize: 28365852 -> 28343696 (-0.08%); split: -0.08%, +0.00%
Latency: 47820341 -> 47799532 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 9904391 -> 9908653 (+0.04%); split: -0.02%, +0.06%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
1088ac49db aco: sometimes join linear wait entries on logical edges
fossil-db (gfx1201):
Totals from 1303 (1.64% of 79653) affected shaders:
Instrs: 6920949 -> 6917692 (-0.05%); split: -0.06%, +0.01%
CodeSize: 37112404 -> 37095728 (-0.04%); split: -0.05%, +0.01%
Latency: 70471343 -> 70365986 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 11515673 -> 11504666 (-0.10%); split: -0.10%, +0.01%

fossil-db (navi31):
Totals from 1293 (1.62% of 79653) affected shaders:
Instrs: 6500186 -> 6496761 (-0.05%); split: -0.06%, +0.01%
CodeSize: 34562712 -> 34549236 (-0.04%); split: -0.04%, +0.01%
Latency: 68604746 -> 68666532 (+0.09%); split: -0.15%, +0.24%
InvThroughput: 11276591 -> 11284914 (+0.07%); split: -0.10%, +0.17%

fossil-db (navi21):
Totals from 811 (1.02% of 79653) affected shaders:
Instrs: 4110953 -> 4108788 (-0.05%); split: -0.05%, +0.00%
CodeSize: 22955984 -> 22948064 (-0.03%); split: -0.03%, +0.00%
Latency: 35070231 -> 35064448 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6945610 -> 6945053 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
c1f8537131 aco: skip waitcnt between two vmem writing different lanes
fossil-db (gfx1201):
Totals from 1382 (1.74% of 79653) affected shaders:
Instrs: 6531704 -> 6523935 (-0.12%); split: -0.12%, +0.00%
CodeSize: 34992076 -> 34933568 (-0.17%); split: -0.17%, +0.01%
Latency: 70183360 -> 69616066 (-0.81%); split: -0.81%, +0.00%
InvThroughput: 11155445 -> 11068667 (-0.78%); split: -0.78%, +0.00%

fossil-db (navi31):
Totals from 46 (0.06% of 79653) affected shaders:
Instrs: 1833768 -> 1833732 (-0.00%)
CodeSize: 9468788 -> 9468716 (-0.00%)
Latency: 11683092 -> 11667865 (-0.13%)
InvThroughput: 2274377 -> 2272872 (-0.07%)

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
9649deb50e aco: skip waitcnt between two vmem writing different halves
fossil-db (gfx1201):
Totals from 4 (0.01% of 79653) affected shaders:
Instrs: 41374 -> 41380 (+0.01%); split: -0.01%, +0.02%
CodeSize: 238912 -> 238924 (+0.01%); split: -0.01%, +0.01%
Latency: 706714 -> 706410 (-0.04%)
InvThroughput: 352269 -> 352118 (-0.04%)
VClause: 803 -> 798 (-0.62%)

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13028
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
9a38ad3ca7 aco: add wait_entry::logical_events
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
bb99de00f7 aco: add wait_entry::vm_mask
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
b70ecfa588 aco: only join barrier_imm/barrier_events for logical edges
fossil-db (gfx1201):
Totals from 3 (0.00% of 79653) affected shaders:
Instrs: 2904 -> 2893 (-0.38%)
CodeSize: 14944 -> 14900 (-0.29%)
Latency: 14703 -> 14248 (-3.09%)
InvThroughput: 1237 -> 1210 (-2.18%)

fossil-db (navi31):
Totals from 3 (0.00% of 79653) affected shaders:
Instrs: 2742 -> 2731 (-0.40%)
CodeSize: 14136 -> 14092 (-0.31%)
Latency: 14744 -> 14287 (-3.10%)
InvThroughput: 1241 -> 1213 (-2.26%)

fossil-db (navi21):
Totals from 3 (0.00% of 79653) affected shaders:
Instrs: 2326 -> 2315 (-0.47%)
CodeSize: 12472 -> 12428 (-0.35%)
Latency: 14921 -> 14465 (-3.06%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
62a9b4b976 aco: set vmem_types for args_pending_vmem
fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi31):
Totals from 11 (0.01% of 79653) affected shaders:
Instrs: 4543 -> 4554 (+0.24%)
CodeSize: 23256 -> 23300 (+0.19%)

fossil-db (navi21):
Totals from 8 (0.01% of 79653) affected shaders:
Instrs: 2333 -> 2341 (+0.34%)
CodeSize: 12328 -> 12360 (+0.26%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Georg Lehmann
a6675f35b2 aco: clamp exponent of 16bit ldexp
The hw uses only a 16bit int, but NIR's src is 32bit.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34073>
2025-06-03 06:34:18 +00:00
Rhys Perry
1fdfdbaf92 aco/hard_clauses: simplify and complete get_type()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This now includes image_msaa_load and the new atomic instructions in
GFX12.

It also treats point sample accelerated MIMG as either sample or load,
like the waitcnt insertion pass. I'm not sure if that's necessary or not,
though.

No fossil-db changes (gfx1201, gfx1150 and navi31).

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
2025-06-02 10:28:10 +00:00
Rhys Perry
8764ec0230 aco: consider image_msaa_load a sample operation before gfx12
LLVM commit 62dea99a7d7df9daedbb86133f3d46699cd2728d made this instruction
a sample for all GFX levels, then with f898161bfa95723954a273a519180e070a5ccd2e
it was changed to be GFX12+. Now 34b6285735c999d2fab77b0ff8e5b497d86df3af
changed it to be all GFX levels again.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
2025-06-02 10:28:09 +00:00
Samuel Pitoiset
9692ef41a3 aco: implement bitfield_extract for 8-bit/16-bit
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35199>
2025-05-29 12:24:59 +00:00
Samuel Pitoiset
8596150ae8 aco: implement bitfield_reverse for types other than 32-bits
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34583>
2025-05-28 09:52:12 +00:00
Daniel Schürmann
5b4d284493 aco/isel: use vector-aligned operands for image_bvh64_intersect_ray
Totals from 93 (0.12% of 79377) affected shaders: (Navi48)
MaxWaves: 1376 -> 1368 (-0.58%)
Instrs: 3583500 -> 3581861 (-0.05%); split: -0.05%, +0.00%
CodeSize: 18792300 -> 18785296 (-0.04%); split: -0.04%, +0.00%
VGPRs: 8652 -> 8592 (-0.69%); split: -1.25%, +0.55%
Latency: 20861347 -> 20834407 (-0.13%); split: -0.17%, +0.04%
InvThroughput: 4032604 -> 4028020 (-0.11%); split: -0.14%, +0.03%
VClause: 90507 -> 90525 (+0.02%); split: -0.01%, +0.03%
Copies: 279429 -> 277839 (-0.57%); split: -0.58%, +0.01%
Branches: 100260 -> 100251 (-0.01%)
PreVGPRs: 8949 -> 8771 (-1.99%)
VALU: 1955635 -> 1954053 (-0.08%); split: -0.08%, +0.00%
SALU: 477347 -> 477329 (-0.00%); split: -0.01%, +0.01%
VOPD: 69 -> 61 (-11.59%)

Totals from 93 (0.12% of 79377) affected shaders: (Navi31)

MaxWaves: 1376 -> 1374 (-0.15%)
Instrs: 3442606 -> 3440344 (-0.07%); split: -0.07%, +0.00%
CodeSize: 17801008 -> 17790476 (-0.06%); split: -0.07%, +0.01%
VGPRs: 8652 -> 8556 (-1.11%); split: -1.25%, +0.14%
Latency: 20590943 -> 20542279 (-0.24%); split: -0.27%, +0.03%
InvThroughput: 3978133 -> 3969497 (-0.22%); split: -0.25%, +0.03%
VClause: 91784 -> 91769 (-0.02%); split: -0.05%, +0.03%
Copies: 277177 -> 275263 (-0.69%); split: -0.70%, +0.01%
Branches: 100098 -> 100092 (-0.01%); split: -0.02%, +0.01%
PreVGPRs: 9021 -> 8843 (-1.97%)
VALU: 2001794 -> 1999893 (-0.09%); split: -0.10%, +0.00%
SALU: 419504 -> 419559 (+0.01%); split: -0.01%, +0.02%
VOPD: 77 -> 64 (-16.88%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Rhys Perry
c50f9541e4 aco/tests: Add tests for vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
b5382faa9c aco/validate: validate register assignment of vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
9091c3bf5b aco/ra: add affinities for MIMG vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
fb689f133e aco/ra: handle register assignment of vector-aligned operands
Vector-aligned operands are handled by temporarily allocating
a vector-SSA value for the duration of the instruction.
On completion of the register assignment, the individual
operands are assigned to the reserved register space and,
if necessary, parallelcopies are emitted.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
92b1154397 aco/ra: Always rename copy-kill operands, even if the temporary doesn't match
This makes it independent of whether the operand already got renamed or not.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
4fad3514a9 aco/ra: only change registers of already handled operands in update_renames()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
51a2e1eb94 aco/ra: don't use kill-flags as indicator in get_reg_create_vector()
We are about to re-use this function for vector-aligned operands.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
3d8b355f22 aco/assembler: support vector-aligned operands on MIMG instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
8cb1700c74 aco/print_ir: print parenthesis around vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Daniel Schürmann
6aabcb02a1 aco/print_ir: only print 'lateKill' if requested via print_kill flag
Also only print lateKill for actually killed operands.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Daniel Schürmann
a9645fdd89 aco: introduce concept of vector-aligned Operands
Operand::isVectorAligned indicates that the Operand is part of a vector
consisting of multiple operands. Therefore, it must reside in a register
aligned with the next Operand.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Daniel Schürmann
a4fa3935fd aco/live_var_analysis: set same lateKill flags for same operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Daniel Schürmann
ee0ee282b9 aco: simplify Operand() constructor
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Rhys Perry
072e6d1ab5 aco/tests: add tests for tied definitions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Some of these would have failed before the rewrite.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34700>
2025-05-20 15:40:47 +00:00
Rhys Perry
b341a12526 aco/ra: rewrite handling of tied definitions
The old version worked by precoloring both the operand and definition to
whatever register the operand was at the time. This didn't allow moving
the operand/definition after precoloring them, or two tied operands with
the same temporary.

The new version works by temporarily making the operands late kill and
creating a copy if the temporary is live-through or used as another tied
operand. Then we can simply later make the operands early kill again and
assign the definitions to the operand's register.

This way, we can move the operand to make space and the new location will
not intersect with any other definition and won't cause the operand and
definition registers to mismatch.

fossil-db (gfx1201):
Totals from 2253 (2.84% of 79377) affected shaders:
Instrs: 1634747 -> 1630799 (-0.24%); split: -0.27%, +0.03%
CodeSize: 8688148 -> 8672348 (-0.18%); split: -0.20%, +0.02%
VGPRs: 106500 -> 106512 (+0.01%)
Latency: 11385480 -> 11382965 (-0.02%); split: -0.04%, +0.01%
InvThroughput: 1754430 -> 1754326 (-0.01%); split: -0.01%, +0.00%
SClause: 38954 -> 38964 (+0.03%); split: -0.01%, +0.04%
Copies: 110772 -> 110800 (+0.03%); split: -0.02%, +0.04%
Branches: 29093 -> 29092 (-0.00%)
VALU: 902011 -> 902008 (-0.00%)
SALU: 260175 -> 260203 (+0.01%); split: -0.01%, +0.02%

fossil-db (navi31):
Totals from 2 (0.00% of 79377) affected shaders:
Latency: 1766 -> 1765 (-0.06%)
InvThroughput: 3219 -> 3215 (-0.12%)

fossil-db (navi21):
Totals from 14 (0.02% of 79377) affected shaders:
(no affected stats)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34700>
2025-05-20 15:40:47 +00:00