Commit graph

17706 commits

Author SHA1 Message Date
Marek Olšák
8d3e3c72e0 radv,radeonsi: merge PATCH_CONTROL_POINT & OUT_PATCH_CP into 1 field
One is only used by TCS, the other is only used by TES.
Use the same field for both, call it PATCH_VERTICES_IN.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:39 +00:00
Marek Olšák
534b282573 ac/nir/tess: adjust memory layout of TCS outputs to have aligned store offsets
There is a comment that explains it.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Marek Olšák
80236f2367 ac/nir/tess: add if/endif for HS threads in NIR instead of ACO/LLVM
This just removes the if/endif wrapping for LLVM, and hopefully the ACO
change does the same thing.

ACO had redundant code in endif_merged_wave_info, which is removed here.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Marek Olšák
cd366b57d9 ac/nir: implement load_subgroup_id/local_invocation_index for TCS on gfx6-10.x
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
2025-06-07 16:29:38 +00:00
Rhys Perry
86ccceb4de aco: don't consider gfx1153 to have point sample acceleration
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
f10b49781d aco: make all wait entries linear
If we remove exec skips, then we can wait for an entry on all paths in the
linear cfg, but not the logical cfg.

fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 1586 (1.99% of 79653) affected shaders:
Instrs: 5118897 -> 5113206 (-0.11%); split: -0.11%, +0.00%
CodeSize: 28365852 -> 28343696 (-0.08%); split: -0.08%, +0.00%
Latency: 47820341 -> 47799532 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 9904391 -> 9908653 (+0.04%); split: -0.02%, +0.06%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
1088ac49db aco: sometimes join linear wait entries on logical edges
fossil-db (gfx1201):
Totals from 1303 (1.64% of 79653) affected shaders:
Instrs: 6920949 -> 6917692 (-0.05%); split: -0.06%, +0.01%
CodeSize: 37112404 -> 37095728 (-0.04%); split: -0.05%, +0.01%
Latency: 70471343 -> 70365986 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 11515673 -> 11504666 (-0.10%); split: -0.10%, +0.01%

fossil-db (navi31):
Totals from 1293 (1.62% of 79653) affected shaders:
Instrs: 6500186 -> 6496761 (-0.05%); split: -0.06%, +0.01%
CodeSize: 34562712 -> 34549236 (-0.04%); split: -0.04%, +0.01%
Latency: 68604746 -> 68666532 (+0.09%); split: -0.15%, +0.24%
InvThroughput: 11276591 -> 11284914 (+0.07%); split: -0.10%, +0.17%

fossil-db (navi21):
Totals from 811 (1.02% of 79653) affected shaders:
Instrs: 4110953 -> 4108788 (-0.05%); split: -0.05%, +0.00%
CodeSize: 22955984 -> 22948064 (-0.03%); split: -0.03%, +0.00%
Latency: 35070231 -> 35064448 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6945610 -> 6945053 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
c1f8537131 aco: skip waitcnt between two vmem writing different lanes
fossil-db (gfx1201):
Totals from 1382 (1.74% of 79653) affected shaders:
Instrs: 6531704 -> 6523935 (-0.12%); split: -0.12%, +0.00%
CodeSize: 34992076 -> 34933568 (-0.17%); split: -0.17%, +0.01%
Latency: 70183360 -> 69616066 (-0.81%); split: -0.81%, +0.00%
InvThroughput: 11155445 -> 11068667 (-0.78%); split: -0.78%, +0.00%

fossil-db (navi31):
Totals from 46 (0.06% of 79653) affected shaders:
Instrs: 1833768 -> 1833732 (-0.00%)
CodeSize: 9468788 -> 9468716 (-0.00%)
Latency: 11683092 -> 11667865 (-0.13%)
InvThroughput: 2274377 -> 2272872 (-0.07%)

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
9649deb50e aco: skip waitcnt between two vmem writing different halves
fossil-db (gfx1201):
Totals from 4 (0.01% of 79653) affected shaders:
Instrs: 41374 -> 41380 (+0.01%); split: -0.01%, +0.02%
CodeSize: 238912 -> 238924 (+0.01%); split: -0.01%, +0.01%
Latency: 706714 -> 706410 (-0.04%)
InvThroughput: 352269 -> 352118 (-0.04%)
VClause: 803 -> 798 (-0.62%)

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13028
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
9a38ad3ca7 aco: add wait_entry::logical_events
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
bb99de00f7 aco: add wait_entry::vm_mask
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
b70ecfa588 aco: only join barrier_imm/barrier_events for logical edges
fossil-db (gfx1201):
Totals from 3 (0.00% of 79653) affected shaders:
Instrs: 2904 -> 2893 (-0.38%)
CodeSize: 14944 -> 14900 (-0.29%)
Latency: 14703 -> 14248 (-3.09%)
InvThroughput: 1237 -> 1210 (-2.18%)

fossil-db (navi31):
Totals from 3 (0.00% of 79653) affected shaders:
Instrs: 2742 -> 2731 (-0.40%)
CodeSize: 14136 -> 14092 (-0.31%)
Latency: 14744 -> 14287 (-3.10%)
InvThroughput: 1241 -> 1213 (-2.26%)

fossil-db (navi21):
Totals from 3 (0.00% of 79653) affected shaders:
Instrs: 2326 -> 2315 (-0.47%)
CodeSize: 12472 -> 12428 (-0.35%)
Latency: 14921 -> 14465 (-3.06%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
62a9b4b976 aco: set vmem_types for args_pending_vmem
fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi31):
Totals from 11 (0.01% of 79653) affected shaders:
Instrs: 4543 -> 4554 (+0.24%)
CodeSize: 23256 -> 23300 (+0.19%)

fossil-db (navi21):
Totals from 8 (0.01% of 79653) affected shaders:
Instrs: 2333 -> 2341 (+0.34%)
CodeSize: 12328 -> 12360 (+0.26%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Samuel Pitoiset
babeb975c4 radv,radeonsi: fix emitting UPDATE_DB_SUMMARIZER_TIMEOUT on GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Not all PFP firmwares for GFX12 have this packet.

Fixes: 47f5d25f93 ("radv,radeonsi: emit UPDATE_DB_SUMMARIZER_TIMEOUT on GFX12")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13312
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35370>
2025-06-05 16:51:07 +00:00
Rhys Perry
00a2ed60f8 radv/meta: use unsigned min in copy/fill shaders
Otherwise, this would break >2 GiB copy/fill.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport: 25.1
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35343>
2025-06-05 09:55:32 +00:00
Georg Lehmann
297fdc6636 radv: don't accidentally expose samplerFilterMinmax through Vulkan 1.2
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35339>
2025-06-05 09:01:19 +00:00
Marek Olšák
c3034fa82c amd: replace most u_bit_consecutive* with BITFIELD_MASK/RANGE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35346>
2025-06-04 17:46:38 +00:00
David Rosca
e579b982b0 radv/video: Set all pic params for H264 encode refs
Fixes encoding B-frames with I-frame as L1 reference.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35283>
2025-06-04 11:33:02 +00:00
David Rosca
92e99e6169 radv/video: Add radv_enc_h264/5_pic_type
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35283>
2025-06-04 11:33:02 +00:00
Samuel Pitoiset
098c15bfc9 radv: use paired shader registers for graphics on GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Loosely based on RadeonSI.

This is supposed to be faster because parsing the packet header seems
to be the main bottleneck on GFX12.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35282>
2025-06-04 09:17:51 +00:00
Samuel Pitoiset
c8b3c92a3e radv: add macros for paired shader registers on GFX12
Imported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35282>
2025-06-04 09:17:51 +00:00
Samuel Pitoiset
c8f9e0fb05 radv: add a new dirty state for emitting tess user SGPRs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35282>
2025-06-04 09:17:51 +00:00
Georg Lehmann
c27cdaac70 radv: expose scalarBlockLayout on GFX6
Scalar block layout doesn't allow anything that our memory load/store vectorizer
couldn't create on its own. So I assume whatever reason there was to only
expose this feature on GFX7+ was incorrect or ended up being fixed.

Passes vkcts in CI on tahiti.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35279>
2025-06-04 08:49:57 +00:00
Karol Herbst
4f5ce2d5aa ac/nir: fix unaligned single component load/stores
This fixes two problems:
1. we need to lower the bit_size according to the alignment.
2. num_components could end up being 0, so we need to round up instead.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13102
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34976>
2025-06-03 13:14:31 +00:00
Samuel Pitoiset
94a4ba5b4d radv/ci: bump the timeout for radv-polaris10-vkcts
Looks like it's actually also affected by the memory explosion caused
by zerovram alloc by default in AMDGPU. Though it's very random,
sometimes the job will finish in 40 minutes, sometimes it needs more
than 1h15m. Let's bump the timeout because it's a post-merge job.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35157>
2025-06-03 10:18:30 +00:00
Rhys Perry
2e82f481ca radv: fix too large shift exponent in radv_remove_color_exports
"shift exponent 1020 is too large for 32-bit type 'unsigned int'" with
madmax/25b8180e05220b8c and UBSan

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35255>
2025-06-03 09:45:01 +00:00
Valentine Burley
3a0cc0ee0d ci: Use zstd compressed kernel modules
Change how we package kernel modules: instead of storing them in
.tar.zst archives with uncompressed .ko files inside, we now compress
each .ko file individually with ZSTD and bundle them into a plain tar
archive.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35129>
2025-06-03 07:27:26 +00:00
Georg Lehmann
a6675f35b2 aco: clamp exponent of 16bit ldexp
The hw uses only a 16bit int, but NIR's src is 32bit.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34073>
2025-06-03 06:34:18 +00:00
Natalie Vock
dac6f09451 radv/rt: Report 256 byte alignment for scratch
This mirrors AMDVLK. 128-byte alignment is possible, but DOOM: The Dark
Ages screws up scratch allocation with alignments <256 bytes.

Fixes hangs in DOOM: The Dark Ages.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35152>
2025-06-02 19:52:51 +00:00
Natalie Vock
6628ac8ad9 radv/rt: Avoid encoding infinities in box node coords
On Navi33, certain box sorting modes combined with infinity/-infinity in
the child AABBs cause image_bvh64_intersect_ray to return garbage node
pointers.

To avoid this, convert infinity to the maximum representable
floating-point value, which will still intersect with any non-inf ray.

Fixes consistent hangs in DOOM: The Dark Ages.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35254>
2025-06-02 19:33:18 +00:00
Rhys Perry
1fdfdbaf92 aco/hard_clauses: simplify and complete get_type()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This now includes image_msaa_load and the new atomic instructions in
GFX12.

It also treats point sample accelerated MIMG as either sample or load,
like the waitcnt insertion pass. I'm not sure if that's necessary or not,
though.

No fossil-db changes (gfx1201, gfx1150 and navi31).

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
2025-06-02 10:28:10 +00:00
Rhys Perry
8764ec0230 aco: consider image_msaa_load a sample operation before gfx12
LLVM commit 62dea99a7d7df9daedbb86133f3d46699cd2728d made this instruction
a sample for all GFX levels, then with f898161bfa95723954a273a519180e070a5ccd2e
it was changed to be GFX12+. Now 34b6285735c999d2fab77b0ff8e5b497d86df3af
changed it to be all GFX levels again.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35235>
2025-06-02 10:28:09 +00:00
David Rosca
960f63596f radv/video: Add VCN5 encode support
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
New with VCN5 is separate reference images support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35261>
2025-06-02 09:30:30 +00:00
David Rosca
4a3b3febda radv/video: Enable decode on VCN5
No differences from VCN4 for tier2.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13118
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35261>
2025-06-02 09:30:30 +00:00
David Rosca
25f7996395 radv/video: Set correct minCodedExtent for encode
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35261>
2025-06-02 09:30:30 +00:00
David Rosca
ef305f3875 radv: Use RADEON_SURF_VIDEO_REFERENCE for video DPB images
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35261>
2025-06-02 09:30:30 +00:00
Samuel Pitoiset
47f5d25f93 radv,radeonsi: emit UPDATE_DB_SUMMARIZER_TIMEOUT on GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This try to mitigate the HiZ GPU hang by increasing a timeout. Loosely
based on PAL but I can confirm it delays the hang when
BOTTOM_OF_PIPE_TS is used as a workaround.

This must be emitted when the GFX queue is idle.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35212>
2025-06-02 07:30:18 +00:00
David Rosca
8f4e251c98 radeonsi/vcn: Support disabling HEVC dependent slice segments
With older FW this needs to be always enabled, but it can now be
disabled when using the new separate header instructions for
dependent_slice_segment_flag and slice_segment_address.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35072>
2025-05-30 08:29:53 +00:00
Samuel Pitoiset
9692ef41a3 aco: implement bitfield_extract for 8-bit/16-bit
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35199>
2025-05-29 12:24:59 +00:00
Samuel Pitoiset
fe2c93a788 ac/nir: enable 64-bit lowering for bitfield_extract
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35187>
2025-05-29 08:45:41 +02:00
Samuel Pitoiset
8596150ae8 aco: implement bitfield_reverse for types other than 32-bits
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34583>
2025-05-28 09:52:12 +00:00
Daniel Schürmann
5b4d284493 aco/isel: use vector-aligned operands for image_bvh64_intersect_ray
Totals from 93 (0.12% of 79377) affected shaders: (Navi48)
MaxWaves: 1376 -> 1368 (-0.58%)
Instrs: 3583500 -> 3581861 (-0.05%); split: -0.05%, +0.00%
CodeSize: 18792300 -> 18785296 (-0.04%); split: -0.04%, +0.00%
VGPRs: 8652 -> 8592 (-0.69%); split: -1.25%, +0.55%
Latency: 20861347 -> 20834407 (-0.13%); split: -0.17%, +0.04%
InvThroughput: 4032604 -> 4028020 (-0.11%); split: -0.14%, +0.03%
VClause: 90507 -> 90525 (+0.02%); split: -0.01%, +0.03%
Copies: 279429 -> 277839 (-0.57%); split: -0.58%, +0.01%
Branches: 100260 -> 100251 (-0.01%)
PreVGPRs: 8949 -> 8771 (-1.99%)
VALU: 1955635 -> 1954053 (-0.08%); split: -0.08%, +0.00%
SALU: 477347 -> 477329 (-0.00%); split: -0.01%, +0.01%
VOPD: 69 -> 61 (-11.59%)

Totals from 93 (0.12% of 79377) affected shaders: (Navi31)

MaxWaves: 1376 -> 1374 (-0.15%)
Instrs: 3442606 -> 3440344 (-0.07%); split: -0.07%, +0.00%
CodeSize: 17801008 -> 17790476 (-0.06%); split: -0.07%, +0.01%
VGPRs: 8652 -> 8556 (-1.11%); split: -1.25%, +0.14%
Latency: 20590943 -> 20542279 (-0.24%); split: -0.27%, +0.03%
InvThroughput: 3978133 -> 3969497 (-0.22%); split: -0.25%, +0.03%
VClause: 91784 -> 91769 (-0.02%); split: -0.05%, +0.03%
Copies: 277177 -> 275263 (-0.69%); split: -0.70%, +0.01%
Branches: 100098 -> 100092 (-0.01%); split: -0.02%, +0.01%
PreVGPRs: 9021 -> 8843 (-1.97%)
VALU: 2001794 -> 1999893 (-0.09%); split: -0.10%, +0.00%
SALU: 419504 -> 419559 (+0.01%); split: -0.01%, +0.02%
VOPD: 77 -> 64 (-16.88%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Rhys Perry
c50f9541e4 aco/tests: Add tests for vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
b5382faa9c aco/validate: validate register assignment of vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
9091c3bf5b aco/ra: add affinities for MIMG vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
fb689f133e aco/ra: handle register assignment of vector-aligned operands
Vector-aligned operands are handled by temporarily allocating
a vector-SSA value for the duration of the instruction.
On completion of the register assignment, the individual
operands are assigned to the reserved register space and,
if necessary, parallelcopies are emitted.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
92b1154397 aco/ra: Always rename copy-kill operands, even if the temporary doesn't match
This makes it independent of whether the operand already got renamed or not.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
4fad3514a9 aco/ra: only change registers of already handled operands in update_renames()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
51a2e1eb94 aco/ra: don't use kill-flags as indicator in get_reg_create_vector()
We are about to re-use this function for vector-aligned operands.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
3d8b355f22 aco/assembler: support vector-aligned operands on MIMG instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00