Commit graph

11401 commits

Author SHA1 Message Date
Eric Engestrom
78578a6ddb vk: move radv's linker symbols scripts for use in all drivers
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21631>
2023-03-04 07:41:10 +00:00
Konstantin Seurer
f094e69469 radv/rt: Use ushr for extracting the cull mask
Fixes the following tests:
dEQP-VK.ray_tracing_pipeline.acceleration_structures.ray_cull_mask.gpu_built.ahit.4_bits
dEQP-VK.ray_tracing_pipeline.acceleration_structures.ray_cull_mask.gpu_built.ahit.16_bits
dEQP-VK.ray_tracing_pipeline.acceleration_structures.ray_cull_mask.gpu_built.chit.4_bits
dEQP-VK.ray_tracing_pipeline.acceleration_structures.ray_cull_mask.gpu_built.chit.16_bits
dEQP-VK.ray_tracing_pipeline.acceleration_structures.ray_cull_mask.gpu_built.isec.4_bits
dEQP-VK.ray_tracing_pipeline.acceleration_structures.ray_cull_mask.gpu_built.isec.16_bits

Fixes: 2d93ab7 ("radv/rt: Pre shift cull_mask")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21659>
2023-03-03 23:56:49 +00:00
Timur Kristóf
05e6d945ad radv: Emulate VGT_ESGS_ITEMSIZE in shaders on GFX9+.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21434>
2023-03-03 20:15:10 +00:00
Rhys Perry
8aff7152a0 aco: make IDSet sparse
Improves compilation time of huge shaders.

A ray tracing pipeline of Hellblade: Senua's Sacrifice compiles in about
half the time, with this patch.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8179
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21022>
2023-03-03 17:45:14 +00:00
Rhys Perry
736d6643bb aco/tests: add tests for v_fma_f32 with 2 fp16 literals
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21633>
2023-03-03 14:20:55 +00:00
Samuel Pitoiset
3e4541bb56 radv/ci: adjust timeouts for Vega10 and Renoir
With latest CTS it takes much more time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21390>
2023-03-03 08:23:22 +00:00
Samuel Pitoiset
f775873f81 ci: uprev CTS to 1.3.5.0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21390>
2023-03-03 08:23:21 +00:00
Samuel Pitoiset
3b9937c85e radv: stop allocationg the attr ring BO for compute queues on GFX11
Only needed for graphics. This saves ~8Mib of 32-bit VRAM per compute
queue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21632>
2023-03-03 07:27:21 +00:00
Hans-Kristian Arntzen
b7926303e6 radv: Expose VK_EXT_swapchain_maintenance1.
Passes dEQP-VK.wsi.*.maintenance1.*.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20235>
2023-03-03 03:59:13 +00:00
Marek Olšák
4f7e353237 amd: lower multi-component subdword SSBO loads in NIR
because the hw and LLVM only support subdword single-component SSBO loads,
and ac_nir_to_llvm splits multi-component loads because of that, which is
inefficient.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
2023-03-03 03:27:40 +00:00
Marek Olšák
82919e2dcb amd: lower subdword UBO loads in NIR
This fixes broken subdword UBO loads with LLVM.

It's only needed for LLVM, but it's done for both LLVM and ACO because
the pass can be fully validated only with ACO and the Vulkan CTS right now.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
2023-03-03 03:27:40 +00:00
Marek Olšák
1a424fee4a ac/llvm: implement nir_op_unpack_32_4x8
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
2023-03-03 03:27:40 +00:00
Marek Olšák
6aee999131 aco: implement nir_op_unpack_32_4x8
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
2023-03-03 03:27:40 +00:00
Marek Olšák
09005e6dfc ac/nir: add ac_nir_lower_subdword_loads to lower 8/16-bit loads to 32 bits
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
2023-03-03 03:27:40 +00:00
Marek Olšák
0669d7c29b radeonsi: remove Smart Access Memory because CPU access has large overhead
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8176

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:49 +00:00
Marek Olšák
3c9aa3e201 amd/rtld: allow 64K LDS for all shader stages except for gfx6
Gfx6 can only use 32K LDS per workgroup.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:48 +00:00
Marek Olšák
ccaaf8fe04 amd: massively simplify how info->spi_cu_en is applied
Instead of having ac_set_reg_cu_en that sets the register, replace it with
ac_apply_cu_en that only returns the modified register value,
which allows a large simplification in both drivers because a lot of code
becomes duplicated after it's switched to ac_apply_cu_en.

RADV also didn't apply it to a few registers. Fixed.

This removes 82 lines of code in total.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:48 +00:00
Marek Olšák
2b3f551ed8 ac/nir: don't use load_esgs_vertex_stride_amd on gfx6-8
An improvement for 9f1e6d8f70.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:48 +00:00
Marek Olšák
79732416fd amd: query cache sizes from the kernel
Also rename l1_cache_size -> tcp_cache_size. L1 means shader array cache.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:48 +00:00
Marek Olšák
6e2e89e6d8 amd,radeonsi: change enabled_rb_mask to 64 bits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:48 +00:00
Marek Olšák
03ffb8d77c amd: update amdgpu_drm.h
From kernel commit 817714d9665e.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
2023-03-03 00:41:48 +00:00
Rhys Perry
dc01f03d1b radv: remove is_internal pipeline creation parameter
Instead, check if the cache is the meta shader cache. This catches the
shaders created by radv_create_radix_sort_u64().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21606>
2023-03-02 16:48:09 +00:00
Samuel Pitoiset
4ec6850210 radv: fix DCC decompress on GFX11
The hardware requires one color output to be set by CB registers,
otherwise the DCC decompression does nothing.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8127
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8175
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8370
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21629>
2023-03-02 16:03:31 +00:00
Tatsuyuki Ishi
57ab623f0b radv: Use common helpers to translate format in SDMA copy.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21585>
2023-03-02 15:29:47 +00:00
Tatsuyuki Ishi
4f681d5e2c radv: Remove SDMA padding from copy helpers.
These are handled in winsys already; no need to duplicate and complicate
the code paths.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21585>
2023-03-02 15:29:47 +00:00
Tatsuyuki Ishi
e9a55b332a radv: SDMA v4 size field is size - 1
After cross-checking with kernel and the old buffer copy code, it seems
that the size field should be size - 1 instead.

Fixes: 7b5ab48c40 ("radv: partial sdma support")

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21585>
2023-03-02 15:29:47 +00:00
Samuel Pitoiset
427fd83d27 radv: use new EVENT_WRITE_ZPASS packet3 on GFX11
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21621>
2023-03-02 12:53:27 +00:00
Samuel Pitoiset
87444bb7ab radv: ignore alpha_is_on_msb on GFX11 because the hw ignores it
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21621>
2023-03-02 12:53:27 +00:00
Georg Lehmann
ede0630f9e aco: use v_fma_mix_f32 for v_fma_f32 with 2 fp16 representable, different literals
We can pack two fp16 literals into one 32bit literal and use opsel to select
the correct value. Note that LLVM currently disassembles these instructions
incorrectly.

Foz-DB Navi21:
Totals from 13365 (9.91% of 134913) affected shaders:
VGPRs: 840880 -> 840016 (-0.10%); split: -0.11%, +0.01%
SpillSGPRs: 724 -> 722 (-0.28%)
CodeSize: 82439364 -> 82451336 (+0.01%); split: -0.06%, +0.08%
MaxWaves: 244858 -> 244980 (+0.05%)
Instrs: 15265976 -> 15247201 (-0.12%); split: -0.13%, +0.01%
Latency: 223316180 -> 223272495 (-0.02%); split: -0.03%, +0.02%
InvThroughput: 41981375 -> 41969917 (-0.03%); split: -0.04%, +0.01%
VClause: 266775 -> 266558 (-0.08%); split: -0.14%, +0.06%
SClause: 646602 -> 645996 (-0.09%); split: -0.16%, +0.07%
Copies: 794703 -> 776075 (-2.34%); split: -2.46%, +0.12%
Branches: 296317 -> 296316 (-0.00%)
PreSGPRs: 658796 -> 656479 (-0.35%); split: -0.35%, +0.00%
PreVGPRs: 744014 -> 743679 (-0.05%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587>
2023-03-02 10:59:05 +00:00
Georg Lehmann
ed349951cb aco: mark mad definition as precise if the mul/add were precise
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20587>
2023-03-02 10:59:05 +00:00
Samuel Pitoiset
f19fccd9f8 amd,ac/rgp: fix SQTT memory types
This crashed on Steam Deck because the memory type is LPDDR5 and it
wasn't not handled in the switch. It seems the kernel changed the
memory type returned for VanGogh because it used to work.

Fixes: aef7ea868f ("ac/gpu_info: handle LPDDR4 and 5 in ac_memory_ops_per_clock")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21627>
2023-03-02 07:54:35 +00:00
Chia-I Wu
e2c67ed63e ci/radv: remove dEQP-VK.image.sample_texture.* fails/flakes
They were fixed since commit 11b2a063bf ("vulkan: Unconditionally add
barriers for missing external subpass deps").

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21616>
2023-03-02 00:56:16 +00:00
Samuel Pitoiset
3ced4ae816 radv: only expose EXT_pipeline_library_group_handles if RT is enabled
VK_EXT_pipeline_library_group_handles requires
VK_KHR_ray_tracing_pipeline to be enabled.

Fixes dEQP-VK.info.device_extensions.

Fixes: ed76833705 ("radv: Implement & expose VK_EXT_pipeline_library_group_handles.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21618>
2023-03-01 10:55:00 +00:00
Dave Airlie
24c09d4b16 radv: add video format support to format probing.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21457>
2023-03-01 07:16:47 +00:00
Tatsuyuki Ishi
bab235106e radv: Replace radv_trap_handler_shader with radv_shader.
Now that the upload memory is tied to the shader itself, the trap handler
shader no longer needs an additional wrapper.

This is a cleanup to ease introduction of a new shader uploading code path.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21541>
2023-03-01 05:12:10 +00:00
Konstantin Seurer
5ce99bc568 radv: Only init geometry infos if RRA is enabled
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21532>
2023-02-28 20:49:33 +00:00
Konstantin Seurer
7bd265bc86 radv: Move header and geometry info init into separate functions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21532>
2023-02-28 20:49:33 +00:00
Samuel Pitoiset
c356f1b4ed radv: fix draw calls with 0-sized index buffers and robustness on NAVI10
The correct workaround is to bind an internal index buffer to handle
robustness2 correctly.

Fixes dEQP-VK.robustness.index_access.* in CTS 1.3.5.0 on NAVI10.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21471>
2023-02-28 14:12:29 +00:00
Samuel Pitoiset
7c62f6fa01 radv: fix flushing non-coherent images in EndCommandBuffer()
The condition was inverted.

This doesn't fix anything known.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21549>
2023-02-28 09:35:07 +00:00
Samuel Pitoiset
6750a9094f radv: fix flushing non-coherent images inside secondaries on GFX9+
Fixes
dEQP-VK.draw.dynamic_rendering.complete_secondary_cmd_buff.multi_draw.mosaic.*
on VEGA10 (related to the use of HTILE).

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21549>
2023-02-28 09:35:07 +00:00
Qiang Yu
4b3a22fcd4 aco: only ls and ps use store output now
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21435>
2023-02-28 07:19:29 +00:00
Qiang Yu
e9616d1d2a ac/llvm: only init outputs when fragment shader for radv
LS pass output to TCS by reg is not enabled when LLVM.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21435>
2023-02-28 07:19:29 +00:00
Georg Lehmann
e1eabab6fe aco/optimizer_postRA: assume all registers are untrackable in loop headers
Register writes from the pre-header might not be correct for any but
the first loop iteration because they can be clobbered inside the loop.

Foz-DB Navi21:
Totals from 18 (0.01% of 134913) affected shaders:
CodeSize: 251384 -> 251508 (+0.05%)
Instrs: 47644 -> 47664 (+0.04%)
Latency: 801801 -> 801852 (+0.01%)
InvThroughput: 177579 -> 177593 (+0.01%)
Copies: 4752 -> 4771 (+0.40%)

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8376
Fixes: d3b0f78110 ("aco/optimizer_postRA: Initialize loop header with preheader information")

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21540>
2023-02-28 04:27:05 +00:00
Emma Anholt
fef6e6588b ci: Update traces expectations for gutting glsl opt_algebraic.
All look like harmless changes.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21475>
2023-02-28 03:36:09 +00:00
Rhys Perry
75d9a4a6ce aco: always update orig_names in get_reg_phi()
No idea why this wasn't done if pc.first was a renamed temporary.

Fixes navi10 RA validation error with
dEQP-VK.binding_model.descriptor_buffer.multiple.graphics_geom_buffers1_sets3_imm_samplers

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8349
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21501>
2023-02-27 15:10:22 +00:00
Eric Engestrom
735df516e9 radv: split linker script for android since it requires different symbols
Fixes: 4956f6d0bf ("radv: Add Android module info to linker script.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8338
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21518>
2023-02-27 14:34:16 +00:00
Mike Blumenkrantz
7c8a5f6e37 vulkan/wsi: switch to using an options struct for last param
this makes adding values easier since the drivers won't need to be updated

Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21447>
2023-02-27 13:21:21 +00:00
Georg Lehmann
1c5c2f77c3 aco: use and swizzle mask in dpp quad perm
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21412>
2023-02-27 11:09:42 +00:00
Georg Lehmann
8fabde3be4 aco/gfx11: use dpp_row_xmask and dpp_row_share
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21412>
2023-02-27 11:09:42 +00:00
Georg Lehmann
b7cd0eb439 aco: use v_permlane(x)16_b32 for masked swizzle
Should be cheaper than ds_swizzle.

Totals from 8 (0.01% of 134913) affected shaders:
CodeSize: 16316 -> 16388 (+0.44%)
Instrs: 3088 -> 3086 (-0.06%)
Latency: 49558 -> 49508 (-0.10%)
InvThroughput: 9180 -> 9198 (+0.20%)
Copies: 376 -> 384 (+2.13%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21412>
2023-02-27 11:09:42 +00:00