Fixes the following clang warning:
mesa/src/amd/compiler/aco_optimizer.cpp:2928:15: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
ctx.uses = std::move(dead_code_analysis(program));
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5228>
GFX6 and GFX7 don't have the ds_bpermute (or permute) instruction,
but we would like to support subgroup shuffle on these old GPUs.
So we introduce a new pseudio instruction which will be lowered
to an "unrolled loop" that emulates bpermute on GFX6 and GFX7
using readlane instructions, while also respecting the exec mask
thanks to v_cmpx.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr,
so we don't consider it a reduction anymore. Additionally, the
code is slightly reorganized in preparation for the GFX6 emulated
bpermute.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
If the ~/.cache dir already exists continue on without failing.
Fixes: cd61f5234d ("radv: Handle failing to create .cache dir.")
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5249>
Because some 16-bit instructions are already VOP3 on GFX10, we use
the 32-bit variants to remove the temporary VGPR and to use DDP with
the arithmetic instructions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>
Some 16-bit instructions are VOP3 on GFX10 and we have to emit a
32-bit DPP mov followed by the ALU instruction.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>
ACO should already support Wave32 on GFX10 with all shader stages
and CTS pass. RADV currently only allows Wave32 with the compute
shader stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5056>
This extension only allows HLSL shader compilers to optionally embed
unambiguous type information which can be safely ignored by the driver.
This fixes a crash with the recent Vulkan backend of Path Of Exile
(it uses the extension without checking if it's supported).
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5237>
Unless we're reordering it around a barrier of the same type
No shader-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4880>
Totals from 11 (0.01% of 127638) affected shaders:
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4880>
The cu bitmap in amd gpu info structure is
4x4 size array, and it's usually suitable for Vega
ASICs which has 4*2 SE/SH layout.
But for Arcturus, SE/SH layout is changed to 8*1.
To mostly reduce the impact, we make it compatible
with current bitmap array as below:
SE4,SH0 --> cu_bitmap[0][1]
SE5,SH0 --> cu_bitmap[1][1]
SE6,SH0 --> cu_bitmap[2][1]
SE7,SH0 --> cu_bitmap[3][1]
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5212>
This was implemented in version 6 of the VK_ANDROID_native_buffer
extension and we only implement version 5. However, the Android
Vulkan loader only checks whether vkGetInstanceProcAddr for the
function is not NULL.
This all went wrong when we switched to the layer code from ANV.
Because the function may now be different per device, it adds fallback
functions that dispatch to the dispatch table. So if we didn't implement
the function we still returned a pointer to the dispatch function,
which made the Android Vulkan loader believe it was supported.
Dispatch functions:
d555794f30/src/amd/vulkan/radv_entrypoints_gen.py (L328)
Fixes: d555794f30 "radv: update entrypoints generation from ANV"
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2936
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5198>
The compiler should emit s_memtime instead of s_memrealtime for
the subgroup scope. I don't know why this LLVM 9 checks was for
but LLVM 8 also has this amdgcn intrinsic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5117>