16-bit med3 is only supported on GFX9+.
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f16.*.
Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
Lower 64-bit fmed3 because LLVM doesn't expose an intrinsic.
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f64.*.
Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
Because ac_build_optimization_barrier() overwrites the original
src_type, we have to keep track of it before emitting that barrier.
Otherwise, wrong conversions are expected for pointers or small
bitsizes.
By doing this, we no longer need to do the cast dance in
ac_build_readlane_no_opt_barrier(), it was just necessary for
ac_build_optimization_barrier().
This fixes a bunch of crashes with subgroups related tests when
RADV_DEBUG=checkir is enabled, and it also fixes a compiler crash
with The Surge 2.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2395
Fixes: 0f45d4dc2b ("ac: add ac_build_readlane without optimization barrier")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Global load/store instructions can't know if the offset is
out-of-bound because they don't use descriptors (no range).
Fix this by clamping the offset for arrays that are indexed
with a non-constant offset that's greater or equal to the array
size.
This fixes VM faults and GPU hangs with Dead Rising 4.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148
Fixes: 71a6794200 ("ac/nir: Enable nir_opt_large_constants")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes some CTS regressions.
Fixes: e61a826f39 ("ac/llvm: fix pointer type for global atomics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Stronger ordering is implemented in SPIRV->NIR with barriers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This shouldn't introduce any functional changes for RadeonSI
when NIR is enabled because these operations are already lowered.
pipeline-db (NAVI10/LLVM):
SGPRS: 9043 -> 9051 (0.09 %)
VGPRS: 7272 -> 7292 (0.28 %)
Code Size: 638892 -> 621628 (-2.70 %) bytes
LDS: 1333 -> 1331 (-0.15 %) blocks
Max Waves: 1614 -> 1608 (-0.37 %)
Found this while glancing at some F12019 shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
To avoid generating invalid LLVM IR when both operands don't have
the same type. This might happen when performing pointer comparisons
with SPIRV 1.4.
Fixes invalid LLVM IR for:
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl
This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.
Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4567 | return ac_build_intrinsic(ctx, intr, type, params, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4568 | AC_FUNC_ATTR_READNONE);
| ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:
- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.
- Stop using radv-specific structures for things like determining the
chip generation in ACO.
In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.
My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The return type is always the src type (32 or 64 bits).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This option is useless and shouldn't be used at all.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>