Both radeonsi and radv use scoped barriers, so this should not be possible to
hit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
radeonsi switched over to CU wavefront execution mode, but didn't tell
LLVM. This can lead to shaders requiring too many VGPRs to be executed in
CU mode and so cause GPU resets.
Pass along +cumode to LLVM so it properly spills VGPRs.
Fixes: 9d7eab2ab1 ("radeonsi: don't enable WGP_MODE because of high cost of workgroup mem coherency")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23569>
This will be required for OpenCL subgroup support on radeonsi, but also
fixes some regressions today as radeonsi started to use the subgroup id
for invocation_index calculation.
Fixes: 39da12b7c7 ("ac/llvm: clean up visit_load_local_invocation_index and visit_load_subgroup_id")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23551>
Since radeonsi sets the alu_to_scalar callback, frontends like Rusticl
might end up generating vec2 b2i16. Support this just like it's done for
b2f16.
Fixes: d692d433f2 ("radeonsi: use nir_lower_alu_to_scalar correctly")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23551>
We can remove the LLVM 13 Wave32 discard workaround and
SI_PROFILE_IGNORE_LLVM13_DISCARD_BUG that disabled the workaround.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23471>
The demote emulation can be removed, and FS_CORRECT_DERIVS_AFTER_KILL
can be removed because it's always enabled on LLVM >= 13.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23471>
Now that radeonsi support pass desc to ssbo atomic ops,
we can use ssbo atomic instead. aco does not implement
nir_buffer_atomic_add either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23096>
Now both radeonsi and radv call it in driver.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23018>
ACO need this to be done in nir. Remove the llvm round code
because both radv and radeonsi do this in nir for both aco
and llvm.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
aco does not implement fpow, need nir to lower it
first. llvm will do by itself in the same way, so
we always lower fpow in nir now.
Remove the llvm fpow implementation that has special
handling for the muliplication. It's not used any
more and does not match GLSL spec as fpow(0,0)=NaN
but here we get 0.
There's some pixel changes for gl-radeonsi-stoney:
ror-default 2 (no tolerance), 0 (1% tol.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
ACO only support nir_fsin/cos_amd.
There's some pixel changes for gl-radeonsi-stoney trace.
Different pixels:
furmark 61 (no tolerance), 0 (1% tol.)
gimark 93867 (no tolerance), 888 (1% tol.)
tessmark 39 (no tolerance), 0 (1% tol.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
aco does not implement these idiv ops.
nir_lower_idiv is for idiv ops <= 32bit and ported from
llvm amdgpu, so llvm do the same.
nir_lower_divmod64 is for 64bit idiv ops.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
Use ACCESS_* flags in call sites instead of GLC/DLC/SLC.
ACCESS_* flags are extended to describe other aspects of memory instructions
like load/store/atomic/smem.
Then add a function that converts the access flags to GLC, DLC, SLC.
The new functions are also usable by ACO.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22770>
Should not be seen, already would be stubbed out.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
Will be used by ac/nir legacy and NGG lowerings.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
Contains a global wave ID of legacy GS waves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
This intrinsic is going to be used for simplifying GS code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22690>
The side effect is removing the aco/llvm backend bc optimization code
and linear/persp_centroid variable.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22199>
radv ps does not support epilog when llvm, so outputs will always
be lowered to exports in nir.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22199>
In OpenCL we can actually end up with those.
Fixes `basic astype` and those `integer_ops` OpenCL CTS tests:
integer_hadd
integer_rhadd
integer_upsample
quick_short_shift
quick_ushort_shift
Cc: mesa-stable
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22597>