This renames it to drop the ptr_as and makes it handle all of the stride
cases. There's a bit of a tricky bit in here around Booleans but we
currently use 32-bit for those always.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
Instead of always lowering everything, we add a threshold such that if
the total indirected array size (AoA size) is above that threshold, it
won't lower. It's assumed that the driver will sort things out somehow
by, for instance, lowering to scratch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5909>
Input and output info is gathered from intrinsics. nir_variables are
ignored (and we'll remove them anyway).
This is a prerequisite for ACO, but also makes the IR prettier.
The ac_nir_to_llvm change has to be in this commit.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6445>
Replace div(x) by min(div(x), FLT_MAX)) to avoid getting a NaN result
when x is 0.
A cheaper alternative would be to use legacy mult instructions but they're
not exposed by LLVM.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6259>
These consist of the variations nir_op_{i|u|f}{min|max|med}3 which are either
lowered in the backend (LLVM) anyway or can be recombined by the backend (ACO).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6421>
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
The OpenCL image_width/height/depth functions have variants which can
take an LOD parameter. More importantly, LLVM-SPIRV-Translator always
generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero.
Given that over half the hardware out there has an LOD field for image
size queries (based on a rudimentary scan through their NIR -> whatever
code), we may as well just add the source to the NIR intrinsic. If this
is ever a problem for anyone, the lowering is pretty trivial.
I've also added asserts to everyone's drivers that should alert them if
they ever see an LOD other than zero. This will never happen with GL or
Vulkan so there's no need for panic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>
Having a single init function works as expected for shared llvm, but
when using a static llvm only one llvm will get initialized.
This commit introduces 2 separate init function:
- shared llvm = single public init function
- static llvm = one init function for each module using llvm
Fixes: 50d20dc055 ("ac/llvm: export ac_init_llvm_once in targets")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3376
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6253>
LLVM uses __declspec(restrict) which breaks because Mesa define restrict
as __restrict. Move the LLVM headerse up to dodge the macro.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6180>
ACCESS_COHERENT may be set for a specific load/store in the case of
atomic loads/stores.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6063>
If a program like mpv uses both radeon_dri.so (because --vo=gpu) and
radeonsi_drv_video.so (because --hwdec=vaapi) then LLVM will be inialized twice.
The commit exports the ac_init_llvm_once so there's only one instance of the
function.
See also 18b12bf533 ("targets: export radeon winsys_create functions to silence LLVM warning")
which implemented this workaround initially.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/1377
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5648>
Otherwise LLVM does not see the pointers as allowing speculative
loads.
The pipeline-db results are pretty wild, but mostly what is to be
expected from allowing more code movement in LLVM:
Totals from affected shaders:
SGPRS: 157728 -> 168336 (6.73 %)
VGPRS: 158628 -> 158664 (0.02 %)
Spilled SGPRs: 10845 -> 24753 (128.24 %)
Spilled VGPRs: 13 -> 13 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8 -> 8 (0.00 %) dwords per thread
Code Size: 17189180 -> 17313712 (0.72 %) bytes
LDS: 204 -> 204 (0.00 %) blocks
Max Waves: 5700 -> 5687 (-0.23 %)
Wait states: 0 -> 0 (0.00 %)
This gives some boosts for shaders we can move a descriptor load
outside a loop.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3159>
If buffer or stride is unaligned we use the same trick as on gfx6:
load 1 byte at a time and recompose the output if needed.
This change fixes lots of deqp/glcts tests:
- dEQP-GLES2.functional.draw.random.1, 10, ...
- dEQP-GLES2.functional.vertex_arrays.multiple_attributes.stride.3_float2_0_float2_0_float2_17, ...
- dEQP-GLES2.functional.vertex_arrays.single_attribute.first.byte_first24_offset1_stride2_quads256, ...
- dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_17_byte2_vec4_dynamic_draw_quads_1, ...
- dEQP-GLES31.functional.draw_indirect.random.14, ...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5502>
16-bit types can be used with MUBUF on GFX6-GFX7.
Fixes: c3e0ba52a0 ("ac/nir: support 16-bit data in buffer_load_format opcodes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5325>
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
The compiler should emit s_memtime instead of s_memrealtime for
the subgroup scope. I don't know why this LLVM 9 checks was for
but LLVM 8 also has this amdgcn intrinsic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5117>