For 16-bit bank LDS (ie. Kabini/Stoney) we need a slightly different
path. It's completely untested though because I don't have these
chips but according to vkpipeline-db the generated assembly seems fine.
Note that 16-bit I/O is currently only exposed on GFX9+ for both
compiler backends.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
This adds a separate emission path in the assembly for the 16-bit
interp instructions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
16-bit interp instructions are considered VINTRP by the compiler
but they are emitted as VOP3 by the assembler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
We only have to adjust some assertions to allow storing/loading
16-bit values.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
SymbolInfoTy was modified in LLVM 11. It is also in MCDisassembler.h now
and we don't have to duplicate it anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5060>
Since commit 8b8af6d398 there is a
performance regression in dirt 4 on picasso APUs.
The game ends up feeding a large value into this which overflows on the
conversion to 16bit float. With the old implementation (which now lives
in util_float_to_half_rtz) it would be clamped to inf-1, while the new
one returns inf. This causes a performance hit somehow at some point
down the line.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 8b8af6d398 "gallium/util: Switch util_float_to_half to _mesa_float_to_half()'s impl."
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5062>
Instead of relying it's read being entirely within the swap's definition.
No shader-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4950>
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.
All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.
All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>
If one VMEM instruction uses a sampler and the other doesn't, we can't do
this optimization.
Totals from 47 (0.04% of 127638) affected shaders:
CodeSize: 271744 -> 271656 (-0.03%); split: -0.04%, +0.01%
Instrs: 52783 -> 52761 (-0.04%); split: -0.05%, +0.01%
Cycles: 5547040 -> 5546952 (-0.00%); split: -0.00%, +0.00%
VMEM: 10022 -> 9887 (-1.35%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4949>
When the LLVM version is too old or missing, SotTR applies shader
workarounds and that reduces performance by 2-5% with ACO.
SotTR workarounds are applied with LLVM 8 and older, so reporting
LLVM 9.0.1 should be fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4984>
Only UBO, SSBO, global and push constants accesses should matter.
This fixes a bunch of new robustness2 failures. Note that RADV/LLVM
isn't affected because it relies on LLVM for loads/stores
vectorization and LLVM doesn't vectorize in this situation as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4881>
This prevents vectorization for loads/stores that can overflow if
the low offset is negative and the range greater or equal than 0.
The caller can pass the list of variable modes that matter for
robust access.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4881>
v_frexp_exp returns the exponent as an unsigned value.
Also, v_ashr returns either 0 or -1 depending on the sign of the
source operand, but what we want is only the sign bit.
Fixes a bunch of recent dEQP-VK.glsl.builtin.precision_double.* tests.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4921>
Trivial.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4911>
It's already part of the device name, and it should now also
correctly report when ACO is used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4911>
Now that ACO supports all shader stages (the only exception is NGG
GS on Navi10 but it fallbacks to legacy GS) it makes sense to remove
the LLVM version string reported as part of the device name.
The LLVM version string was added in the past for some Feral games
to workaround LLVM issues by detecting the version. With ACO, this
is unecessary because the Mesa version is enough to eventually enable
specific shader workarounds.
When the LLVM version string is missing, it is assumed that an old
LLVM is used and workarounds are automatically applied. The only
Vulkan games that might be affected is Shadow of The Tomb Raider
but the impact should be fairly small.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4911>
Don't do both at the same time as it breaks DCE
Fixes: 2dc550202e ('aco: copy-propagate p_create_vector copies of vectors')
Fixes: dEQP-VK.glsl.builtin.precision_double.ldexp.compute.scalar on GFX6-GFX7
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4922>