For 16-bit bank LDS (ie. Kabini/Stoney) we need a slightly different
path. It's completely untested though because I don't have these
chips but according to vkpipeline-db the generated assembly seems fine.
Note that 16-bit I/O is currently only exposed on GFX9+ for both
compiler backends.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
This adds a separate emission path in the assembly for the 16-bit
interp instructions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
16-bit interp instructions are considered VINTRP by the compiler
but they are emitted as VOP3 by the assembler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
We only have to adjust some assertions to allow storing/loading
16-bit values.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
SymbolInfoTy was modified in LLVM 11. It is also in MCDisassembler.h now
and we don't have to duplicate it anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5060>
Since commit 8b8af6d398 there is a
performance regression in dirt 4 on picasso APUs.
The game ends up feeding a large value into this which overflows on the
conversion to 16bit float. With the old implementation (which now lives
in util_float_to_half_rtz) it would be clamped to inf-1, while the new
one returns inf. This causes a performance hit somehow at some point
down the line.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 8b8af6d398 "gallium/util: Switch util_float_to_half to _mesa_float_to_half()'s impl."
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5062>
Instead of relying it's read being entirely within the swap's definition.
No shader-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4950>
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.
All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.
All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>
If one VMEM instruction uses a sampler and the other doesn't, we can't do
this optimization.
Totals from 47 (0.04% of 127638) affected shaders:
CodeSize: 271744 -> 271656 (-0.03%); split: -0.04%, +0.01%
Instrs: 52783 -> 52761 (-0.04%); split: -0.05%, +0.01%
Cycles: 5547040 -> 5546952 (-0.00%); split: -0.00%, +0.00%
VMEM: 10022 -> 9887 (-1.35%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4949>
When the LLVM version is too old or missing, SotTR applies shader
workarounds and that reduces performance by 2-5% with ACO.
SotTR workarounds are applied with LLVM 8 and older, so reporting
LLVM 9.0.1 should be fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4984>
Only UBO, SSBO, global and push constants accesses should matter.
This fixes a bunch of new robustness2 failures. Note that RADV/LLVM
isn't affected because it relies on LLVM for loads/stores
vectorization and LLVM doesn't vectorize in this situation as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4881>
This prevents vectorization for loads/stores that can overflow if
the low offset is negative and the range greater or equal than 0.
The caller can pass the list of variable modes that matter for
robust access.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4881>
v_frexp_exp returns the exponent as an unsigned value.
Also, v_ashr returns either 0 or -1 depending on the sign of the
source operand, but what we want is only the sign bit.
Fixes a bunch of recent dEQP-VK.glsl.builtin.precision_double.* tests.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4921>