This worked because the optimizer didn't consider that the 16-bit
instruction would interpret the inline constant differently. This will
change in the next commit.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5245>
Use sub-dword definitions so that the RA can use SDWA
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5245>
Shared subdword loads don't need byte alignment as they are split
into multiple loads if necessary.
Fixes: 5cde4989d3 ('aco: remove unnecessary split- and create_vector instructions for subdword loads')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5441>
I think this case was just missing.
This fixes a bunch of 16-bit storage related CTS failures like
dEQP-VK.ssbo.phys.layout.single_basic_type.std430.u16vec4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>
Use v_bfe to implement small bitsize conversions because the
compiler probably optimizes this better.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>
GFX6 and GFX7 don't have the ds_bpermute (or permute) instruction,
but we would like to support subgroup shuffle on these old GPUs.
So we introduce a new pseudio instruction which will be lowered
to an "unrolled loop" that emulates bpermute on GFX6 and GFX7
using readlane instructions, while also respecting the exec mask
thanks to v_cmpx.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr,
so we don't consider it a reduction anymore. Additionally, the
code is slightly reorganized in preparation for the GFX6 emulated
bpermute.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
Use s_memrealtime instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5117>
For 16-bit bank LDS (ie. Kabini/Stoney) we need a slightly different
path. It's completely untested though because I don't have these
chips but according to vkpipeline-db the generated assembly seems fine.
Note that 16-bit I/O is currently only exposed on GFX9+ for both
compiler backends.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
We only have to adjust some assertions to allow storing/loading
16-bit values.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.
All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>
v_frexp_exp returns the exponent as an unsigned value.
Also, v_ashr returns either 0 or -1 depending on the sign of the
source operand, but what we want is only the sign bit.
Fixes a bunch of recent dEQP-VK.glsl.builtin.precision_double.* tests.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4921>
Should be fine now that RA take full registers for v2b if it's
not an SDWA instruction.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4879>
Instead of relying on calling shader_io_get_unique_index repeatedly,
remember the which output driver location corresponds to which
varying slot.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
VS needs the number of TCS inputs, and TES needs the number of TCS
outputs.
It is error-prone to repeat those calculations in both instruction
selection and setup. Just set them in one place instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
unreachable was true if the last block is unreachable in the linear cfg,
but it should also be true if it is unreachable in the logical cfg.
Fixes dEQP-VK.graphicsfuzz.for-with-ifs-and-return
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 8d8c864beb
('aco: improve check for unreachable loop continue blocks')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4764>
The SPIR-V spec doesn't say explicitly that the sample index
must be an unsigned integer.
This fixes crashes with some new VK_EXT_robustness2 tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
split_store_data() splits a vector and p_as_uniforms it if needed.
scan_write_mask()/advance_write_mask() are similar to
u_bit_scan_consecutive_range(), but makes it easier to only clear part of
the range and will also give ranges for zero'd bits.
split_buffer_store() is a helper for splitting VMEM/SMEM stores.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>