Now that we have bi_lower_mkvec_swz(), there's no need to be so careful
in the NIR -> bi translation. We can just emit MKVEC and move on. The
lowering pass will sort out the detaisl.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we lower it, there's no advantage to one over the other at the
time this pass runs. Also, the is_8bit check was technically wrong
since it checks destination sizes, not source sizes. It's a lot safer
to just use SWZ.v4i8 and let the lowering pass do the right thing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Instead of trying very carefully in the bifrost emit code to only
generate valid MKVEC for the target hardware, this adds a lowering pass
which is capable of lowering any MKVEC or SWZ we can throw at it. Even
if the swizzle isn't supported or if it's a MKVEC.v4i8 on Valhall, we'll
lower it to something that does work on that platform. This frees up
the rest of the compiler so we can add and modify MKVEC and SWZ at-will
and never have to worry about hardware generation details.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
At least bi_half() has the decency to assert if the swizzle isn't
BI_SWIZZLE_H01 to start with but bi_byte() did an irrelevant assert
and then overwrote the swizzle with BI_SWIZZLE_B<lane> regardless of
what was there before. In a lot of cases, this doesn't matter but we
use both in translating NIR to BI on things that may have already been
swizzled so we need to do the composition.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The only real requirement here is that the destination offset is zero
and that the destination is big enough to hold the source. The source
offset doesn't matter.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The non-trivial non-replicate swizzles on IADD.v4x8 and ISUB.v4x8 are
either documented wrong or broken in hardware. Instead of swizzling
b0101 and b2323, they swizzle b0011 and b2233 on G52. This is either a
hardware bug or an issue with documentation. In either case, it's
probably best not to trust it. Those swizzles aren't all that useful
anyway. We also weren't using any of them before (or they'd have
broken) so this isn't a performance regression.
Cc: mesa-stable
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Before this, everything was in the giant bifrost_compile.c file, now
preprocess, optimize and postproces are in their own "small"
bifrost_nir.c.
I also removed some dead functions and moved the passes closer to their
usage, (ex, passes only used in preprocess are now just before
preprocess). Otherwise it's all the same code we had before.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40717>
Enable the VK_KHR_shader_untyped_pointers extension and the
shaderUntypedPointers feature for Valhall and newer (v9+).
Bifrost (v6/v7) has issues with 8-bit vector loads through untyped
pointers combined with 16-bit storage, so restrict the extension
to architectures where it's fully functional.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40457>
To make sure all memcpy derefs are lowered before explicit IO lowering.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40457>
We have 4 image intrinsic variants now. This enum is useful for
nir_rewrite_image_intrinsic() and it will be used by other NIR passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40709>
The only possible values are:
- VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
- VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
- VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40670>
This will allow image_deref_size intrinsics generated in
pan_nir_lower_image_ms to be lowered in nir_lower_descriptors.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39207>
This allows us to avoid dirtying all of the state for user compute
dispatches when we run a precomp shader.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37970>
Just a bit cleaner, and we can unify point size too.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
This could've saved me a lot of time debugging stack corruption.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
Expose float32 atomic exchange support for buffer, shared, and image
operations on all architectures. The existing axchg instruction is
type-agnostic, so no compiler changes are needed. Image atomics are
already lowered to global atomics via nir_lower_image_atomics_to_global.
Also add R32_FLOAT to the STORAGE_IMAGE_ATOMIC format feature flag so
image atomic operations are accepted for r32f images.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40506>
Previously bifrost_nir_lower_shader_output grouped outputs in separate
if blocks and made a best-effort attempt to group them together. This
also assumed that pan_nir_lower_store_component wrote each output only
once and that nir_lower_io_vars_to_temporaries pulled them out of any
control flow.
Now all of these are handled by the new pan_nir_lower_vs_outputs pass
that handles write masks, control flow, per_view and grouping for IDVS.
This makes the overall dependencies much simpler, ensures that the
stores are grouped in the same ifs and should be more robust.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
Intstead of focusing on numbers of components and bit sizes, focus on
the total number of bytes read.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
These are just a fancy mov on Mali. We need to use bi_make_vec_to()
because it handles 64-bit movs as well.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
These are the same as the split versions and vecN except they have one
source and they use src[0].swizzle[i] instead of src[i].swizzle[0].
While we're here, it's trivial to implement pack_64_4x16 as well.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
They're literally the same thing since vectors are packed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
Previously, we used bi_src_index() directly and ignored the offset we
took all that care to calculate at the top of the function. For most
cases, this is fine since the offset is 0. But if we ever have an i8v8,
or larger, this doesn't work. It's not really more work to handle this
case. All we have to do is use the offset and &3 the swizzle. It just
means we can't have false code sharing with the bi_make_vec_to() case.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
There's no real difference for us between robustness and robustness2.
The only thing robust_modes does in nir_opt_load_store_vectorize() is to
tell it to be a bit more careful about integer overflow in address
calculations so you don't end up wrapping something around and getting a
non-zero load when you should have gotten an zero from OOB. There's no
good reason why we should only set it for robustness2.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
Now that we claim 16B robustness alignments, we can vectorize UBO
access, even when robustness2 is enabled.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>