These are based on nightlies. It's unclear if one of the t760 flakes
should really be a fail or a flake, but I have another MR that tries to
clean up the flakes pending; I'll address that there, as there's other
related flakes that probably should change as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40943>
Some code in gallium was making assumptions of how the gpu_id is laid
out, which will not work for 64 bit gpu_ids.
Expose pan_prod_id and pan_rev from the model to collect this logic in a
single place.
Reviewed-by: Marc Alcala Prieto <marc.alcalaprieto@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40921>
For Valhall, use SHADDX instruction for 64-bit integer addition instead
of lowering it to 32-bit operations. The instruction sequence for doing
it in 32-bit costs 3 cycles but SHADDX only takes 2 cycles to perform.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40841>
va_mark_last currently expects 64-bit registers to always be split in
two, this commit changes it to check first if a 64-bit register is split
or not.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40841>
Without this, printf messages that were sent before a crash will be
lost, which makes shader printf pretty useless for debugging crashes.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40816>
Some video decoders spit out AFBC(16x16,sparse,split) images. Advertise
support for this modifier so we can import such images.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40746>
The old implementation only worked for 16-bit because we assumed scalar
so we could stomp the whole destination as if it was 32-bit. This
version works for v2i16.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
Now that bi_byte() does the right thing, we can just use it and not
worry about the rest.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
We call bi_optimize_nir() a few lines later.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
This isn't hard to do and it gives us a lot more flexibility in what NIR
we can consume.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
Add asserts this time that we don't miss any and that the buckets
actually match the enum in bifrost/compiler.h.
Fixes: 82328a5245 ("pan/bi: Generate instruction packer for new IR")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
It appears nowhere so I don't know why we have it in the enum.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
Now that all of the additional cases are handled, we can hook up the
allow_merging_workgroups flag in panvk.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Mali does not support divergent operands in some cases, and we are
already using lower_non_uniform_access to handle this for descriptor
indexing. We can extend this to handle merged workgroups by just tagging
every intrinsic as nonuniform and then letting divergence analysis sort
out which ones can actually be nonuniform in opt_non_uniform_access.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Merging workgroups affects divergence analysis, since subgroups can now
contain extra threads from other workgroups. We already have divergence
analysis flags to handle this case, but since the compiler options memory
is static, we need to define an entirely separate option set for merged
vs non-merged workgroups.
In gallium, we don't have to switch options because opengl requires
uniformity over the entire dispatch in application shaders.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
Vulkan guarantees that all subgroup invocations will be part of the same
workgroup, so we need to disable merging workgroups for shaders where
the subgroup layout is observable.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
In panvk, we will need to decide whether we are merging workgroups early
in shader compilation, before calling nir_lower_non_uniform_access. This
is because nonuniform lowering introduces new subgroup intrinsics which
would otherwise inhibit workgroup merging, and because the set of
instructions that need to be lowered may be different with merged
workgroups.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
The only requirement for barriers is that the hardware doesn't support
allow_merging_workgroups with actual BARRIER instructions. We only emit
these for workgroup execution barriers though, so are safe to merge
workgroups when the shader uses memory barriers or subgroup execution
barriers.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38586>
While the swizzle code was producing the correct encoding, the
disassembly was slightly weird and swz_16 required an extra argument
that was always "false".
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40865>
The Vulkan runtime provides the dynamic state infrastructure via
vk_common_CmdSetAttachmentFeedbackLoopEnableEXT(). This builds on the
attachment feedback loop layout support.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40498>
PanVK treats image layouts as no-ops and already disables Forward Pixel
Kill when the same render target is both read and written.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40498>
While not currently required, it will be for future GPUs.
Also cleans up gpu_id as parameter to some functions that didn't use it.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40610>
The current implementation is a bit awkward and becomes tricky when
adding support for 64 bit gpu_ids.
Rather than keeping a mask of bits in gpu_id to compare with the stored
gpu_prod_id value, rely on macro functions for fetching the information
required from gpu_id and creating the comparison value.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40610>
Rather than having preload registers hardcoded over multiple files,
gather them in one place with an enum abstraction.
This should simplify updates to the preload registers.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40643>
Replace the old partial IR state snapshotting with prebuilt
per-pass IR descriptor buffers and a queue-attached scratch FBD.
1. emit FBD+DBD+RTDs for each IR-pass+layer
2. store it into some side-band GPU buffer that's passed around
through the queue context
3. in the execption handler, copy FBD+DBD+RTDs from the IR desc
buffer to some space that's attached to the queue context and
not the cmdbuf
Fixes: 46f611c9 ("panvk: Also use resolve shaders for Z/S")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Ryan zhang <ryan.zhang@nxp.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40625>
This makes printing of BIR in SSA-form more similar to NIR and after
register allocation, it shows consecutive registers for operands
reading/writing to more than one register.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40711>
We used to splat out 8-bit vec2s to 16-bit by repeating both 8-bit
halves twice with the B0011 swizzle. I think the original idea here was
that 16-bit swizzles were more widely available in the hardware and that
this would make swizzling things easier. The problem is that nothing
actually knows that the value is half-repeated like this so nothing
knows it can upgrade a swizzle from B0022 to B0123 (H01). So instead we
get a bunch of B0022 swizzles, which nothing supports.
We can shave a lot of instructions if we just stop trying to be so
clever and instead repeat the whole thing with a B0101 swizzle.
The only real issue here is that v2[fiu]8_to_v2[fiu]16 needs a B0011
swizzle, which we have to apply on-the-fly. Fortunately, any swizzle
can be composed with B0011.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
This adds a new bytewise copy propagation pass which chews through MKVEC
and SWZ instructions. The word-based copy propagation pass only existed
to chew through SPLIT/COLLECT but MKVEC is COLLECT for bytes and we had
nothing to help with that.
This is actually two passes in one: Byte propagation and swizzle
propagation. Any time we see a MKVEC, we look at its sources only as
bytes and chase individual bytes back, through other MKVEC and SWZ, to
their generating instruction and make the MKVEC only consume the
original bytes. If the MKVEC happens to construct something that's just
a swizzle of another def (this is fairly common), we record that as
well. The idea here is that a lot of MKVEC just consume other MKVEC and
we can get rid of the intermediate ones or even the whole chain if it
just ends up being a swizzle in the end.
For SWZ instructions, we first look at them like a MKVEC of the
individual bytes they consume. If that doesn't yield a single swizzled
word, we then crawl through the words table, just accumulating swizzles.
This gives us the best (closest to the generating instructions) coherent
word. We could also replace SWZ with MKVEC and just do byte propagation
but MKVEC is often 2 instructions whereas SWZ is often one (or folded
into a source) so this is probably the better balance.
Finally, we not only replace the MKVEC and SWZ instructions but we also
attempt to propagate swizzles into individual ALU op sources. For v4i8
ops, this often fails since the full generality isn't always available
but for fp16, we can almost always fold the swizzle into the consuming
instruction.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we have bi_lower_mkvec_swz(), there's no need to be so careful
in the NIR -> bi translation. We can just emit MKVEC and move on. The
lowering pass will sort out the detaisl.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we lower it, there's no advantage to one over the other at the
time this pass runs. Also, the is_8bit check was technically wrong
since it checks destination sizes, not source sizes. It's a lot safer
to just use SWZ.v4i8 and let the lowering pass do the right thing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Instead of trying very carefully in the bifrost emit code to only
generate valid MKVEC for the target hardware, this adds a lowering pass
which is capable of lowering any MKVEC or SWZ we can throw at it. Even
if the swizzle isn't supported or if it's a MKVEC.v4i8 on Valhall, we'll
lower it to something that does work on that platform. This frees up
the rest of the compiler so we can add and modify MKVEC and SWZ at-will
and never have to worry about hardware generation details.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
At least bi_half() has the decency to assert if the swizzle isn't
BI_SWIZZLE_H01 to start with but bi_byte() did an irrelevant assert
and then overwrote the swizzle with BI_SWIZZLE_B<lane> regardless of
what was there before. In a lot of cases, this doesn't matter but we
use both in translating NIR to BI on things that may have already been
swizzled so we need to do the composition.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The only real requirement here is that the destination offset is zero
and that the destination is big enough to hold the source. The source
offset doesn't matter.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>