This isn't required for deref instructions because it's possible to
get the image format back from the variable but it will be useful for
descriptor heap.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>
These two new variable modes are used to relax restrictions on deref
casts through because it's possible to cast different modes from the
heap pointers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>
It is sometimes useful to remove all elements of a bitset while retaining the
backing storage. With a dense bitset, we would just memset everything to 0,
which is O(capacity). With a sparse bitset, previously we would have to free and
reallocate, which is O(capacity) in the dense case and O(cardinality) in the
sparse case. That is the correct asymptoptic behaviour O(cardinality) in the
worst case, but there is an unfortunate constant-factor associated with the
redundant allocation & free in the dense case.
Therefore, we add a new helper to clear all elements of the sparse bitset in one
go, avoiding reallocation in the dense case.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40806>
Instead of keeping track of the topology with some scratch value in MME,
we can rely on SET_PRIMITIVE_TOPOLOGY to directly set it.
This simplify some of the MME codegen but does not seems to have any
impact on performance in general.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40749>
The sc7180-trogdor-lazor-limozeen devices have been dying off over the
past few weeks, so move the last two jobs to sc7180-trogdor-kingoftown
and retire the device type.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40818>
panfrost_bo_import() calls drmPrimeFDToHandle() then pan_kmod_bo_import(),
which also calls drmPrimeFDToHandle() internally. This double import causes
GEM handle refcount leaks because each drmPrimeFDToHandle() increments the
kernel's GEM handle refcount, but only one drmCloseBufferHandle() is called
during cleanup by panfrost_kmod_bo_free(or panthor_kmod_bo_free).
Fix by removing the redundant drmPrimeFDToHandle() and using
pan_kmod_bo_import() directly. On re-import of existing buffers, properly
release the extra pan_kmod_bo reference with pan_kmod_bo_put().
This ensures GEM handle refcount, pan_kmod_bo refcount, and panfrost_bo
refcount are all properly balanced.
Fixes: 5089a758df ("panfrost: Back panfrost_bo with pan_kmod_bo object")
Signed-off-by: Xianzhong Li <xianzhong.li@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40778>
The LLVM backend is unmaintained. Let's not encourage users to swap out
entire parts of the driver with an unsupported codepath. Enabling this
option is a footgun nowadays anyway, given that it disables many
features and thus may trigger bigger changes in behavior than intended.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40815>
Fixes multiple cts tests on blackwell, including eg.
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.opfdiv_tessc
Fixes: d031365f7c ("nak: support MUFU.F16")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40804>
We used to splat out 8-bit vec2s to 16-bit by repeating both 8-bit
halves twice with the B0011 swizzle. I think the original idea here was
that 16-bit swizzles were more widely available in the hardware and that
this would make swizzling things easier. The problem is that nothing
actually knows that the value is half-repeated like this so nothing
knows it can upgrade a swizzle from B0022 to B0123 (H01). So instead we
get a bunch of B0022 swizzles, which nothing supports.
We can shave a lot of instructions if we just stop trying to be so
clever and instead repeat the whole thing with a B0101 swizzle.
The only real issue here is that v2[fiu]8_to_v2[fiu]16 needs a B0011
swizzle, which we have to apply on-the-fly. Fortunately, any swizzle
can be composed with B0011.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
This adds a new bytewise copy propagation pass which chews through MKVEC
and SWZ instructions. The word-based copy propagation pass only existed
to chew through SPLIT/COLLECT but MKVEC is COLLECT for bytes and we had
nothing to help with that.
This is actually two passes in one: Byte propagation and swizzle
propagation. Any time we see a MKVEC, we look at its sources only as
bytes and chase individual bytes back, through other MKVEC and SWZ, to
their generating instruction and make the MKVEC only consume the
original bytes. If the MKVEC happens to construct something that's just
a swizzle of another def (this is fairly common), we record that as
well. The idea here is that a lot of MKVEC just consume other MKVEC and
we can get rid of the intermediate ones or even the whole chain if it
just ends up being a swizzle in the end.
For SWZ instructions, we first look at them like a MKVEC of the
individual bytes they consume. If that doesn't yield a single swizzled
word, we then crawl through the words table, just accumulating swizzles.
This gives us the best (closest to the generating instructions) coherent
word. We could also replace SWZ with MKVEC and just do byte propagation
but MKVEC is often 2 instructions whereas SWZ is often one (or folded
into a source) so this is probably the better balance.
Finally, we not only replace the MKVEC and SWZ instructions but we also
attempt to propagate swizzles into individual ALU op sources. For v4i8
ops, this often fails since the full generality isn't always available
but for fp16, we can almost always fold the swizzle into the consuming
instruction.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we have bi_lower_mkvec_swz(), there's no need to be so careful
in the NIR -> bi translation. We can just emit MKVEC and move on. The
lowering pass will sort out the detaisl.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we lower it, there's no advantage to one over the other at the
time this pass runs. Also, the is_8bit check was technically wrong
since it checks destination sizes, not source sizes. It's a lot safer
to just use SWZ.v4i8 and let the lowering pass do the right thing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Instead of trying very carefully in the bifrost emit code to only
generate valid MKVEC for the target hardware, this adds a lowering pass
which is capable of lowering any MKVEC or SWZ we can throw at it. Even
if the swizzle isn't supported or if it's a MKVEC.v4i8 on Valhall, we'll
lower it to something that does work on that platform. This frees up
the rest of the compiler so we can add and modify MKVEC and SWZ at-will
and never have to worry about hardware generation details.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
At least bi_half() has the decency to assert if the swizzle isn't
BI_SWIZZLE_H01 to start with but bi_byte() did an irrelevant assert
and then overwrote the swizzle with BI_SWIZZLE_B<lane> regardless of
what was there before. In a lot of cases, this doesn't matter but we
use both in translating NIR to BI on things that may have already been
swizzled so we need to do the composition.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The only real requirement here is that the destination offset is zero
and that the destination is big enough to hold the source. The source
offset doesn't matter.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The non-trivial non-replicate swizzles on IADD.v4x8 and ISUB.v4x8 are
either documented wrong or broken in hardware. Instead of swizzling
b0101 and b2323, they swizzle b0011 and b2233 on G52. This is either a
hardware bug or an issue with documentation. In either case, it's
probably best not to trust it. Those swizzles aren't all that useful
anyway. We also weren't using any of them before (or they'd have
broken) so this isn't a performance regression.
Cc: mesa-stable
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
Once commited and have AABB or triangle intersection found, terminate
the traversal if TerminateOnFirstHit ray flag is present.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40773>
Commit e27e41a842 ("vulkan,spirv: update headers") exposed a
flaw in the cerealgenerator.
It modified -- among other things -- the VkDeviceCreateInfo
struct in vk.xml.
In the update, the len="enabledLayerCount,null-terminated"
attribute was removed from the ppEnabledLayerNames member.
The gfxstream code generator processes ppEnabledLayerNames
(which is a const char* const*), it identifies it as an "array of
strings". However, because the len attribute is now missing,
vulkanType.getLengthExpression() returns None.
This leads to errors like:
gfxstream_guest_vk_autogen_impl/gen/goldfish_vk_counting_guest.cpp:642:30:
error: use of undeclared identifier 'None'
642 | for (uint32_t i = 0; i < None; ++i)
| ^~~~
1 error generated.
This patch adds various length access checks to prevent this from
happening.
TEST=m vulkan.ranchu
Reviewed-by: David Gilhooley <djgilhooley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40785>
These failures turned out to be triggered by the new autotune
causing rendering mode transitions (such as GMEM -> SYSMEM) which
led to a new set of failures to be uncovered. They tend to work as
expected under either GMEM or SYSMEM being forced for all RPs but
the specific transitions caused by the autotuner leads them to fail.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
The usage of two CP counters by latency sensitive autotuner will
affect the operation of fdperf, this detects when counters have
selectors that have been changed and marks them as invalid with
corresponding UI cues.
This also seems to detect selector values being dropped while the
GPU is in sleep states and tends to be useful to catch that too.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>