The sc7180-trogdor-lazor-limozeen devices have been dying off over the
past few weeks, so move the last two jobs to sc7180-trogdor-kingoftown
and retire the device type.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40818>
panfrost_bo_import() calls drmPrimeFDToHandle() then pan_kmod_bo_import(),
which also calls drmPrimeFDToHandle() internally. This double import causes
GEM handle refcount leaks because each drmPrimeFDToHandle() increments the
kernel's GEM handle refcount, but only one drmCloseBufferHandle() is called
during cleanup by panfrost_kmod_bo_free(or panthor_kmod_bo_free).
Fix by removing the redundant drmPrimeFDToHandle() and using
pan_kmod_bo_import() directly. On re-import of existing buffers, properly
release the extra pan_kmod_bo reference with pan_kmod_bo_put().
This ensures GEM handle refcount, pan_kmod_bo refcount, and panfrost_bo
refcount are all properly balanced.
Fixes: 5089a758df ("panfrost: Back panfrost_bo with pan_kmod_bo object")
Signed-off-by: Xianzhong Li <xianzhong.li@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40778>
The LLVM backend is unmaintained. Let's not encourage users to swap out
entire parts of the driver with an unsupported codepath. Enabling this
option is a footgun nowadays anyway, given that it disables many
features and thus may trigger bigger changes in behavior than intended.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40815>
Fixes multiple cts tests on blackwell, including eg.
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.opfdiv_tessc
Fixes: d031365f7c ("nak: support MUFU.F16")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40804>
We used to splat out 8-bit vec2s to 16-bit by repeating both 8-bit
halves twice with the B0011 swizzle. I think the original idea here was
that 16-bit swizzles were more widely available in the hardware and that
this would make swizzling things easier. The problem is that nothing
actually knows that the value is half-repeated like this so nothing
knows it can upgrade a swizzle from B0022 to B0123 (H01). So instead we
get a bunch of B0022 swizzles, which nothing supports.
We can shave a lot of instructions if we just stop trying to be so
clever and instead repeat the whole thing with a B0101 swizzle.
The only real issue here is that v2[fiu]8_to_v2[fiu]16 needs a B0011
swizzle, which we have to apply on-the-fly. Fortunately, any swizzle
can be composed with B0011.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
This adds a new bytewise copy propagation pass which chews through MKVEC
and SWZ instructions. The word-based copy propagation pass only existed
to chew through SPLIT/COLLECT but MKVEC is COLLECT for bytes and we had
nothing to help with that.
This is actually two passes in one: Byte propagation and swizzle
propagation. Any time we see a MKVEC, we look at its sources only as
bytes and chase individual bytes back, through other MKVEC and SWZ, to
their generating instruction and make the MKVEC only consume the
original bytes. If the MKVEC happens to construct something that's just
a swizzle of another def (this is fairly common), we record that as
well. The idea here is that a lot of MKVEC just consume other MKVEC and
we can get rid of the intermediate ones or even the whole chain if it
just ends up being a swizzle in the end.
For SWZ instructions, we first look at them like a MKVEC of the
individual bytes they consume. If that doesn't yield a single swizzled
word, we then crawl through the words table, just accumulating swizzles.
This gives us the best (closest to the generating instructions) coherent
word. We could also replace SWZ with MKVEC and just do byte propagation
but MKVEC is often 2 instructions whereas SWZ is often one (or folded
into a source) so this is probably the better balance.
Finally, we not only replace the MKVEC and SWZ instructions but we also
attempt to propagate swizzles into individual ALU op sources. For v4i8
ops, this often fails since the full generality isn't always available
but for fp16, we can almost always fold the swizzle into the consuming
instruction.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we have bi_lower_mkvec_swz(), there's no need to be so careful
in the NIR -> bi translation. We can just emit MKVEC and move on. The
lowering pass will sort out the detaisl.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Now that we lower it, there's no advantage to one over the other at the
time this pass runs. Also, the is_8bit check was technically wrong
since it checks destination sizes, not source sizes. It's a lot safer
to just use SWZ.v4i8 and let the lowering pass do the right thing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Instead of trying very carefully in the bifrost emit code to only
generate valid MKVEC for the target hardware, this adds a lowering pass
which is capable of lowering any MKVEC or SWZ we can throw at it. Even
if the swizzle isn't supported or if it's a MKVEC.v4i8 on Valhall, we'll
lower it to something that does work on that platform. This frees up
the rest of the compiler so we can add and modify MKVEC and SWZ at-will
and never have to worry about hardware generation details.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
At least bi_half() has the decency to assert if the swizzle isn't
BI_SWIZZLE_H01 to start with but bi_byte() did an irrelevant assert
and then overwrote the swizzle with BI_SWIZZLE_B<lane> regardless of
what was there before. In a lot of cases, this doesn't matter but we
use both in translating NIR to BI on things that may have already been
swizzled so we need to do the composition.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The only real requirement here is that the destination offset is zero
and that the destination is big enough to hold the source. The source
offset doesn't matter.
Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
The non-trivial non-replicate swizzles on IADD.v4x8 and ISUB.v4x8 are
either documented wrong or broken in hardware. Instead of swizzling
b0101 and b2323, they swizzle b0011 and b2233 on G52. This is either a
hardware bug or an issue with documentation. In either case, it's
probably best not to trust it. Those swizzles aren't all that useful
anyway. We also weren't using any of them before (or they'd have
broken) so this isn't a performance regression.
Cc: mesa-stable
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40720>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
Once commited and have AABB or triangle intersection found, terminate
the traversal if TerminateOnFirstHit ray flag is present.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40773>
Commit e27e41a842 ("vulkan,spirv: update headers") exposed a
flaw in the cerealgenerator.
It modified -- among other things -- the VkDeviceCreateInfo
struct in vk.xml.
In the update, the len="enabledLayerCount,null-terminated"
attribute was removed from the ppEnabledLayerNames member.
The gfxstream code generator processes ppEnabledLayerNames
(which is a const char* const*), it identifies it as an "array of
strings". However, because the len attribute is now missing,
vulkanType.getLengthExpression() returns None.
This leads to errors like:
gfxstream_guest_vk_autogen_impl/gen/goldfish_vk_counting_guest.cpp:642:30:
error: use of undeclared identifier 'None'
642 | for (uint32_t i = 0; i < None; ++i)
| ^~~~
1 error generated.
This patch adds various length access checks to prevent this from
happening.
TEST=m vulkan.ranchu
Reviewed-by: David Gilhooley <djgilhooley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40785>
These failures turned out to be triggered by the new autotune
causing rendering mode transitions (such as GMEM -> SYSMEM) which
led to a new set of failures to be uncovered. They tend to work as
expected under either GMEM or SYSMEM being forced for all RPs but
the specific transitions caused by the autotuner leads them to fail.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
The usage of two CP counters by latency sensitive autotuner will
affect the operation of fdperf, this detects when counters have
selectors that have been changed and marks them as invalid with
corresponding UI cues.
This also seems to detect selector values being dropped while the
GPU is in sleep states and tends to be useful to catch that too.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Many games have short periods where a certain mode might win
consistently but this trend doesn't hold after that. Only allowing
locking to occur on RPs where a certain mode consistently stays
winning for 30s allows us to partially mitigate these bad locks.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
The maximum probability was limited to 95% earlier due to the step
delta of 5% (95+5=100% which we wanted to avoid). This introduces a
new slower step delta after 95% which steps at 1% up to 99% which
is significantly better in terms of eliminating the performance loss
or stuttering from when there is a large difference between the modes.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
There are certain scenarios where even switching to another render
mode has significant negative implications for performance even
when done for a single invocation. Now we try to heuristically
pick out these cases and lock them into the optimal mode, at the
moment the heuristic is fairly conservative but it manages to lock
RPs in under a minute in most cases.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
PC games tend to almost always run far better in SYSMEM due to the
high FS complexity, and so preferring SYSMEM tends to be a winning
policy until profiled mode reaches a state where it can surpass it.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Certain games tend to use rendering patterns that strongly prefer
one mode over the other, and thus we're better off not bothering
with profiling them.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
This introduces a new option that makes autotune optimize for low
preemption latency which is crucial to ensure responsiveness on
systems with GPU-based composition. A large enough draw can entirely
block the compositor from running with draw-level preemption, this can
be mitigated by preferring to use GMEM which breaks up the draw into
smaller pieces and generally has a lower latency for preemption.
As a further mitigation, tiles in GMEM are then divided into smaller
and smaller pieces which lowers the non-preemptible duration. There
are static checks in place to avoid doing this when it would incur a
cost that is too large.
Uses performance counters read during ambles to detect preemption
latency events while rendering in SYSMEM. This approach is superior
to using RBBM draw time thresholds which could be imprecise as only
the average was calculated rather than true maximum draw time.
However, converting the preemption latency performance counter value
from CP ticks to wall clock is based on the average GPU frequency of
the whole period from the start of the RP until the switch-away amble
while the preemption latency stars counting from the request. Thus, if
the GPU frequency shifts rapidly throughout the RP, it may cause the
estimated wall clock time to be inaccurate, but it should be good enough
in the vast majority of cases.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Tuning these small renderpasses is difficult due to their high
variability across command buffers and low impact on overall performance
in most cases. This change disables autotuning for renderpasses with 5
or fewer draw calls unless the TUNE_SMALL modifier flag is explicitly
set.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
In cases where only SW binning is possible and where there would be
a performance impact from not using HW binning (i.e. > 2 tiles), it
is preferable to default to SYSMEM as the performance impact of
using GMEM is almost definitely not going to be worth it.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
The default ROUND_DOWN_TO only handles POT alignment values, so
an additional variant was added which handles NPOT alignment too.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
This algo measures the time taken by each RP as a whole, and uses that
to move a probability distribution of whether to use GMEM or SYSMEM for
that RP. This is done with a delta of 5% per run, and the probability is
clamped to 5% and 95% to avoid getting stuck when conditions change.
Additionally, an "immediate resolve" variant which tries to work off a
single data point in SYSMEM and GMEM, then immediately resolves to the
faster path. This is useful for usage in CI which runs a single frame
multiple times where the performance isn't varying change from frame to
frame.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>