This frees a fairly large amount of memory from the 2D matrix by
iterating over the rows to free them individually.
Liuqiang spotted some areas that we return early in the threaded
function and don't free some pointers.
To remedy this, we'll reorder the checks so that we don't have to
return early and can instead use an if/else flow to take care of
these problematic areas in a more elegant way.
Co-authored-by: Casey Bowman <casey.g.bowman@intel.com>
Co-authored-by: liuqiang <liuqiang@kylinos.cn>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31793>
This allows larger buffer sizes when using the env config as well
as filepath for the output directory.
This will allow, for example, using a large number of singular frames:
frames=1/2/3/4/5/6/7/8/.../300
Also fixed an issue with filepaths sometimes being appended with garbage
characters due to not being initialized.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31793>
Previously, only the first image in the swapchain was chosen at all times
to be copied to a file.
This meant that if a list of consecutive images were selected, multiple
duplicate images would be saved, instead of the proper frames actually
used in the workload.
Now, the index is properly obtained from AcquireNextImageKHR(), leading
to the same image being used for the workload to be copied and saved to
a file.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31793>
According to PAL, an image with DCC/HTILE and mipmaps isn't coherent
with L2 when the mip level is in the metadata mip-tail region.
This fix isn't super optimal because the driver should rely on the
subresource range to determine if the mip level is in the mip-tail,
but it's easier to backport. Upcoming commits will optimize that.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11939
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31920>
This is a workaround that is still in progress, see HSD 22020521218.
If we don't have these NOPs we may see rendering corruption or even
GPU hangs.
While we still don't fully understand the issue from the hardware
point of view, let's have this workaround so we can pass CTS and move
things forward. If we need to change this later, we can. Besides, the
impact is minimal. Shaderdb/fossilize report no changes for this
patch.
On our Blackops trace, the lack of this patch causes corruption in fog
rendering (rectangles where fog was supposed to be shown don't show
the fog).
On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without
this patch we get a GPU hang.
Backport-to: 24.2
Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>
When running deqp-runner with a toml suite, the skip files can be
specified in the toml configuration or on the command line. The names of
most skip files are generated in `deqp-runner.sh` and passed through on
the command line so it’s not necessary to specify them again in the toml
suite. It doesn’t hurt, but it can be confusing.
Simplify the toml files by removing the duplicate skip files.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31912>
Rather than updating intel_device_info_update_l3_banks(), the Xe KMD
provides this info via the DRM_XE_DEVICE_QUERY_GT_TOPOLOGY query item.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31894>
Now that the LAVAJobSubmitter's `__post_init__` method is working for
unit tests, one of the job definitions parametrized tests started to
fail due to lack of LAVA proxy mocking.
This commit adds the missing pieces.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
python-fire auto-converts `item1,item2` into a tuple, but if there is a
dash `-` inside the argument, it treats it as a string.
Let's validate the data, when it comes as a `str` or `tuple`.
For more details, here are the tested scenarios:
| --lava-tags= | Type | Value |
|--------------|-------|---------------------|
| None | bool | True |
| '' | str | '' |
| tag1 | str | "tag1" |
| tag1, | tuple | ("tag1",) |
| tag-1,tag-2 | str | 'tag-1,tag-2' |
| tag1,tag2 | tuple | ("tag1", "tag2") |
| ',' | str | ',' |
| ',,' | str | ',,' |
| 'tag1,,' | str | 'tag1,,' |
See also:
https://google.github.io/python-fire/guide/#argument-parsing
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
We don't need the /done anymore, because we have better job
dependencies. But the mainline-or-fork query is still helpful, so
refactor that out into a common helper we can reuse for other things.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
Instead of providing a hardcoded set of arguments, allow overlays to be
added to the submitter script. Passing Python dicts as a string
representation and relying on coercion from strings is far from great,
but fire doesn't give anything else, so.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
We compose the rootfs from a mixture of the base rootfs (exported from
the container build stage, currently lava_build.sh, which can be reused
as long as the container isn't rebuilt), the Mesa build overlay
(exported from the debian-* build job, which can be reused for every job
in that pipeline), and the per-job rootfs (containing job-specific
variables which cannot be reused).
Instead of having LAVA pull the base rootfs and separately downloading
the build/per-job parts on the DUT, get LAVA to compose the whole thing
by using overlays.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
This enables the 16bit storage extensions, with the
uniformAndStorageBuffer16BitAccess feature-bit.
This seems to already be implemented, so let's just expose it!
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31907>
The compute stage has two EXCP_EN fields and the memory violation bit
is in EXCP_EN_MSB. Confirmed by writing a small test on GFX8.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31902>
On panthor, VK_SYNC_FEATURE_TIMELINE is always supported. On panfrost,
we can use vk_sync_timeline_get_type.
Note that there is a kernel issue regarding syncobj query that causes
dEQP-VK.synchronization.timeline_semaphore.wait.poll_signal_from_device
to time out when VK_SYNC_FEATURE_TIMELINE is set. It is considered a
kernel bug and is not dealt with here.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31720>
src_stages_to_subqueue_sb_mask calls stages_cover_subqueue, but also has
a special case for VK_PIPELINE_STAGE_2_DRAW_INDIRECT_BIT.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31720>
This bit is not needed for barriers and appears to trigger a
performance regression. So leave it for just for AUX-TT
flushing/invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3814dee1a ("anv: add plumbing/support for L3 fabric flush")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12090
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
Detect when the temporary index buffer cannot be generated due to too
large primitive count, and simply drop the draw on the floor.
Fixes a webgl reachable asan/crash.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12092
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31914>
This fixes a CTS hang on Hawaii.
We previously only did a CB/DB flush,
but that doesn't include a L2 cache flush.
Also fix the comment that said this is for GFX9+.
Fixes: 7c62f6fa01
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31906>
It looks like the fixed-func hardware is very slow to cull primitives
with zero pos.w but shader based culling helps a lot.
This fixes a massive performance gap with the FSR2 demo compared to
AMDGPU-PRO, +228% on RDNA2.
Based on my investigation, AMDGPU-PRO seems to always cull these
primitives. Note that disabling NGG culling with AMDGPU-PRO reports the
same performance as RADV without that fix. Also note that the FSR2
sample doesn't specify any cull mode (ie. VK_CULL_MODE_NONE is used),
so this is the only reason PRO was culling more than RADV.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7260
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31891>
Out of 5 run during busy hours, all the jobs that once took 5+ min are:
build-for-tests / debian-arm64-asan 6 min
build-only / debian-s390x 6 min
build-only / debian-android 7 min
build-only / debian-clang 8 min
build-for-tests / debian-arm32-asan 8 min
build-only / debian-vulkan 11 min
build-for-tests / debian-testing 12 min
build-only / debian-testing-msan 12 min
build-only / debian-clang-release 13 min
build-only / alpine-build-testing 14 min
build-for-tests / debian-testing-asan 21 min
The jobs at 10+ min are considered to take long enough that they might
risk crossing the 15 min mark, so let's keep these ones at 30 min and
lower the timeout for everyone else to 15 min.
It's worth pointing out that debian-testing-asan is a build-for-tests
job and as such it blocks build-only jobs from running until it's
finished, which can be a problem for a job that has been seen taking 20+
minutes. We should do something about that, but that's not the topic of
this MR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31846>
GitLab doesn't (yet) support `timeout:` being a variable, so let's put
the real `timeout:` at the max timeout we want, and internally use
another timeout (using coreutils' `timeout`) that we _can_ set using
a variable.
With that, we can set a 1h timeout on nightly LTO builds while keeping
our tighter 30min timeout the rest of the time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31846>
We may still need to insert a continue block even if there is only one
backedge, in a situation like:
for (...) {
if (...) continue;
foo();
break;
}
We want foo() to be executed before reconverging. This is important for
the BVH encoding kernel, which launches an invocation for each node in
the tree and does a preorder traversal:
while (true) {
if (!ready[node]) continue;
encode();
for (child node)
ready[child] = true;
break;
}
For the first few nodes, which will be in the same wave, we need
encode() for the root node to be called first, then its children spin
until ready, then the children call encode(), and so on. This can only
work if the children that aren't ready yet are parked while the parent
executes encode(), which requires the continue block.
This is also required because divergence analysis will assume that
uniform values written before the continue are still uniform after it,
which isn't the case now and causes an RA validation failure with Godot.
Fixes: 0fa93fb662 ("ir3: Fix convergence behavior for loops with continues")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31905>