because the hw and LLVM only support subdword single-component SSBO loads,
and ac_nir_to_llvm splits multi-component loads because of that, which is
inefficient.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
This fixes broken subdword UBO loads with LLVM.
It's only needed for LLVM, but it's done for both LLVM and ACO because
the pass can be fully validated only with ACO and the Vulkan CTS right now.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
Instead of having ac_set_reg_cu_en that sets the register, replace it with
ac_apply_cu_en that only returns the modified register value,
which allows a large simplification in both drivers because a lot of code
becomes duplicated after it's switched to ac_apply_cu_en.
RADV also didn't apply it to a few registers. Fixed.
This removes 82 lines of code in total.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
The result buffer is where the kernel places statistics and fault
information after the GPU executes a command. Dummy structure pending
UAPI.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
This might be useful in the future, but it is best reimplemented in
terms of the upcoming Linux UAPI instead of having parallel codepaths.
Let's drop it.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21662>
After cross-checking with kernel and the old buffer copy code, it seems
that the size field should be size - 1 instead.
Fixes: 46c95047bd ("radeonsi: implement si_sdma_copy_image for gfx7+")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21585>
We are seeing endless DRM_IOCTL_SYNCOBJ_WAIT ioctl when system memory is
under pressured.
Commit f9d8d9acbb ("iris: Avoid abort() if kernel can't allocate
memory") avoids the abort() on ENOMEM by resetting the batch. However,
when there's an ongoing OpenGL query, resetting the batch will make the
snapshots_landed never be flipped, so iris_get_query_result() gets stuck
in the while loop forever.
Since there's no guarantee that the next batch after resetting won't hit
ENOMEM, so instead of resetting the batch, be patient and wait until kernel has
enough memory. Once the batch is submiited and snapshots_landed gets
flipped, iris_get_query_result() can proceed normally.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6851
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20449>
So that the next person trying to cut down LLVM compile times doesn't trip
over this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21142>
Again, this should reduce the complexity of the LLVM IR we emit in some
cases. We don't use it for shared loads, due to the noted corner case.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21142>
If we're just loading memory, we can take the scalar offset_is_uniform
paths even the first active invocation is nonzero, saving a bunch of
looping and bounds checking for per-element loads. And, if we don't have
an active invocation, doing the load for element 0 (which is
bounds-checked to return 0 if element 0 had a bad value in it) before
throwing away the result is still better than doing bounds-checked loads
for each element before throwing away the result.
dEQP-VK.ubo.random.16bit.scalar.92 goes from 16.5 to 14.0 seconds.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21142>
gallivm doesn't actuially jump across branches where no invocations are
active, so my previous assertion about the exec mask being nonzero was
incorrect. This means that we'll always use a defined invocation for the
various LLVMBuildExtractElements using the result value, which is an
improvement over my even the code before my cttz change that would use
undefined values for the element to be extracted.
Fixes: 8c2493d041 ("gallivm: Use cttz instead of a loop for first_active_invocation().")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21142>
This ensures that users of libintel_dev.a won't be compiled until
include files are generated, and that they are recompiled when the
header changes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20825>
The offset handling should already work for flattening to our split vars,
just need to make sure we have enough (or any!) array elements.
Fixes: #7154
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13288>
Currently the iris driver is adding the buffer objects to zombie list
without checking if it is busy or not. It checks for it after 1 second
which adds delay to buffer release.
This fix checks if the bo is busy or not before adding it to zombie list.
Without this fix, the applications expecting immediate buffer release would
fail.
The fix is identified while debugging below android cts tests:
android.graphics.cts.BitmapTest#testDrawingHardwareBitmapNotLeaking
android.graphics.cts.BitmapTest#testHardwareBitmapNotLeaking
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21460>
It seems that when batches are submitted back to back, the USC can
retain cache contents between them. This causes a problem when the CPU
updates a VBO between batches, since some of those updates might not be
visible to the USC.
Looks like the VDM barrier command with one magic bit set fixes this, so
let's try that.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21538>
Coverity points out that we can pass a negative value to `close()`,
which results in an unchecked error. While this is technically true, it
really isn't a problem as `close()` is speced to return -1 in that case
(which we ignore). However, what is true is that if we fail to dup the
fd (the only case where we could end up with a negative value), then
we're in an unrecoverable error state anyway, and should go to the error
cleanup code.
CID: 1521539
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21568>
this avoids potentially splitting renderpasses by ensuring that
all (non-cs) query operations always occur inside renderpasses
zink_query_update_gs_states() now has to be called inside renderpass
to catch the active queries
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21534>