For multiple queue emulation, we need to change how queue related
functions are working on the host side and do custom unboxing
before submitting the commands to the underlying driver.
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>
Indirect draw calls upload VS params themselves, if indirect draw call
has zero instances GPU seem to still try to upload consts, however
with zer instances it doesn't apply draw state. Without the draw state
the the HLSQ_VS_CNTL values is stale, so less constants may be specified
than draw call expects. It is found that if CP_DRAW_INDIRECT_MULTI_1_DST_OFF
is less than 0x3f - GPU is happy even if the constlen is less than that.
As a workaround we allocate driver params first and ensure that VS
constlen always has the minimum size which is enough to upload driver
params.
Fixes one of the GPU hangs in "Disney Epic Mickey: Rebrushed" on a750.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32140>
With all consts going through generic allocations it's now possible
to call ir3_setup_const_state once, and have lowerings that dynamically
lower things to consts just to update the max consts being used.
The only exception for now are immediates, since they eat up the space
that was left and allocated much later.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32140>
The order of allocation was backed into ir3_setup_const_state and
some other parts of ir3, which is rather brittle.
And don't assume offsets for consts in other part of code, their order
and offset calculation is not guaranteed.
This also potentially fixes indirect UBO effect on constlen size.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32140>
In the current API, precomp implicitly assumes full barriers both before & after
every dispatch. That's not good for performance. However, dropping the barriers
and requiring user to explicitly call barrier functions before/after would have
bad ergonomics.
So, we add a new parameter to the standard MESA_DISPATCH_PRECOMP signature
representing the barriers required around the dispatch. As usual, the actual
type & semantic is left to drivers to define what makes sense for their
hardware. We just reserve the place for it. (I think most drivers will want
bitflags here, but I don't think the actual flags are worth. If a driver wanted
to use a struct here, that would work too.)
Since the asahi stack doesn't do anything clever with barriers yet, we
mechnically add an AGX_BARRIER_ALL barrier to all precomp users in-tree. We can
optimize that later, this just gets the flag-day change in with no functional
change.
For JM panfrost, this will provide a convenient place to stash both their "job
barrier" bit and their "suppress prefetch" bit (which is really a sort of
barrier / cache flush, if you think about it).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32980>
When cross-building for Android meson fails to find supported compiler
arguments for the C++ cross-compiler, e.g.:
```
Compiler for C++ supports arguments -Wno-array-bounds: NO (cached)
Compiler for C++ supports arguments -Wno-overflow: NO
Compiler for C++ supports arguments -Wno-c++11-narrowing: NO (cached)
Compiler for C++ supports arguments -Wno-vla-cxx-extension: NO (cached)
```
This is due to an **unrelated** and more generic compilation failure
when testing for the supported arguments, e.g.:
```
Command line: `/tmp/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android34-clang++ -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -static-libstdc++ /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpv6_hke9l/testfile.cpp -o /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpv6_hke9l/output.obj -c -Wno-error=c99-designator -Wno-error=unused-variable -Wno-error=unused-but-set-variable -Wno-error=self-assign -D_FILE_OFFSET_BITS=64 -O0 -fpermissive -Werror=implicit-function-declaration -Werror=unknown-warning-option -Werror=unused-command-line-argument -Werror=ignored-optimization-argument -Wvla-cxx-extension -Wno-vla-cxx-extension` -> 1
stderr:
clang++: error: argument unused during compilation: '-static-libstdc++' [-Werror,-Wunused-command-line-argument]
-----------
Compiler for C++ supports arguments -Wno-vla-cxx-extension: NO
```
The issue is caused by how the cross compiler is set up by
.gitlab-ci/container/create-android-cross-file.sh
Allow the cross compiler to still start even when the
`-Werror,-Wunused-command-line-argument` error occurs in order to be
able to actually detect other arguments:
```
Command line: `/tmp/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android34-clang++ -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables --start-no-unused-arguments -static-libstdc++ --end-no-unused-arguments /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpm6eolxed/testfile.cpp -o /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpm6eolxed/output.obj -c -Wno-error=c99-designator -Wno-error=unused-variable -Wno-error=unused-but-set-variable -Wno-error=self-assign -D_FILE_OFFSET_BITS=64 -O0 -fpermissive -Werror=implicit-function-declaration -Werror=unknown-warning-option -Werror=unused-command-line-argument -Werror=ignored-optimization-argument -Wvla-cxx-extension -Wno-vla-cxx-extension` -> 0
Compiler for C++ supports arguments -Wno-vla-cxx-extension: YES
```
This makes argument detection work again, e.g:
```
Compiler for C++ supports arguments -Wno-array-bounds: YES (cached)
Compiler for C++ supports arguments -Wno-overflow: YES
Compiler for C++ supports arguments -Wno-c++11-narrowing: YES (cached)
Compiler for C++ supports arguments -Wno-vla-cxx-extension: YES (cached)
```
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12441
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33013>
../src/asahi/vulkan/hk_cmd_draw.c: In function ‘hk_draw’:
../src/asahi/vulkan/hk_cmd_draw.c:3471:32: error: expression in static assertion is not constant
3471 | static_assert(size > sizeof(VkDrawIndirectCommand),
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 5bc89aa991 ("hk,libagx: handle adjacency without a GS")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12351
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32988>
The Xe uAPI is designed to use bind queues such that binds without input
dependencies (sync objects) do not block on binds with input
dependencies.
For example:
- Bind A (sparse) is submitted with a list of input dependencies.
- Bind B (immediate) is subsequently submitted without a list of input
dependencies.
If Bind A and Bind B share a single bind queue, Bind B will not be
scheduled until Bind A completes. Using individual bind queues decouples
Bind A and Bind B, allowing Bind B to make immediate progress.
This change creates a separate bind queue for each ANV queue, enabling
support for sparse bindings that may have input dependencies.
v2:
- Bail on bind queue creation failure (Linoel)
- Only create bind queue if VK_QUEUE_SPARSE_BINDING_BIT is set (Jose)
v3:
- Add comment around submit->queue usage (Jose)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32873>
It's split into ac_nir_lower_ps_early ac_nir_lower_ps_late.
ac_nir_lower_ps_early doesn't generate any AMD specific intrinsics except
some system values and is mainly an optimization pass with some lowering.
The new change here is that it also eliminates output components not needed
by spi_shader_col_format.
ac_nir_lower_ps_late lowers output stores to exports and does the bc_optimize
thing.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
Add support for getting time elapsed values via glBeginQuery/glEndQuery.
When recording query start & end time, we ensure that all pending jobs have
been completed by using v3d cpu_queue & the multisync extension.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support for getting timestamp values via
glGet(GL_TIMESTAMP) and glQueryCounter(GL_TIMESTAMP). For the case of
glQueryCounter, we make use of v3d cpu jobs via
DRM_IOCTL_V3D_SUBMIT_CPU and DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support to check if v3d supports the multisync
extension. This will be used in future patches to enable support for
PIPE_CAP_QUERY_TIMESTAMP & PIPE_CAP_QUERY_TIME_ELAPSED.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support to check if v3d supports cpu_queue. This
will be used in future patches to enable support for
PIPE_CAP_QUERY_TIMESTAMP & PIPE_CAP_QUERY_TIME_ELAPSED.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
The attribute ring size per SE is different than GFX11 and it was
already computed correctly in common code but RADV was using the old
GFX11 style.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32994>
The spec says
vkCmdCopyQueryPoolResults is considered to be a transfer operation,
and its writes to buffer memory must be synchronized using
VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before
using the results.
While STORE_MULTIPLE is not exactly VK_PIPELINE_STAGE_TRANSFER_BIT /
VK_ACCESS_TRANSFER_WRITE_BIT, we can still rely on user barriers to do
the right thing (e.g., flush caches for host access).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
When VK_QUERY_RESULT_WAIT_BIT is set, we rely on sync wait. When
VK_QUERY_RESULT_WAIT_BIT is not set, no wait is needed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
We can guarantee ordering with this sequence of async cmds
RUN_FRAGMENT ->
(signal and wait SB_ITER) ->
FLUSH_CACHE2 ->
(signal and wait DEFERRED_FLUSH) ->
SYNC_SET32
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
The spec says
VUID-vkCmdBeginQueryIndexedEXT-None-00807
All queries used by the command must be unavailable
and panvk_cmd_reset_occlusion_queries is synchronous.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
The spec says
Resetting a query via vkCmdResetQueryPool or vkResetQueryPool sets the
status to unavailable and makes the numerical results undefined.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>