Commit graph

58 commits

Author SHA1 Message Date
Marek Olšák
3d0a4864ce winsys/amdgpu: add amdgpu_cs::ws to reduce dereferences
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9809>
2021-04-06 22:31:15 +00:00
Marek Olšák
ff311df6b5 winsys/amdgpu: remove amdgpu_winsys_bo::num_cs_references to remove atomics
This decreases the CPU time percentage of amdgpu_cs_add_buffer by 50%
on Ryzen 3900X.

We don't need to call amdgpu_bo_is_referenced_by_any_cs
in amdgpu_bo_can_reclaim. The reclaim function is only called for buffers
that have 0 references.

The only downside is that amdgpu_bo_is_referenced_by_cs might be slower
in some very rare cases. Overall the driver overhead is better.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8849>
2021-02-06 05:41:22 +00:00
Marek Olšák
06b9dedfd9 winsys/amdgpu: optimize out conditionals in amdgpu_lookup_buffer
Move them to a wrapper function.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8849>
2021-02-06 05:41:22 +00:00
Marek Olšák
3bd9db5be3 r300,r600,radeonsi: inline struct radeon_cmdbuf to remove dereferences
It's straightforward except that the amdgpu winsys had to be cleaned up
to allow this.

radeon_cmdbuf is inlined and optionally the winsys can save the pointer
to it. radeon_cmdbuf::priv points to the winsys cs structure.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7907>
2020-12-05 10:52:17 -05:00
Marek Olšák
2c61411f25 winsys/amdgpu: don't use debug_get_option_noop in a hot path
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7721>
2020-12-01 15:33:03 -05:00
Marek Olšák
9c239aa638 winsys/amdgpu: replace amdgpu_winsys_bo::flags with pb_buffer::usage
Let's use the field so as not to waste memory.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:40 -05:00
Marek Olšák
37cdce0146 winsys/amdgpu: remove amdgpu_winsys_bo::sparse
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:38 -05:00
Pierre-Eric Pelloux-Prayer
55b018b634 amd/winsys: add RADEON_FLUSH_TOGGLE_SECURE_SUBMISSION
Instead of exposing a cs_set_secure() callback that always needs a call
to si_flush_gfx_cs before a switch, this commit introduces a new
flag to switch between secure and non-secure on submissions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
2020-09-24 14:51:16 +00:00
Marek Olšák
9e2113c6dc radeonsi: set up IBs for preemption
- Execute cs_preamble_state as a separate IB with different flags.
- Set the PREEMPT flag for the main IB.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5798>
2020-07-22 12:08:33 -04:00
Pierre-Eric Pelloux-Prayer
977e19d5cf amdgpu/radeon: add secure api
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4401>
2020-05-11 10:25:53 +02:00
Bas Nieuwenhuizen
531728d6cb drm-uapi,radv,radeonsi: Add amdgpu_drm.h header.
Use it instead of the libdrm provided amdgpu_drm.h header. I used
the kernel revision from the README to get the header so the
header versions should be consistent.

Tested by removing /usr/include/libdrm/amdgpu_drm.h from my dev-machine.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4749>
2020-04-27 12:27:02 +00:00
Michel Dänzer
cb446dc0fa winsys/amdgpu: Add amdgpu_screen_winsys
It extends pipe_screen / radeon_winsys and references amdgpu_winsys.
Multiple amdgpu_screen_winsys instances may reference the same
amdgpu_winsys instance, which corresponds to an amdgpu_device_handle.

The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM
file descriptor passed to amdgpu_winsys_create, which will be needed
in the next change.

v2:
* Add comment in amdgpu_winsys_unref explaining why it always returns
  true (Marek Olšák)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Marek Olšák
b19884e08e winsys/amdgpu: add a parallel compute IB coupled with a gfx IB
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:07:00 -04:00
Marek Olšák
114a899cc8 winsys/amdgpu: cs_check_space sets the minimum IB size for future IBs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
881ef14b32 winsys/amdgpu: use a separate fence list for syncobjs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
9f00123d51 winsys/amdgpu: unify fence list code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
e0a6399eb4 winsys/amdgpu: rename rfence, rsrc, rdst -> afence, asrc, adst
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 12:26:45 -05:00
Marek Olšák
d2b2364313 radeonsi: stop command submission with PIPE_CONTEXT_LOSE_CONTEXT_ON_RESET only
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-11-09 14:55:04 -05:00
Marek Olšák
6b1e0e51e6 radeonsi: rework RADEON_PRIO flags to be <= 31
This decreases sizeof(struct amdgpu_cs_buffer) from 24 to 16 bytes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-07-16 13:32:33 -04:00
Marek Olšák
caf41fb96d winsys/amdgpu: make amdgpu_cs_context::flags & handles local
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-07-16 13:32:33 -04:00
Marek Olšák
6703fec58c amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Andres Rodriguez
cc9762d74d winsys/amdgpu: add support for syncobj signaling v3
Add the ability to signal a syncobj when a cs completes execution.

v2: corresponding changes for gallium fence->semaphore rename
v3: s/semaphore/fence for pipe objects

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Nicolai Hähnle
e6dbc804a8 winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences
The idea is to fix the following interleaving of operations
that can arise from deferred fences:

 Thread 1 / Context 1          Thread 2 / Context 2
 --------------------          --------------------
 f = deferred flush
 <------- application-side synchronization ------->
                               fence_server_sync(f)
                               ...
                               flush()
 flush()

We will now stall in fence_server_sync until the flush of context 1
has completed.

This scenario was unlikely to occur previously, because applications
seem to be doing

 Thread 1 / Context 1          Thread 2 / Context 2
 --------------------          --------------------
 f = glFenceSync()
 glFlush()
 <------- application-side synchronization ------->
                               glWaitSync(f)

... and indeed they probably *have* to use this ordering to avoid
deadlocks in the GLX model, where all GL operations conceptually
go through a single connection to the X server. However, it's less
clear whether applications have to do this with other WSI (i.e. EGL).
Besides, even this sequence of GL commands can be translated into
the Gallium-level sequence outlined above when Gallium threading
and asynchronous flushes are used. So it makes sense to be more
robust.

As a side effect, we no longer busy-wait on submission_in_progress.

We won't enable asynchronous flushes on radeon, but add a
cs_add_fence_dependency stub anyway to document the potential
issue.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:22 +01:00
Marek Olšák
529cdce799 radeonsi: remove 'Authors:' comments
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-02 18:19:03 +01:00
Marek Olšák
1f2640bfa9 Revert "winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx."
This reverts commit f03b7c9ad9.

The libdrm interface is wrong.
2017-11-01 21:42:31 +01:00
Andrey Grodzovsky
f03b7c9ad9 winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-10-31 16:55:24 +01:00
Marek Olšák
49f5ce39c1 winsys/amdgpu: don't do read-modify-write on command buffers
i.e. don't use |=

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-17 22:03:03 +02:00
Marek Olšák
162502370c winsys/amdgpu: implement sync_file import/export
syncobj is used internally for interactions with command submission.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-12 21:07:41 +02:00
Marek Olšák
a2a326e8f8 winsys/amdgpu: use the new raw CS API
This also cleans things up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-11 16:29:52 +02:00
Marek Olšák
113278ee79 radeonsi: remove Constant Engine support
We have come to the conclusion that it doesn't improve performance.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-22 13:29:47 +02:00
Marek Olšák
58af1f6bb0 winsys/amdgpu: fix a deadlock when waiting for submission_in_progress
First this happens:

1) amdgpu_cs_flush (lock bo_fence_lock)
   -> amdgpu_add_fence_dependency
   -> os_wait_until_zero (wait for submission_in_progress) - WAITING

2) amdgpu_bo_create
   -> pb_cache_reclaim_buffer (lock pb_cache::mutex)
   -> pb_cache_is_buffer_compat
   -> amdgpu_bo_wait (lock bo_fence_lock) - WAITING

So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't
continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job,
but the CS ioctl is trying to release a buffer:

3) amdgpu_cs_submit_ib (CS thread - job entrypoint)
   -> amdgpu_cs_context_cleanup
   -> pb_reference
   -> pb_destroy
   -> amdgpu_bo_destroy_or_cache
   -> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK

The simple solution is not to wait for submission_in_progress, which we
need in order to create the list of dependencies for the CS ioctl. Instead
of building the list of dependencies as a direct input to the CS ioctl,
build the list of dependencies as a list of fences, and make the final list
of dependencies in the CS thread itself.

Therefore, amdgpu_cs_flush doesn't have to wait and can continue.
Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib
can continue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294

Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-20 12:53:46 +02:00
Nicolai Hähnle
79dae12b41 winsys/amdgpu: add sparse buffers to CS
... and implement the corresponding fence handling.

v2:
- add missing bit in amdgpu_bo_is_referenced_by_cs_with_usage
- remove pipe_mutex_*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
f3e514361c winsys/amdgpu: extend amdgpu_add_fence to allow adding multiple fences
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
ae4f442304 winsys/amdgpu: build handles and flags list late on submit thread
This probably has only minor performance effects, but it simplifies some
subsequent code slightly.

Ideally, it could also be used to simplify the handling of slab buffers
in the same way, but unfortunately that's not possible as long as we need
indices for relocations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:17 +02:00
Marek Olšák
2fc5fe0e85 winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer
The time spent in the function dropped by 37% for torcs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:57:09 +01:00
Marek Olšák
1840800860 winsys/amdgpu: report a rejected IB as a lost context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Nicolai Hähnle
a3832590c6 winsys/amdgpu: add fence and buffer list logic for slab allocated buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:20 +02:00
Nicolai Hähnle
12657a7abf winsys/amdgpu: remove unused field domains from amdgpu_cs_buffer
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:55:07 +02:00
Marek Olšák
63b99590db winsys/amdgpu: implement cs_get_next_fence
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 14:29:30 +02:00
Marek Olšák
9646ae7799 gallium/radeon/winsyses: expose per-IB used_vram and used_gart to drivers
The following patches will use this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-06 13:56:14 +02:00
Marek Olšák
85388652f9 winsys/amdgpu: return an error on IB submission failures
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-14 22:00:54 +02:00
Marek Olšák
1c5a10497a gallium/radeon/winsyses: boolean -> bool, TRUE -> true, FALSE -> false
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-25 23:13:42 +02:00
Marek Olšák
404d0d50d8 gallium/u_queue: add an option to have multiple worker threads
independent jobs don't have to be stuck on only one thread

v2: use CALLOC & FREE

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák
562cb03d76 gallium/util: import the multithreaded job queue from amdgpu winsys (v2)
v2: rename the event to util_queue_fence

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-15 21:07:34 +02:00
Nicolai Hähnle
6aff6377b1 winsys/amdgpu: implement IB chaining on the gfx ring
As a consequence, CE IB size never triggers a flush anymore.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:20 +02:00
Nicolai Hähnle
83a01cb498 winsys/amdgpu: start with smaller IBs, growing as necessary
This avoids allocating giant IBs from the outset, especially for CE and DMA.

Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.

With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.

v2:
- clean up buffer_size calculation

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
f80c6abb9e winsys/amdgpu: add amdgpu_ib and amdgpu_cs_from_ib helper functions
The latter function allows getting the containing amdgpu_cs from any IB
(including non-main ones).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:19 +02:00
Nicolai Hähnle
92d5d97b10 winsys/amdgpu: simplify interface of amdgpu_get_new_ib
We'll want to have an amdgpu_cs pointer for future changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:52:18 +02:00
Marek Olšák
53f33619a4 winsys/amdgpu: add back multithreaded command submission
Ported from the initial amdgpu winsys from the private AMD branch.

The thread creates the buffer list, submits IBs, and cleans up
the submission context, which can also destroy buffers.

3-5% reduction in CPU overhead is expected for apps submitting a lot
of IBs per frame. This is most visible with DMA IBs.

v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs
    add another amdgpu_cs_sync_flush call into amdgpu_bo_map

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-26 16:43:45 +02:00
Marek Olšák
7997b5f005 winsys/amdgpu: Add support for const IB.
v2: Use the correct IB to update request (Bas Nieuwenhuizen)
v3: Add preamble IB. (Bas Nieuwenhuizen)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-19 18:10:30 +02:00