fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 21:50:12 +01:00

Author	SHA1	Message	Date
Marek Olšák	529cdce799	radeonsi: remove 'Authors:' comments It's inaccurate. Instead, see the copyright and use "git log" and "git blame" to know the authorship. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-02 18:19:03 +01:00
Marek Olšák	1f2640bfa9	Revert "winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx." This reverts commit `f03b7c9ad9`. The libdrm interface is wrong.	2017-11-01 21:42:31 +01:00
Andrey Grodzovsky	f03b7c9ad9	winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-10-31 16:55:24 +01:00
Marek Olšák	49f5ce39c1	winsys/amdgpu: don't do read-modify-write on command buffers i.e. don't use \|= Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-17 22:03:03 +02:00
Marek Olšák	162502370c	winsys/amdgpu: implement sync_file import/export syncobj is used internally for interactions with command submission. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-12 21:07:41 +02:00
Marek Olšák	a2a326e8f8	winsys/amdgpu: use the new raw CS API This also cleans things up. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-09-11 16:29:52 +02:00
Marek Olšák	113278ee79	radeonsi: remove Constant Engine support We have come to the conclusion that it doesn't improve performance. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-08-22 13:29:47 +02:00
Marek Olšák	58af1f6bb0	winsys/amdgpu: fix a deadlock when waiting for submission_in_progress First this happens: 1) amdgpu_cs_flush (lock bo_fence_lock) -> amdgpu_add_fence_dependency -> os_wait_until_zero (wait for submission_in_progress) - WAITING 2) amdgpu_bo_create -> pb_cache_reclaim_buffer (lock pb_cache::mutex) -> pb_cache_is_buffer_compat -> amdgpu_bo_wait (lock bo_fence_lock) - WAITING So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job, but the CS ioctl is trying to release a buffer: 3) amdgpu_cs_submit_ib (CS thread - job entrypoint) -> amdgpu_cs_context_cleanup -> pb_reference -> pb_destroy -> amdgpu_bo_destroy_or_cache -> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK The simple solution is not to wait for submission_in_progress, which we need in order to create the list of dependencies for the CS ioctl. Instead of building the list of dependencies as a direct input to the CS ioctl, build the list of dependencies as a list of fences, and make the final list of dependencies in the CS thread itself. Therefore, amdgpu_cs_flush doesn't have to wait and can continue. Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib can continue. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294 Cc: 17.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-06-20 12:53:46 +02:00
Nicolai Hähnle	79dae12b41	winsys/amdgpu: add sparse buffers to CS ... and implement the corresponding fence handling. v2: - add missing bit in amdgpu_bo_is_referenced_by_cs_with_usage - remove pipe_mutex_* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-05 10:37:18 +02:00
Nicolai Hähnle	f3e514361c	winsys/amdgpu: extend amdgpu_add_fence to allow adding multiple fences Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-05 10:37:18 +02:00
Nicolai Hähnle	ae4f442304	winsys/amdgpu: build handles and flags list late on submit thread This probably has only minor performance effects, but it simplifies some subsequent code slightly. Ideally, it could also be used to simplify the handling of slab buffers in the same way, but unfortunately that's not possible as long as we need indices for relocations. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-04-05 10:37:17 +02:00
Marek Olšák	2fc5fe0e85	winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer The time spent in the function dropped by 37% for torcs. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-30 13:57:09 +01:00
Marek Olšák	1840800860	winsys/amdgpu: report a rejected IB as a lost context Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-23 23:43:38 +01:00
Nicolai Hähnle	a3832590c6	winsys/amdgpu: add fence and buffer list logic for slab allocated buffers Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-09-27 16:45:20 +02:00
Nicolai Hähnle	12657a7abf	winsys/amdgpu: remove unused field domains from amdgpu_cs_buffer Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-09-12 13:55:07 +02:00
Marek Olšák	63b99590db	winsys/amdgpu: implement cs_get_next_fence Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-06 14:29:30 +02:00
Marek Olšák	9646ae7799	gallium/radeon/winsyses: expose per-IB used_vram and used_gart to drivers The following patches will use this. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-08-06 13:56:14 +02:00
Marek Olšák	85388652f9	winsys/amdgpu: return an error on IB submission failures Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-14 22:00:54 +02:00
Marek Olšák	1c5a10497a	gallium/radeon/winsyses: boolean -> bool, TRUE -> true, FALSE -> false Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-25 23:13:42 +02:00
Marek Olšák	404d0d50d8	gallium/u_queue: add an option to have multiple worker threads independent jobs don't have to be stuck on only one thread v2: use CALLOC & FREE Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-24 12:24:40 +02:00
Marek Olšák	562cb03d76	gallium/util: import the multithreaded job queue from amdgpu winsys (v2) v2: rename the event to util_queue_fence Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-06-15 21:07:34 +02:00
Nicolai Hähnle	6aff6377b1	winsys/amdgpu: implement IB chaining on the gfx ring As a consequence, CE IB size never triggers a flush anymore. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-01 22:52:20 +02:00
Nicolai Hähnle	83a01cb498	winsys/amdgpu: start with smaller IBs, growing as necessary This avoids allocating giant IBs from the outset, especially for CE and DMA. Since we now limit max_dw only by the size that the buffer happens to be (which, due to the buffer cache, can be even larger than the rounded-up size we request), the new function amdgpu_ib_max_submit_dwords controls when we submit an IB. With this change, we effectively never flush prematurely due to the CE IB, after an initial warm-up phase. v2: - clean up buffer_size calculation Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-01 22:52:19 +02:00
Nicolai Hähnle	f80c6abb9e	winsys/amdgpu: add amdgpu_ib and amdgpu_cs_from_ib helper functions The latter function allows getting the containing amdgpu_cs from any IB (including non-main ones). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-01 22:52:19 +02:00
Nicolai Hähnle	92d5d97b10	winsys/amdgpu: simplify interface of amdgpu_get_new_ib We'll want to have an amdgpu_cs pointer for future changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-06-01 22:52:18 +02:00
Marek Olšák	53f33619a4	winsys/amdgpu: add back multithreaded command submission Ported from the initial amdgpu winsys from the private AMD branch. The thread creates the buffer list, submits IBs, and cleans up the submission context, which can also destroy buffers. 3-5% reduction in CPU overhead is expected for apps submitting a lot of IBs per frame. This is most visible with DMA IBs. v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs add another amdgpu_cs_sync_flush call into amdgpu_bo_map Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-05-26 16:43:45 +02:00
Marek Olšák	7997b5f005	winsys/amdgpu: Add support for const IB. v2: Use the correct IB to update request (Bas Nieuwenhuizen) v3: Add preamble IB. (Bas Nieuwenhuizen) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-04-19 18:10:30 +02:00
Marek Olšák	e78170f388	winsys/amdgpu: split IB data into a new structure in preparation for CE Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2016-04-19 18:10:30 +02:00
Marek Olšák	f4b77c764a	gallium/radeon: move ring_type into winsyses Not used by drivers. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2016-04-19 18:10:30 +02:00
Bas Nieuwenhuizen	6373845d98	winsys/amdgpu: enlarge buffer_indices_hashlist Enlarge the buffer hashlist to prevent large numbers of misses due to adding more buffers than can be cached in the hashlist. The game I tested had CS's with up to 1500 buffers and the overhead of amdgpu_lookup_buffer for various sizes was: 4096 1.97% (new value) 2048 4.37% 1024 6.92% 512 9.47% (old value) (percentage of CPU usage in render thread as determined by perf) The time spent in amdgpu_add_buffer self is ~4.2% in all cases and for 4096 the time needed to clear the hashlist is still < 0.10%, so I am not expecting significant regressions. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-03-09 00:52:07 +01:00
Marek Olšák	1e05812fcd	winsys/amdgpu: don't use the "rws" abbreviation for amdgpu_winsys Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-12-11 15:25:12 +01:00
Marek Olšák	6f48e2bee1	winsys/amdgpu: add winsys function cs_get_buffer_list For debugging. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-03 22:06:07 +02:00
Marek Olšák	93641f4341	gallium/radeon: stop using "reloc" in a few places Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-10-03 22:06:07 +02:00
Marek Olšák	5fb0180592	winsys/amdgpu: fix the type of memory usage counters If the 32-bit types overflowed, the driver could submit an IB that uses much more memory than is available. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-08-19 12:03:01 +02:00
Marek Olšák	2eb067db0f	winsys/amdgpu: add a new winsys for the new kernel driver v2: - lots of changes according to Emil Velikov's comments - implemented radeon_winsys::read_registers v3: - a lot of new work, many of them adapt to libdrm interface changes Squashed patches: winsys/amdgpu: implement radeon_winsys context support winsys/amdgpu: add reference counting for contexts winsys/amdgpu: add userptr support winsys/amdgpu: allocate IBs like normal buffers winsys/amdgpu: add IBs to the buffer list, adapt to interface changes winsys/amdgpu: don't use KMS handles as reloc hash keys winsys/amdgpu: sync buffer accesses to different rings winsys/amdgpu: use dependencies instead of waiting for last fence v2 gallium/radeon: unify buffer_wait and buffer_is_busy in the winsys interface (amdgpu part) winsys/amdgpu: track fences per ring and be thread-safe winsys/amdgpu: simplify waiting on a variable in amdgpu_fence_wait gallium/radeon: allow the winsys to choose the IB size (amdgpu part) winsys/amdgpu: switch to new amdgpu_cs_query_fence_status interface winsys/amdgpu: handle fence and dependencies merge winsys/amdgpu follow libdrm change to move user fence into UMD winsys/amdgpu: use amdgpu_bo_va_op for va map/unmap v2 winsys/amdgpu: use the new tiling flags winsys/amdgpu: switch to new GTT_USWC definition winsys/amdgpu: expose amdgpu_cs_query_reset_state to drivers winsys/amdgpu: fix valgrind warnings winsys/amdgpu: don't use VRAM with APUs that don't have much of it winsys/amdgpu: require LLVM 3.6.1 for VI because of bug fixes there winsys/amdgpu: remove amdgpu_winsys::num_cpus winsys/amdgpu: align BO size to page size winsys/amdgpu: reduce BO cache timeout winsys/amdgpu: remove useless flushing and waiting in amdgpu_bo_set_tiling winsys/amdgpu: use amdgpu_device_handle as a unique device ID instead of fd winsys/amdgpu: use safer access to amdgpu_fence_wait::signalled winsys/amdgpu: allow maximum IB size of 4 MB winsys/amdgpu: add ip_instance into amdgpu_fence gallium/radeon: add RING_COMPUTE instead of RADEON_FLUSH_COMPUTE winsys/amdgpu: set the ring type at CS initilization winsys/amdgpu: query the GART page size from the kernel winsys/amdgpu: correctly wait for shared buffers to become idle winsys/amdgpu: set the amdgpu_cs_fence structure only once at fence creation winsys/amdgpu: add a specific error message for cs_submit -> -ENOMEM winsys/amdgpu: check num_active_ioctls before calling amdgpu_bo_wait_for_idle winsys/amdgpu: clear user fence BO after allocating it winsys/amdgpu: fix user fences winsys/amdgpu: make amdgpu_winsys_create public winsys/amdgpu: remove thread offloading winsys/amdgpu: flatten the amdgpu_cs_context structure and simplify more v4: require libdrm 2.4.63	2015-08-14 15:02:28 +02:00

35 commits