This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
v2: Fixup crash spotted by Mark about missing alloc vfuncs
v3: Fixup double iteration over device->memory_objects (that ought to
be expensive...) (Ken)
v4: Add more asserts for non-softpin cases (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2371>
We would like to chain multiple primary command buffer to be submitted
together to i915. For prepare this, add end the command buffers with a
MI_BATCH_BUFFER_START and at submit time, replace it with
MI_BATHC_BUFFER_END if needed.
v2: Don't even consider non softpin platforms
v3: Fix inverted condition
v4: Limit is_chainable() to checking device->use_softpin (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2371>
When a secondary command buffer is encountered, insert an event that
links to the new batch.
This commit leaves intel_measure timestamp buffer objects mmapped,
which is more efficient than mapping/unmapping several times. With
the BOs mapped at all times, timestamp buffers can be managed directly
by intel_measure, where it will iterate over timestamps of linked
secondary buffers.
With timestamp buffers managed by intel_measure, a more efficient and
accurate check for render completion can be moved into intel_measure
from anv/iris.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7354>
This may vary based on the newer kernel engines based contexts.
v2 (Jason Ekstrand):
- Initialize anv_queue::exec_flags in anv_queue_init
- Don't conflate this with refactors to get_reset_stats
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
It's included in declaration of INTEL_DEBUG.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>
this catches some undefined behavior like e.g., using a stale descriptorset
that references deleted bos, which I would absolutely never do
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6747>
Fixes crashes in:
- Rise of the Tomb Rider (on benchmark start)
- Total War: Three Kingdoms (on game start)
- Total War: Warhammer II (on game start)
Fixes: 34a0ce58c7 ("anv: add a new execution mode for secondary command buffers")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6546>
This implements timeline semaphores using a new type of dma-fence
stored into drm-syncobjs. We use a thread to implement delayed
submissions.
v2: Drop cloning of temporary semaphores and just transfer their ownership (Jason)
Drain queue when dealing with binary semaphore
Ensure we don't submit to the thread as long as we don't need to
v3: Use __u64 not uintptr_t for kernel pointers
Fix commented code for INTEL_DEBUG=bat
Set DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES in timeline fence execbuf extension
Add new anv_queue_set_lost()
Drop multi queue stuff meant for the fake multi queue patch
Rework temporary syncobj handling
Don't use syncobj when not available (DeviceWaitIdle/CreateDevice)
Use ANV_MULTIALLOC
And a few more tweaks...
v4: Drop drained condition helper (Lionel)
Fix missing EXEC_OBJECT_WRITE on BOs we want to wait on (Jason)
v5: Add missing device->lost_reported in _anv_device_report_lost (Lionel)
Fix missing free on submit->simple_bo (Lionel)
Don't drop setting the device in lost state on QueueSubmit error (Jason)
Store submit->fence_bos as an array of uintptr_t (Jason)
v6: condition device->has_thread_submit to i915 & core DRM support (Jason)
v7: Fix submit->in_fence leakage on error (Jason)
Keep dummy semaphore with no thread submission (Jason)
v8: Move ownership of submit->out_fence to submit (Jason)
v9: Don't forget to read the VkFence's syncobj binary payload (Lionel)
v10: Take the mutex lock on anv_gem_close() (Jason/Lionel)
v11: Fix void* -> u64 cast on 32bit (Lionel)
v12: Rebase after BO backed timeline semaphore (Lionel)
v13: Fix missing snippets lost after rebase (Lionel)
v14: Drop update_binary usage (Lionel)
v15: Use ANV_MULTIALLOC (Lionel)
v16: Fix some realloc issues (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901>
This is a left over from the earlier version of
VK_KHR_performance_query where we used kernel relocs to implement
multi passe queries.
We're using self modifying batches now so we shouldn't need any
relocation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2001a80d4a ("anv: Implement VK_KHR_performance_query")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6291>
Those are currently hurting Felix' ability to look at the batches.
We can probably detect this in the aubinator but that's a bit more
work than falling back to the previous behavior.
v2: Condition VK_KHR_performance_query to not using this variable (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5391>
A buffer added to all execbufs so that we can attribute a batch that
caused a hang to a particular driver.
v2: Reuse workaround BO
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3203>
We initially used this debug option to mean "don't bother registering
the OA configuration into the kernel".
This change makes this option suppress any interaction with the
i915/perf interface. This is useful when debugging self modifying
batches with performance queries while running on the intel_mi_runner.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
This has the same kernel requirements are VK_INTEL_performance_query
v2: Fix empty queue submit (Lionel)
v3: Fix autotool build issue (Piotr Byszewski)
v4: Fix Reset & Begin/End in same command buffer, using soft-pin &
relocation on the same buffer won't work currently. This version
uses a somewhat dirty trick in anv_execbuf_add_bo (Piotr Byszewski)
v5: Fix enumeration with null pointers for either pCounters or
pCounterDescriptions (Piotr)
Fix return condition on enumeration (Lionel)
Set counter uuid using sha1 hashes (Lionel)
v6: Fix counters scope, should be COMMAND_KHR not COMMAND_BUFFER_KHR (Lionel)
v7: Rebase (Lionel)
v8: Rework checking for loaded queries (Lionel)
v9: Use new i915-perf interface
v10: Use anv_multialloc (Jason)
v11: Implement perf query passes using self modifying batches (Lionel)
Limit support to softpin/gen8
v12: Remove spurious changes (Jason)
v13: Drop relocs (Jason)
v14: Avoid overwritting .sType in
VkPerformanceCounterKHR/VkPerformanceCounterDescriptionKHR (Lionel)
v15: Don't copy the entire
VkPerformanceCounterKHR/VkPerformanceCounterDescriptionKHR (Jason)
Reuse anv_batch rather than custom packing (Jason)
v16: Fix missing MI_BB_END in reconfiguration batch
Only report the extension with kernel support (perf_version >= 3)
v17: Some cleanup of unused stuff
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
This change adds a call/return execution mode for secondary command
buffer rather than the existing copy into the primary batch mode.
v2: Rework convention to avoid burning an ALU register (Jason)
v3: Use anv_address_add() (Jason)
v4: Move command emissions to anv_batch_chain.c (Jason)
v5: Also move last MI_BBS emission in secondary command buffer to
anv_batch_chain.c (Jason)
v6: Fix end secondary command buffer end (Jason)
v7: Refactor anv_batch_address() to remove additional emit functions
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
This allows a pool's allocations to start somewhere other than the base
address. Our first real use of this will be to use a negative offset
for the binding table pool to make it so that the offset is baked into
the pool and the code in anv_batch_chain.c doesn't have to understand
pool offsetting.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4897>
We intentionally throw away all but one BT block but then we set
cmd_buffer->bt_block to ANV_STATE_NULL instead of the one we hung on to.
This causes the command buffer to immediately re-emit STATE_BASE_ADDRESS
the first time a BT is needed for no good reason.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called. This
is likely true for fences but it seems somewhat fragile.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
See also 246261f0ad
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455892
Fixes: 246261f0ad ("anv: prepare the driver for delayed submissions")
This is a bit more natural because we're already getting an anv_state
most places in the pipeline. The important part here, however, is that
we're no longer calling anv_block_pool_map on every alloc_binding_table
call. While it's probably pretty cheap, it is potentially a linear walk
over the list of BOs and it was showing up in profiles.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Timeline semaphore introduce support for wait before signal behavior,
which means that it is now allowed to call vkQueueSubmit() with wait
semaphores not yet submitted for execution. Our kernel driver requires
all of the wait primitives to be created before calling the execbuf
ioctl. As a result, we must delay submissions in the userspace driver.
This change store the necessary information to be able to delay a
VkSubmitInfo submission to the kernel driver.
v2: Fold count++ into array access (Jason)
Move queue list to another patch (Jason)
v3: Document cleanup of temporary semaphores (Jason)
v4: Track semaphores of SYNC_FD type that needs updating after delayed
submission
v5: Don't forget to update sync_fd in signaled semaphores after
submission (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we will submit to i915 from a submission thread, we won't be able
to directly report the error to the user (in particular through the
debug report callbacks). So prepare 2 paths to report errors device ->
notifying the user immediately, queue -> notifying the user the next
time an entry point is called.
In this change we still report directly for both paths, this will
change in the next commit.
v2: Split NULL batch parameter handling in
anv_queue_submit_simple_batch() in a different commit
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the future we'll have 2 different allocations depending on whether
we're using threaded submission or not.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.
v2: Found a second occurence
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.
Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!
v2: drop redundant fd = -1 (Jason)
v3: Update commit message (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we can conveniently map between GEM handles and struct anv_bo
pointers, we can use a simple bitset for residency tracking instead of
the complex hash set. This shaves about 3% off of a CPU-limited example
running with the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to start needing to lookup BO pointers by GEM handle so we
need access to the device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of depending on a mutable BO in the state pool for handling
growing state pools, add a concept of "wrapper" BOs which just wrap an
actual BO. This way, the wrapper can exist once for all of time and we
can put it in relocation lists even if the actual BO it references gets
swapped out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're not THAT strapped for space that we can't burn one extra bit for
a boolean. If we're really worried about it, we can always shrink the
flags field to 16 bits because the kernel only uses 7 currently.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We have to go through and rewrite them all anyway so it doesn't do us
any good to put them in the list in anv_reloc_list_add. Also, for state
pools the handles are likely wrong by the time vkQueueSubmit is called.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we would read the offset from the BO in anv_reloc_list_add
to generate the presumed offset and then again in the caller to compute
the 64-bit address to write into the buffer. However, if the offset
somehow changed between these two points, the presumed offset would no
longer match the written offset. This is unlikely to actually ever be a
problem in practice because the presumed offset gets recorded first and
so if the written address is wrong then the presumed offset is almost
certainly wrong and the relocation will trigger. However, it's much
safer to simply have anv_reloc_list_add return the 64-bit address.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The original value of 256 was under the assumption that you're a batch
buffer which is likely going to have a large number of relocations.
However, pipeline objects on Gen7 will have at most 6 relocations (one
per shader stage and one for the workaround BO) so this is a lot of
per-pipeline wasted space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>