This allows a pool's allocations to start somewhere other than the base
address. Our first real use of this will be to use a negative offset
for the binding table pool to make it so that the offset is baked into
the pool and the code in anv_batch_chain.c doesn't have to understand
pool offsetting.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4897>
Also update all of its callers.
On the next commit, the device will be used by anv_gem_munmap to choose
whether we need to call the valgrind code or not, depending on which
type of mmap we are using.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1675>
When we originally wrote a bunch of the allocation data structures, we
re-used the GPU memory for CPU-side data structures. It's a bit more
memory efficient and usually ok. However, this has a couple of
problems:
1. It makes it MUCH more likely that the GPU will accidentlly stomp
CPU-side data structures and cause nearly impossible to debug
crashes.
2. With discrete GPUs, the memory will be mapped somehow and that map
may be across the BAR so it could have horribly slow CPU access.
This is bad for our CPU-side data structures.
In the case of anv_state_stream, it also made the data structure
massively more complex than it needed to be.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
If we have an allocation that's exactly the block size, we end up
computing a new block size to allocate that's exactly the block size,
add in the header, and then assert fail. When computing the block size,
we need to account for the header.
Fixes: 955127db93 "anv/allocator: Add support for large stream..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
My understanding is that there's no reason for the scratch space
allocation to be different between iris, i965 and anv. Let's make all
the functions behave the same.
I don't know if this fixes any specific gen9 bugs, it it might since
it increases the scratch space.
v2: Rebase.
v3: Rebase.
v4: Remove redundant gen 11 check (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
This is the same idea as "intel: fix the gen 11 compute shader scratch
IDs".
The number of EUs on TGL is not the same as ICL, but the
MEDIA_VFE_STATE restrictions stay the same, so adapt the code to it.
Also, consider the base configuration instead of what we read from the
Kernel.
According to Mark, this fixes the following piglit tests on TGL:
piglit.spec.arb_compute_shader.execution.shared-atomicmax-uint.tglm64
piglit.spec.arb_compute_shader.execution.shared-atomicmax-int.tglm64
piglit.spec.intel_shader_atomic_float_minmax.execution.shared-atomicmax-float.tglm64
v2: s/ICL+/Gen11+/ (Jason).
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Scratch space allocation is based on the number of threads in the base
configuration, and we only have one base configuration for ICL, with 8
subslices.
This fixes an issue with Aztec on Vulkan in a machine with a
configuration that's not the base. The issue looks like a regression
from b9e93db208, but it seems things are broken since forever, just
not easily reproducible.
v2: Reimplement it using the subslices variable. Don't touch TGL.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Because softpin block pools are made up of a set of BOs with different
maps, it was possible for a single state to end up straddling blocks.
To fix this, we pass a contiguous size to anv_block_pool_grow and it
ensures that the next allocation in the pool will have at least that
size.
We also add an assert in anv_block_pool_map to ensure we always get
contiguous maps. Prior to the changes to anv_block_pool_grow, the unit
tests failed with this assert. With this patch, the tests pass.
This was causing problems on Gen12 where we allocate the pages for the
AUX table from the dynamic state pool. The first chunk, which gets
allocated very early in the pool's history, is 1MB which was enough that
it was getting multiple BOs. This caused the gen_aux_map code to write
outside of the map and overwrite the instruction state pool buffer which
lead to GPU hangs.
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The previous way we were attempting to handle AUX tables on TGL-LP was
very GL-like. We used the same aux table management code that's shared
with iris and we updated the table on image create/destroy. The problem
with this is that Vulkan allows multiple VkImage objects to be bound to
the same memory location simultaneously and the app can ping-pong back
and forth between them in the same command buffer. Because the AUX
table contains format-specific data, we cannot support this ping-pong
behavior with only CPU updates of the AUX table.
The new mechanism switches things around a bit and instead makes the aux
data part of the BO. At BO creation time, a bit of space is appended to
the end of the BO for AUX data and the AUX table is updated in bulk for
the entire BO. The problem here, of course, is that we can't insert the
format-specific data into the AUX table at BO create time.
Fortunately, Vulkan has a requirement that every TILING_OPTIMAL image
must be initialized prior to use by transitioning the image from
VK_IMAGE_LAYOUT_UNDEFINED to something else. When doing the above
described ping-pong behavior, the app has to do such an initialization
transition every time it corrupts the underlying memory of the VkImage
by using it as something else. We can hook into this initialization and
use it to update the AUX-TT entries from the command streamer. This way
the AUX table gets its format information, apps get aliasing support,
and everyone is happy.
One side-effect of this is that we disallow CCS on shared buffers.
We'll need to fix this for modifiers on the scanout path but that's a
task for another patch. We should be able to do it with dedicated
allocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
All they do now is take a size, align, and flags and figure out which
heap to allocate in. All of the actual code to deal with the BO is in
anv_allocator.c. We want to leave anv_vma_alloc/free in anv_device.c
because it deals with API-exposed heaps so it still makes sense to have
it there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.
Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When a BO is flagged as having a client visible address, we put it in
its own heap. We also support the client explicitly specifying an
address in said heap. If an address collision happens, we return false
from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have a mechanism for specifying that we want a fixed address
provided by the driver internals. We're about to let the client start
specifying addresses in some very special scenarios as well so we want
to pass this through to the allocation function.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In ee77938733, we started using the BO cache for anv_bo_pool and
stopped using the bo_flags parameter. However, we never dropped it from
the struct or the init function.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
BOs are now only ever allocated through the BO cache so there's no need
to have these exposed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
While we're here, we get rid of the locking and use a lock-free
algorithm. The chances of spilling contention are low and this is
actually a bit simpler in some ways.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit switches block pools over to being allocated from the BO
cache rather than being allocated manually by the block pool.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All block pools are allocated with the same flags. There's no good
reason why it needs to be configurable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes a number of changes to the current API:
1. Everything is renamed to anv_device_* instead of anv_bo_cache_*
because the BO cache is soon going to be the sole BO allocation path
and not some special case to make import/export work.
2. Drop the cache parameter. It's totally redundant with the device
and just annoying to keep typing.
3. Rework flags so that they go the convenient direction for usage in
ANV rather than whichever awkward way the i915 specified it to
maintain backwards compatibility. This also gives us the
opportunity to set some defaults.
4. Add flags for mapping and coherency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The growing algorithms for the softpin case and the userptr version are
almost entirely different. Having this weird join doesn't make the code
more comprehensible. This rework does a few things:
1. Move the comment about 48-bit addresses to anv_device_init where we
actually unset the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.
2. Separate the paths in anv_block_pool_expand_range so it's easier to
see what happens in the two different cases.
3. Use the anv_block_poo::bos array for storing all allocated BOs in
both paths rather than using the cleanup list in both paths. This
lets us make the cleanups array only used for mmaps of the memfd for
the userptr case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of depending on a mutable BO in the state pool for handling
growing state pools, add a concept of "wrapper" BOs which just wrap an
actual BO. This way, the wrapper can exist once for all of time and we
can put it in relocation lists even if the actual BO it references gets
swapped out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're not THAT strapped for space that we can't burn one extra bit for
a boolean. If we're really worried about it, we can always shrink the
flags field to 16 bits because the kernel only uses 7 currently.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It has exactly one caller and we're about to change some of the dynamics
which would make this confusing as a separate function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us do less allocation because the anv_bo's are now embedded in
the sparse array and it also allows lock-free translation from GEM
handle to BO which will be useful in future commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From the MEDIA_VFE_STATE docs:
"Starting with this configuration, the Maximum Number of Threads must
be set to (#EU * 8) for GPGPU dispatches.
Although there are only 7 threads per EU in the configuration, the
FFTID is calculated as if there are 8 threads per EU, which in turn
requires a larger amount of Scratch Space to be allocated by the
driver."
It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern. The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.
Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.
Fixes: 5ac804bd9a ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move the Weston os_create_anonymous_file code from egl/wayland into util,
add support for Linux memfd and FreeBSD SHM_ANON,
use that code in anv/aubinator instead of explicit memfd calls for portability.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
When using softpin, the first allocation was not calculating the
padding and offset correctly for the case the first allocation needed
to grow. We were missing initialize the state.end right after
expanding the pool for the first time.
This is not a problem for non-softpin since there we don't use
leftover padding so the ends would re-arrange incrementally.
This fixes running dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 in
SKL -- the test uses a shader larger than the initial size for the
instruction pool.
Fixes: dfc9ab2ccd "anv/allocator: Add padding information."
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We didn't notice this issue much because the 2 struct share a similar
layout, expect for the additional fields...
We run into that issue in Anv :
==15236== Invalid write of size 8
==15236== at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211)
==15236== by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264)
==15236== by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312)
==15236== by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167)
==15236== by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190)
==15236== by 0x8CF60871: alloc_surface_state (anv_image.c:1122)
==15236== by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519)
==15236== by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358)
==15236== Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd
==15236== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15236== by 0x8D2578E6: u_vector_init (u_vector.c:47)
==15236== by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168)
==15236== by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921)
==15236== by 0x8CF56517: anv_CreateDevice (anv_device.c:1909)
==15236== by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073)
==15236== by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so)
==15236== by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so)
==15236== by 0x8BCB35C6: loader_create_device_chain (loader.c:5449)
==15236== by 0x8BCBC230: vkCreateDevice (trampoline.c:838)
v2: Rename mmap_cleanups to avoid confusion (Caio)
v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:
==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Accessing bo->map and then pool->center_bo_offset without a lock is
racy. One way of avoiding such race condition is to store the bo->map +
center_bo_offset into pool->map at the time the block pool is growing,
which happens within a lock.
v2: Only set pool->map if not using softpin (Jason).
v3: Move things around and only update center_bo_offset if not using
softpin too (Jason).
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Ian Romanick <idr@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
Fixes: fc3f588320
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If softpin is supported, create new BOs for the required size and add the
respective BO maps. The other main change of this commit is that
anv_block_pool_map() now returns the map for the BO that the given
offset is part of. So there's no block_pool->map access anymore (when
softpin is used.
v3:
- set fd to -1 on softpin case (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We are not going to use userptr for anv block pool BOs anymore. However,
so far we have been relying on the fact that userptr BOs are snooped on
non-llc platforms. Let's make sure that the block pool BOs are still
snooped, and we can also remove the clflush'ing that we do on all state
buffers.
And since we plan to remove the flushes, set the anv_bo_pool BOs to
cached (snooped on non-LLC platforms) too. For LLC platforms, they are
all cached by default, so this becomes a no-op.
v5:
- Add snooping to anv_bo_pool BOs too (Jason).
- Remove anv_gem_set_domain.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's possible that we still have some space left in the block pool, but
we try to allocate a state larger than that state. This means such state
would start somewhere within the range of the old block_pool, and end
after that range, within the range of the new size.
That's fine when we use userptr, since the memory in the block pool is
CPU mapped continuously. However, by the end of this series, we will
have the block_pool split into different BOs, with different CPU
mapping ranges that are not necessarily continuous. So we must avoid
such case of a given state being part of two different BOs in the block
pool.
This commit solves the issue by detecting that we are growing the
block_pool even though we are not at the end of the range. If that
happens, we don't use the space left at the end of the old size, and
consider it as "padding" that can't be used in the allocation. We update
the size requested from the block pool to take the padding into account,
and return the offset after the padding, which happens to be at the
start of the new address range.
Additionally, we return the amount of padding we used, so the caller
knows that this happens and can return that padding back into a list of
free states, that can be reused later. This way we hopefully don't waste
any space, but also avoid having a state split between two different
BOs.
v3:
- Calculate offset + padding at anv_block_pool_alloc_new (Jason).
v4:
- Remove extra "leftover".
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.
The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.
The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.
The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
v2:
- Include Jason's suggestions to the algorithm that returns chunks.
- Update comments.
v3:
- Disallow returning 0 blocks (Jason).
- fix min_size in the loop (Jason).
- remove temporary variables (Jason)
v4:
- return_chunk() should never return blocks larger than
pool->block_size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So far we use only one BO (the last one created) in the block pool. When
we switch to not use the userptr API, we will need multiple BOs. So add
code now to store multiple BOs in the block pool.
This has several implications, the main one being that we can't use
pool->map as before. For that reason we update the getter to find which
BO a given offset is part of, and return the respective map.
v3:
- Simplify anv_block_pool_map (Jason).
- Use fixed size array for anv_bo's (Jason)
v4:
- Respect the order (item, container) in anv_block_pool_foreach_bo
(Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Change block_pool->bo to be a pointer, and update its usage everywhere.
This makes it simpler to switch it later to a list of BOs.
v3:
- Use a static "bos" field in the struct, instead of malloc'ing it.
This will be later changed to a fixed length array of BOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
After switching to using anv_state_table, there are very few places left
still using pool->map directly. We want to avoid that because it won't
be always the right map once we split it into multiple BOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>