If we are on gen8+ and have context isolation support, just make that
constant buffer address be absolute, so we can use it for push UBOs too.
v2: Do not duplicate constant_buffer_0_is_relative flag (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the application asks for the maximum number of fragment input
components (128), use all of them plus some builtins that are
passed in the VUE, then we exceed the maximum number of used VUE
slots (32) and we break one assert that checks this limit.
Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1
builtins in brw_compute_vue_map() because we don't know if
gl_ClipDistance is going to be read/write by an adjacent stage.
Fixes VK-GL-CTS CL#2569.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Jason Ekstrand):
- Break up Scott's mega-patch
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Now that we've done all that refactoring, addresses are now being
directly written into surface states by ISL and BLORP whenever a BO is
pinned so there's really nothing to do besides enable it.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
It's safer to set them there because we have the opportunity to properly
handle combining flags if a BO is imported more than once.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Soft pinning lets us satisfy the binding table address
requirements without using both sides of a growing state_pool.
If you do use both sides of a state pool, then you need to read
the state pool's center_bo_offset (with the device mutex held) to
know the final offset of relocations that target the state pool
bo.
By having a separate pool for binding tables that only grows in
the forward direction, the center_bo_offset is always 0 and
relocations don't need an update pass to adjust relocations with
the mutex held.
v2: - don't introduce a separate state flag for separate binding tables (Jason)
- replace bo and map accessors with a single binding_table_pool accessor (Jason)
v3: - assert bt_block->offset >= 0 for the separate binding table (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The state_pools reserve virtual address space of the full
BLOCK_POOL_MEMFD_SIZE, but maintain the current behavior of
growing from the middle.
v2: - rename block_pool::offset to block_pool::start_address (Jason)
- assign state pool start_address statically (Jason)
v3: - remove unnecessary bo_flags tampering for the dynamic pool (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Instead of storing a BO and offset separately, use an anv_address. This
changes anv_fill_buffer_surface_state to use anv_address and we now call
anv_address_physical and pass that into ISL.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
These will be used to assign virtual addresses to soft pinned
buffers in a later patch.
Two allocators are added for separate 'low' and 'high' virtual
memory areas. Another alternative would have been to add a
double-sided allocator, which wasn't done here just because it
didn't appear to give any code complexity advantages.
v2 (Scott Phillips):
- rename has_exec_softpin to use_softpin (Jason)
- Only remove bottom one page and top 4 GiB from virt (Jason)
- refer to comment in anv_allocator about state address + size
overflowing 48 bits (Jason)
- Mention hi/lo allocators vs double-sided allocator in
commit message (Chris)
- assign state pool memory ranges statically (Jason)
v3 (Jason Ekstrand):
- Use (LOW|HIGH)_HEAP_(MIN|MAX)_ADDRESS rather than (1 << 31) for
determining which heap to use in anv_vma_free
- Only return de-canonicalized addresses to the heap
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
This rollbacks the revert of this patch introduced with
commit 7cf284f18e.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initially, I didn't understand this feature. Turns out that all it
means is that you can switch multisample rates in the middle of a
zero-attachment subpass. We've been able to do this since forever.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The previous logic of the supports_48b_addresses wasn't actually
checking if i915.ko was running with full_48bit_ppgtt. The ENOENT
it was checking for was actually coming from the invalid context
id provided in the test execbuffer. There is no path in the
kernel driver where the presence of
EXEC_OBJECT_SUPPORTS_48B_ADDRESS leads to an error.
Instead, check the default context's GTT_SIZE param for a value
greater than 4 GiB
v2 (Ken): Fix in i965 as well.
v3 Check GTT_SIZE instead of HAS_ALIASING_PPGTT (Chris Wilson)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch enables the Vulkan driver on Ice Lake h/w
with added warning about preliminary support.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Store the default clear address for HiZ fast clears on a global bo, and
point to it when needed.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We only get VK_SUCCESS if it was initialized, but apparently my compiler
doesn't track that far.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This requires us to bump the subgroup size to 32 for all shader stages
because Vulkan requires that to be a physical device query.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
From the Vulkan 1.1 spec:
"Vulkan 1.0 implementations were required to return
VK_ERROR_INCOMPATIBLE_DRIVER if apiVersion was larger than 1.0.
Implementations that support Vulkan 1.1 or later must not return
VK_ERROR_INCOMPATIBLE_DRIVER for any value of apiVersion."
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is not strictly necessary since users should not be requesting any
flags that are not valid for the list of enabled features requested and
we already fail if they attempt to use an unsupported feature, however
it is an easy to implement sanity check that would help developes realize
that they are doing things wrong, so we might as well do it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From the Vulkan 1.1 spec, VkDeviceQueueInfo2 structure:
"The queue returned by vkGetDeviceQueue2 must have the same flags value
from this structure as that used at device creation time in a
VkDeviceQueueCreateInfo instance. If no matching flags were specified
at device creation time then pQueue will return VK_NULL_HANDLE."
For us this means no flags at all since we don't support any.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This belongs to the protected memory feature but there's nothing about
it that's specific to protected memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This advertises the VK_KHR_shader_draw_parameters functionality as a
"core optimal feature" in Vulkan 1.1.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This requires us to rename any Vulkan API entrypoints which became core
in 1.1 to no longer have the KHR suffix.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.
The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.
But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:
surface_size = isl_align(buffer_size, 4) +
(isl_align(buffer_size, 4) - buffer_size)
So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:
buffer_size = (surface_size & ~3) - (surface_size & 3)
It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.
v2: (Jason Ekstrand)
Move padding logic fron anv to isl_surface_state.
Move calculus of original size from spirv to driver backend.
v3: (Jason Ekstrand)
Rename some variables and use a similar expresion when calculating.
padding than when obtaining the original buffer size.
Avoid use of unnecesary component call at brw_fs_nir.
v4: (Jason Ekstrand)
Complete comment with buffer size calculus explanation in brw_fs_nir.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We don't zalloc the physical device so we need to unconditionally set
everything. Crucible helpfully initializes all allocations to 139 so it
was getting true regardless of whether or not the kernel actually
supports context priorities.
Fixes: 6d8ab53303 "anv: implement VK_EXT_global_priority extension"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
anv_gem_set_context_param is to be used directly instead!
Fixes: 6d8ab53303 "anv: implement VK_EXT_global_priority extension"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: add ANV_CONTEXT_REALTIME_PRIORITY (Chris)
use unreachable with unknown priority (Samuel)
v3: add stubs in gem_stubs.c (Emil)
use priority defines from gen_defines.h
v4: cleanup, add anv_gem_set_context_param (Jason)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (v2)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
For a bit there, we had a bug in i965 where it ignored the tiling of the
modifier and used the one from the BO instead. At one point, we though
this was best fixed by setting a tiling from Vulkan. However, we've
decided that i965 was just doing the wrong thing and have fixed it as of
5048572352.
The old assumptions also affected the solution we used for legacy
scanout in Vulkan. Instead of treating it specially, we just treated it
like a modifier like we do in GL. This commit goes back to making it
it's own thing so that it's clear in the driver when we're using
modifiers and when we're using legacy paths.
v2 (Jason Ekstrand):
- Rename legacy_scanout to needs_set_tiling
Reviewed-by: Daniel Stone <daniels@collabora.com>
The loop goes through the list of enabled extensions marking them as
enabled in the list, but this relies on every other extension being
initialized to false by default.
This bug would make us, for example, advertise certain device extension
entry points as available even when the corresponding extensions had
not been enabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: abc62282b5 "anv: Add a per-device table of enabled extensions"
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Technically, the Vulkan spec requires that we return valid entrypoints
for all core functionality and any available device extensions. This
means that, for gen-specific functions, we need to return a trampoline
which looks at the device and calls the right device function. In 99%
of cases, the loader will do this for us but, aparently, we're supposed
to do it too. It's a tiny increase in binary size for us to carry this
around but really not bad.
Before:
text data bss dec hex filename
3541775 204112 6136 3752023 394057 libvulkan_intel.so
After:
text data bss dec hex filename
3551463 205632 6136 3763231 396c1f libvulkan_intel.so
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>