Previously, i965/iris tried to reuse the currently programmed URB config
if it was good enough for BLORP, rather than reprogramming it each time.
However, this will make some things harder on Gen12+ and we've not seen
any performance impact from emitting URB more frequently in ANV.
This makes the blorp <-> driver interface a bit simpler on Gen7+ because
now all the driver has to do is to provide the L3$ config rather than
trying to hand off URB re-config to blorp.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
SML is no longer in the L3$ on Gen11+. It's not incredibly clear from
the docs but no Gen11 platforms are in the list of platforms on which
this bit exists. Also, we've been always setting it false on Gen11 in
ANV and i965 thanks to GEN_L3P_SLM being zero with no ill effects.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
According to the BSpec, this should prevent hangs when using shaders
with large URB entries. A more precise fix can be done but it requires
re-arranging URB setup.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
Because softpin block pools are made up of a set of BOs with different
maps, it was possible for a single state to end up straddling blocks.
To fix this, we pass a contiguous size to anv_block_pool_grow and it
ensures that the next allocation in the pool will have at least that
size.
We also add an assert in anv_block_pool_map to ensure we always get
contiguous maps. Prior to the changes to anv_block_pool_grow, the unit
tests failed with this assert. With this patch, the tests pass.
This was causing problems on Gen12 where we allocate the pages for the
AUX table from the dynamic state pool. The first chunk, which gets
allocated very early in the pool's history, is 1MB which was enough that
it was getting multiple BOs. This caused the gen_aux_map code to write
outside of the map and overwrite the instruction state pool buffer which
lead to GPU hangs.
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We intentionally throw away all but one BT block but then we set
cmd_buffer->bt_block to ANV_STATE_NULL instead of the one we hung on to.
This causes the command buffer to immediately re-emit STATE_BASE_ADDRESS
the first time a BT is needed for no good reason.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Before flushing the instruction cache with a pipe control, we need to
use a CS Stall pipe control.
Ref: GEN:BUG:1409226450
Rework: Add stall-at-scoreboard (Lionel)
Rework: Merge with other anvil pre-invalidate stalls (Lionel)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3457>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3457>
If VK_QUERY_RESULT_WAIT_BIT is not set, there is currently no
special handling of unavailable queries in vkCmdCopyQueryPoolResults,
and anv will write an invalid value for the query result.
This commit updates vkCmdCopyQueryPoolResults for unavailable
queries to return 0 if the VK_QUERY_RESULT_PARTIAL_BIT flag is set
and if not, skip writing altogether.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
Currently, fetching the partial results (VK_QUERY_RESULT_PARTIAL_BIT)
of an unavailable occlusion query via vkGetQueryPoolResults can
return invalid values. anv returns slot.end - slot.begin, but in the
case of unavailable queries, slot.end is still at the initial value
of 0. If slot.begin is non-zero, the occlusion count underflows to
a value that is likely outside the acceptable range of the partial
result.
This commit fixes vkGetQueryPoolResults by always returning 0 if the
query is unavailable and the VK_QUERY_RESULT_PARTIAL_BIT is set.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
This cuts away dependency to libgrallocusage.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3532>
The previous way we were attempting to handle AUX tables on TGL-LP was
very GL-like. We used the same aux table management code that's shared
with iris and we updated the table on image create/destroy. The problem
with this is that Vulkan allows multiple VkImage objects to be bound to
the same memory location simultaneously and the app can ping-pong back
and forth between them in the same command buffer. Because the AUX
table contains format-specific data, we cannot support this ping-pong
behavior with only CPU updates of the AUX table.
The new mechanism switches things around a bit and instead makes the aux
data part of the BO. At BO creation time, a bit of space is appended to
the end of the BO for AUX data and the AUX table is updated in bulk for
the entire BO. The problem here, of course, is that we can't insert the
format-specific data into the AUX table at BO create time.
Fortunately, Vulkan has a requirement that every TILING_OPTIMAL image
must be initialized prior to use by transitioning the image from
VK_IMAGE_LAYOUT_UNDEFINED to something else. When doing the above
described ping-pong behavior, the app has to do such an initialization
transition every time it corrupts the underlying memory of the VkImage
by using it as something else. We can hook into this initialization and
use it to update the AUX-TT entries from the command streamer. This way
the AUX table gets its format information, apps get aliasing support,
and everyone is happy.
One side-effect of this is that we disallow CCS on shared buffers.
We'll need to fix this for modifiers on the scanout path but that's a
task for another patch. We should be able to do it with dedicated
allocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
All they do now is take a size, align, and flags and figure out which
heap to allocate in. All of the actual code to deal with the BO is in
anv_allocator.c. We want to leave anv_vma_alloc/free in anv_device.c
because it deals with API-exposed heaps so it still makes sense to have
it there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This commit moves it in with all the other cache invalidation operations
as if it were done by PIPE_CONTROL even though it's a pair of register
writes. This means we only have to write the GFX_AUX_TABLE_BASE_ADDR
register once at device initialization instead of every invalidate.
Invalidates are now a single LRI instead of two.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
We compute the same thing with the same variable name at the top of the
function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Pass down stencil data from the subpass attachment like we do
elsewhere. Only stencil attachments will make use of it.
Fixes warnings like
../src/intel/vulkan/genX_cmd_buffer.c: In function ‘cmd_buffer_begin_subpass’:
../src/intel/vulkan/genX_cmd_buffer.c:4656:41: warning: ‘target_stencil_layout’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4656 | att_state->current_stencil_layout = target_stencil_layout;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Previously, we set aux_usage=ISL_AUX_USAGE_NONE when we really meant
CCS_D. This sort-of made sense before we had anv_layout_to_aux_usage
but now that we have that helper. However, in our more modern aux
tracking model, all aux usage goes through anv_layout_to_* and we're
better off making the meaning of anv_image::planes[]::aux_usage be
AUX_USAGE_NONE if and only if there is no compression.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3556>
This commit makes two changes:
1. We set pending_pipe_bits instead of emitting PIPE_CONTROL directly
for the flush at the end of cmd_buffer_begin_subpass.
2. Because BLORP ops such as vkCmdClearAttachments may come in the
middle of a render pass, we have to also flag the need for a cache
flush after the blorp op.
Fixes: 185630c6bc "anv/blorp: Do the gen11 BTI flush"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
Most places we actually know the usage and can provide it. There are
two exceptions to this:
1. We pass 0 into get_blorp_surf_for_anv_image when we use
ANV_IMAGE_LAYOUT_EXPLICIT_AUX because anv_layout_to_aux_usage is
never actually called so it doesn't matter.
2. We pass 0 into anv_layout_to_aux_usage in transition_color_buffer.
However, the coming commits which will begin using the usage
parameter only care about depth.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
Rather than looking at the aux usage, we look at the isl_aux_state which
provides us with more detailed information. This commit adds a couple
helpers to isl which let us quickly determine if we have valid depth/hiz
on the initial layout and if we need valid depth/hiz for the final
layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
This new helper maps VkImageLayout enums to isl_aux_state enums which
are the hardware's concept of image layouts. We can then use the aux
state to get the fast clear type and the aux usage. This should yield
no functional change in driver behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
As of 52ad1712ed, TRANSFER_SRC_OPTIMAL and SHADER_READ_ONLY_OPTIMAL
are now identical for depth buffers so there's no reason why we need to
use the "wrong" layout. Technically, according to Vulkan, blits and
MSAA resolves are transfer ops so we should use the transfer layout now
that we can.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
Currently only implemented in the scalar backend, so only enable for
Gen8+. If support for the other opcodes is added to the vec4 backend,
Gen7 could be supported.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
As a result of 9baa33cef0 our backend compiler leaves params pretty
much untouched. So in order to avoid storing uninitialized values in
the shader cache blobs, just 0 out this array.
I've considered not even allocating this array which works on gen8+
but the vec4 backend still makes a copy of this array and so it
crashes on memcpy on HSW.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9baa33cef0 ("anv: Rework push constant handling")
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3516>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3516>
Instead of having a single physical device in anv_instance, have a
linked list of them. What we have now works today because we our GPUs
are build into the CPU and so you're guaranteed to only ever have one of
them. One day, that will change and we want ANV to be ready.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
This commit simply moves fetching the device info and checking if ANV
supports the device a bit higher up. This way we fail earlier and it'll
make error checking easier in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
We don't actually have genX versions of any physical device level
commands so we don't need the trampoline versions and we don't need to
have a separate table per physical device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
There are very few times when we actually want to fetch the instance
from the anv_device. We can put up with a bit of pain there in exchange
for strongly discouraging people from doing this in general.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>