We've been missing this workaround for a while and since it's required
for Gen12, let's implement it for Gen9 first.
v2: Update comment for Gen9.
v3: Fix clearing of bits... (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3405>
This patch fixes the way_size_per_bank for Gen12+
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
This patch is required to fix 11K+ vulkan CTS failures we were
getting with way_size_per_bank of 4 (see next patch).
Thanks to Sagar Ghuge and Jordan Justen for all the hard work of
debugging and testing.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
The old code would only break at stride boundaries if the stride was
less than 32B; otherwise it would just break every 32B. This commit
makes it break at stride boundaries and 32B boundaries (starting from
the last stride). This makes reading large vertex buffers in aubinator
much nicer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
Otherwise, the validator tries to read the type of src1 of a SEND/SENDS
which doesn't actually have a type field. This prevents validation
issues in the next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
Previously, i965/iris tried to reuse the currently programmed URB config
if it was good enough for BLORP, rather than reprogramming it each time.
However, this will make some things harder on Gen12+ and we've not seen
any performance impact from emitting URB more frequently in ANV.
This makes the blorp <-> driver interface a bit simpler on Gen7+ because
now all the driver has to do is to provide the L3$ config rather than
trying to hand off URB re-config to blorp.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
SML is no longer in the L3$ on Gen11+. It's not incredibly clear from
the docs but no Gen11 platforms are in the list of platforms on which
this bit exists. Also, we've been always setting it false on Gen11 in
ANV and i965 thanks to GEN_L3P_SLM being zero with no ill effects.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
According to the BSpec, this should prevent hangs when using shaders
with large URB entries. A more precise fix can be done but it requires
re-arranging URB setup.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
Because softpin block pools are made up of a set of BOs with different
maps, it was possible for a single state to end up straddling blocks.
To fix this, we pass a contiguous size to anv_block_pool_grow and it
ensures that the next allocation in the pool will have at least that
size.
We also add an assert in anv_block_pool_map to ensure we always get
contiguous maps. Prior to the changes to anv_block_pool_grow, the unit
tests failed with this assert. With this patch, the tests pass.
This was causing problems on Gen12 where we allocate the pages for the
AUX table from the dynamic state pool. The first chunk, which gets
allocated very early in the pool's history, is 1MB which was enough that
it was getting multiple BOs. This caused the gen_aux_map code to write
outside of the map and overwrite the instruction state pool buffer which
lead to GPU hangs.
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We intentionally throw away all but one BT block but then we set
cmd_buffer->bt_block to ANV_STATE_NULL instead of the one we hung on to.
This causes the command buffer to immediately re-emit STATE_BASE_ADDRESS
the first time a BT is needed for no good reason.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Before flushing the instruction cache with a pipe control, we need to
use a CS Stall pipe control.
Ref: GEN:BUG:1409226450
Rework: Add stall-at-scoreboard (Lionel)
Rework: Merge with other anvil pre-invalidate stalls (Lionel)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3457>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3457>
In f132e0fddf, I attempted to allow BLORP to do CCS_E copies by using
the UNORM formats instead. However, the old BLORP bit-cast code could
only handle RGBA formats and asserted on anything other than UINT
formats. The reason we didn't catch this is because it only comes up on
Gen12 platforms which aren't in our normal CI yet.
Fixes: f132e0fddf "intel/blorp: Add support for CCS_E copies with..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3593>
If VK_QUERY_RESULT_WAIT_BIT is not set, there is currently no
special handling of unavailable queries in vkCmdCopyQueryPoolResults,
and anv will write an invalid value for the query result.
This commit updates vkCmdCopyQueryPoolResults for unavailable
queries to return 0 if the VK_QUERY_RESULT_PARTIAL_BIT flag is set
and if not, skip writing altogether.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
Currently, fetching the partial results (VK_QUERY_RESULT_PARTIAL_BIT)
of an unavailable occlusion query via vkGetQueryPoolResults can
return invalid values. anv returns slot.end - slot.begin, but in the
case of unavailable queries, slot.end is still at the initial value
of 0. If slot.begin is non-zero, the occlusion count underflows to
a value that is likely outside the acceptable range of the partial
result.
This commit fixes vkGetQueryPoolResults by always returning 0 if the
query is unavailable and the VK_QUERY_RESULT_PARTIAL_BIT is set.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
This cuts away dependency to libgrallocusage.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3532>
v2: (Francisco Jerez)
- Drop vec4 changes.
- Handle explicit acc0 operand and implicit one.
- Make sure instruction is SIMD16, prediction is off and default mask
control set to true.
v3: (Francisco Jerez)
- Clear accumulator only when it's written.
- Use BRW_MASK_DISABLE instead of true.
- Use correct width for brw_acc_reg().
- Fix last_inst_offset.
v4: (Francisco Jerez)
- Don't check for last instruction for accummulator write.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
We were applying row pitch constraint of CCS surfaces to linear
surfaces. But CCS is only supported in linear tiling under some
condition (more on that in the following commit). So let's drop that
requirement for now.
Fixes a bunch of crucible assert where the byte size of a linear image
is expected to be similar to the byte size of buffer for the same
extent in the following category :
func.miptree.r8g8b8a8-unorm.aspect-color.view-2d.*download-copy-with-draw.*
v2: Move restriction to isl_calc_tiled_min_row_pitch()
v3: Move restrinction to isl_calc_row_pitch_alignment() (Jason)
v4: Update message (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 07e16221d9 ("isl: Round up some pitches to 512B for Gen12's CCS")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3551>
Gen12 does not support RENDER_SURFACE_STATE::SurfaceArray = true &&
RENDER_SURFACE_STATE::Depth = 0. SurfaceArray can only be set to true
if Depth >= 1.
We workaround this limitation by adding the max(value, 1) snippet in
the shaders on the 3 components for texture array sizes.
Tested on Gen9 with the following Vulkan CTS tests :
dEQP-VK.image.image_size.2d_array.*
v2: Drop debug print (Tapani)
Switch to GEN:BUG instead of Wa_
v3: Fix dEQP-VK.image.image_size.1d_array.* cases (Lionel)
v4: Fix dEQP-VK.glsl.texture_functions.query.texturesize.* cases
(Missing tex_op handling) (Lionel)
v5: Missing break statement (Lionel)
v6: Fixup comment (Tapani)
v7: Fixup comment again (Tapani)
v8: Don't use sample_dim as index (Jason)
Rename pass
Simplify control flow
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v7)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Some of the smaller bit-size formats which support CCS_E don't have a
UINT representative in their compression class. However, we should be
able to use UNORM just fine and still get bit-exact copies. We just
have to do a conversion to/from UNORM when we bitcast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3554>
The previous way we were attempting to handle AUX tables on TGL-LP was
very GL-like. We used the same aux table management code that's shared
with iris and we updated the table on image create/destroy. The problem
with this is that Vulkan allows multiple VkImage objects to be bound to
the same memory location simultaneously and the app can ping-pong back
and forth between them in the same command buffer. Because the AUX
table contains format-specific data, we cannot support this ping-pong
behavior with only CPU updates of the AUX table.
The new mechanism switches things around a bit and instead makes the aux
data part of the BO. At BO creation time, a bit of space is appended to
the end of the BO for AUX data and the AUX table is updated in bulk for
the entire BO. The problem here, of course, is that we can't insert the
format-specific data into the AUX table at BO create time.
Fortunately, Vulkan has a requirement that every TILING_OPTIMAL image
must be initialized prior to use by transitioning the image from
VK_IMAGE_LAYOUT_UNDEFINED to something else. When doing the above
described ping-pong behavior, the app has to do such an initialization
transition every time it corrupts the underlying memory of the VkImage
by using it as something else. We can hook into this initialization and
use it to update the AUX-TT entries from the command streamer. This way
the AUX table gets its format information, apps get aliasing support,
and everyone is happy.
One side-effect of this is that we disallow CCS on shared buffers.
We'll need to fix this for modifiers on the scanout path but that's a
task for another patch. We should be able to do it with dedicated
allocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
All they do now is take a size, align, and flags and figure out which
heap to allocate in. All of the actual code to deal with the BO is in
anv_allocator.c. We want to leave anv_vma_alloc/free in anv_device.c
because it deals with API-exposed heaps so it still makes sense to have
it there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This commit moves it in with all the other cache invalidation operations
as if it were done by PIPE_CONTROL even though it's a pair of register
writes. This means we only have to write the GFX_AUX_TABLE_BASE_ADDR
register once at device initialization instead of every invalidate.
Invalidates are now a single LRI instead of two.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
We compute the same thing with the same variable name at the top of the
function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This breaks add_mapping() into three pieces:
1. get_aux_entry() adds AUX-TT pages as needed and returns the
L1 entry index, L1 entry address, and L1 entry map.
2. gen_aux_map_format_bits_for_isl_surf() computes the format-
specific information that goes in the AUX-TT entry.
3. add_mapping() is a lot dumber function that now just adds the
requested mapping with the requested format bits.
This lets us break out some additional helpers in the API which we want
to use for more direct AUX-TT management in ANV.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Pass down stencil data from the subpass attachment like we do
elsewhere. Only stencil attachments will make use of it.
Fixes warnings like
../src/intel/vulkan/genX_cmd_buffer.c: In function ‘cmd_buffer_begin_subpass’:
../src/intel/vulkan/genX_cmd_buffer.c:4656:41: warning: ‘target_stencil_layout’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4656 | att_state->current_stencil_layout = target_stencil_layout;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>