In the fast-link case, we should just use the library shader and assume
that it's correct. We have to do this because we may not have a
precompiled shader in this case so we can't even generate the shader
hash.
In the link optimization case, we could still have a library shader
coming in from some library pipeline. That shader may happen to be
correct, in which case we can just use it and not even bother digging
around in the cache. In the more likely case, the keys won't match and
we should throw it away before we look up a different shader in the
cache and leak our reference.
Fixes: 9308e8d90d ("vulkan: Add generic graphics and compute VkPipeline implementation")
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27860>
KMD needs to be able to read and write the clear-color from CPU.
i915 can workaround it but Xe KMD will reject page flips with
clear-color bos that can be read from CPU.
So here it make sure that bos with the clear color information
are placed in a lmem portion that is CPU-visible, that is important in
PCIe small bar systems.
And as CCS in discrete GPUs are only supported in lmem this bo can't
become a IRIS_HEAP_DEVICE_LOCAL_PREFERRED(lmem + smem).
So here the IRIS_HEAP_DEVICE_LOCAL_CPU_VISIBLE_SMALL_BAR heap is selected.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26700>
This is intented to be used in cases where BOs can only be placed in
lmem but needs to accesible from CPU side.
Actual usage will be added in the next patch, here just adding the
heap type and all the handling.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26700>
../src/intel/nullhw-layer/intel_nullhw.c: In function ‘new_device_data’:
../src/intel/nullhw-layer/intel_nullhw.c:65:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
65 | #define HKEY(obj) ((uint64_t)(obj))
| ^
../src/intel/nullhw-layer/intel_nullhw.c:193:15: note: in expansion of macro ‘HKEY’
193 | map_object(HKEY(data->device), data);
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27854>
Input attachments which read GMEM via the UCHE aperture need to
flush the blit cache on A7XX and wait for the writes to land, this
implements it as access flags and a pending flush with special
semantics.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
This register is set by the proprietary driver along with other VSC state
for binning, the stale value of this register set by the prop driver was
being used by Turnip resulting in crashes that were exclusive to Android
due to only running the prop driver alongside Turnip there.
The fix is to emit this new register alongside all other VSC state inside
the `update_vsc_pipe` function.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
A7XX needs the CCU blit caches to be flushed before a CP_BLIT to
ensure any writes from a CP_EVENT_WRITE::BLIT have landed, without
this the source buffer may have an incomplete load/clear when the
2D blit starts resulting in what's written out being broken.
The corruption can be seen with GMEM passes using CP_BLIT especially
when forced using `TU_DEBUG=gmem,unaligned_store`.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
On A7XX, A6XX_RB_CCU_CNTL was broken into two registers, A7XX_RB_CCU_CNTL which
has static properties that can be set once, this requires a WFI to take effect.
As a result, it's now set during `tu6_hw_init` rather than being set every time.
While the newly introduced register A7XX_RB_CCU_CNTL2 has properties that may
change per-RP and don't require a WFI to take effect, only CCU inval/flush
events are required. This is now the only register set in `emit_rb_ccu_cntl`.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
LRZ wasn't entirely disabled due to the register `A7XX_GRAS_LRZ_DEPTH_BUFFER_INFO`
not being set to `0` in all circumstances, this register affects rendering even
when LRZ is disabled so needs to be set to `0` until LRZ is properly implemented.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
A7XX has corruption when 2D blits are performed on D24S8 images
from GMEM when the source format is FMT6_8_8_8_8_UNORM, this is
fixed by using FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8.
Fixes VK-CTS: dEQP-VK.pipeline.monolithic.multisample.misc.*
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
These were broken due to the new window offset register not being
set for every tile, even with this the 2D blit path is broken for
MSAA D24S8 resolves but since outside of FDM that should be handled
by the event blit path it's not a major concern but should be fixed.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
This seemingly works on A7XX with no issues and the comment there
prior suggests that it should work on A6XX so this case is now
allowed to go through the event blit rather than the slow path.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
The CCU layout logic needed to match the full `use_fast_path` case
in `tu_store_gmem_attachment`, not just unaligned but also for the
stencil storage logic.
The current code works since depth/stencil formats are forced to use the
slow path by `blit_can_resolve`. However, that will be removed since only
seperate stencil stores are unable to use the fast path while combined
stores can use it without any issues. This change prevents a regression
due to no longer choosing the sysmem CCU layout for seperate stencil
stores when fast-path resolves are allowed for DS formats.
Fixes VK-CTS cases (when fast-path stores for DS formats are enabled):
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d24_unorm_s8_uint.compatibility_depth_zero_stencil_zero_testing_stencil
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d24_unorm_s8_uint_separate_layouts.compatibility_depth_zero_stencil_zero_testing_stencil
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
Only a fraction of GMEM was being used by the color CCU even in
sysmem mode where it would go unused aside from the portion used by
the depth CCU. This can help with color CCU bottlenecks on both
A6XX and A7XX.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
The tile align size was incorrect resulting in certain invalid bins
being selected that would cause rendering to entirely break down. In
addition, the maximum tile size has been further increased on A7XX.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>