This works around graphics corruption seen on MTL and DG2 platforms
when sampling from a HIZ-CCS depth surface that was previously fast
cleared and resolved for sampling. Apparently full resolves no longer
guarantee that the CCS surface ends up in a pass-through state due to
the behavior of the L3 cache in presence of compressible data. In
order to work around the problem this makes sure that we use a
CCS-enabled AUX mode for depth textures if the base surface has a CCS
control surface, even if we are instructed to use ISL_AUX_USAGE_NONE.
This appears to fix the corruption without the need to add extra L3
flushes after resolves (as was done in the Vulkan driver, see
5178ad761c).
v2: Use ISL_AUX_USAGE_HIZ_CCS_WT instead of ISL_AUX_USAGE_HIZ_CCS
usage to represent the requirements of sampling from a depth
surface (Nanley).
v3: Add some comments, remove redundant check, disallow creation of
ISL_AUX_USAGE_NONE surface state for depth sampler views since the
hardware is buggy (Nanley).
v4: Preserve use of ISL_AUX_STATE_CLEAR when fast-clearing a surface
(Nanley).
v5: Set ISL_AUX_STATE_COMPRESSED_NO_CLEAR state after clearing a HiZ
CCS WT resource on xe2+ (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
The clear color state has to be allocated since we will be sampling
from non-WT HiZ CCS depth surfaces without disabling compression.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
Documentation is kinda of ambiguos but at least gfx12.5 is allowed to
do hierarchial depth buffer write through for multi sampled surfaces.
BSpec: 46965
BSpec: 56419
Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
This state is helpful to track when resolves are needed for HiZ-CCS
non-WT surfaces, since the ISL_AUX_STATE_COMPRESSED_* states that
currently exist don't distinguish between the CCS and the HiZ surfaces
being in a non-passthrough compression state, so we would have had to
pre-emptively issue a resolve before sampling from any
ISL_AUX_STATE_COMPRESSED_* HiZ-CCS surface just in case its HiZ
surface has non-trivial contents, even if its HiZ surface is in
pass-through state and the surface only has non-trivial CCS
compression.
This commit introduces a new ISL_AUX_STATE_COMPRESSED_HIER_DEPTH state
that indicates that the hierarchical depth surface has non-trivial
contents that have to be considered to get a complete representation
of the image. While in this state the surface may also have
fast-cleared blocks. The pre-existing ISL_AUX_STATE_COMPRESSED_*
states now unambiguously indicate that the HiZ surface is in an
identity state, so it's unnecessary to obtain a complete
representation of the image e.g. while sampling from a HiZ-CCS depth
surface.
v2: Use more abstract aux state name instead of
ISL_AUX_STATE_COMPRESSED_HIZ, don't transition legacy HIZ surfaces
to new aux state on write by using COMPRESS write behavior instead
of COMPRESS_HIZ (Nanley).
v3: Comment clarifications (Nanley).
v4: Re-apply change to transition legacy HIZ surfaces to new aux state
on write by using COMPRESS_HIZ for consistent semantics of the aux
state irrespective of the aux usage, this is particularly
important because the HIZ aux usage coexists with HIZ_CCS in some
platforms, so pretending write_behavior is just "COMPRESS" for HIZ
as on v2 would cause the ISL_AUX_STATE_COMPRESSED_CLEAR state to
have different meaning and require different handling depending on
the aux usage that was used with the surface before.
v5: Additional comment clarifications, express aux_state_possible()
result and isl_aux_prepare_access() check in terms of
aux_usage_info::write_behavior (Nanley). Move changes in behavior
for ISL_AUX_STATE_CLEAR from future ISL partial resolve commit
into this commit since the change is already required for
correctness as part of the split of hierarchical depth states.
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31139>
Use fd after dup instead of the one before dup to avoid
drm_syncobj_find failed in guest kernel when dev is found in
dev_list.
When dev is not found in dev_list, it uses device fd which is
duplicated, to init sync provider. And when it's found, the same
device fd should be used. Otherwise, it would caused inconsistency
and failures like in the Android domU CTS test where the guest
kernel attempts to locate a syncobj. This occurs because
vdrm_device_connect and VIRTGPU_EXECBUFFER ioctl use fd after dup
while util_sync_provider_drm uses the one before dup.
The fix has been validated with the CtsSdkSandboxWebkitTestCases in
Android domU, and the previously failing test cases no longer occur.
Signed-off-by: Ruitang.Wang@amd.com
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39520>
This commit adds support for masked clear operations in the BLT path,
allowing partial clears of specific color channels and stencil bits.
For color clears, calculate which bits to clear based on the clear_mask
by examining the format's channel layout. The clear_bits field is now
set according to the mask instead of clearing all channels.
For stencil clears, use the clear_mask parameter through to mask the
stencil bits in the S8_UINT_Z24_UNORM format path, which was previously
hardcoded to 0xff.
Update etna_blt_will_fastclear() to check that clear_mask is 0xf (all
channels) before allowing fast clear, since masked clears require the
full clear path.
Enable the clear_masked capability when BLT is available and the
BLT_64bpp_MASKED_CLEAR_FIX cap is supported.
Passes the following dEQPs:
- dEQP-GLES2.functional.*_clear.*masked*
- dEQP-GLES3.functional.*_clear.*masked*
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31512>
Add a new PIPE_CAP_CLEAR_MASKED capability that allows drivers to
handle buffer clears with color and stencil masks directly, instead
of falling back to drawing a quad in Mesa.
This patch introduces several changes:
1. Add the new pipe cap PIPE_CAP_CLEAR_MASKED to pipe_defines.h and
document it in the Gallium screen documentation.
2. Add color_clear_mask and stencil_clear_mask parameters to the
pipe_context::clear() hook:
- color_clear_mask (uint32_t): contains 4 color mask bits per draw buffer
(max 8 buffers = 32 bits)
- stencil_clear_mask (uint8_t): contains the stencil write mask (8 bits)
3. Update the state tracker to use the masked clear path when the
driver supports it:
- Pass ctx->Color.ColorMask for color buffer clears
- Pass ctx->Stencil.WriteMask for stencil clears
- Allow both color and stencil clears to avoid the quad path when
masks are present and the driver advertises support
4. Update all existing driver clear() hooks to accept the new
color_clear_mask and stencil_clear_mask parameter.
This optimization allows drivers that can efficiently handle masked
clears in hardware to do so, improving performance for applications
that frequently clear buffers with masks enabled.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31512>
Enable CCS with Ys on all systems, and with Yf on gfx9-11.
Unfortunately, Yf + CCS isn't supported on gfx12. Tests fail and systems
hang in the CI with this enabled. The simulator also complains about
this combination on tests such as:
dEQP-VK.api.image_clearing.core.clear_color_attachment.multiple_layers.r4g4b4a4_unorm_pack16
dEQP-VK.api.image_clearing.core.clear_color_attachment.single_layer.r4g4b4a4_unorm_pack16_200x180_sample_count_2
The simulator doesn't complain about this combination on depth/stencil
surfaces, but actual hardware still has issues with this.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11057
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
BSpec 46969 (r45602) tells us that we get no fast-clears for 3D:
3D/Volumetric surfaces do not support Fast Clear operation.
For Y-tiled surfaces, we work around this in BLORP with
convert_rt_from_3d_to_2d(). However, that function doesn't support Ys-tiling.
We could modify our surface redescription code paths to support clearing
entire Ys tiles, but we choose to hold off on the added complexity until
we have a use-case.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
We'll use isl_surf_supports_ccs() in a scenario in which we want to
check for CCS support without creating a HIZ or MCS surface beforehand.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
When choosing between the suggested tilings, create one of each allowed
and pick the smallest one. One benefit of using the standard tilings is
that miptails can avoid space waste in mipmapped compressed textures.
From the ICL PRM, Volume 5: Memory Data Formats, "MIP Layout":
If Tiling is enabled, then each MIP is layed out using one or more
tiles. If TileYf or TileYs tiling is enabled (TR_MODE != NONE), then
some of the MIPs may actually be stored in a MIPTail which fits in a
single 64K or 4K tile. The layout above, then only applied to MIPs
which are not packed in the MIP Tail. Note that, depending on surface
height the Vertical Alignment that surface can actually have the last
few mips layed out below LOD1. Using MIP Tail (if supported)
eliminates this possibility.
In the performance CI, this helps:
* Hogwarts Legacy on DG2 by 0.64%
* Satisfactory on BMG by 0.89%
* Wukong on BMG by 0.77%
Highlights on memory saved by using Tile64 from at most 10k frames in
game traces on DG2:
* Hogwarts. 32 instances of:
Saved 128 4KB page(s). extent=4096x4096x1 dim=2d levels=13 fmt=BC7_UNORM
* Assassin's Creed. 8 instances of:
Saved 768 4KB page(s). extent=120x68x192 dim=3d levels=1 fmt=R16G16B16A16_FLOAT
* Black Ops 3. 3 instances of:
Saved 864 4KB page(s). extent=172x140x288 dim=3d levels=1 fmt=BC6H_UF16
* God of War. 1 instance of:
Saved 1920 4KB page(s). extent=320x170x192 dim=3d levels=1 fmt=R16G16B16A16_FLOAT
This patch may cause regressions on SKL-TGL because the smaller surface
may not support compression. This will be fixed in a coming patch.
v2. Don't factor in the image alignments when comparing their sizes.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14074
Reviewed-by: Rohan Garg <rohan.garg@intel.com> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
Does nothing for now. This will be used in future patch where a
64K-aligned image may be selected over a 4K-aligned one.
Follows the alignment request behavior specified in
VkImageAlignmentControlCreateInfoMESA. Specifically, this preference
does not override attempts by ISL to enable compression.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
Prevent assert failures in a future commit where Tile64 will be selected
more often.
Fixes: 42ef23ecd1 ("intel/blorp: Don't redescribe some Tile64 clears")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
ISL's tiled-memcpy functions don't support Yf, Ys, and Tile64. Remove
those tilings when creating an image which will be used with host-image
copies.
The identical memory layout flag is checked by tests such as:
dEQP-VK.image.host_image_copy.identical_memory_layout.optimal.bc5_snorm_block
dEQP-VK.image.host_image_copy.query.linear.r16_unorm
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
We don't actually handle this case. The next patch will limit the amount
of tilings used when an image is created with
VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT. This prevents zink failures on DG2
for various multisampled test cases. For example:
arb_internalformat_query2-internalformat-size-checks -auto -fbo
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
The missing bits for correct operation with compressed textures and
multisampled textures were added in previous commits.
The issues with lossless compression and higher miptail slots seem to
affect 128bpb formats as well. However, we're only failing tests which
use compression (even if those tests never actually use the compression
format, just blorp_copy() up and down). Limit the workaround only to
compressed formats until we get more information/testing.
Tests:
dEQP-VK.api.copy_and_blit.core.image_to_buffer.3d_images.mip_copies_etc2_r8g8b8a8_unorm_block_16x8x24
dEQP-VK.pipeline.monolithic.sampler.view_type.3d.format.astc_10x6_unorm_block.mipmap.linear.lod.select_bias_3_1
dEQP-VK.api.copy_and_blit.core.image_to_buffer.2d_images.mip_copies_astc_12x12_unorm_block_64x192
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
This will be used to clarify some undocumented restrictions with 64bpb
and 128bpb formats. Changes include:
* Drop a redundant tiling check
* Restrict workarounds to the right ISL_SURF_DIM
* Handle the Yf case for the 2D workaround
* Implement a narrower workaround for the 3D workaround
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
Allow them in all cases except for one which prevents
dEQP-GLES31.functional.image_load_store.3d.atomic.xor_r32i_return_value
from hitting the following assertion on TGL:
convert_rt_from_3d_to_2d:
Assertion `!isl_tiling_is_std_y(info->surf.tiling)' failed.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
GL only allows atomics on R32 formats. So, for a shader which does
atomic operations, only decompress the bound R32-formatted images
instead of every image.
Aside from the performance improvement, explicitly limiting the formats
here makes it clear which formats may be resolved on gfx12.0. This helps
us to limit the scope of the Ys + 3D-dim restriction that will be added
in the next patch.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
ISL prevents certain tilings from being used on 3D shader images prior
to gfx12 due to an undocumented dataport issue. We're going to allow
these tilings soon, so increase use of the shader flag to make use of
ISL's workaround.
Test case:
arb_shader_image_load_store-layer
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
The BO may contain a surface that is tiled with a 64K tiling. Without
this change, the following piglit test assert fails on ICL:
ext_external_objects-vk-stencil-display -auto -fbo
The assertion is:
isl_gfx11_emit_depth_stencil_hiz_s: Assertion
`info->depth_address % info->depth_surf->alignment_B == 0' failed.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
Prevents the following piglit test from failing on DG2 when Tile64 is
force-enabled:
fbo-clear-formats GL_ARB_texture_rg -auto -fbo
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
When determining if an LOD can fit within a miptail, we must minify in
pixel space and then convert to elements.
Prevents the following test case from failing when Yf is force-enabled:
dEQP-VK.image.texel_view_compatible.graphic.extended.3d_image.texture_read.astc_8x5_srgb_block.r32g32b32a32_uint
Fixes: 46f45d62d1 ("intel/isl: Start using miptails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
This bit seems to affect whether the SKL or ICL swizzles are used for
multisampled surfaces.
Prevents the following test case from failing when Yf is force-enabled:
dEQP-VK.pipeline.monolithic.multisample.misc.dynamic_rendering.multi_renderpass.r8g8b8a8_unorm_r16g16b16a16_sfloat_r16g16b16a16_sint_d32_sfloat_s8_uint.random_203
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>
From the ICL PRMs Volume 5: Memory Data Formats, "Compressed
Multisampled Surfaces":
Tiling for CMS and UMS Surfaces
Multisampled CMS and UMS use a modified table from
non-mulitsampled 2D surfaces.
[...]
TileYS: In addition to u and v, the sample slice index “ss” is
included in the address swizzling according to the following
table.
[...]
TileYF: In addition to u and v, the sample slice index “ss” is
included in the address swizzling according to the following
table.
For depth/stencil surfaces with Yf/Ys tiling, don't use the MSAA
swizzles.
With the driver modified forced to prefer Ys/Yf for depth buffers, this
fixes 14 failing tests in the VK CTS group:
dEQP-VK.pipeline.monolithic.multisample.misc.clear*16x*
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38063>