Apparently, this is a major footgun since it is not uncommon for apps to
enable all the features exposed by a driver. Having UBWC disabled for
D24S8 can result in a major performance loss, and the reason can be hard
for devs to spot. This footgun is already known to have happened a few
times. Furthermore, disabling UBWC depending on a Vulkan feature being
requested broke D24S8 sharing via external memory when only one device
was created with customBorderColorWithoutFormat.
Fortunately, there is the depthStencilSwizzleOneSupport feature, which
was added after the above hardware deficiency was found and, when false,
forbids the problematic state combination.
To prevent the footgun described above, we now set
depthStencilSwizzleOneSupport to false by default. This allows UBWC to be
enabled for D24S8 in all cases while remaining conformant. We also have
the tu_enable_d24s8_border_color_workaround driconf option, which enables
the previous workaround for apps that don't know about
depthStencilSwizzleOneSupport, which is currently only the ANGLE
translation layer.
One caveat is that we cannot use the fast border color HW feature for
D24S8+USAGE_SAMPLED+VK_FORMAT_UNDEFINED, so a new driconf toggle is
added. enable_fast_border_color_for_undefined_formats is set for DXVK and
vkd3d-proton since they are known not to use border colors with D24S8.
Lacking fast border colors is a much smaller penalty than not having UBWC
for D24S8.
For some context also see: https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4346
This partially reverts 36916949.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41514>
Suspending render pass jobs have more things than render targets to
preserve, e.g. occlusion query related information, atomic / compute
overlap enablement information etc.
Preserve them too when suspending. When resuming, for boolean
properties, or'ing them; for other preserved things assign them. This is
for ensuring the last resuming fragment job is compatible with all
suspending geometry jobs, as for suspending render passes the fragment
job is omitted.
The situation of the suspending render pass and the resuming render pass
have different query pools is still not supported, and quite difficult
to support.
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Nick Hamilton <nick.hamilton@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41002>
As more things than render targets data need to be kept for suspending
renderpasses, add a structure to sort out them.
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Nick Hamilton <nick.hamilton@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41002>
As we're going to kick frag for suspending rendering passes to mitigate
frag job inconsistency between suspending rendering passes and resuming
render passes, deriving render target datasets based on
geometry_terminate property will be incorrect.
Stop to use geometry_terminate to decide whether to remember render
target datasets, instead use is_suspend directly.
In addition, is_resume is now also used instead of checking whether
suspended render taget datasets is available. This will help when either
the suspending render pass or the resuming render pass have multiple
graphics sub_cmds.
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Nick Hamilton <nick.hamilton@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41002>
When executing a secondary command buffer outside a renderpass, the
sub_cmds of that secondary command buffer is simply copied into the
primary command buffer. However, the 4 flags outside the type-specific
structures are not copied. Although owned flag is intentionally set to
false, the other 3 flags should be preserved.
Copy these 3 flags when executing sub_cmds of a secondary command buffer
outside renderpasses.
Backport-to: 26.0
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Nick Hamilton <nick.hamilton@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41002>
The attachments field of the render pass state could be
MESA_VK_RP_ATTACHMENT_INFO_INVALID, which indicates no attachment
information is valid. If such situation really happens when initializing
the fragment state of a pipeline, this means neither a render pass nor a
VkPipelineRenderingCreateInfo structure is available -- in this case,
the specificiation for that structure says colorAttachmentCount is
considered as 0, so the loop iterating color attachments should just not
happen.
Skip iterating color attachments if the render pass has a attachments
field with value MESA_VK_RP_ATTACHMENT_INFO.
This fixes some regression on the Vulkan CTS testcase
dEQP-VK.pipeline.monolithic.misc.no_rendering introduced by !40870, in
which MESA_VK_RP_ATTACHMENT_INFO instead of 0 is set as the value of the
attachments field of the render pass state, if neither a render pass nor
the VkPipelineRenderingCreateInfo structure is available.
Fixes: 1950b6c1a7 ("vulkan: mark RP attachments as invalid when no rendering create info")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41032>
This reverts commit 2ee6b4d96e.
The previous change avoids 0.25MB (1%) size change on the driver binary file,
but blocks the runtime enablement for some intel tools which is critical
to our optimization tasks.
It's not a good tradeoff based on the new need of the tool in runtime,
so revert this change.
Test: meson setup builddir -Dallow-fallback-for=libdrm -D build-tests=true -Dbuildtype=release --reconfigure && ninja -C builddir && cd builddir && meson test
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: hwandy <hwandy@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41525>
This allows us to use LD_VAR_BUF instead of LD_VAR when the shaders are
linked together.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40761>
This will make it easier to create new default keys in other places
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40761>
pan_varying_layout contains both layout and format, in lower_fs_inputs
though the layout is referring to the VS layout and the format might
differ from what the FS layout expects. We cannot use the VS format as
FS format otherwise we risk interpolating an integer.
Fixes: 66bee415ad ("pan/compiler: Split lower_varyings_io into fs_inputs and vs_outputs")
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40761>
textureGrad() has to be split into two halves on Mali: Computing the
gradient/LOD and doing the actual texture operation. On Valhal, we do
this with LOD_MODE_GRDESC but on Bifrost, we use LOD_MOD_EXPLICIT. When
converting to NIR, I missed this.
Fixes: 05a066c921 ("pan/nir: Add bifrost support to pan_nir_lower_tex()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41513>
Trailing zeroes should be harmless, but it seems to cause issues with
latest ffmpeg (which looks like an ffmpeg bug).
The extra bytes are useless, so we can just skip them like we already
do on VCN to workaround it.
Cc: mesa-stable
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41485>
Because SDMA doesn't support MSAA, it's possible to get there because
RADV fallback to compute queue in this case.
Some tests only pass because RDNA2 and older don't support image
stores with depth/stencil and MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41492>
allows deleting piles of moves & pressure.
simd16 results:
Totals:
Instrs: 2759547 -> 2753358 (-0.22%); split: -0.29%, +0.06%
CodeSize: 41141280 -> 41071072 (-0.17%); split: -0.23%, +0.06%
Totals from 332 (12.54% of 2647) affected shaders:
Instrs: 648080 -> 641891 (-0.95%); split: -1.23%, +0.28%
CodeSize: 9782272 -> 9712064 (-0.72%); split: -0.97%, +0.25%
simd32 is a loss because of RA being stupid. again, this is obviously the right
thing to do so we're doing it. stats are just a hint.
Totals:
Instrs: 4683556 -> 4689193 (+0.12%); split: -0.25%, +0.37%
CodeSize: 70072256 -> 70171920 (+0.14%); split: -0.23%, +0.38%
Number of spill instructions: 50320 -> 50316 (-0.01%)
Number of fill instructions: 51530 -> 51526 (-0.01%)
Totals from 351 (13.26% of 2647) affected shaders:
Instrs: 1349954 -> 1355591 (+0.42%); split: -0.86%, +1.28%
CodeSize: 20484224 -> 20583888 (+0.49%); split: -0.80%, +1.29%
Number of spill instructions: 21762 -> 21758 (-0.02%)
Number of fill instructions: 26328 -> 26324 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
this is both a correctness fix (insufficient MEM registers reserved in some
cases) and a performance fix (unnecessary allocations & zeroing in the RA when
we don't spill).
fixes dEQP-VK.dgc.ext.compute.misc.scratch_space
stats are noise but positive i guess.
Totals from 35 (1.32% of 2647) affected shaders:
Instrs: 396770 -> 396690 (-0.02%)
CodeSize: 6040832 -> 6039600 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
poking around, it seems branches stall the pipelines so we don't need to do any
dataflow analysis, but we do need to fall through for correctness. just keep
going across block boundaries. this isn't optimal yet but it reduces a
pile of A@1's already.
Totals from 1389 (52.47% of 2647) affected shaders:
CodeSize: 56385376 -> 56325776 (-0.11%); split: -0.13%, +0.03%
--
this also fixes issues where the first instruction of a block is a SEND that has
an unmet register dependency, since the old code was fundamentally broken. oops.
lol. fixes
dEQP-VK.compute.pipeline.workgroup_memory_explicit_layout.zero.uint8_t_array_to_uint_array_1
among many others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
Lets us use more accumulators, I think this is well motivated. Saw this in a
test shader.
Totals from 242 (9.14% of 2647) affected shaders:
Instrs: 1365060 -> 1365035 (-0.00%); split: -0.00%, +0.00%
CodeSize: 20678592 -> 20680096 (+0.01%); split: -0.01%, +0.02%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>