Seems we haven't encountered this before because
nir_lower_io_to_scalar_early usually scalarizes this.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12486>
This has never been used because it requires to know the previous
clear values which is not really possible in Vulkan.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12326>
Using separate aspects is required.
Fixes few CTS failures (dEQP-VK.api.copy_and_blit.*) when the compute
path is forced in the driver. Note that CTS coverage of compute queue
is rather limited.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12287>
This seems to be unsupported.
COMPRESSION_EN=1 and WRITE_COMPRESS_ENABLE=1 don't update HTILE
with image stores.
Note that there is no issue because depth/stencil images will be
decompressed for image stores, and TC-compat HTILE is disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12450>
The driver doesn't enable TC-compat HTILE for storage images, so this
was actually always TRUE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12450>
There are a bunch of optimizations that are broken when DPP is involved.
fossil-db (Sienna Cichlid):
Totals from 100 (0.07% of 150170) affected shaders:
CodeSize: 325204 -> 325192 (-0.00%); split: -0.06%, +0.05%
Instrs: 62773 -> 62664 (-0.17%); split: -0.18%, +0.00%
Latency: 295348 -> 295266 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 73990 -> 73946 (-0.06%); split: -0.06%, +0.01%
Copies: 1650 -> 1609 (-2.48%); split: -2.55%, +0.06%
PreSGPRs: 3554 -> 3520 (-0.96%)
Fossil-db changes are probably because v_sub_f32_dpp(v_mul_f32) is no
longer being combined into MAD and then split back into separate
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924>
Images that support comp-to-single don't have to be fast-cleared at
all, so the predicate is unnecessary.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12323>
It doesn't make sense to ask for the depth-only or stencil-only format
if there is no depth or stencil. One bit of radv_image.c did seem to
take advantage of the default case in vk_format_depth_only so throw an
`if (vk_format_has_depth(format))` around it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12023>
viewportCount is the number of viewports in pViewports while
firstViewport is the index.
Fixes new CTS dEQP-VK.draw.depth_clamp.*_clamp_four_viewports
Fixes: a2ef92d7a5 ("radv: pre-calculate viewport transforms")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12353>
It seems we can enable DCC if the possible formats differ in signedness
and are otherwise compatible. We just need a fast-clear eliminate for
certain clear colors.
Improves Trine 4 performance.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9387>
Only GFX10+ is affected because older chips don't support
comp-to-single. For them, we need to implement FCE on compute with DCC
and eventually CMASK.
Fixes the gap between concurrent vs exclusive queue with Scarlet Nexus,
also gives a boost with Doom Eternal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12088>
When an image supports comp-to-single, DCC is cleared to 0x10 (single)
and the clear color value is written to the beginning of each 256B
block in the image.
This allows to skip FCE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
When DCC is cleared with that code, the hardware expects the clear
color value to be stored at the beginning of each 256B block in
the image.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
The Vulkan spec requires ~0 for 1x1.
Fixes dEQP-VK.fragment_shading_rate.misc.shading_rates.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
Minimum required value is 16 but we support up to 32
(2x2 VRS with MSAA 8x).
Fixes dEQP-VK.fragment_shading_rate.misc.limits.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
From the Vulkan spec 1.2.187.
"fragmentShadingRateWithCustomSampleLocations specifies whether
custom sample locations are supported for multi-pixel fragments.
It must be VK_FALSE if VK_EXT_sample_locations is not supported."
VK_EXT_sample_locations is disabled on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
It's a RMW operation, also note that DB doesn't use L2 on GFX6-8.
Fixes test_clear_depth_stencil_view() and test_discard_resource() tests
from vkd3d-proton.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12223>
In the compute dispatch path we do not allocate a huge amount
of space to cover everything so the individual functions have to
allocate. This was missing here, causing a hang in Cyberpunk when
accessing the system menu at some locations with thread tracing
enabled.
Fixes: bd1186572f ("radv: add support for push constants inlining when possible")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12271>
Lots of the MAX2 args end up subtracting two unsigned numbers, which
blows up when the result is negative.
Fixes: 4c99d6ff54 ("radv: flush L2 for images affected by the pipe misaligned issue on GFX10+")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12272>
We shouldn't overwrite the clear value of the other aspect (in case
separate depth/stencil layouts are used).
Found by inspection.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12222>
FMASK_DECOMPRESS can't eliminate DCC fast clears. This will allow to
enable DCC MSAA fast clears that require a FCE.
Only supported on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12180>