There are a bunch of optimizations that are broken when DPP is involved.
fossil-db (Sienna Cichlid):
Totals from 100 (0.07% of 150170) affected shaders:
CodeSize: 325204 -> 325192 (-0.00%); split: -0.06%, +0.05%
Instrs: 62773 -> 62664 (-0.17%); split: -0.18%, +0.00%
Latency: 295348 -> 295266 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 73990 -> 73946 (-0.06%); split: -0.06%, +0.01%
Copies: 1650 -> 1609 (-2.48%); split: -2.55%, +0.06%
PreSGPRs: 3554 -> 3520 (-0.96%)
Fossil-db changes are probably because v_sub_f32_dpp(v_mul_f32) is no
longer being combined into MAD and then split back into separate
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924>
Images that support comp-to-single don't have to be fast-cleared at
all, so the predicate is unnecessary.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12323>
It doesn't make sense to ask for the depth-only or stencil-only format
if there is no depth or stencil. One bit of radv_image.c did seem to
take advantage of the default case in vk_format_depth_only so throw an
`if (vk_format_has_depth(format))` around it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12023>
viewportCount is the number of viewports in pViewports while
firstViewport is the index.
Fixes new CTS dEQP-VK.draw.depth_clamp.*_clamp_four_viewports
Fixes: a2ef92d7a5 ("radv: pre-calculate viewport transforms")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12353>
It seems we can enable DCC if the possible formats differ in signedness
and are otherwise compatible. We just need a fast-clear eliminate for
certain clear colors.
Improves Trine 4 performance.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9387>
Only GFX10+ is affected because older chips don't support
comp-to-single. For them, we need to implement FCE on compute with DCC
and eventually CMASK.
Fixes the gap between concurrent vs exclusive queue with Scarlet Nexus,
also gives a boost with Doom Eternal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12088>
When an image supports comp-to-single, DCC is cleared to 0x10 (single)
and the clear color value is written to the beginning of each 256B
block in the image.
This allows to skip FCE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
When DCC is cleared with that code, the hardware expects the clear
color value to be stored at the beginning of each 256B block in
the image.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
The Vulkan spec requires ~0 for 1x1.
Fixes dEQP-VK.fragment_shading_rate.misc.shading_rates.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
Minimum required value is 16 but we support up to 32
(2x2 VRS with MSAA 8x).
Fixes dEQP-VK.fragment_shading_rate.misc.limits.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
From the Vulkan spec 1.2.187.
"fragmentShadingRateWithCustomSampleLocations specifies whether
custom sample locations are supported for multi-pixel fragments.
It must be VK_FALSE if VK_EXT_sample_locations is not supported."
VK_EXT_sample_locations is disabled on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
It's a RMW operation, also note that DB doesn't use L2 on GFX6-8.
Fixes test_clear_depth_stencil_view() and test_discard_resource() tests
from vkd3d-proton.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12223>
In the compute dispatch path we do not allocate a huge amount
of space to cover everything so the individual functions have to
allocate. This was missing here, causing a hang in Cyberpunk when
accessing the system menu at some locations with thread tracing
enabled.
Fixes: bd1186572f ("radv: add support for push constants inlining when possible")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12271>
Lots of the MAX2 args end up subtracting two unsigned numbers, which
blows up when the result is negative.
Fixes: 4c99d6ff54 ("radv: flush L2 for images affected by the pipe misaligned issue on GFX10+")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12272>
We shouldn't overwrite the clear value of the other aspect (in case
separate depth/stencil layouts are used).
Found by inspection.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12222>
FMASK_DECOMPRESS can't eliminate DCC fast clears. This will allow to
enable DCC MSAA fast clears that require a FCE.
Only supported on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12180>
The radv_emit_ngg_culling_state function won't write the
SPI_SHADER_PGM_RSRC2_GS register when it knows in advance that
radv_emit_graphics_pipeline will overwrite it anyway.
However, there is an unhandled case:
radv_emit_graphics_pipeline will not emit anything (including this
register) when the pipeline is already emitted. Hence, improve
the check in radv_emit_ngg_culling_state to consider this.
Fixes: 9a95f5487f
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12237>
Since copies for the successor's linear phis are inserted before the
branch, we should consider the definitions and operands of the successor's
linear phis.
Fixes a Detroit: Become Human spilling failure with GCM+GVN.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12035>
This aligns RADV with RadeonSI in how it handles late alloc,
making it easier for us to deal with deadlocks and such.
Also move setting the RSRC3 registers for VS, GS and NGG
into radv_pipeline.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11905>
We are going to add this directly to the pipeline.
If a pipeline has such a shader, NGG culling is turned on
most of the time, so it's not useful to toggle this setting.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11905>
Navi 10 can hang when an NGG workgroup has no output,
so we work around that by always exporting a single zero-area
triangle with a single vertex that has all-NaN coordinates.
Thus far, we only employed this for NGG GS, because on all
other stages, the output can't be empty.
However, with NGG culling, the output can be empty, so let's
apply the same workaround there too.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12169>