For MSAA storage images with DCC, we also need to perform a MSAA
color decompression.
Fixes dEQP-VK.pipeline.multisample.storage_image.* if DCC stores
is enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9854>
The format_feature_flags bitfield is derived from the modifier if
the tiling is set to VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT.
However radv will reset the tiling to either LINEAR or OPTIMAL if
the caller supplied a VkPhysicalDeviceImageDrmFormatModifierInfoEXT
in the chain.
Stop resetting the tiling, so that we can compute the correct feature
flags.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 6c83e3ea98 ("radv: Add format modifier format queries.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9978>
To remove bubbles during layout transitions from UNDEFINED, especially
with MSAA because we might have all.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10004>
From the Vulkan spec 1.2.172:
"If there is no subpass dependency from VK_SUBPASS_EXTERNAL to the
first subpass that uses an attachment, then an implicit subpass
dependency exists from VK_SUBPASS_EXTERNAL to the first subpass
it is used in."
"Similarly, if there is no subpass dependency from the last subpass
that uses an attachment to VK_SUBPASS_EXTERNAL, then an implicit
subpass dependency exists from the last subpass it is used in to
VK_SUBPASS_EXTERNAL."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9714>
It's not enabled by default because it requires performance testing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9919>
Since RADV requires DRM 3.35+, this code can be simplified.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9953>
From the Vulkan spec:
"VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL specifies a
layout for both the depth and stencil aspects of a depth/stencil
format image allowing read only access as a depth/stencil
attachment or in shaders as a sampled image, combined
image/sampler, or input attachment. It is equivalent to
VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL and
VK_IMAGE_LAYOUT_STENCIL_READ_ONLY_OPTIMAL."
So, it should be safe to keep HTILE compressed if the depth/stencil
image isn't going to be sampled. We could probably extend this
to separate depth/stencil layout but that seems a bit more
complicated.
This gives a huge boost to the deferredmultisampling Vulkan demo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10008>
Across every driver...
v2: Add casts to appease -fpermissive used on CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9477>
It was parsing it as SQ_WAVE_GPR_ALLOC instead of COMMAND.
Change the offset to an odd number to work around it.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
LLVM prints an error if xnack is unsupported and it uses a global stream
object that is not thread-safe. Since Mesa uses multiple threads to compile
shaders, there is a small chance that it will crash.
Just don't set any xnack options to use LLVM defaults.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4439
Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
This should be equivalent without needed to force enable FMASK for
some specific internal pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9940>
This is no longer needed since FMASK is also compressed for
transfer dst operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9940>
The COMPRESSION bit is FMASK and this is much faster! Should
speedup transfer dst operations with MSAA images considerably.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9990>
This will allow us to reduce the number of situations where the
compiler workaround is needed on GFX10.3.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9924>
this breaks up the monolithic draw path used for all draw calls into
pre/post functions to handle general setup and a couple helper functions
to more directly handle different draw modes in a way that's both more readable
and, potentially, more optimizable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8788>
With compressed DCC writes supported, the image should still be
compressed after resolving using the compute path.
Fixes various dEQP-VK.api.copy_and_blit.core.resolve_image.*
failures with RADV_DEBUG=forcecompress on GFX10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9908>
MAX2(count * struct size, 1) results in 1 for count=0, not the size of a struct.
Since this MAX only seems to exist so we can keep using NULL for error reporting,
just refactor to return a VkResult.
Fixes: ad241b15a9 ("vk: consolidate dynamic descriptor binding sorting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4522
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9880>
FMASK/CMASK make no sense with depth/stencil or stencil-only surfaces.
gfx9_compute_miptree() was called twice for stencil-only surfaces,
and the first call was allocating FMASK/CMASK.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9612>
Linux Kernel 4.15+ is now required for RADV, this kernel has been
released 3 years ago and should be in most modern distros.
This allows us to remove a lot of legacy code for fence/semaphore.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9800>
Fixes the following building error:
In file included from external/mesa/src/amd/addrlib/src/gfx9/gfx9addrlib.cpp:36:
external/mesa/src/amd/addrlib/src/chip/gfx9/gfx9_gb_reg.h:43:2: error: "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined"
^
NOTE: Android ABI specifies Little Endian for arm and x86 targets
Fixes: 3616e02ef3 ("amd/addrlib: define endianess differently")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9869>
Since image stores can now compress and we can't track image stores
this also stops using predication for DCC decompression.
In GFX10 this was benchmarked to be faster. For GFX10.3 the microbenchmarks
are not as possible though I haven't tested any games, so this is not enabled
there yet.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6796>
For 16x16 we get 4 16x4 waves, which is bad for DCC image stores.
The workgroup size doesn't really matter for speed, the important
part is the number of waves, which should stay constant here.
(Though some optimization would be nice, but out of scope for this
patch)
The compute DCC compress shader still uses 16x16 due to functional
requirements (and we're sure it won't write with DCC compression ...)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6796>
The upper 32 bits are truncated before converting, which still produces
correct results since they never meaningfully contribute to the result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
The previous code produced incorrect results for inputs outside the
range [INT16_MIN, INT16_MAX].
A problematic case is e.g. i2f16 32768, which previously would be
converted to -32768.0 instead of returning the exactly representable
floating point result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>