radv_can_dump_shader() already handles if module is NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These are going to be crazy and we are probably going to add
more scan stuff in the future. Also use switch cases instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I don't think the hw resolve path can't handle multi-layer images.
This fixes all the:
dEQP-VK.renderpass.multisample_resolve.layers_*
tests on my VI card.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
This path should iterate across all layers, I've some ideas
for doing this in a single pass, but this is simpler for now.
This passes the tests because we don't use the fragment path
unless we have DCC, and we don't have DCC on layered images.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
For a multi-layer subpass resolve we want to make sure we flush all
the layers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.
We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.
With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.
V2: add bit to radv_pipeline_key
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
These helpers will be needed for future work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This can result in 2x increase in performance on non-harvested Kaveris.
v2: don't do it on radeon
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This seems to be broken at the moment for opengl interop.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's a per-application optimization, so it makes more sense
to do that in radv_handle_per_app_options().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes
dEQP-VK.api.device_init.create_instance_invalid_api_version
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Apparently the somewhere between 1.1.70 and 1.1.73 the loader started
depending on this. The loader then creates a 1.0 instance, which gets
into funny situation because we have a 1.1 device.
No idea how to do line wrapping in Mako though, my random guesses
did not work.
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously before fb077b0728, the LOD parameter was being used in place of the
sample index, which would only copy the first sample to all samples in the
destination image. After that multisample image copies wouldn't copy anything
from my observations.
This fixes some copy_and_blit CTS tests.
v3.1: - set lod to 0 for nir_txf_ms (Samuel)
v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel)
- updated commit description (Samuel)
Fix this properly by copying each sample in a separate radv_CmdDraw and using a
pipeline with the correct rasterizationSamples for the destination image.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
As the implementation is conservative, we can now enable it
by default. It can be disabled with RADV_DEBUG=nooutoforder.
Don't expect much more than 1% of improvements, but the gain
seems consistent.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only count color attachments twice if resolves are used, also
account for the depth stencil attachment if present.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is needed for gfx9 and later for all fmask surface index.
(Mentioned by Marek on irc)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
A bo's ref_count was not being initialized when imported from an fd.
Therefore, we would fail to free the resource during VkFreeMemory().
This patch fixes applications like hifi VR in threaded mode, which
perform frequent imports/releases of IPC shared memory.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
If loading 64-bit vec3 values, a 4 component load would be followed
by a 2 component load and the resulting shuffle would fail as it
requires 2 4 components. This just expands the second results
vector out to 4 components.
This fixes 100 CTS tests:
dEQP-VK.spirv_assembly.type.vec3.*64*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: require the previous level to be clearable for determining whether
the last unaligned level is clearable
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
1D textures are allocated as 2D which means we only need
one coordinate for texture query LOD.
Fixes: 625dcbbc45 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>