By using the geometry shader output usage mask.
This improves all Vulkan demos that use a geometry shader
(ie. geometryshader, deferredshadows, viewportarray).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For reducing the number of parameters that are exported by
the GS copy shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
An upcoming patch will run the shader info pass on the
geometry shader just before emitting the GS copy shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The hardware always interprets the alpha as unsigned and fixing it
in the shader is going to add unacceptable overheads.
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Pre-Vega HW always interprets the alpha for this format as unsigned,
so we have to implement a fixup to do the sign correctly for signed
formats.
v2: Improve indexing mess.
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Basic sampling support for linear tiling.
No CTS regressions, but it seems the blitting coverage is not very
extensive.
https://bugs.freedesktop.org/show_bug.cgi?id=106331
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
radeonsi could pass them through but the enum changed between
Gallium and Vulkan, so we have to translate.
In progress I made the register defines a bit more readable.
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100430
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This moves the extra queries to after the main query ended, instead
of doing it after the begin and hence doing nesting.
We also emit only (view count - 1) extra queries, as the main query
is already there for the first view.
This fixes the CTS occasionally getting stuck in
dEQP-VK.multiview.queries* waiting on results.
Fixes: 32b4f3c38d "radv/query: handle multiview queries properly. (v3)"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
radv_can_dump_shader() already handles if module is NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These are going to be crazy and we are probably going to add
more scan stuff in the future. Also use switch cases instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I don't think the hw resolve path can't handle multi-layer images.
This fixes all the:
dEQP-VK.renderpass.multisample_resolve.layers_*
tests on my VI card.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
This path should iterate across all layers, I've some ideas
for doing this in a single pass, but this is simpler for now.
This passes the tests because we don't use the fragment path
unless we have DCC, and we don't have DCC on layered images.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
For a multi-layer subpass resolve we want to make sure we flush all
the layers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.
We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.
With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.
V2: add bit to radv_pipeline_key
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
These helpers will be needed for future work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This seems to be broken at the moment for opengl interop.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's a per-application optimization, so it makes more sense
to do that in radv_handle_per_app_options().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes
dEQP-VK.api.device_init.create_instance_invalid_api_version
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Apparently the somewhere between 1.1.70 and 1.1.73 the loader started
depending on this. The loader then creates a 1.0 instance, which gets
into funny situation because we have a 1.1 device.
No idea how to do line wrapping in Mako though, my random guesses
did not work.
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously before fb077b0728, the LOD parameter was being used in place of the
sample index, which would only copy the first sample to all samples in the
destination image. After that multisample image copies wouldn't copy anything
from my observations.
This fixes some copy_and_blit CTS tests.
v3.1: - set lod to 0 for nir_txf_ms (Samuel)
v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel)
- updated commit description (Samuel)
Fix this properly by copying each sample in a separate radv_CmdDraw and using a
pipeline with the correct rasterizationSamples for the destination image.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
As the implementation is conservative, we can now enable it
by default. It can be disabled with RADV_DEBUG=nooutoforder.
Don't expect much more than 1% of improvements, but the gain
seems consistent.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only count color attachments twice if resolves are used, also
account for the depth stencil attachment if present.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is needed for gfx9 and later for all fmask surface index.
(Mentioned by Marek on irc)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
A bo's ref_count was not being initialized when imported from an fd.
Therefore, we would fail to free the resource during VkFreeMemory().
This patch fixes applications like hifi VR in threaded mode, which
perform frequent imports/releases of IPC shared memory.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
num_channels has been introduced since "ac/surface: don't set
the display flag for obviously unsupported cases".
Based on RadeonSI.
Fixes: e29facff31 ("ac/surface: don't set the display flag for obviously unsupported cases (v2)")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
dcc_msaa_allowed is always false on GFX9+ and only true on VI
if RADV_PERFTEST=dccmsaa is set. This means DCC was disabled
in some situations where it should not.
This is likely going to fix a performance regression.
Fixes: 2f63b3dd09 ("radv: enable DCC for MSAA 2x textures on VI under an option")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This refactors the code out to share it between radv and radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This follows what radeonsi does.
Ported from radeonsi:
radeonsi: emit PA_SC_RASTER_CONFIG_1 only once
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This just makes this common code between the two drivers.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Missed this on initial radeonsi port, we shouldn't use this value
on gfx9, but also in gfx8 only for when we have a geom shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The copr repo from che was using LTO and he reported radv broke
recently with it. When testing with lto builds here I noticed
that we weren't seeing any instance extensions reported.
It appears LTO was treating the const without extern as an empty
struct, this is possibly a gcc bug, but we can work around it
just by marking these with extern.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise a lot of games complain about not having enough memory,
and it is sort of local so this seems reasonable to me.
CC: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The SI family doesn't support chaining which means the maximum
size in dwords per CS is limited. When that limit was reached
we failed to submit the CS and the application crashed.
This patch allows to submit up to 4 IBs which is currently the
limit, but recent amdgpu supports more than that.
Please note that we can reach the limit of 4 IBs per submit
but currently we can't improve that. The only solution is to
upgrade libdrm. That will be improved later but for now this
should fix crashes on SI or when using RADV_DEBUG=noibs.
Fixes: 36cb5508e8 ("radv/winsys: Fail early on overgrown cs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ported from RadeonSI.
Local BOs ignore BO priorities, and we don't need those on APUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>