DCC_DECOMPRESS doesn't work. Instead of trying to figure out why,
use a compute blit where the load is compressed and the store is
uncompressed.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4761>
the implicit sync flag gets set at the beginning at the function,
but I used = instead of |= later.
Fixes: bec9285027 "radv: Stop using memory type indices."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4814>
So that drivers can enable it without worrying how the texture was
allocated.
v2: reworked the mechanism, hopefully fixes now
added Bas Nieuwenhuizen's diff to fix radv
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
The way the new locations are set up has much fewer gaps
between each I/O slot, so this results in a massive reduction
in the LDS usage of tessellation shaders.
Totals (GFX10):
VGPRS: 3976792 -> 3974864 (-0.05 %)
Code Size: 260552784 -> 260532860 (-0.01 %) bytes
LDS: 48723 -> 30179 (-38.06 %) blocks
Max Waves: 1053407 -> 1053583 (0.02 %)
Totals from affected shaders (1407 shaders on GFX10):
SGPRS: 59144 -> 59216 (0.12 %)
VGPRS: 63024 -> 61096 (-3.06 %)
Code Size: 2695508 -> 2675584 (-0.74 %) bytes
LDS: 47109 -> 28565 (-39.36 %) blocks
Max Waves: 12999 -> 13175 (1.35 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
This doesn't fix anything, just reports the LDS size used by
merged ESGS shaders, such as vertex_geometry_gs and
tess_eval_geometry_gs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Instead of relying on calling shader_io_get_unique_index repeatedly,
remember the which output driver location corresponds to which
varying slot.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
VS needs the number of TCS inputs, and TES needs the number of TCS
outputs.
It is error-prone to repeat those calculations in both instruction
selection and setup. Just set them in one place instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Previously these functions needed the bit mask of the TCS outputs
and patch outputs written, and concluded the number of outputs
from that.
Now, they take the number of outputs and patch outputs instead.
This will allow the backend compiler to better optimize the
LDS layout.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
unreachable was true if the last block is unreachable in the linear cfg,
but it should also be true if it is unreachable in the logical cfg.
Fixes dEQP-VK.graphicsfuzz.for-with-ifs-and-return
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 8d8c864beb
('aco: improve check for unreachable loop continue blocks')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4764>
With VK_EXT_robustness2, an element of pBuffers can be NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
The SPIR-V spec doesn't say explicitly that the sample index
must be an unsigned integer.
This fixes crashes with some new VK_EXT_robustness2 tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Example situation:
v1 = {v0.hi, v1.lo}
v0.hi = v1.hi
The 4-byte copy's definition is completely used, but swapping it makes no
sense. We have to split it to generate correct code:
swap(v0.hi, v1.lo)
swap(v0.hi, v1.hi)
Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
When splitting a v6b vector into v1 and v2b components, we should ensure
the v1 definition doesn't start at the upper half.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
Doom Eternal explicitly allows overallocation via this extension
but that shouldn't change anything because it's the default RADV
behavior.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4785>
By default, RADV supports overallocation by the sense that it doesn't
reject an allocation if the target heap is full.
With VK_AMD_overallocation_behaviour, apps can disable overallocation
and the driver should account for all allocations explicitly made by
the application, and reject if the heap is full.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4785>
The reason behind this is that FMASK requires CMASK and also that
FMASK for non color attachments looks unnecessary. It's currently
much easier to add this simple check because the driver tries to
always enable DCC first and if we enable FMASK only if CMASK, we
might loose some FMASK compressions.
This helps fixing some new robustness2 tests which fails because
only FMASK is enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4783>
This would be necessary for an application to figure out if the
memory was allocated using a memory type with VK_MEMORY_PROPERTY_PROTECTED_BIT.
It also allows one to determine VRAM vs. GTT etc.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4751>
Lots of extra coding was involved in managing them.
And for protected memory I was thinking of making a function that
goes from domain+flags to memory types, which can reuse this array.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4751>
On APUs, the memory is unified (all heaps are equally fast) and
apps should count all memory heaps together. But some games like
Id Tech games (Youngblood and such) don't manage memory correctly
on APUs and they spill everything when one VRAM heap is full.
Instead of spilling buffers, they should just allocate new buffers
in the second heap but it seems like these games are confused if
two memory heaps have the DEVICE_LOCAL_BIT set.
This is probably a first step towards better memory management on
APUs but there is still some work to do if we want to run most apps
with a small dedicated VRAM (256MB or so).
This gives a huge boost for Id Tech games on APUs, and doesn't
seem to reduce Feral games performance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4771>
Also reverse the BO list removal loop. This way typical WSI usage
should find the entry in O(active swapchains) iterations, which
should not be a performance issues. Tested with Doom(2106) which
found the entry in 1 iteration every time.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4306>