Make sure to preserve signed zeroes.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
on GFX6 (Pitcairn). Untested on GFX7.
Fixes: 54a09545ec ("aco: optimize a*0.0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>
because gfx103.json is automatically generated and can't be changed
manually. This fixes the file generator without changing the generated
header.
Missing registers must be in registers-manually-defined.json, and
missing fields must be in parse_kernel_headers.py.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
Fixes
dEQP-VK.api.image_clearing.core.clear_color_image.2d.optimal.single_layer.e5b9g9r9_ufloat_pack32_33x128
with RADV_DEBUG=forcecompress on GFX10.3.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 21.1 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10176>
Without image stores, DCC is always decompressed on compute.
Cc: 21.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10168>
This would be easy to support except that it doesn't support RDNA 2.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
It doesn't support Navi1x and the removal enables this nice code cleanup.
v2: rebase - mareko
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
ACO doesn't create a waitcnt for barriers between texture samples and
image stores because texture samples are supposed to use read-only
memory. It could also schedule the barrier to above the texture sample.
We also have use a larger memory scope to avoid an ACO optimization.
Tested on GFX8 with Sachsa Willems deferred sample. With some DCC
decompressions and the compute path forced.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 21.1 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9496>
Allocating a descriptor set is aligned to 32 bytes, so just like the
other buffer types, bump the descriptor size to 32 bytes when allocating
MUTABLE descriptor types from a pool.
Fixes: 86644b84b9 ("radv: Implement VK_VALVE_mutable_descriptor_type.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10132>
Accept non-linear tiling for multi-planar formats on GFX9+, as long
as DCC is disabled. DCC support is possible in theory, but untested
for now.
GFX8 is still restricted to linear tiling because it's not yet clear
how modifiers should be handled on these chips for multi-planar
formats. Each plane may need a different modifier.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10134>
util_format_get_blocksize asserts that the blocksize isn't zero.
However the blocksize will be zero if the format's channel encoding
is unspecified. The channel encoding is only meaningful for the
plain u_format layout, so util_format_get_blocksize can't be used
for formats with another layout. For example, YUV formats don't have
the channel encoding specified.
Use util_format_get_blocksizebits, which just returns zero without
an assertion for formats which don't have a channel encoding.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10134>
Otherwise, handle_loop_phis() might pass it to handle_live_in() and then
we could have two phis for this variable.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 7c64623e94 ("aco/ra: refactor SSA repairing during register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10236>
In most games I tested we use 32 MiB of cmdbuffers+cmd upload buffers
at most. Especially since we have mutable descriptors it seems
somewhat unlikely anything else will eat it up so be a bit more
aggressive allocating them in VRAM.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10198>
Late export is theoretically better if used with LATE_ALLOC,
but in practice, the early export has an advantage of
lower register usage, therefore more concurrent waves.
The idea of this commit is that "small" shaders benefit from early
primitive export more, due to being able to launch much more waves.
Let's consider a NIR shader "small" when it has only 1 block.
This yields both better performance, and better stats, than always
using late export.
Fossil DB on Sienna:
Totals from 12807 (8.76% of 146265) affected shaders:
VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83%
SpillSGPRs: 1458 -> 1538 (+5.49%)
CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14%
MaxWaves: 282902 -> 278516 (-1.55%)
Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18%
VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30%
SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16%
Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26%
Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26%
PreSGPRs: 434701 -> 447396 (+2.92%)
PreVGPRs: 527783 -> 540527 (+2.41%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
The user-set priority of shaders matters very little, but we hope
this might still help speed up VS input loads especially.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
We learned that the gs_alloc_req is not actually when the export
space allocation happens. So it makes no sense to prioritize it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
Previously, every wave had multiple active lanes read the LDS, and
the data was processed by VALU DPP instructions.
Now, only the first lane reads the LDS in order to avoid bank
conflicts, and the results are processed by SALU.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>
No fossil-db changes, probably because all fp16 shaders have at least one
16-bit mov or vec2 somehwere.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10227>
Layered VRS attachments is for later.
The CTS failures are similar to the existing ones, I will investigate
soon.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
The global VRS image is created on-demand to avoid wasting space.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
When a subpass uses a VRS attachment without binding a depth/stencil
attachment (yes, this is allowed by the Vulkan spec), we have to bind
our internal depth buffer that contains the VRS data.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
The Vulkan spec doesn't require the application to always binds
a depth/stencil attachment when a VRS attachment is used inside the
same subpass.
To handle this situation, the driver creates a global 4096x4096
VRS image that will be bind at draw-time if needed. This isn't
super ideal but we have to do that unfortunately.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
The stencil data needs to be included for storing the VRS rates
into the HTILE buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
When VRS attachment, any depth buffer can potentially be used for VRS.
We also have to create a global depth buffer if the app doesn't
provide one.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
The Vulkan spec requires the implementation to only supports
VK_SAMPLE_COUNT_1_BIT with fragment shading rate attachments.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>
This will be used to copy VRS rates to the HTILE buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10187>