This removes a very old hack which will also allow us to enable DCC
for multiplanar formats eventually and to reduce the combined
image+sampler descriptor size from 96 to 48 on RDNA3+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457>
The GFX12 ISA doc describes other layouts for A/B, but they are identical
to the C layout with the exception of the order of the rows (columns for A).
And as long as these are swapped in the same way for both A and B, the muladd
result will be the same. So we use the C layout for all uses.
This will simplify conversions between uses, and allows A/B to use a single
memory access for load/store in wave32.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35570>
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
One is only used by TCS, the other is only used by TES.
Use the same field for both, call it PATCH_VERTICES_IN.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>
Use the same path as GFX12 using SSBO atomics because performance
should be equal or slightly better due to less synchronization.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35017>
This increases this buffer by 20 bytes but it will be re-used for
emulated queries on GFX11-GFX11.5 in order to remove the GDS path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35017>
This optimization used to optimize the allocated space for descriptors
when immutable samplers are equal. Though, this was basically broken :
- descriptor copies were broken for combiner image sampler (or sampler)
with equal immutable samplers because 96 bytes were copied instead of
64 bytes (cf. the linked ticket). This could be fixed but it's not
worth it.
- the value returned by vkGetDescriptorLayoutSupport() was broken, it
should have been 96 with no immutable samplers (or when they aren't
equal)
This optimization was also not applied for descriptor buffers which is
the default for vkd3d-proton and Zink. DXVK doesn't use db but it
doesn't use immutable samplers, so basically only native vulkan games
would be concerned.
Note that immutable samplers would still be inlined in shaders if no
indirect access which should be 99.9% of the usecase.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11165
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34928>
fossil-db (gfx1201):
Totals from 5 (0.01% of 79377) affected shaders:
(no affected stats)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
Invalid for now, but used by vkd3d-proton, where the use case is to convert
a result matrix to lower precision, followed by a store.
For 16bit accumulation matrices, GFX11 only uses 16bits per 32bit register.
RADV's coop matrix code pads the unused space with undefs and uses a vector
with twice as many elements as the matrix length. Extending that to 8bit by
leaving 24 bits unused is unnecessary as these matrices as there
is no hw unit that requires it. And in wave32, it would also result in
vectors larger than NIR's limit.
So tightly pack 8bit matrices without any undef padding.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34382>
Recently, I renamed most of the helpers for future work but I forgot
few things like meta keys, etc.
This is for consistency.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34558>
A cooperative matrix conversion operation was represented in NIR by the
cmat_unary_op intrinsic with an nir_alu_op as extra parameter,
that was already lowered to a specific conversion operation
based on the matrix types.
Instead of that, add a new intrinsic `cmat_convert` that is specific
for that conversion. In addition to the src/dst matrix descriptions
already available, also include the signedness information in the
intrinsic (reuse nir_cmat_signed for that). This is needed because
different Convert operations define different interpretations for
integers, regardless their original type.
In this patch, both radv and intel were changed to use the same logic
that was previously used to pick the lowered ALU op.
This change will help represent cmat conversions involving BFloat16,
because it avoids having to create new NIR ALU ops for all the
combinations involving BFloat16.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34511>
When we are using compute resolve, we can get
values the CTS does not expect due to the value
we end up writing for UNORM in
`nir_image_deref_store`.
Make the compute resolve rounding path match with
the output of the fragment shader resolve path,
by going through the same FP16 RTZ conversion as
we do for UNORM/SNORM formats.
This is why VK_EXT_sample_locations CTS was
failing on > GFX9.
On <= GFX9, I am assuming we are falling back to
RESOLVE_FRAGMENT, due to DCC stuff, which is why
it works there.
I tested a handful of images from the Vulkan CTS
for the sample locations and resolve tests for
diff UNORM formats from the qpa file forcing
FRAGMENT and with this change.
With this change, we now match on the compute
resolve path the same sha for the ones I compared
with ImageMagick `identify`.
CTS passes for: *resolve*, *image_clearing* and
*sample_locations* on RX 7900XTX.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28237>