for intrinsics, we have these really nice builders using designated initializers
+ macros to specify optional indices. texture instrs have even more craziness
involved, but we can do the same trick. this commit takes the existing "fixed
form" deref-centric tex builders and generalizes them to work with non-deref
textures, making it useful also for GL and late VK passes, while providing an
API that strives to be ergonomic and consistent.
this series only implements a subset of possible texture operations for now, but
more generalizing could be added as people have need.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36050>
The imod instructions are lowered to 4 alu instructions each. We can do
better by packing the results with the values for kz.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
Instead of using fp64 (Which is broken in some cases) the new approach
only uses fp32 and implements tiebreaking for edge/vertex hits. Using
fp32 is also much faster, improving performance of q2rtx by around 40%.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
This increases primitive throughput for all hw with NGG if the shader
culls and the removal of cull distances reduces the number of position
exports.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>
Clip and cull distance outputs decrease primitive throughput, so culling
against them in the shader has even more benefit than other culling
options.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>
This simplifies shader info in drivers by returning GSVS emit sizes from
ac_nir_lower_legacy_gs. The pass knows the sizes, so drivers shouldn't
have to determine them independently.
This also makes the values more accurate because both drivers were
computing the GSVS emit sizes inaccurately and had redundant fields
in shader info. RADV had a lot of redudancy there.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35473>
To be shared with radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
To be shared with radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
To be shared with radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
This is a cleanup.
Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]
New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]
The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.
The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.
The ngg_lds_layout SGPR is now unused without GS in RADV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This is a prerequisite for NGG lowering passes to return LDS vertex and
scratch sizes, which will lead to further simplifications. That will
require calling gfx10_get_ngg_info after radv_postprocess_nir, which means
LDS offsets are unknown when the passes are called.
This makes the 2 values no longer compile-time constants.
A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will
determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from
an SGPR, though that could be removed too (non-trivially) or handled as
a relocation.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This removes a very old hack which will also allow us to enable DCC
for multiplanar formats eventually and to reduce the combined
image+sampler descriptor size from 96 to 48 on RDNA3+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35457>
The GFX12 ISA doc describes other layouts for A/B, but they are identical
to the C layout with the exception of the order of the rows (columns for A).
And as long as these are swapped in the same way for both A and B, the muladd
result will be the same. So we use the C layout for all uses.
This will simplify conversions between uses, and allows A/B to use a single
memory access for load/store in wave32.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35570>
It's a stride of 1 output, which isn't 16. It's 16 * num_threads,
aligned to 256.
tcs_offchip_layout has 5 unused bits, so let's use them.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34780>