6698753cdb switched our GS output stores to use MUBUF.
The stride doesn't matter for the ESGS descriptor (because idxen=false and
the index stride is 64), but this fixes it anyway.
This also changes ACO to use MUBUF store too, since MTBUF doesn't seem to
work correctly with an invalid data format in the descriptor.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 6698753cdb ("ac/llvm: don't use tbuffer_store as a fallback for swizzled stores")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18885>
FMask doesn't exist on GFX11. Have txf_ms take the fragment_fetch_amd
path.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19375>
TES rel patch id is <256, so we can use an existing unused LDS
byte instead of extra dword.
To ease the programing, change the index of repacked_arg_vars
for these variables.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18832>
According to LLVM, s_sendmsg_rtn(GET_REALTIME) should be used instead
of s_memrealtime.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19267>
Also modify all existing uses to pass a zero to this new src.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>
Remove unused arguments, clean up allow_combining vs. swizzled etc.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>
Previously, we always treated these as coherent, but now let's make
this configurable. Also set all current users to ACCESS_COHERENT.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>
Avoids later moves with extractions from the vector.
Reduces VALU operation in the raytrace loop by ~6%, increasing
the RT performance in Q2RTX on a 6800 XT by about ~1.3%.
Suggested by Georg.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19148>
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).
ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.
This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).
There are two cases that benefit from the new change:
1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.
v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks
Fossil DB stats on Navi 21:
Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
According to an LLVM comment, these are swapped in GFX11.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
Quad-divergent CF and vertex selection doesn't work, but should at least
prevent crashes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
We can't detect color attachment without exports when compiling a PS
epilog, so we can't compact MRTs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18514>
fossils-db (NAVI21):
Totals from 158 (0.12% of 134913) affected shaders:
CodeSize: 569456 -> 568824 (-0.11%)
Only Control seems affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615>
If there are holes between color outputs (e.g. a shader exports MRT1,
but not MRT0), we can remove the holes by moving higher MRTs lower. The
hardware will remap the MRTs to their correct locations if we remove
holes in SPI_SHADER_COL_FORMAT but not CB_SHADER_MASK. This is good for
performance because the hardware will allocate less space for color
MRTs.
This also allows to remove even more unused color exports because we no
longer need to force previous targets to be non-zero. Only SotTR seems
affected from our fossils db.
fossils-db (NAVI21):
Totals from 859 (0.64% of 134913) affected shaders:
VGPRs: 24328 -> 24216 (-0.46%)
CodeSize: 1433276 -> 1422576 (-0.75%)
Instrs: 255275 -> 253728 (-0.61%)
Latency: 1666836 -> 1661544 (-0.32%)
InvThroughput: 346038 -> 343406 (-0.76%)
Copies: 16520 -> 16506 (-0.08%)
PreSGPRs: 25934 -> 25920 (-0.05%)
PreVGPRs: 19903 -> 19662 (-1.21%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5786>
If there's a lod parameter it matter if the image is 3d or 2d because
the hw reads either the fourth or third component as lod. So detect
3d images and place the lod at the third component otherwise.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18114>
Note that, from 22.4.1. Vertex Input Extraction of Vulkan spec:
The input variable in the shader must be declared as a 64-bit data type if
and only if format is a 64-bit data type.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17894>
Size is in bytes, not bits.
Fixes plenty of crashes in CI, like
dEQP-VK.synchronization.op.single_queue.event.write_image_fragment_read_image_tess_eval.image_128_r32_uint.
Fixes: 46f6e2ddbb ("aco: Implement storage image A16.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18266>
get_ssa_temp's and NIR's bit size can differ for scalar sources.
This causes broken packing of the MIMG operands with A16/G16.
Fixes: f5f73db846 ("aco: Support 16bit sources for texture ops.")
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18008>
This makes it more likely to hit the fast path for count == 1
in the split_store_data function.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17923>