Reported by static analysis. Multiplication may overflow
before being converted to the larger type, so fix this
by casting one of the operands to the destination type.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35877>
ac_estimate_size() triggers an assertion because the block size isn't
aligned to a power of two for ASTC formats.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35879>
This changes legacy GS outputs to use the same logic as NGG GS.
It enables the same optimizations that NGG has such as forwarding
constant GS output components to the GS copy shader at compile time.
ac_nir_gs_output_info is removed.
GS output info is no longer passed to ac_nir_lower_legacy_gs and
ac_nir_create_gs_copy_shader separately.
ac_nir_lower_legacy_gs now gathers ac_nir_prerast_out, generates GSVS ring
stores, and also generates the GS copy shader with GSVS ring loads.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This way we won't have to pass output info between the two functions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This is a cleanup.
Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]
New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]
The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.
The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.
The ngg_lds_layout SGPR is now unused without GS in RADV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
We incorrectly used it to determine whether the shader should cull, which
luckily had no effect because it wasn't used everywhere.
cull_clipdist_mask should be used instead, which also reflects whether
clip planes are enabled in GL.
clip_cull_dist_mask is renamed to export_clipdist_mask to make it clear.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This removes LDS space and loads/stores for constant GS & XFB output
components. Constant output components skip LDS stores, and LDS loads
are replaced with the gathered constants.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This simplifies the code and scalarizes the loads/stores.
Scalar loads/stores will allow forwarding constant output components
from stores to loads easily.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This simplifies the code and scalarizes the loads/stores.
Scalar loads/stores will allow forwarding constant output components
from stores to loads easily.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
This looks like a hw bug on GFX10.3-GFX11.5 because RB+ seems to only
work as expected when all channels (RGBA) are written. With that format,
RGB channels must be all set or unset but setting the A channel is
legal so far.
This will reduce rendering performance with that format but it's the
less intrusive solution for now. This might be revisited in the near
future, also with more VKCTS coverage.
This has been tested and verified on GFX10.3 (NAVI21) and GFX11
(NAVI31) and GFX12 (NAVI48), unfortunately I don't have GFX11.5 but
let's assume it's broken there too.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13371
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35631>
For each dimension, we `threads *= lws`.. which is still zero if threads
is initialized to zero.
Fixes: eca4f0f632 ("rusticl/kernel: check that local size on dispatch doesn't exceed limits")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35864>
Recent nvidia hardware has a native instruction for
nir_intrinsic_vote_ieq but not for nir_intrinsic_vote_feq. So, split
this boolean into two so we can contol the lowering separately for each
instruction.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
This switches the code to the new slot offsets from ac_nir_prerast_out
instead of using a prefix bitmask over outputs_written.
The LDS layout no longer includes these:
- GS: output components that are not written by GS
- VS/TES+XFB: output components that are not written by XFB
- VS/TES+XFB: slots that are not written by XFB (this could be significant)
This is also a cleanup because it unduplicates the bitcounts.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
instead of computing it separately. This is better because
ac_nir_lower_ngg_gs knows the final LDS size anyway, and it will be
easier to modify the size calculation this way.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
instead of computing it separately. This is better because
ac_nir_lower_ngg_nogs knows the final LDS size anyway, and it will be
easier to modify the size calculation this way.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This is a prerequisite for NGG lowering passes to return LDS vertex and
scratch sizes, which will lead to further simplifications. That will
require calling gfx10_get_ngg_info after radv_postprocess_nir, which means
LDS offsets are unknown when the passes are called.
This makes the 2 values no longer compile-time constants.
A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will
determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from
an SGPR, though that could be removed too (non-trivially) or handled as
a relocation.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
We need to gather outputs before lowering because lowering requires that we
know the LDS vertex stride, so that we can lower output stores to LDS
stores.
The pass will determine the LDS vertex stride, not drivers.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This will be used to reduce the NGG LDS size for uncompacted GS and XFB
outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This enables emulating clip planes without ClipVertex via clip distances
(max 8) instead of the fixed-func hw (max 6 planes).
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>