Gamescope relies on legacy scanout support when explicit modifier isn't
available and it chains the mesa wsi hint requesting such. Venus doesn't
support legacy scanout with optimal tiling on its own, so venus disables
legacy scanout in favor of prime buffer blit for optimal performance. As
a workaround here, venus can once again force linear tiling when legacy
scanout is requested outside of common wsi.
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35811>
Brings the vertex shader in
dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_dvec4_vertex
from 234 to 169 instructions on NAK.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
No shader-db changes here, but it does improve some cts shaders, eg. the
vertex shader in
dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_i64vec4_vertex
goes from 80 to 56 instructions with NAK
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
Recent nvidia hardware has a native instruction for
nir_intrinsic_vote_ieq but not for nir_intrinsic_vote_feq. So, split
this boolean into two so we can contol the lowering separately for each
instruction.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
AMD hardware can early-cull box nodes if all leaves are either opaque or
not and the ray flags are set to discard (non-)opaque geometries. This
works even across TLAS/BLAS boundaries.
Propagate info on whether all child nodes are opaque or not through the
BVH to allow RADV to set these flags per box node.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32417>
This switches the code to the new slot offsets from ac_nir_prerast_out
instead of using a prefix bitmask over outputs_written.
The LDS layout no longer includes these:
- GS: output components that are not written by GS
- VS/TES+XFB: output components that are not written by XFB
- VS/TES+XFB: slots that are not written by XFB (this could be significant)
This is also a cleanup because it unduplicates the bitcounts.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
instead of computing it separately. This is better because
ac_nir_lower_ngg_gs knows the final LDS size anyway, and it will be
easier to modify the size calculation this way.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
instead of computing it separately. This is better because
ac_nir_lower_ngg_nogs knows the final LDS size anyway, and it will be
easier to modify the size calculation this way.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This is a prerequisite for NGG lowering passes to return LDS vertex and
scratch sizes, which will lead to further simplifications. That will
require calling gfx10_get_ngg_info after radv_postprocess_nir, which means
LDS offsets are unknown when the passes are called.
This makes the 2 values no longer compile-time constants.
A later commit will remove NGG_LDS_LAYOUT_SCRATCH_BASE (the passes will
determine it), so only NGG_LDS_LAYOUT_GS_OUT_VERTEX_BASE will come from
an SGPR, though that could be removed too (non-trivially) or handled as
a relocation.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
We need to gather outputs before lowering because lowering requires that we
know the LDS vertex stride, so that we can lower output stores to LDS
stores.
The pass will determine the LDS vertex stride, not drivers.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This will be used to reduce the NGG LDS size for uncompacted GS and XFB
outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
This enables emulating clip planes without ClipVertex via clip distances
(max 8) instead of the fixed-func hw (max 6 planes).
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
Streamout will require prerast info, which is gathered by
lower_ngg_gs_intrinsics.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
just code reordering (position exports should be at the end for perf)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
It has no effect, but the extra export instructions is unnecessary and
we can't gather the effective number of position exports from NIR if we
insert incorrect exports.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35351>
We simplify the implementation by assuming the worse case, copying
entire per-vertex regions if necessary.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
The formula uses scalar indices (4bytes), not slots (16bytes).
We also incorrectly passed a scalar (vertex case) & slot (mesh case)
offset in the push constants. Use slots instead so that the value is
smaller and we can pack more stuff into fs_msaa_flags.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
load intrinsics don't have range
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
Align register names to internal docs to avoid having to mentally remap
register names between the names we invented over the years and what
they are actually called.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35803>