This function calculates two things at once:
1. The total number of vertices emitted by the threadgroup.
2. Exclusive scan of emitted vertex count accross the threadgroup.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
For NGG GS, we need to store the following in LDS:
1. The ESGS ring, similarly to legacy ESGS.
2. Emitted vertices from the GS threads.
3. Temporary space used by the workgroup scan.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Make it possible for ACO to recognize when to use HW NGG GS.
Also add a few notes about the various GS stages in the comments.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
NGG GS need to use the same instructions to export vertex
attributes at the end.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Make the NGG VS/TES code easier to follow, give better names to
some functions and make ngg_nogs_early_prim_export a variable.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Use lshl_or instead of lshl_add, which makes it more robust in
handling -1 and -2 indices which will now just become null
exports, which is what we want.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Previously, this function inferred the vertex and primitive counts
from the gs_tg_info shader argument, but in case of NGG GS, it will
need to be calculated in runtime.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Will be useful for NGG GS and probably testing. The helpers take care of
divergence but not creating correct phis.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
This makes it easier for ACO to implement NGG GS:
1. No need to keep track of vertex and primitive counts.
2. No need to discard incomplete primitives.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
After each end_primitive and at the end of the shader before emitting
set_vertex_and_primitive_count, we check if the primitive that is being
emitted has enough vertices or not, and we adjust the vertex and
primitive counters accordingly.
As a result, if the backend uses this option, the backend compiler
will not have to worry about discarding the unneeded vertices
and primitives.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Add an option to nir_lower_gs_intrinsics so that it can also track
the number of emitted vertices per primitive, not just the total
vertex count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Add an option to nir_lower_gs_intrinsics which tells it to track
the number of emitted primitives, not just vertices. Additionally,
also make it per-stream.
Also rename the set_vertex_count intrinsic to
set_vertex_and_primitive_count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Based on the OpenBSD variant.
The only difference between those two system is the sysctl mib.
Signed-off-by: Emmanuel Vadot <manu@FreeBSD.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6995>
To match ACO.
Totals from 268 (0.20% of 136420) affected shaders:
CodeSize: 1214060 -> 1214096 (+0.00%); split: -0.05%, +0.06%
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6938>
mpTrackMemAccessFuncTy is not used anywhere.
Fix defect reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member mpTrackMemAccessFuncTy is not
initialized in this constructor nor in any functions that it calls.
Suggested-by: Jan Zielinski <jan.zielinski@intel.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6930>
GCC and Clang support --std and -std options but Intel C++
Compiler only supports -std.
icpc: command line warning #10159: invalid argument for option '--std'
Fixes: 8a05d6ffc6 ("driconf: Make the driver's declarations be structs instead of XML.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7020>
And document where to find information on qcom gralloc's private handle
layout. I chose not to #include the gralloc_priv because it seems that
there's not much we need yet, and I'm hoping we can avoid the build-time
dependency on the specific platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7015>
This reverts commit bcfec61d1e.
The previous patch fixed the underlying issue that the above commit was
actually working around. It turns out that the previously observed
performance regression was due to invalid aux-map entries for
multi-layer HiZ+CCS buffers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
Fixes rendering corruption in the shadowmappingcascade Sascha Willems
Vulkan demo. To see the corruption, I adjusted the demo options as
follows:
1. Enable "Display depth map"
2. Set "Split lambda" to 0.100
3. Make "Cascade" non-zero.
Fixes: 80ffbe915f ("anv: Add support for HiZ+CCS")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
Shaders may not use a particular region of a UBO in a given shader (think
UBOs shared between stages, or between shaders), and by just always
extending the existing range for a given UBO, we'd waste bandwidth
uploading it, and also waste our precious const space in storing the
unused data.
Instead, only upload exactly the ranges we can use, and merge ranges when
they're neighbors. We may end up with more upload packets, but the
bandwidth savings is surely going to be worth it (and if find we want a
distance threshold for merging with nearby uploads, that would be easy to
add).
total instructions in shared programs: 9266114 -> 9255092 (-0.12%)
total full in shared programs: 343162 -> 341709 (-0.42%)
total constlen in shared programs: 1454368 -> 1275236 (-12.32%)
total cat6 in shared programs: 93073 -> 82589 (-11.26%)
total (ss) in shared programs: 212402 -> 206404 (-2.82%)
total (sy) in shared programs: 122905 -> 114007 (-7.24%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7036>
rehashing a populated hash table is very expensive, so for the case where
the maximum/likely table size is already known, this function allows for
pre-sizing the table to avoid ever needing a rehash
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7037>
To get more consistent performance and results, use the performance
devfreq governor and disable PM runtime.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7011>