Avoiding NaNs should have the same effect but it's good practice to not
rely on float OPs for correctness.
Fixes: 95a89f7 ("radv: Report smaller bvh sizes when possible")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39640>
It is more efficient to compute the child index of the current node
inside the parent node and write the bounds when available. The previous
code could load up to 16 AABBs to compute the new ones. The new code
also only needs 1/7 of the previously used scratch memory. The new code
seems to be around 30% faster (0.5ms) in GOTG on a 6700XT.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39139>
This calculation needs to happen in the same loop as the
geometry/triangle id calculations in case the selected invocation is
before all invocations that were already selected.
Totals from 1269 (15.10% of 8406) affected BVHs:
compacted_size: 137581888 -> 137606464 (+0.02%); split: -0.08%, +0.10%
sah: 6496048424 -> 6496048450 (+0.00%); split: -0.00%, +0.00%
primitive_node_count: 604384 -> 604656 (+0.05%); split: -0.14%, +0.19%
Fixes: c18a7d0 ("radv: Emit compressed primitive nodes on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38462>
In updateable AS, we keep all nodes active even if they're
degenerate/NaN, because too many games ignore API rules about not
making inactive nodes active (and some vendor tips outright advise this
behavior). We also need to match this by keeping everything active in
the update side. The ALWAYS_ACTIVE macro has been long removed and
replaced by VK_BVH_BUILD_FLAG, too. Since updating only happens to
updateable AS, don't even check for the flag, just implement the
always-active handling.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38488>
The normal encode pass writes batches to a section in build scratch
memory. Those batches contain information about the internal node and
the primitive nodes. The encoder is split to avoid the register
pressure of the compressor and maximize occupancy.
The compressor works in two passes because one pass can not guarantee
that every primitive node (except) has at least two triangles. This
guarantee is used to advertise a smaller acceleration structure size to
the application.
During compression, every invocation processes at most two triangles.
Groups of 8 invocations are used to support the maximum triangle count
of 16 that the hardware supports.
The first step of compression is loading the triangle(s). Shared
vertices are deduplicated early to avoid doing it in the compression
loop. The compression loop tries to add triangles to a list of triangles
until the computed node size needed for storing the triangles reaches
the hardware node size. For this, each invocation first deduplicates
vertices with the triangles that have already been picked. It then
computes the node size of the picked triangles plus the candidate
triangles of the current invocation. The invocation that computed the
smallest size is added to the list.
Because it may not be possible to fit every triangle into the same node,
there can be multiple hardware nodes which are written in parallel for
optimal performance. If there are no nodes with only one triangle, all
nodes are written. If there is, compression of the batch is aborted and
the index of the batch is written to build scratch memory. The second
compression pass will repeat the steps above but only for those aborted
batches. The nodes with only one triangle can and are now merged.
It can not be determined during box node encode which triangles will be
compressed together so the encoder also has to fix up the parent box
node's child infos.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36965>
If there are no leaves, the root node bounds still span -inf/inf.
Making empty BLASs infinite-sized guarantees ray traversal needs to
enter the BLAS (and immediately exit because it's empty). Remove the
BLAS from the BVH entirely by marking its bounds as NaN. As a bonus,
this works around RADV encountering issues in Silent Hill 2 on RDNA4 due
to infinite-sized BVHs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37492>
Having a list of all enabled/used extensions in meson allows us to get
rid of a lot of boilerplate in every bvh build shader.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35326>
Emits two triangles per node whenever possible. The nir code will
revisit the triangle node to handle the second triangle only if both
triangles are interescted by the ray.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35734>
On Navi33, certain box sorting modes combined with infinity/-infinity in
the child AABBs cause image_bvh64_intersect_ray to return garbage node
pointers.
To avoid this, convert infinity to the maximum representable
floating-point value, which will still intersect with any non-inf ray.
Fixes consistent hangs in DOOM: The Dark Ages.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35254>
valid_child_count_minus_one is 15 for box nodes without child so every
child was considered valid which made the code read invalid data and use
that for addressing.
Fixes: 2d48b2c ("radv: Use subgroup OPs for BVH updates on GFX12")
Closes: #13217
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35119>
This patch changes the update code to launch 8 invocations for every
internal node. The internal nodes update their child leaf nodes using
the geometry index and primitive index stored inside the primitive node.
Processing 8 child nodes in parallel is faster than looping over them.
Moving to one dispatch that updates all nodes in one go lets us get rid
of atomics and will also enable updatable BVHs to use pair compression.
Improves Elden Ring (high settings, max RT settings, 1080p) by around
10%.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34601>
The advantage of using spec constants is that we do not have to include
multiple spirv binaries for multiple variants of a build stage.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34594>
For empty BVHs we shouldn't emit any leaf nodes, but there is one
invocation to encode the root node. Guard leaf node encoding so that
invocation doesn't try writing any leaves.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33985>
This custom builder implements fine-grained instance node bounds
calculation by looking at all AABBs at tree depth 2.
Shaves off 0.3ms in the start scene for Indiana Jones: The Great Circle
on Deck (roughly 29.1ms->28.7ms).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32797>