Fixes the following building error happening with clang:
FAILED: src/amd/vulkan/libvulkan_radeon.so.p/radv_pipeline_rt.c.o
...
../src/amd/vulkan/radv_pipeline_rt.c:934:38: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct nir_function raygen_stub = {};
^
1 error generated.
Fixes: 0a1911b2 ("radv,aco: Use function call structure for RT programs")
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39370>
If two devices/instances are created, the VMID reservation will just
fail. It seems fine as long as it's reserved before SPM is used.
Fixes: a7a4abc8d8 ("radv: Move VMID reservation to vkCreateDevice")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39311>
This is unused by any callers currently, but will be useful for nir
algebraic pattern testing, and as a way to turn our comments in
nir_opcodes.py into actual C code. For now, always returns false.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39076>
ACO got a lot better at forming VOPD instructions, and testing
feedback seems to point in a slightly positive direction for this.
gfx12 will also start requiring wave32 for dynamic VGPR allocation at
some point.
Measurements on navi31:
Cyberpunk 2077:
Difference at 95.0% confidence
1.12333 +/- 0.42876
1.88216% +/- 0.718391%
(Student's t, pooled s = 0.189165)
Black Myth Wukong benchmark:
Difference at 95.0% confidence
4 +/- 1.30862
13.9535% +/- 4.56495%
(Student's t, pooled s = 0.57735)
Portal with RTX:
66.2ms->61.5ms (~7.64% improvement)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39275>
We just... didn't do this at all??? I have no idea how this didn't blow
up before, given that plenty of apps should generate a traversal shader
that spills (and thus has a large stack size), but it did finally blow
up in function-call related work.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580>
These clocks need to be the clocks at trace time. This shouldn't fix
anything given that RADV sets profile_peak when SQTT is enabled but
better to report it correctly anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39208>
This would always wait on the queue syncobj if there was any other
wait syncobj, but it should only wait after zero submit.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39193>
Just zero-initialize the layout struct to fix the following warning
because radv_use_bvh8() might return FALSE.
../src/amd/vulkan/radv_acceleration_structure.c: In function ‘radv_update_as_gfx12’:
../src/amd/vulkan/radv_acceleration_structure.c:873:70: warning: ‘layout.bounds_offsets’ may be used uninitialized [-Wmaybe-uninitialized]
873 | .bounds = state->build_info->scratchData.deviceAddress + layout.bounds_offsets,
| ~~~~~~^~~~~~~~~~~~~~~
../src/amd/vulkan/radv_acceleration_structure.c:866:33: note: ‘layout.bounds_offsets’ was declared here
866 | struct update_scratch_layout layout;
| ^~~~~~
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39228>
If the main CS is SDMA and the gang CS is ACE, this would emit a
SDMA_FENCE packet on ACE which just hangs.
Fixes: b1938901d0 ("radv: Use SDMA fence packet when flushing gang semaphores")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39211>
Unifies nir per instruction float control.
In the future this can be split into contract/reassoc/transform
like SPIR-V.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (except SPIR-V)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
It is more efficient to compute the child index of the current node
inside the parent node and write the bounds when available. The previous
code could load up to 16 AABBs to compute the new ones. The new code
also only needs 1/7 of the previously used scratch memory. The new code
seems to be around 30% faster (0.5ms) in GOTG on a 6700XT.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39139>