ACO got a lot better at forming VOPD instructions, and testing
feedback seems to point in a slightly positive direction for this.
gfx12 will also start requiring wave32 for dynamic VGPR allocation at
some point.
Measurements on navi31:
Cyberpunk 2077:
Difference at 95.0% confidence
1.12333 +/- 0.42876
1.88216% +/- 0.718391%
(Student's t, pooled s = 0.189165)
Black Myth Wukong benchmark:
Difference at 95.0% confidence
4 +/- 1.30862
13.9535% +/- 4.56495%
(Student's t, pooled s = 0.57735)
Portal with RTX:
66.2ms->61.5ms (~7.64% improvement)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39275>
We just... didn't do this at all??? I have no idea how this didn't blow
up before, given that plenty of apps should generate a traversal shader
that spills (and thus has a large stack size), but it did finally blow
up in function-call related work.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580>
These clocks need to be the clocks at trace time. This shouldn't fix
anything given that RADV sets profile_peak when SQTT is enabled but
better to report it correctly anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39208>
This would always wait on the queue syncobj if there was any other
wait syncobj, but it should only wait after zero submit.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39193>
Just zero-initialize the layout struct to fix the following warning
because radv_use_bvh8() might return FALSE.
../src/amd/vulkan/radv_acceleration_structure.c: In function ‘radv_update_as_gfx12’:
../src/amd/vulkan/radv_acceleration_structure.c:873:70: warning: ‘layout.bounds_offsets’ may be used uninitialized [-Wmaybe-uninitialized]
873 | .bounds = state->build_info->scratchData.deviceAddress + layout.bounds_offsets,
| ~~~~~~^~~~~~~~~~~~~~~
../src/amd/vulkan/radv_acceleration_structure.c:866:33: note: ‘layout.bounds_offsets’ was declared here
866 | struct update_scratch_layout layout;
| ^~~~~~
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39228>
If the main CS is SDMA and the gang CS is ACE, this would emit a
SDMA_FENCE packet on ACE which just hangs.
Fixes: b1938901d0 ("radv: Use SDMA fence packet when flushing gang semaphores")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39211>
Unifies nir per instruction float control.
In the future this can be split into contract/reassoc/transform
like SPIR-V.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (except SPIR-V)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
It is more efficient to compute the child index of the current node
inside the parent node and write the bounds when available. The previous
code could load up to 16 AABBs to compute the new ones. The new code
also only needs 1/7 of the previously used scratch memory. The new code
seems to be around 30% faster (0.5ms) in GOTG on a 6700XT.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39139>
Implement a mitigation for VM faults caused by SMEM reading
out of bounds when using robust buffer access.
- Pad uniform and storage buffer allocations with a readonly VM page
- Clamp SMEM offsets that can potentially read past the next page
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
Implement a mitigation for VM faults caused by SMEM reading
from NULL descriptors.
In order to satisfy VKD3D-Proton's expectations on mutable
descriptors, we must do this in shader code, it is not
sufficient to use the address of a mapped BO when writing
null descriptors. It is not feasible to mitigate this
in VKD3D-Proton.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
Map the first page of the same BO as read-only after the BO itself
in order to pad each BO with an extra page. This doesn't require
us to allocate any memory.
This is going to be used for a HW bug mitigation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38769>
Much faster because CB is optimal with 2D swizzle modes. This isn't
applied for storage images because it depends on the access pattern,
and benchmark results are very different.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38084>