because the hw and LLVM only support subdword single-component SSBO loads,
and ac_nir_to_llvm splits multi-component loads because of that, which is
inefficient.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
This fixes broken subdword UBO loads with LLVM.
It's only needed for LLVM, but it's done for both LLVM and ACO because
the pass can be fully validated only with ACO and the Vulkan CTS right now.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19399>
Instead of having ac_set_reg_cu_en that sets the register, replace it with
ac_apply_cu_en that only returns the modified register value,
which allows a large simplification in both drivers because a lot of code
becomes duplicated after it's switched to ac_apply_cu_en.
RADV also didn't apply it to a few registers. Fixed.
This removes 82 lines of code in total.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
Instead, check if the cache is the meta shader cache. This catches the
shaders created by radv_create_radix_sort_u64().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21606>
After cross-checking with kernel and the old buffer copy code, it seems
that the size field should be size - 1 instead.
Fixes: 7b5ab48c40 ("radv: partial sdma support")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21585>
Now that the upload memory is tied to the shader itself, the trap handler
shader no longer needs an additional wrapper.
This is a cleanup to ease introduction of a new shader uploading code path.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21541>
The correct workaround is to bind an internal index buffer to handle
robustness2 correctly.
Fixes dEQP-VK.robustness.index_access.* in CTS 1.3.5.0 on NAVI10.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21471>
Fixes
dEQP-VK.draw.dynamic_rendering.complete_secondary_cmd_buff.multi_draw.mosaic.*
on VEGA10 (related to the use of HTILE).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21549>
RGP traces need a dump of shader code in order to display ISA and
instruction trace. Previously, this was read back from GPU at trace
creation time. However, for future changes that implements upload shader
to invisible VRAM, the upload destination will be a temporary staging
buffer and will be only accessible during shader creation.
To allow dumping in such cases, copy the shader code to a separate buffer
at creation time, if thread tracing is enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21513>
Since cull_mask is only one byte, we can trivially store it in the same
register as the flags. This leaves us with a 2% performance gain in
Quake II RTX:
Totals from 7 (14.00% of 50) affected shaders:
VGPRs: 720 -> 688 (-4.44%)
CodeSize: 213052 -> 212980 (-0.03%); split: -0.05%, +0.02%
MaxWaves: 67 -> 70 (+4.48%)
Instrs: 39429 -> 39394 (-0.09%); split: -0.15%, +0.06%
Latency: 1096258 -> 1096943 (+0.06%); split: -0.05%, +0.11%
InvThroughput: 230661 -> 222963 (-3.34%); split: -3.42%, +0.08%
VClause: 1208 -> 1206 (-0.17%); split: -0.25%, +0.08%
Copies: 5321 -> 5269 (-0.98%); split: -1.22%, +0.24%
Branches: 1903 -> 1902 (-0.05%)
PreVGPRs: 650 -> 645 (-0.77%)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21470>
This will emulate VGT_ESGS_RING_ITEMSIZE, which does the multiplication
for us. It's beneficial to stop setting VGT_ESGS_RING_ITEMSIZE to reduce
context rolls, and also the register will be removed in the future.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
For testing, the conformant behavior can be enabled by setting
conformant_trunc_coord to true manually and running this to enable
the conformant behavior in hw:
umr -w *.*.regTA_CNTL2 0x40000
The layer index rounding and TRUNC_COORD resetting workarounds can disabled
in the shader compiler.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>