Since cull_mask is only one byte, we can trivially store it in the same
register as the flags. This leaves us with a 2% performance gain in
Quake II RTX:
Totals from 7 (14.00% of 50) affected shaders:
VGPRs: 720 -> 688 (-4.44%)
CodeSize: 213052 -> 212980 (-0.03%); split: -0.05%, +0.02%
MaxWaves: 67 -> 70 (+4.48%)
Instrs: 39429 -> 39394 (-0.09%); split: -0.15%, +0.06%
Latency: 1096258 -> 1096943 (+0.06%); split: -0.05%, +0.11%
InvThroughput: 230661 -> 222963 (-3.34%); split: -3.42%, +0.08%
VClause: 1208 -> 1206 (-0.17%); split: -0.25%, +0.08%
Copies: 5321 -> 5269 (-0.98%); split: -1.22%, +0.24%
Branches: 1903 -> 1902 (-0.05%)
PreVGPRs: 650 -> 645 (-0.77%)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21470>
This will emulate VGT_ESGS_RING_ITEMSIZE, which does the multiplication
for us. It's beneficial to stop setting VGT_ESGS_RING_ITEMSIZE to reduce
context rolls, and also the register will be removed in the future.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
For testing, the conformant behavior can be enabled by setting
conformant_trunc_coord to true manually and running this to enable
the conformant behavior in hw:
umr -w *.*.regTA_CNTL2 0x40000
The layer index rounding and TRUNC_COORD resetting workarounds can disabled
in the shader compiler.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21525>
On GFX10.3, all auxiliary position exports are optimized, so set it
for clip/cull distances. Both RadeonSI and llpc set it too.
Suggested by Marek.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21439>
This avoids waiting for instance_data which can improve performance:
vk_ray_tracing_ao_KHR_app: 0.2% (The TLAS has 2 instances)
Quake II RTX: 1%
Control: 1%
We also have to shuffle around some code to avoid increasing VGPR usage.
That leaves us with the following stats:
Quake II RTX:
Totals from 7 (14.29% of 49) affected shaders:
CodeSize: 165612 -> 165716 (+0.06%)
Instrs: 31446 -> 31460 (+0.04%)
Latency: 596709 -> 554292 (-7.11%)
InvThroughput: 121998 -> 113327 (-7.11%)
VClause: 596 -> 587 (-1.51%)
Copies: 4664 -> 4646 (-0.39%)
PreVGPRs: 620 -> 639 (+3.06%)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21421>
For weird reasons, the hardware doesn't return any data for other SEs.
RadeonSI is also affected by the same issue, enable only SE0 for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20338>
This is to make dEQP-VK.api.buffer.basic.size_max_uint64 pass on
android.
The test creates a buffer of size UINT64_MAX and makes sure the memory
requirement for the buffer is sane. It fails because our memory
requirement is "align64(UINT64_MAX, 16)" which is 0 after overflow.
The test checks maintenance4's maxBufferSize and is skipped normally.
But the extension can be disabled on an android build.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21346>
This is a small optimization for fragment shaders that only write
depth/stencil/sample mask without any color outputs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21341>
The blend state is emitted from the command buffer when the FS uses
an epilog (either compiled from a lib with GPL or compiled on-demand).
This shouldn't change anything but it will allow to disable using a
PS epilog when the fragment shader doesn't write any color outputs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21341>
AMDGPU pstate is per context but if there is multiple AMDGPU contexts
in flight, the kernel can return -EBUSY.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21222>
This change is wrong for two reasons:
- it hangs most of the time maybe, because changing PSTATE when the
application is running is broken somehow
- it increases the time between triggering and generating the capture
considerably, because there is a delay for changing PSTATE
This restores previous logic where PSTATE is set to profile_peak at
logical device creation. Though, it also re-introduces an issue with
multiple logical devices (kernel returns -EBUSY) but this will be
fixed in the next commit.
This fixes GPU hangs when trying to record RGP captures on my NAVI21.
Note that profile_peak is only required for some RDNA2 chips (including
VanGogh).
Cc: mesa-stable
This reverts commit 923a864d94.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21222>
Using 8191.875 seems to big for the hardware to correctly render wide
rectangular lines. This can also be reproduced with AMDVLK by forcing
rectangularLines = True, and fixed by reducing the maximum size as well.
Other drivers seem to expose that value.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21287>