This fixes a performance regression for games (eg. Youngblood) that
declare all images as concurrent. This is likely buggy for compute
queues but this just restores the previous behaviour for now.
Fixes: f4f096805b ("radv: fix TC-compat HTILE images with DST_OPTIMAL on the compute queue")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8351>
It wasn't expected to also enable inside render loops.
Fixes: 4bb92d9145 ("radv: enable TC-compat HTILE in GENERAL on GFX10+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8351>
This was missing, it can be enabled with RADV_PERFTEST=tccompatcmask.
Note that this feature is still experimental.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8350>
Compute the number of components of the destination vector from the
bitsize when eg. a 16-bit vec2 vertex fetches is splitted. This is
because the dst will be a v1, so the p_create_vector should be created
from two v2b fro both sizes to match.
This prevents a regression from the next change which will split
typed vertex buffer loads on GFX6 and GFX10+.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8363>
This assumption becomes incorrect with RADV_PERFTEST=cswave32.
Games include Detroit: Become Human and Doom Eternal.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7918>
GFX10+ supports compressed writes to HTILE, so it should just work
to skip decompressions when transitioning from/to GENERAL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
Otherwise it's useless because we are unlikely to perform a
fast depth stencil clear.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
This is probably rare but can happen if someone performs a depth-stencil
copy on the compute queue. This might work (untested by CTS) but it
looks more conservative to decompress before perfoming the operation.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
We can only use the entire HTILE buffer if TILE_STENCIL_DISABLE is
TRUE. On GFX8+, this is only true if the depth image has no stencil
and if it's not TC-compatible because of the ZRANGE_PRECISION issue.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
To make sure the stencil compare state is properly initialized and
cleared when the driver performs a fast depth clear.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
Not doing this for APUs because spilling is quite likely, due to
overall VRAM pressure.
Also adding a flag to disable for performance debugging.
Finally adds some memset for places where we depended on the memory
being initialized to zero, which we won't get with VRAM anymore.
(I think these places should stop depending on it since it hides
issues with executing the cmdbuffer multiple times, but this
preserves behavior)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7979>
Now that this function does not block RegisterFile entries anymore,
the temporary copy is only needed upon reaching the collect_vars call.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8261>
ACO attempts to store the output of an instruction in the same register
occupied by its operands where possible. Importantly this only works if
the operands are large enough to store the result register size. The code
failed to consider subdword operands when checking for this, causing
entire register slots to be freed up even though subdword parts were still
used.
In Mafia 3, this affected the following code:
v2b: %363:v[2][0:16], v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]
where v[2] is allocated to %116 even though its original lower 16 bits are
still used in the instruction after.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3717
Fixes: 031edbc4a5
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>
The write() to the communication pipe shared with check_output.py would block
for large test output streams since the pipe's consumer wouldn't be launched
until the write already completed.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>
This new policy parameter allows disabling the optimistic path of get_reg
(i.e. get_reg_simple) to improve test coverage of the pessimistic path
provided by get_reg_impl.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>