Not doing this for APUs because spilling is quite likely, due to
overall VRAM pressure.
Also adding a flag to disable for performance debugging.
Finally adds some memset for places where we depended on the memory
being initialized to zero, which we won't get with VRAM anymore.
(I think these places should stop depending on it since it hides
issues with executing the cmdbuffer multiple times, but this
preserves behavior)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7979>
Now that this function does not block RegisterFile entries anymore,
the temporary copy is only needed upon reaching the collect_vars call.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8261>
ACO attempts to store the output of an instruction in the same register
occupied by its operands where possible. Importantly this only works if
the operands are large enough to store the result register size. The code
failed to consider subdword operands when checking for this, causing
entire register slots to be freed up even though subdword parts were still
used.
In Mafia 3, this affected the following code:
v2b: %363:v[2][0:16], v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]
where v[2] is allocated to %116 even though its original lower 16 bits are
still used in the instruction after.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3717
Fixes: 031edbc4a5
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>
The write() to the communication pipe shared with check_output.py would block
for large test output streams since the pipe's consumer wouldn't be launched
until the write already completed.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>
This new policy parameter allows disabling the optimistic path of get_reg
(i.e. get_reg_simple) to improve test coverage of the pessimistic path
provided by get_reg_impl.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7461>
Fixes dEQP-VK.api.copy_and_blit.*.4_bit. I think the MSAA2x and
MSAA8x just passed by luck.
Fixes: 7b21ce401f ("radv: disable FMASK compression when drawing with GENERAL layout")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7915>
It was disabled because some CTS failed but they pass now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8189>
This can result in meaningful compression changes so we shouldn't skip.
Fixes: 66131ceb8b "radv: Pass through render loop detection to internal layout decisions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7004>
All depth/stencil formats are incompatible each others, so the
mutable bit and the image format list can be ignored.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8126>
Depth/stencil resolves are only allowed inside a subpass, which means
the offset is always 0 and the draw/dispatch covers the whole image.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8127>
It should actually be 4 because the maximum fragment size supported
by the hardware is 2x2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8100>
LLVM (like NIR) requires phi instructions to be before any other
instructions in the block. ac_branch_exited() can insert non-phi
instructions before visit_block() adds phis, so visit_block() should add
phi instructions before the non-phi instructions ac_branch_exited()
inserts.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: aa757f4f8c ("ac/llvm: fix demote inside conditional branches")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8054>
This was disabled due to some depth/stencil resolve CTS failures
which are now fixed.
I figured that disabling TC-compat HTILE for D32_SFLOAT+MSAA reduced
performance in Control by -11% on Vega10. In fact, the game only uses
D32_SFLOAT for depth rendering.
This gives a huge boost in Control on Navi10 (eg. +17% in MSAA4x).
Note that the game is still slower than PRO without MSAA on Navi10,
but as fast (or even a bit faster) on Vega10.
I think TC-compat HILE could also be enabled for D32_SFLOAT_S8_UINT
but it needs more testing first.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8143>
Since we're requiring the branch condition to be in WQM, we have to ensure
that the block is in the worklist.
Fixes Trials Fusion hang at 4K and High settings.
fossil-db (Sienna):
Totals from 216 (0.15% of 139391) affected shaders:
SGPRs: 13392 -> 13360 (-0.24%)
CodeSize: 1321184 -> 1318592 (-0.20%)
Instrs: 255310 -> 254662 (-0.25%)
Cycles: 2178360 -> 2174652 (-0.17%)
Affected fossils in fossil-db are dirt4, nier and youngblood.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3863
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8145>
I thought this was a bug in CTS but the Vulkan spec says:
"VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT specifies write access
to a color, resolve, or depth/stencil resolve attachment during
a render pass or via certain subpass load and store operations."
So, VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT is used to synchronize
depth/stencil resolve attachments. Yes, it's counterintuitive.
This can't actually be fixed properly for now because RADV performs
the end subpass barrier *before* resolve attachments instead of after.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8138>
In case one operand was renamed and another operand came
from an incomplete phi, it could happen, that the original
name was not restored.
This has no impact on the code, but ensures correct SSA
is maintained during RA.
Cc: mesa-stable
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8109>
The small DCE of the spiller only removes the original instructions
of rematerialized variables in case they are unused. If a variable
has been renamed, it cannot match any original instruction anymore.
Thus, the lookup is then unnecessary and can be omitted.
No fossil-db changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8055>
Without it, FragCoord.z will have the value of one of the fine pixels
instead of the center of the coarse pixel.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7837>