We never took the time to actually test this, but it works fine.
Improves performance on Navi 10 in the following test cases:
Baldur's Gate 3 Vulkan: up to 10%
Witcher 3 D3D11: around 4%
Granite primitive stress test: 107%
FSR2 sample app: 57%
Notes:
NGG is still disabled on Navi 14.
Not tested on Navi 12.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31971>
Helps performance in Baldur's Gate 3 on Navi 10
when NGG culling is enabled.
Also fix the description of the RADV_PERFTEST=nggc env var.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31971>
\[WHY\]
For color fill only case we see performance on Vpe1.1 are not doubled due
to CD are all 0, no odd CD
\[HOW\]
1. Dummy stream dst rect should be in the middle of target rect so the
two (dummy seg + bg only seg) are balanced, instead of target at upper
left corner which makes it imbalance
2. BG gap generation should consider more for collaboration mode
num_multiple
3. When pure bg case, skip dummy stream handling and go ahead do BG gap
generation
4. Update memory requirement for the new pure BG case flow to avoid run out of embedded buffer
4. Additional -- fix the random Collaborate data generation bug (benign)
\[TESTING\]
Vpelibtest app + nv12torgb case with debug flag bgcolorfill set on in
vpelibtestapp
Media player with/without bgcolorfillonly flag
Teams
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Navid Assadian <Navid.Assadian@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Signed-off-by: Tomson Chang <tomson.chang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
\[WHY\]
1.Remove TODO comments that don't need action item
2.Delete the legacy command number check as it is now using a vector (i.e. without hard limit)
\[HOW\]
Remove TODO comments and delete the legacy command number check
Signed off by <tvisan@amd.com>
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
According to PAL, an image with DCC/HTILE and mipmaps isn't coherent
with L2 when the mip level is in the metadata mip-tail region.
This fix isn't super optimal because the driver should rely on the
subresource range to determine if the mip level is in the mip-tail,
but it's easier to backport. Upcoming commits will optimize that.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11939
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31920>
The compute stage has two EXCP_EN fields and the memory violation bit
is in EXCP_EN_MSB. Confirmed by writing a small test on GFX8.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31902>
This fixes a CTS hang on Hawaii.
We previously only did a CB/DB flush,
but that doesn't include a L2 cache flush.
Also fix the comment that said this is for GFX9+.
Fixes: 7c62f6fa01
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31906>
It looks like the fixed-func hardware is very slow to cull primitives
with zero pos.w but shader based culling helps a lot.
This fixes a massive performance gap with the FSR2 demo compared to
AMDGPU-PRO, +228% on RDNA2.
Based on my investigation, AMDGPU-PRO seems to always cull these
primitives. Note that disabling NGG culling with AMDGPU-PRO reports the
same performance as RADV without that fix. Also note that the FSR2
sample doesn't specify any cull mode (ie. VK_CULL_MODE_NONE is used),
so this is the only reason PRO was culling more than RADV.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7260
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31891>
This used to abort (see the previous commit) when the hardware wasn't
able to sample all SPM counters because the BO was too small. The SPM
BO can now be resized like the SQTT BO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31883>
Computing 'htile_size/meta_size' is allowed for RADEON_SURF_MODE_1D when
RADEON_SURF_TC_COMPATIBLE_HTILE isn't set.
Lacking of computing causes performance degradation in some scenarios.
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31617>