Commit graph

16124 commits

Author SHA1 Message Date
Timur Kristóf
766617e8da radv: Enable NGG culling by default on GFX10.
We never took the time to actually test this, but it works fine.
Improves performance on Navi 10 in the following test cases:

Baldur's Gate 3 Vulkan: up to 10%
Witcher 3 D3D11: around 4%
Granite primitive stress test: 107%
FSR2 sample app: 57%

Notes:
NGG is still disabled on Navi 14.
Not tested on Navi 12.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31971>
2024-11-06 03:16:54 +00:00
Timur Kristóf
6bf19b2d70 radv: Increase NGG culling PS param limit to 12 on GFX10.
Helps performance in Baldur's Gate 3 on Navi 10
when NGG culling is enabled.

Also fix the description of the RADV_PERFTEST=nggc env var.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31971>
2024-11-06 03:16:53 +00:00
Evan
c3c80491f9 amd/vpelib: Input Format Adjustment
Reviewed-by: Jiali Zhao <Jiali.Zhao@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Signed-off-by: Evan <evan.damphousse@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
2024-11-06 02:19:39 +00:00
Chang, Tomson
d1b790c028 amd/vpelib: Fix color fill performance issue on VPE1.1 (#419)
\[WHY\]
For color fill only case we see performance on Vpe1.1 are not doubled due
to CD are all 0, no odd CD

\[HOW\]
1. Dummy stream dst rect should be in the middle of target rect so the
two (dummy seg + bg only seg) are balanced, instead of target at upper
left corner which makes it imbalance
2. BG gap generation should consider more for collaboration mode
num_multiple
3. When pure bg case, skip dummy stream handling and go ahead do BG gap
generation
4. Update memory requirement for the new pure BG case flow to avoid run out of embedded buffer
4. Additional -- fix the random Collaborate data generation bug (benign)

\[TESTING\]
Vpelibtest app + nv12torgb case with debug flag bgcolorfill set on in
vpelibtestapp
Media player with/without bgcolorfillonly flag
Teams

Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Navid Assadian <Navid.Assadian@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Signed-off-by: Tomson Chang <tomson.chang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
2024-11-06 02:19:39 +00:00
Visan, Tiberiu
4661bf3659 amd/vpelib: Remove TODO comments and legacy check(#421)
\[WHY\]
1.Remove TODO comments that don't need action item
2.Delete the legacy command number check as it is now using a vector (i.e. without hard limit)

\[HOW\]
Remove TODO comments and delete the legacy command number check

Signed off by <tvisan@amd.com>

Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
2024-11-06 02:19:39 +00:00
Chenyu Chen
e0754a6dc7 amd/vpelib: Remove unused define macro
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Signed-off-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
2024-11-06 02:19:39 +00:00
Samuel Pitoiset
64774f9c19 radv: cleanup tools related resources when destroying logical device
This was missing.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31986>
2024-11-05 15:31:00 +00:00
Marek Olšák
5882b5b93b amd/ci: adjust stoney traces checksums
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31968>
2024-11-05 14:13:40 +00:00
Samuel Pitoiset
32a537b25b aco: use inlined constant offsets for storing SGPRs in the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31976>
2024-11-05 11:55:24 +00:00
Samuel Pitoiset
9bcf17ef5a aco: add support for the trap handler shader on GFX11
This has been verified on navi31.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
6d5a2ae928 aco: clear the current wave exception in the trap handler
This is required to re-enable VALU instructions in this wave, only
float exception seem to be affected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
e85fc0f869 aco: fix validation for VOP1 instructions without any dest/src
Like v_clrexcp.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
81f4670ed6 radv,aco: dump all SGPRS from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
45d56d9395 radv: set missing shader info values for the trap handler
This fixes an assert in radv_precompute_registers_pgm() on GFX11
because it was considered a vertex shader.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31962>
2024-11-05 07:11:30 +00:00
Marek Olšák
755fb7a262 amd: move Tonga and Iceland TC-compat HTILE workarounds to ac_gpu_info.c
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31910>
2024-11-04 19:45:54 +00:00
Georg Lehmann
a7f6294f90 radv: use nir_opt_frag_coord_to_pixel_coord
Foz-DB Navi21:
Totals from 1648 (2.08% of 79395) affected shaders:
MaxWaves: 44918 -> 44950 (+0.07%); split: +0.09%, -0.02%
Instrs: 1004193 -> 1001179 (-0.30%); split: -0.33%, +0.03%
CodeSize: 5486412 -> 5486592 (+0.00%); split: -0.08%, +0.09%
VGPRs: 56664 -> 56552 (-0.20%); split: -0.93%, +0.73%
Latency: 15430894 -> 15435320 (+0.03%); split: -0.12%, +0.15%
InvThroughput: 3097789 -> 3092861 (-0.16%); split: -0.20%, +0.04%
VClause: 18757 -> 18793 (+0.19%); split: -0.13%, +0.32%
SClause: 34475 -> 34495 (+0.06%); split: -0.11%, +0.17%
Copies: 66195 -> 66150 (-0.07%); split: -0.88%, +0.81%
Branches: 23035 -> 23033 (-0.01%)
PreVGPRs: 42235 -> 41724 (-1.21%); split: -1.32%, +0.11%
VALU: 709730 -> 706662 (-0.43%); split: -0.47%, +0.04%
SALU: 111731 -> 111722 (-0.01%); split: -0.02%, +0.01%
VMEM: 25988 -> 25987 (-0.00%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31864>
2024-11-04 12:34:31 +00:00
Georg Lehmann
a58d2b59e9 aco: implement load_pixel_coord
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31864>
2024-11-04 12:34:30 +00:00
Georg Lehmann
42d5cb62bb ac/llvm: implement load_pixel_coord
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31864>
2024-11-04 12:34:30 +00:00
Georg Lehmann
a2a9e93e72 radv: add support for load_pixel_coord
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31864>
2024-11-04 12:34:30 +00:00
Samuel Pitoiset
1fa0fe1e0c aco: add support for the trap handler shader on GFX9-GFX10.3
This has been tested on navi21.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31926>
2024-11-04 10:48:52 +00:00
Samuel Pitoiset
281eb14df8 aco: fix reading registers from the trap handler shader
It should read 32-bit values, otherwise some MSB are 0 and it's missing
some information.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31926>
2024-11-04 10:48:52 +00:00
Samuel Pitoiset
f7636b611a radv: add a struct that describes the trap handler layout
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31934>
2024-11-01 15:40:25 +00:00
Samuel Pitoiset
49682fc0cb radv,aco: save SQ_WAVE_GPR_ALLOC from the trap handler
This would be used to dump SGPRs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31934>
2024-11-01 15:40:25 +00:00
Samuel Pitoiset
31fc3199dd radv: fix dumping the faulty shader detected by the trap handler on GFX9+
The most significant bits need to be cleared.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31925>
2024-11-01 15:01:35 +00:00
Samuel Pitoiset
7b4da7f736 radv: only emit the TBA/TMA registers on GFX8
On GFX9+, these registers are privilegied and the kernel needs to
configure them.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31925>
2024-11-01 15:01:35 +00:00
Samuel Pitoiset
930395c5e4 radv: check for has_trap_handler_support instead of asserting
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31925>
2024-11-01 15:01:35 +00:00
Samuel Pitoiset
e27ba67d33 ac: add ac_gpu_info::has_trap_handler_support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31925>
2024-11-01 15:01:35 +00:00
Samuel Pitoiset
b23cc8c1d3 radv: add missing L2 non-coherent image case for mipmaps with DCC/HTILE on GFX11
According to PAL, an image with DCC/HTILE and mipmaps isn't coherent
with L2 when the mip level is in the metadata mip-tail region.

This fix isn't super optimal because the driver should rely on the
subresource range to determine if the mip level is in the mip-tail,
but it's easier to backport. Upcoming commits will optimize that.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11939
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31920>
2024-11-01 14:36:55 +00:00
David Rosca
c9ade8c3b5 radeonsi/vcn: Enable VCN4 AV1 encode WA
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31889>
2024-11-01 14:05:04 +00:00
Samuel Pitoiset
01f329ec82 radv/ci: skip dEQP-VK.api.command_buffers.many_indirect_disps_on_secondary
It can also hang randomly on VanGogh, let's skip it by default for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31922>
2024-10-31 11:44:12 +00:00
Samuel Pitoiset
77e59eefc1 radv: add an option to configure the trap handler exceptions
This introduces RADV_TRAP_HANDLER_EXCP to configure the various
shader exceptions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31902>
2024-10-31 06:58:15 +00:00
Samuel Pitoiset
6b5a0f57ba radv: fix configuring the memory violation exception for the compute stage
The compute stage has two EXCP_EN fields and the memory violation bit
is in EXCP_EN_MSB. Confirmed by writing a small test on GFX8.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31902>
2024-10-31 06:58:14 +00:00
Timur Kristóf
96b95c8427 radv: Flush L2 cache for non-L2-coherent images in EndCommandBuffer.
This fixes a CTS hang on Hawaii.

We previously only did a CB/DB flush,
but that doesn't include a L2 cache flush.
Also fix the comment that said this is for GFX9+.

Fixes: 7c62f6fa01
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31906>
2024-10-30 17:46:50 +00:00
Samuel Pitoiset
7015e22cb6 ac/nir: cull triangles/lines when all W positions are zero/NaN
It looks like the fixed-func hardware is very slow to cull primitives
with zero pos.w but shader based culling helps a lot.

This fixes a massive performance gap with the FSR2 demo compared to
AMDGPU-PRO, +228% on RDNA2.

Based on my investigation, AMDGPU-PRO seems to always cull these
primitives. Note that disabling NGG culling with AMDGPU-PRO reports the
same performance as RADV without that fix. Also note that the FSR2
sample doesn't specify any cull mode (ie. VK_CULL_MODE_NONE is used),
so this is the only reason PRO was culling more than RADV.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7260
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31891>
2024-10-30 17:09:37 +00:00
Samuel Pitoiset
fc0545e6a7 radv: fix wrong index in radv_skip_graphics_pipeline_compile()
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12089
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31901>
2024-10-30 11:25:59 +00:00
Daniel Schürmann
62715984f8 aco/README: add descriptions of recently added passes
... and less recent ones.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31888>
2024-10-30 09:23:54 +00:00
Daniel Schürmann
21ceeb22ed aco: move jump threading optimization into separate pass
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31888>
2024-10-30 09:23:54 +00:00
Daniel Schürmann
87a3c08df1 aco/ssa_elimination: remove some redundant checks during jump threading
Since phis got already lowered to parallelcopies by this point,
there is no need to cross-check.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31888>
2024-10-30 09:23:54 +00:00
Daniel Schürmann
a6c38f706d aco/ssa_elimination: perform jump threading after parallelcopy insertion
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31888>
2024-10-30 09:23:54 +00:00
Samuel Pitoiset
4459a1d210 radv: resize the SPM bo when it's too small
This used to abort (see the previous commit) when the hardware wasn't
able to sample all SPM counters because the BO was too small. The SPM
BO can now be resized like the SQTT BO.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31883>
2024-10-29 18:33:17 +00:00
Samuel Pitoiset
e14511f77d ac/spm: do not abort when the SPM BO is too small
It needs to be resized instead, like the SQTT BO.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31883>
2024-10-29 18:33:17 +00:00
Marek Olšák
4f096b994d ac/nir,radeonsi: use load_cull_line_viewport_xy_scale_and_offset_amd
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
2024-10-29 16:47:44 +00:00
Marek Olšák
0f39d44f1b ac/nir,radeonsi: use load_cull_small_line_precision_amd
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
2024-10-29 16:47:44 +00:00
Marek Olšák
10c6f87adb ac/nir,radeonsi: use load_cull_small_lines_enabled_amd
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
2024-10-29 16:47:44 +00:00
Marek Olšák
ee452129c6 nir: add cull_triangles_, cull_lines_ prefixes to viewport_xy_scale_and_offset
for radeonsi

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
2024-10-29 16:47:44 +00:00
Marek Olšák
2227f5be9d nir: rename load_cull_small_primitive_precision -> triangle, add line_precision
for radeonsi

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
2024-10-29 16:47:44 +00:00
Marek Olšák
0914e0d02f nir: rename load_cull_small_primitives -> triangles, add load_cull_small_lines
for radeonsi

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
2024-10-29 16:47:44 +00:00
Lu Yao
0442a6c292 ac/radeonsi: compute htile for tile mode RADEON_SURF_MODE_1D on GFX6-8
Computing 'htile_size/meta_size' is allowed for RADEON_SURF_MODE_1D when
RADEON_SURF_TC_COMPATIBLE_HTILE isn't set.
Lacking of computing causes performance degradation in some scenarios.

Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31617>
2024-10-29 16:23:51 +00:00
Georg Lehmann
938f5ec7ce radv: use nir_opt_fragdepth
Cyberpunk 2077 writes unmodified depth.

Foz-DB Navi21:
Totals from 28 (0.04% of 79395) affected shaders:
Instrs: 6484 -> 6448 (-0.56%)
CodeSize: 36016 -> 35784 (-0.64%)
Latency: 58517 -> 58400 (-0.20%)
InvThroughput: 7719 -> 7717 (-0.03%)
Branches: 129 -> 119 (-7.75%)
PreVGPRs: 394 -> 372 (-5.58%)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31874>
2024-10-29 15:15:24 +00:00
Georg Lehmann
695d2414cd nir,radv: optimize shared atomic offsets
Foz-DB Navi21:
Totals from 87 (0.11% of 79395) affected shaders:
Instrs: 140877 -> 140873 (-0.00%)
CodeSize: 747760 -> 747164 (-0.08%); split: -0.09%, +0.01%
Latency: 4528171 -> 4528162 (-0.00%)
InvThroughput: 826358 -> 826349 (-0.00%)
Copies: 10888 -> 10884 (-0.04%)
VALU: 84634 -> 84630 (-0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31080>
2024-10-29 09:31:08 +00:00