fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 05:08:06 +02:00

Author	SHA1	Message	Date
David Rosca	c9ade8c3b5	radeonsi/vcn: Enable VCN4 AV1 encode WA Cc: mesa-stable Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31889>	2024-11-01 14:05:04 +00:00
Samuel Pitoiset	7015e22cb6	ac/nir: cull triangles/lines when all W positions are zero/NaN It looks like the fixed-func hardware is very slow to cull primitives with zero pos.w but shader based culling helps a lot. This fixes a massive performance gap with the FSR2 demo compared to AMDGPU-PRO, +228% on RDNA2. Based on my investigation, AMDGPU-PRO seems to always cull these primitives. Note that disabling NGG culling with AMDGPU-PRO reports the same performance as RADV without that fix. Also note that the FSR2 sample doesn't specify any cull mode (ie. VK_CULL_MODE_NONE is used), so this is the only reason PRO was culling more than RADV. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7260 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31891>	2024-10-30 17:09:37 +00:00
Samuel Pitoiset	e14511f77d	ac/spm: do not abort when the SPM BO is too small It needs to be resized instead, like the SQTT BO. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31883>	2024-10-29 18:33:17 +00:00
Marek Olšák	4f096b994d	ac/nir,radeonsi: use load_cull_line_viewport_xy_scale_and_offset_amd Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	0f39d44f1b	ac/nir,radeonsi: use load_cull_small_line_precision_amd Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	10c6f87adb	ac/nir,radeonsi: use load_cull_small_lines_enabled_amd Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	ee452129c6	nir: add cull_triangles_, cull_lines_ prefixes to viewport_xy_scale_and_offset for radeonsi Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	2227f5be9d	nir: rename load_cull_small_primitive_precision -> triangle, add line_precision for radeonsi Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Marek Olšák	0914e0d02f	nir: rename load_cull_small_primitives -> triangles, add load_cull_small_lines for radeonsi Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>	2024-10-29 16:47:44 +00:00
Lu Yao	0442a6c292	ac/radeonsi: compute htile for tile mode RADEON_SURF_MODE_1D on GFX6-8 Computing 'htile_size/meta_size' is allowed for RADEON_SURF_MODE_1D when RADEON_SURF_TC_COMPATIBLE_HTILE isn't set. Lacking of computing causes performance degradation in some scenarios. Fixes: `d4d9ec55c5` ("radeonsi: implement TC-compatible HTILE") Signed-off-by: Lu Yao <yaolu@kylinos.cn> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31617>	2024-10-29 16:23:51 +00:00
Samuel Pitoiset	f7652de1f1	Revert "ac/surface: add RADEON_SURF_VIEW_3D_AS_2D_ARRAY for GFX9+" This reverts commit `dc5ef90547`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31869>	2024-10-28 12:47:38 +00:00
Samuel Pitoiset	aa19bf3d93	amd/descriptors: set fmask_tile_swizzle for TC-compat CMASK images on GFX8 This is required. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31797>	2024-10-28 08:21:12 +01:00
Samuel Pitoiset	927a17f30a	amd: do not emit PA_SU_PRIM_FILTER_CNTL in the common GFX preamble RADV needs to adjust this register for user sample locations because it seems possible to have a sample on the -8 coordinate. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31815>	2024-10-25 07:41:22 +00:00
Daniel Schürmann	87cb42f953	treewide: don't lower to LCSSA before calling nir_divergence_analysis() Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	8d1abd4996	treewide: use nir_src_is_divergent() rather than checking the divergence of the SSA Without LCSSA, divergence between src and def might differ. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Daniel Schürmann	c8348139fd	nir: change signature of nir_src_is_divergent() Now, it takes nir_src * instead of nir_src. Also move the implementation to nir_divergence_analysis.c. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Marek Olšák	45d8cd037a	ac/nir: rewrite ac_nir_lower_ps epilog to fix dual src blending with mono PS Unigine Heaven with AMD_DEBUG=mono has incorrect rendering on gfx11 because it doesn't set nir_io_semantics::dual_source_blend_index for the second output, resulting in garbage asm. Instead of trying to find out what's wrong, I decided to rewrite this to make it the same as the LLVM IR path. It simplifies the code and fixes Unigine Heaven with AMD_DEBUG=mono. Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31669>	2024-10-21 12:06:14 +00:00
Pierre-Eric Pelloux-Prayer	5607c7ee49	ac/surface: fix determination of gfx12_enable_dcc For surfaces without a modifier, the surf_size check wasn't necessary, but it was also invalid since surf_size is set later (in gfx12_compute_miptree). Since it's not required anyway, drop this check. Fixes: `060d5dacfd` ("ac: add gfx12 DCC shared code") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31683>	2024-10-18 14:04:04 +02:00
Georg Lehmann	cba575f4df	nir: always emit ddx intrinsics Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>	2024-10-17 09:50:19 +00:00
Marek Olšák	02923e237d	nir: add hole_size parameter into the vectorize callback It will be used to allow merging loads with a hole between them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>	2024-10-15 05:50:24 +00:00
David Heidelberg	d14d3c5bdd	amd: Pass addrlib cpp args to the tests The declaration and definition used by tests otherwise differs from addrlib. Found by LTO -Werror=lto-type-mismatch. Fixes: `1d69c0419b` ("amd/addrlib: prevent defining regparm differently") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: David Heidelberg <david@ixit.cz> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31613>	2024-10-14 16:52:31 +00:00
David Rosca	1e1f078099	radeonsi/vcn: Add support for VCN5 AV1 compound Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31520>	2024-10-10 09:02:21 +00:00
David Rosca	8b2f0fb574	radeonsi/vcn: Support raw packed headers for AV1 Same as H264/HEVC, we still write sequence header ourselves and slice header is sent to FW, everything else gets copied directly to output bitstream buffer. Fixes generating correct output with libva-utils/av1encode. Also fixes temporal delimiter insertion, it's no longer forced on every frame, but instead it lets application handle it. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31520>	2024-10-10 09:02:21 +00:00
David Rosca	813812b925	radeonsi/vcn: Switch to app DPB management for AV1 Also move the common part of the frame header into shared function. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31520>	2024-10-10 09:02:20 +00:00
Samuel Pitoiset	dc5ef90547	ac/surface: add RADEON_SURF_VIEW_3D_AS_2D_ARRAY for GFX9+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31385>	2024-10-01 08:33:51 +00:00
Marek Olšák	246051ebc6	ac/gpu_info: print 32bpp modifiers Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31187>	2024-09-27 19:21:55 +00:00
Marek Olšák	89db355cc4	ac/llvm: use LLVM processor gfx942 for GFX940 when it's available Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31187>	2024-09-27 19:21:55 +00:00
Marek Olšák	163222abd0	ac/nir: set .image_dim and .image_array for all opcodes for consistency Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31187>	2024-09-27 19:21:55 +00:00
Marek Olšák	14b576e023	ac: make sure VEGA20 and MI200 version ranges don't overlap with other chips Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31187>	2024-09-27 19:21:55 +00:00
Georg Lehmann	151cd9c92b	ac/lower_ngg: use is_subgroup_invocation_lt_amd offset Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>	2024-09-26 14:29:14 +00:00
David Rosca	1459193b99	ac: Add VCN IB parser Reviewed-by: Leo Liu <leo.liu@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31275>	2024-09-23 19:25:08 +00:00
David Rosca	72ae8e25a8	ac: Add remaining VCN encode defines Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31113>	2024-09-20 06:58:29 +00:00
David Rosca	aed89d28d3	ac: Add ac_vcn_init_enc_cmds Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31113>	2024-09-20 06:58:29 +00:00
David Rosca	8ecad47695	ac: Fix typo RENCDOE -> RENCODE Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31113>	2024-09-20 06:58:29 +00:00
David Rosca	d6cf36b4d2	radeonsi/vcn: Add rc_per_pic_ex encode command This makes it a bit cleaner as VCN5 goes back to using base rc_per_pic. Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31113>	2024-09-20 06:58:29 +00:00
Georg Lehmann	2789cee0c0	amd/nir: add ac_nir_opt_shared_append Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31075>	2024-09-19 16:21:47 +00:00
Marek Olšák	0d8fe2d03b	ac/nir/meta: tune clear/copy_buffer performance for gfx6-10.3 Finally, old GPUs have optimal clear/copy_buffer performance, but only the top dGPU of each generation gets the best behavior. Other dGPUs might need slightly different conditions. APUs likely need very different conditions. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31082>	2024-09-17 15:19:32 -04:00
Ganesh Belgur Ramachandra	62592674e0	amd: fix incorrect PIPE_INTERLEAVE_BYTES size for CDNA chips The expected PIPE_INTERLEAVE_BYTES size is ADDR_PIPEINTERLEAVE_256B on gfx940 (or other CDNA based chips). Since CDNA based chips like gfx940 doesn't support image opcodes, it gets gibberish value from the kernel. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30891>	2024-09-16 09:31:49 +00:00
Marek Olšák	1537b9355a	ac,radeonsi: update comments related to the L2 cache, use "L2", not "TC" "GL2" is also OK. "TC-compatible" is also OK. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30869>	2024-09-07 01:51:23 +00:00
Marek Olšák	1b94137039	ac/nir/meta: move the "skip compute if no DCC image stores" condition to common Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30869>	2024-09-07 01:51:23 +00:00
Marek Olšák	5250128c6a	ac: fix WAVES_PER_SH value for gfx12 not a serious issue because we only use it for PRIME without SDMA IIRC Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30869>	2024-09-07 01:51:23 +00:00
Timur Kristóf	79df320463	ac/nir: Move varying cost functions from radeonsi to common code. This code will be shared between RADV and RadeonSI. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28676>	2024-09-06 09:16:21 +00:00
Timur Kristóf	4d5bc893b4	ac/nir/tess: Remove no_inputs_in_lds. When there are no VS outputs, we expect that the drivers set the LS-HS vertex stride to zero, which will produce the same result as no_inputs_in_lds did. Remove the unnecessary code path from the output lowering. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30962>	2024-09-05 19:54:29 +00:00
Marek Olšák	52c41f25de	ac/nir/tess: don't allocate LDS for HS inputs that are passed via VGPRs Right now we don't allocate LDS for HS inputs when all HS inputs are passed via VGPRs. This changes it to skip allocating exactly the HS inputs passed via VGPRs by reducing the inputs_read mask to remove holes. radeonsi changes to the LDS allocation will be in a different MR. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30962>	2024-09-05 19:54:29 +00:00
Qiang Yu	588a65f29a	ac: do not lower some ops in nir_lower_packing AMD does not implement nir_op_pack_32_4x8_split, others are implemented, so don't lower them. Fixes: `0f937426cc` ("radeonsi: lower subgroup ops after wave size is known") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11781 Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30885>	2024-08-30 05:46:51 +00:00
Yinjie Yao	7f1c0fbe61	radeonsi/vcn: Rename transform_skip_disabled and remove hardcoded value for VCN5 This fix the HEVC encode corruption caused by mismatch between PPS header and IB setting, the fix only apply for VCN5. Rename from transform_skip_dicarded to transform_skip_disabled. Signed-off-by: Yinjie Yao <yinjie.yao@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30930>	2024-08-30 01:17:22 +00:00
Samuel Pitoiset	2fda0db66f	ac,radeonsi,radv: add common GFX preambles RADV and RadeonSI have a few differences. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30789>	2024-08-27 14:14:57 +00:00
Samuel Pitoiset	80e8e18cc6	ac: add ac_gfx103_get_cu_mask_ps() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30789>	2024-08-27 14:14:57 +00:00
Benjamin Cheng	95a980b61f	radv/video: add event support for VCN4 This was the main missing piece for passing vulkan video CTS as the video firmwares couldn't do proper vulkan events. With new enough firmware this is now possible. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30837>	2024-08-26 22:19:09 +00:00
Qiang Yu	58e412014a	ac,radv,radeonsi: stop using quad vote any/all when llvm ClustedAnd with bool argument and cluster_size==4 will be lowered to quad_vote_all. So does ALU nir_iand/ior op with bool src. OpenGL and Vulkan subgroup clustered_and tests with bool argument fail when using LLVM. It seems LLVM has bug when quad vote bool is in complex control flow. So stop using it for now. Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30610>	2024-08-26 10:46:15 +08:00

1 2 3 4 5 ...

2822 commits