It looks like the fixed-func hardware is very slow to cull primitives
with zero pos.w but shader based culling helps a lot.
This fixes a massive performance gap with the FSR2 demo compared to
AMDGPU-PRO, +228% on RDNA2.
Based on my investigation, AMDGPU-PRO seems to always cull these
primitives. Note that disabling NGG culling with AMDGPU-PRO reports the
same performance as RADV without that fix. Also note that the FSR2
sample doesn't specify any cull mode (ie. VK_CULL_MODE_NONE is used),
so this is the only reason PRO was culling more than RADV.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7260
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31891>
Computing 'htile_size/meta_size' is allowed for RADEON_SURF_MODE_1D when
RADEON_SURF_TC_COMPATIBLE_HTILE isn't set.
Lacking of computing causes performance degradation in some scenarios.
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31617>
RADV needs to adjust this register for user sample locations because
it seems possible to have a sample on the -8 coordinate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31815>
Unigine Heaven with AMD_DEBUG=mono has incorrect rendering on gfx11
because it doesn't set nir_io_semantics::dual_source_blend_index for
the second output, resulting in garbage asm.
Instead of trying to find out what's wrong, I decided to rewrite this
to make it the same as the LLVM IR path. It simplifies the code and fixes
Unigine Heaven with AMD_DEBUG=mono.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31669>
For surfaces without a modifier, the surf_size check wasn't
necessary, but it was also invalid since surf_size is set later
(in gfx12_compute_miptree).
Since it's not required anyway, drop this check.
Fixes: 060d5dacfd ("ac: add gfx12 DCC shared code")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31683>
It will be used to allow merging loads with a hole between them.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
The declaration and definition used by tests otherwise differs from
addrlib.
Found by LTO -Werror=lto-type-mismatch.
Fixes: 1d69c0419b ("amd/addrlib: prevent defining regparm differently")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31613>
Same as H264/HEVC, we still write sequence header ourselves
and slice header is sent to FW, everything else gets copied
directly to output bitstream buffer.
Fixes generating correct output with libva-utils/av1encode.
Also fixes temporal delimiter insertion, it's no longer forced
on every frame, but instead it lets application handle it.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31520>
Finally, old GPUs have optimal clear/copy_buffer performance, but only
the top dGPU of each generation gets the best behavior.
Other dGPUs might need slightly different conditions.
APUs likely need very different conditions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31082>
The expected PIPE_INTERLEAVE_BYTES size is ADDR_PIPEINTERLEAVE_256B on
gfx940 (or other CDNA based chips). Since CDNA based chips like gfx940
doesn't support image opcodes, it gets gibberish value from the kernel.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30891>
This code will be shared between RADV and RadeonSI.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28676>
When there are no VS outputs, we expect that the drivers set
the LS-HS vertex stride to zero, which will produce the
same result as no_inputs_in_lds did.
Remove the unnecessary code path from the output lowering.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30962>
Right now we don't allocate LDS for HS inputs when all HS inputs are passed
via VGPRs.
This changes it to skip allocating exactly the HS inputs passed via VGPRs
by reducing the inputs_read mask to remove holes.
radeonsi changes to the LDS allocation will be in a different MR.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30962>
This fix the HEVC encode corruption caused by mismatch between PPS
header and IB setting, the fix only apply for VCN5.
Rename from transform_skip_dicarded to transform_skip_disabled.
Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30930>
This was the main missing piece for passing vulkan video CTS
as the video firmwares couldn't do proper vulkan events.
With new enough firmware this is now possible.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30837>
ClustedAnd with bool argument and cluster_size==4 will be lowered
to quad_vote_all. So does ALU nir_iand/ior op with bool src.
OpenGL and Vulkan subgroup clustered_and tests with bool argument
fail when using LLVM. It seems LLVM has bug when quad vote bool
is in complex control flow. So stop using it for now.
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30610>