Port from radeonsi.
Besides vertex position based primitive culling, clipdist
attribute can also be used to cull a primitive. Normally
it's used by fixed-pipeline, but when NGG we can treate it
as a culling condition to filter out invisible primitive
before fixed-pipeline.
There are two kinds of clipdist:
1. user define a clip plane explicitly by glClipPlane(),
fixed-pipeline calculate with vertex position to get
clipdist, then cull. This is the legacy way.
2. Now GLSL define gl_ClipDistance/gl_CullDiatance so that
user can calculate clipdist in any way he like.
This implementation support both way.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Port from radeonsi.
Cull primitive after GS thread and before final vertex/primitive
export. GS culling is like VS/TES culling which read out saved
vertex positions of a primitive from LDS then call the primitive
culling algorithm to check whether it's visiable or not, only
passed primitives will be exported.
Unlike the VS/TES culling that read vertex index of a primitive
from VGPRs as shader args, GS will set a primitive complete flag
for each last vertex of a primitive in LDS, so that vertex thread
know the previous 1/2/3 vertex can form a primitive and do primitive
culling.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
radeonsi has different driver_location and io location.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
radeonsi does not have io nir variables, so need to save output
bit size when lower store_output intrinsic.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
driver_location and io location are different for radeonsi,
and radeonsi llvm rely on the correct driver_location to
index output variables.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Simplify: 64bit IO has been lowered by nir_lower_io with
nir_lower_io_lower_64bit_to_32, so no need to handle in the
ngg lower.
Fix: we need to increase io_sem.location by base_offset for
correct gs_output_info.
radeonsi has different driver_location and io location, so
also change the output variable index to io location.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
Make accept_func optional, and return accpect result for caller
react when primitive is rejected.
This is for GS culling.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17651>
As explained in the previous commit, GFX9+ has issues with addressing
mipmaps in block-compressed images. In the case of copy commands, we fix
this by doing an extra copy for the missing blocks.
For GFX10, the mipmap layout in memory allows us to do better than that. We
can change the base level of the descriptor to one level bigger than the
requested level and adjust the extent and address to match. This is done by
ComputeNonBlockCompressedView in addrlib. Thus on GFX10 we can skip the
fixup copy workaround, and this will also fix cases outside of explicit
copy commands.
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17970>
GFX9+ hardware has an issue where mipmap degradations are calculated
incorrectly due to using divide-by-two integer math and certain mipmap
sizes lose blocks.
This issue has been documented before, and we ported a workaround from
AMDVLK to increase the extent that is programmed into the descriptor, so
that the hardware arrives at the correct result. However, this is
insufficient as we cannot safely increase the extent beyond the physical
extent of the image in memory. If we can't increase it enough, the image
will still be missing blocks.
But there is still hope. In cases where RADV is responsible for copying to
or from an image (such as vkCmdCopyBufferToImage/vkCmdCopyImageToBuffer),
we can perform a second copy of the blocks that the hardware excluded so
that the resulting image is complete. This is another workaround from
AMDVLK.
This fixes corrupted textures in Halo: The Master Chief Collection.
v2: Add RADV_CMD_FLAG_INV_L2 | RADV_CMD_FLAG_INV_VCACHE to flush_bits
just in case (Samuel Pitoiset)
Closes: #3347
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17970>
cull_small_primitive will now allow the caller to pass an
SSA def that it will use to determine if the primitive was
initially rejected.
This allows ACO to remove an s_branch instruction from every
NGG culling shader.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160086644 -> 159355824 (-0.46%); split: -0.46%, +0.00%
Instrs: 30477916 -> 30356092 (-0.40%); split: -0.40%, +0.00%
Latency: 139587915 -> 139611487 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 21184261 -> 21184346 (+0.00%)
Copies: 2762930 -> 2702024 (-2.20%); split: -2.20%, +0.00%
Branches: 1236970 -> 1176052 (-4.92%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17919>
Now they all return whether the primitive was rejected.
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
For primitives which are rejected based on only W and face, this
will reduce the number of executed branches.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160330564 -> 160086644 (-0.15%)
Instrs: 30477385 -> 30477916 (+0.00%); split: -0.00%, +0.00%
Latency: 139802763 -> 139587915 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 21198444 -> 21184261 (-0.07%); split: -0.07%, +0.00%
SClause: 749811 -> 749810 (-0.00%)
Copies: 2701482 -> 2762930 (+2.27%); split: -0.00%, +2.28%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
The previous code checked all_w_positive in the if condition.
Instead, always execute the bbox culling code and include
all_w_positive at the end.
We assume checking in the if is not beneficial because it's
very unlikely that there is no primitive in a wave whose W are
not all positive.
This allows moving other things to the condition
in the next commit.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160574204 -> 160330564 (-0.15%); split: -0.15%, +0.00%
Instrs: 30538297 -> 30477385 (-0.20%); split: -0.20%, +0.00%
Latency: 139810902 -> 139802763 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 21198449 -> 21198444 (-0.00%); split: -0.00%, +0.00%
SClause: 749810 -> 749811 (+0.00%)
Copies: 2701474 -> 2701482 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
The LLVM backend keeps track of 16-bit output variables and it will
miscompile shaders when these outputs aren't the correct bitsize.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17706>
Previously, it was in a divergent branch, therefore
it could hang the GPU when a workgroup had a primitive-only wave.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17581>