As explained in the previous commit, GFX9+ has issues with addressing
mipmaps in block-compressed images. In the case of copy commands, we fix
this by doing an extra copy for the missing blocks.
For GFX10, the mipmap layout in memory allows us to do better than that. We
can change the base level of the descriptor to one level bigger than the
requested level and adjust the extent and address to match. This is done by
ComputeNonBlockCompressedView in addrlib. Thus on GFX10 we can skip the
fixup copy workaround, and this will also fix cases outside of explicit
copy commands.
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17970>
GFX9+ hardware has an issue where mipmap degradations are calculated
incorrectly due to using divide-by-two integer math and certain mipmap
sizes lose blocks.
This issue has been documented before, and we ported a workaround from
AMDVLK to increase the extent that is programmed into the descriptor, so
that the hardware arrives at the correct result. However, this is
insufficient as we cannot safely increase the extent beyond the physical
extent of the image in memory. If we can't increase it enough, the image
will still be missing blocks.
But there is still hope. In cases where RADV is responsible for copying to
or from an image (such as vkCmdCopyBufferToImage/vkCmdCopyImageToBuffer),
we can perform a second copy of the blocks that the hardware excluded so
that the resulting image is complete. This is another workaround from
AMDVLK.
This fixes corrupted textures in Halo: The Master Chief Collection.
v2: Add RADV_CMD_FLAG_INV_L2 | RADV_CMD_FLAG_INV_VCACHE to flush_bits
just in case (Samuel Pitoiset)
Closes: #3347
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17970>
cull_small_primitive will now allow the caller to pass an
SSA def that it will use to determine if the primitive was
initially rejected.
This allows ACO to remove an s_branch instruction from every
NGG culling shader.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160086644 -> 159355824 (-0.46%); split: -0.46%, +0.00%
Instrs: 30477916 -> 30356092 (-0.40%); split: -0.40%, +0.00%
Latency: 139587915 -> 139611487 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 21184261 -> 21184346 (+0.00%)
Copies: 2762930 -> 2702024 (-2.20%); split: -2.20%, +0.00%
Branches: 1236970 -> 1176052 (-4.92%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17919>
Now they all return whether the primitive was rejected.
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
For primitives which are rejected based on only W and face, this
will reduce the number of executed branches.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160330564 -> 160086644 (-0.15%)
Instrs: 30477385 -> 30477916 (+0.00%); split: -0.00%, +0.00%
Latency: 139802763 -> 139587915 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 21198444 -> 21184261 (-0.07%); split: -0.07%, +0.00%
SClause: 749811 -> 749810 (-0.00%)
Copies: 2701482 -> 2762930 (+2.27%); split: -0.00%, +2.28%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
The previous code checked all_w_positive in the if condition.
Instead, always execute the bbox culling code and include
all_w_positive at the end.
We assume checking in the if is not beneficial because it's
very unlikely that there is no primitive in a wave whose W are
not all positive.
This allows moving other things to the condition
in the next commit.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 160574204 -> 160330564 (-0.15%); split: -0.15%, +0.00%
Instrs: 30538297 -> 30477385 (-0.20%); split: -0.20%, +0.00%
Latency: 139810902 -> 139802763 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 21198449 -> 21198444 (-0.00%); split: -0.00%, +0.00%
SClause: 749810 -> 749811 (+0.00%)
Copies: 2701474 -> 2701482 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17870>
The LLVM backend keeps track of 16-bit output variables and it will
miscompile shaders when these outputs aren't the correct bitsize.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17706>
Previously, it was in a divergent branch, therefore
it could hang the GPU when a workgroup had a primitive-only wave.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17581>
There's no dependency between them.
This can simplify the compiler backend translation by
always storing prim id before vertex export, which also
benefits the LLVM backend in latter changes.
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17581>
We are going to reuse them outside of radv_pipeline.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
When culling enabled, it will use LDS space, which overlap with
the prim id export.
Fixes: e97f0463a8 ("ac/nir: Implement NGG deferred attribute culling in NIR.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17593>
These set the pass and make sure we don't have multiple submissions
at the same time touching the perf counters/pass at the same time.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16879>
Change NGG GS emit vertex code to emit combined shared stores,
also change the export vertex code to emit combined shared loads.
This results in more optimal code generation, ie. fewer LDS
instructions are generated.
GS vertices are stored using an odd stride to minimize the chance
of bank conflicts, which means that unfortunately
we still can't use an alignment higher than 4 here,
so the best we can get are some ds_read2_b32 instructions.
Fossil DB stats on Navi 21 (formerly Sienna Cichlid):
Totals from 135 (0.10% of 128653) affected shaders:
VGPRs: 6416 -> 6512 (+1.50%)
CodeSize: 529436 -> 503792 (-4.84%)
MaxWaves: 2952 -> 2924 (-0.95%)
Instrs: 93384 -> 90176 (-3.44%)
Latency: 290283 -> 293611 (+1.15%); split: -0.36%, +1.50%
InvThroughput: 81218 -> 82598 (+1.70%)
Copies: 6603 -> 6606 (+0.05%)
PreVGPRs: 5037 -> 5076 (+0.77%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11425>