nir_cf_reinsert() can re-create the block, invalidating dominance
metadata.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9808>
Fixes the following building error:
external/mesa/src/amd/compiler/aco_interface.cpp:155: error: undefined reference to 'aco::optimize_postRA(aco::Program*)'
Fixes: 0e4747d3fb ("aco: Introduce a new, post-RA optimizer.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11177>
Fixes the following building error:
external/mesa/src/amd/common/ac_nir_lower_ngg.c:27:10: fatal error: 'u_math.h' file not found
^~~~~~~~~~
1 error generated.
Fixes: 3d589b8b46 ("ac: Add new NIR pass to lower NGG VS/TES.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11177>
Fixes the following building errors:
external/mesa/src/amd/vulkan/radv_shader.c:868: error: undefined reference to 'ac_nir_lower_ngg_gs'
external/mesa/src/amd/vulkan/radv_shader.c:851: error: undefined reference to 'ac_nir_lower_ngg_nogs'
external/mesa/src/amd/compiler/aco_interface.cpp:155: error: undefined reference to 'aco::optimize_postRA(aco::Program*)'
Fixes: 3d589b8b46 ("ac: Add new NIR pass to lower NGG VS/TES.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11177>
Instead of passing two different structs to ac_dump_rgp_capture().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11156>
RGP captures can contain both SQTT and SPM data. While we are at it,
move it to ac_rgp.h and adjust a message.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11156>
Performance counters will be used by RADV for VK_KHR_performance_query
and also for adding SPM support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11140>
Instead of just reporting the failed statements, print where they
originated. This is useful for tests which have a number of similar
checks.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10898>
accessing this variable repeatedly like this is a contended hotpath somehow,
so instead just create a const for reference
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11124>
this is an extreme hotpath, so having a single calculation in a const
variable is slightly better for compiler microoptimizing
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11124>
So we don't need to provision aarch64 servers, which are these days
rarer than x8_64.
In the switch to the new runner tags, switch to one which contains the
device type, so we can dimension the runner jobs taking into account the
number of DUTs available.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11108>
To match latest RGP spec. Captures generated by RADV still work
with latest RGP (v1.10).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11063>
A simple post-RA optimization which takes advantage of the
s_cbranch_vccz and s_cbranch_vccnz instructions.
It works on the following pattern:
vcc = v_cmp ...
scc = s_and vcc, exec
p_cbranch scc
The result looks like this:
vcc = v_cmp ...
p_cbranch vcc
Fossil DB results on Sienna Cichlid:
Totals from 4814 (3.21% of 149839) affected shaders:
CodeSize: 15371176 -> 15345964 (-0.16%)
Instrs: 3028557 -> 3022254 (-0.21%)
Latency: 21872753 -> 21823476 (-0.23%); split: -0.23%, +0.00%
InvThroughput: 4470282 -> 4468691 (-0.04%); split: -0.04%, +0.00%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7779>
This commit adds the skeleton of a new ACO post-RA optimizer,
which is intended to be a simple pass called after RA, and
is meant to do code changes which can only be done
after RA.
It is currently empty, the actual optimizations will be added
in their own commits. It only has a DCE pass, which deletes
some dead code generated by the spiller.
Fossil DB results on Sienna Cichlid:
Totals from 375 (0.25% of 149839) affected shaders:
CodeSize: 2933056 -> 2907192 (-0.88%)
Instrs: 534154 -> 530706 (-0.65%)
Latency: 12088064 -> 12084907 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 4433454 -> 4432421 (-0.02%); split: -0.02%, +0.00%
Copies: 81649 -> 78203 (-4.22%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7779>
DFSM has never been enabled by default because it was slower.
RadeonSI is also dropping support for this because they discovered
that's actually not efficient in practice.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10968>
ACO is the default compiler for almost a year from now, so it should
be fine to replace RADV/ACO by just RADV. LLVM is still added
when RADV_DEBUG=llvm is used for convenience.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10972>
For fragment shaders that only contain a discard, the exec mask has
to be zero'd and everything discarded.
It seems unnecessary to emit an export here because if the FS has no
exports, the compiler already emits a null export at the end.
Fixes incorrect hair rendering in Detroit: Become Human.
fossil-db (Sienna Cichlid):
Totals from 3 (0.00% of 149839) affected shaders:
CodeSize: 2896 -> 2872 (-0.83%)
Instrs: 556 -> 553 (-0.54%)
Latency: 29266 -> 29214 (-0.18%)
InvThroughput: 3374 -> 3372 (-0.06%)
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10955>
This has no effects because radv_image_has_CB_metadata() still
accounts for DCC which is incorrect. This should be changed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10667>
Just having DCC enabled on the base level doesn't mean we are
using compressed rendering.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10667>
With DCC and mipmaps, some mips can't be compressed and it makes
sense to check this here.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10667>
Instead of storing the vertex mask per wave into LDS and then computing
the prefix sum, store 8-bit bitcounts (vertex counts) of the vertex masks
into LDS. This allows us to compute the sum using v_sad_u8, which computes
a sum of 4 i8vec4 components in one instruction.
Each i8vec4 of vertex counts is loaded in parallel threads (one dword
per thread) instead of all being loaded in thread 0, and readlane copies
them to SGPRs instead of readfirstlane.
LDS is no longer initialized before culling. Instead, the counts for
inactive waves are masked with AND later.
Incorrect old comments are also fixed.
This change removes 80 bytes from the code size, and it allows increasing
the workgroup size from 128 to 256. (which is the main motivation for this)
Now changing the workgroup size with wave64 has no effect on the code size.
Switching to wave32 with 8 waves even generates slightly smaller code than
wave64 with 4 waves.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>