sanitize_cf_list can in fact invalidate the dominance metadata,
which we need to use eg. nir_unsigned_upper_bound.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
Late export is theoretically better if used with LATE_ALLOC,
but in practice, the early export has an advantage of
lower register usage, therefore more concurrent waves.
The idea of this commit is that "small" shaders benefit from early
primitive export more, due to being able to launch much more waves.
Let's consider a NIR shader "small" when it has only 1 block.
This yields both better performance, and better stats, than always
using late export.
Fossil DB on Sienna:
Totals from 12807 (8.76% of 146265) affected shaders:
VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83%
SpillSGPRs: 1458 -> 1538 (+5.49%)
CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14%
MaxWaves: 282902 -> 278516 (-1.55%)
Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18%
VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30%
SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16%
Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26%
Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26%
PreSGPRs: 434701 -> 447396 (+2.92%)
PreVGPRs: 527783 -> 540527 (+2.41%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
This allows to force the VRS rates via RADV_FORCE_VRS, the supported
values are 2x2, 1x2 and 2x1. This supports the primitive shading rate
mode for non GUI elements.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7794>
It used to be that this intrinsic was never created and texture
instructions were always used.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 50881d59e6 ("compiler/spirv: fix image sample queries")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9686>
Keep track of the current loop depth in Program and set the depth inside
Program::insert_block() instead of repeating it every time we insert one.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
The calculate_tess_lds_size function already returns the size in blocks
of the encoding granule, but we forgot to adjust config->lds_size.
This variable is not used to actually set the LDS size used for TCS,
but by ACO to make scheduling decisions.
Fossil DB stats on Sienna Cichlid:
Please note that the +3729.43% is NOT a regression.
The real LDS size used didn't change, it was just reported incorrectly.
Totals from 1342 (0.96% of 139391) affected shaders:
VGPRs: 60880 -> 80240 (+31.80%); split: -0.05%, +31.85%
CodeSize: 3378456 -> 3381224 (+0.08%); split: -0.23%, +0.31%
LDS: 687104 -> 26312192 (+3729.43%)
MaxWaves: 29794 -> 23962 (-19.57%)
Instrs: 644194 -> 644610 (+0.06%); split: -0.32%, +0.39%
Cycles: 2675068 -> 2676804 (+0.06%); split: -0.31%, +0.38%
VMEM: 428840 -> 517418 (+20.66%); split: +22.53%, -1.88%
SMEM: 91831 -> 88587 (-3.53%); split: +5.70%, -9.23%
VClause: 22740 -> 19384 (-14.76%); split: -16.18%, +1.42%
SClause: 19116 -> 18373 (-3.89%); split: -4.34%, +0.46%
Copies: 66662 -> 63448 (-4.82%); split: -5.55%, +0.73%
Fixes: cf89bdb9ba "radv: align the LDS size in calculate_tess_lds_size()"
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9098>
This patch delays the calculation of GPR limits in order to
precisely incorporate extra registers (VCC etc.) and shared VGPRs.
Additionally, the allocation granularity is used to set the config.
This has some effect on the reported SGPR stats.
Totals (Navi10):
SGPRs: 6971787 -> 17753642 (+154.65%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8921>
When there are no param exports in an NGG (or legacy VS) shader,
the NO_PC_EXPORT=1 is set by RADV, which means PS waves can launch
before the current stage finishes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7868>
Without it, FragCoord.z will have the value of one of the fine pixels
instead of the center of the coarse pixel.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7837>
If the workgroup_size variable is lower than the actual workgroup size,
that means it's possible that ACO won't emit some s_barrier instructions
when in fact it should. This can possibly cause a GPU hang.
This is just for the sake of general correctness, currently this
can't cause a real problem because the maximum vertex count is always
greater than (or equal to) the primitive count in GS, and already
takes into account the number of GS invocations.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
This is to make sure we don't compile a shader which doesn't
fit the available LDS space.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
Now that we have Program::temp_rc, we can replace it with the first
temporary id allocated for NIR's ssa defs.
No fossil-db changes on Navi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7067>
The return value of this texture intrinsic should be a NIR 1-bit bool.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7236>
This has several advantages:
- it generates roughly the same NIR for both compiler backends
(this might help for debugging purposes)
- it might allow to move around some NIR pass to improve compile time
- it might help for RadeonSI support
- it improves fossils-db stats for RADV/LLVM (this shouldn't matter
much but it's a win for free)
fossil-db (Navi/LLVM):
Totals from 80732 (59.18% of 136420) affected shaders:
SGPRs: 5390036 -> 5382843 (-0.13%); split: -3.38%, +3.24%
VGPRs: 3910932 -> 3890320 (-0.53%); split: -2.38%, +1.85%
SpillSGPRs: 319212 -> 283149 (-11.30%); split: -17.69%, +6.39%
SpillVGPRs: 14668 -> 14324 (-2.35%); split: -7.53%, +5.18%
CodeSize: 265360860 -> 267572132 (+0.83%); split: -0.47%, +1.30%
Scratch: 5338112 -> 6134784 (+14.92%); split: -2.65%, +17.57%
MaxWaves: 1077230 -> 1086902 (+0.90%); split: +2.79%, -1.90%
No fossils-db changes on RADV/ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7077>
cs.block_size is copied from cs.local_size during the shader info pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
Significantly improves performance of a Control compute shader. Also seems
to increase FPS at the very start of the game by ~9% (RX 580, 1080p,
medium settings, no MSAA).
fossil-db (Navi):
Totals from 315 (0.23% of 135946) affected shaders:
SGPRs: 18296 -> 18336 (+0.22%); split: -0.26%, +0.48%
VGPRs: 11856 -> 11844 (-0.10%); split: -0.81%, +0.71%
CodeSize: 2233800 -> 2457508 (+10.01%)
MaxWaves: 4506 -> 4497 (-0.20%); split: +0.04%, -0.24%
Instrs: 438766 -> 486215 (+10.81%); split: -0.00%, +10.81%
Cycles: 7880180 -> 8963340 (+13.75%); split: -0.00%, +13.75%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
For NGG GS, we need to store the following in LDS:
1. The ESGS ring, similarly to legacy ESGS.
2. Emitted vertices from the GS threads.
3. Temporary space used by the workgroup scan.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Make it possible for ACO to recognize when to use HW NGG GS.
Also add a few notes about the various GS stages in the comments.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Make the NGG VS/TES code easier to follow, give better names to
some functions and make ngg_nogs_early_prim_export a variable.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
Lowering IO for VS, TCS, TES and GS still have to be done for LLVM.
No fossils-db change on NAVI10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6897>