Commit graph

995 commits

Author SHA1 Message Date
Samuel Pitoiset
4e2fe34aa9 aco: fix determining if LOD is zero for nir_texop_txf/nir_texop_txs
txf/txs expects LOD to be a 32-bit unsigned integer while other
texture operations expects a float.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3668
Fixes: 93c8ebfa78 ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7256>
2020-10-22 11:30:43 +00:00
Samuel Pitoiset
eb6877d3af radv,aco: fix use of texop_samples_identical in the resolve meta path
The return value of this texture intrinsic should be a NIR 1-bit bool.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7236>
2020-10-21 13:06:53 +02:00
Tony Wasserka
fd038132de aco/isel: Miscellaneous cleanups using the new Stage API
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>
2020-10-21 09:49:38 +00:00
Tony Wasserka
34bc9477de aco: Clean up symbol names and comments related to NGG
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>
2020-10-21 09:49:38 +00:00
Tony Wasserka
86c227c10c aco: Use strong typing to model SW<->HW stage mappings
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>
2020-10-21 09:49:38 +00:00
Bas Nieuwenhuizen
76421667ec aco: Add VK_KHR_shader_terminate_invocation support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7226>
2020-10-20 22:53:08 +00:00
Samuel Pitoiset
4ca1030774 radv: move all NIR pass outside of ACO
This has several advantages:
- it generates roughly the same NIR for both compiler backends
  (this might help for debugging purposes)
- it might allow to move around some NIR pass to improve compile time
- it might help for RadeonSI support
- it improves fossils-db stats for RADV/LLVM (this shouldn't matter
  much but it's a win for free)

fossil-db (Navi/LLVM):
Totals from 80732 (59.18% of 136420) affected shaders:
SGPRs: 5390036 -> 5382843 (-0.13%); split: -3.38%, +3.24%
VGPRs: 3910932 -> 3890320 (-0.53%); split: -2.38%, +1.85%
SpillSGPRs: 319212 -> 283149 (-11.30%); split: -17.69%, +6.39%
SpillVGPRs: 14668 -> 14324 (-2.35%); split: -7.53%, +5.18%
CodeSize: 265360860 -> 267572132 (+0.83%); split: -0.47%, +1.30%
Scratch: 5338112 -> 6134784 (+14.92%); split: -2.65%, +17.57%
MaxWaves: 1077230 -> 1086902 (+0.90%); split: +2.79%, -1.90%

No fossils-db changes on RADV/ACO.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7077>
2020-10-20 10:21:39 +00:00
Timur Kristóf
d8435c1628 aco/ngg: Add assertion to make sure we always know the vertex count.
Just a sanity check to avoid hangs caused by missing this
in the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7213>
2020-10-20 07:11:29 +00:00
James Park
af8d488ea5 util,ac,aco,radv: Cross-platform memstream API
POSIX memstream is not available on Windows.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7143>
2020-10-19 03:37:42 -07:00
Rhys Perry
fdb65b8b23 aco: add missing SCC clobber in get_buffer_size
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: fcd6d83245 ("aco: fix imageSize()/textureSize() with large buffers on GFX8")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7162>
2020-10-15 21:11:45 +00:00
Rhys Perry
d75d12f507 aco: don't use v_pack_b32_f16 if 16-bit input denormals are flushed
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7111>
2020-10-15 11:33:42 +00:00
Rhys Perry
d4b3e869ee aco: propagate literals into sub-dword pseudo instructions on GFX9+
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7111>
2020-10-15 11:33:42 +00:00
Rhys Perry
1a652244e4 aco: implement 16-bit literals
We can copy any value into a 16-bit subregister with a 3 dword
v_pack_b32_f16 on GFX10 or a v_and_b32+v_or_b32 on GFX9.

Because the generated code can depend on the register assignment and to
improve constant propagation, Builder::copy creates a p_create_vector in
the case of sub-dword literals.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7111>
2020-10-15 11:33:42 +00:00
Tony Wasserka
d5a72319d6 aco/isel: Remove now unused VS-related code from create_null_export
Also replaced a hardcoded constant with the appropriate register macro.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>
2020-10-14 16:22:51 +00:00
Tony Wasserka
c22c702f35 aco/isel: Remove some dead code
exported_pos was always initialized to true (due to the is_pos argument
of the first export_vs_varying call being true), so none of this code has
any effect.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>
2020-10-14 16:22:51 +00:00
Tony Wasserka
bf51b11c04 aco/isel: Always export position data from VS/NGG
AMD ISA docs explicitly require this for VS, and this likely extends to
NGG too.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3615
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>
2020-10-14 16:22:51 +00:00
Daniel Schürmann
f29c81f863 aco: use VOP2 for v_cvt_pkrtz_f16_f32 if possible
This patch also does a slight rework of export_fs_mrt_color()
to avoid setting of enabled channels which are not used.

Totals from 52404 (38.38% of 136546) affected shaders (NAVI):
SGPRs: 3097443 -> 3097435 (-0.00%)
CodeSize: 189151600 -> 188546200 (-0.32%)
Instrs: 36445061 -> 36445104 (+0.00%); split: -0.00%, +0.00%
Cycles: 1739388020 -> 1739388192 (+0.00%); split: -0.00%, +0.00%
VMEM: 21071501 -> 21071665 (+0.00%); split: +0.00%, -0.00%
SMEM: 3470983 -> 3470982 (-0.00%); split: +0.00%, -0.00%
PreSGPRs: 2058965 -> 2058962 (-0.00%)
PreVGPRs: 1860294 -> 1860295 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
7240edec2a aco: use VOP2 version of v_cvt_pkrtz_f16_f32 on GFX_6_7_10
Totals from 767 (0.56% of 136546) affected shaders (NAVI):
CodeSize: 2862208 -> 2850036 (-0.43%)
Instrs: 561572 -> 561574 (+0.00%)
Cycles: 6455420 -> 6455428 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
2f125908b3 radv,aco: lower_pack_half_2x16
This patch also optimizes pack_half_2x16(a, 0.0).

Totals from 1949 (1.43% of 136546) affected shaders (RAVEN):
SGPRs: 83376 -> 83336 (-0.05%)
CodeSize: 3532144 -> 3512352 (-0.56%)
Instrs: 660746 -> 660682 (-0.01%); split: -0.01%, +0.00%
Cycles: 6780716 -> 6780472 (-0.00%); split: -0.00%, +0.00%
VMEM: 990886 -> 990883 (-0.00%); split: +0.00%, -0.00%
SMEM: 150506 -> 150538 (+0.02%); split: +0.05%, -0.03%
SClause: 30595 -> 30594 (-0.00%); split: -0.01%, +0.00%
Copies: 40801 -> 40729 (-0.18%)
PreSGPRs: 52335 -> 52341 (+0.01%); split: -0.03%, +0.04%
PreVGPRs: 45104 -> 45097 (-0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
dae1e6f756 aco: use v_cvt_pkrtz_f16_f32 for pack_half_2x16
Apparently, we forgot to remove some debug code.
This patch also fixes the round mode check to consider
the destination bit width.

Totals from 2218 (1.62% of 136546) affected shaders (RAVEN):
SGPRs: 100848 -> 100280 (-0.56%)
VGPRs: 68536 -> 66044 (-3.64%); split: -3.68%, +0.05%
CodeSize: 4882296 -> 4837220 (-0.92%); split: -0.94%, +0.01%
MaxWaves: 18990 -> 19019 (+0.15%); split: +0.19%, -0.04%
Instrs: 938150 -> 930388 (-0.83%); split: -0.83%, +0.00%
Cycles: 8699824 -> 8667648 (-0.37%); split: -0.38%, +0.01%
VMEM: 1144502 -> 1059680 (-7.41%); split: +0.06%, -7.48%
SMEM: 170076 -> 167999 (-1.22%); split: +0.22%, -1.44%
VClause: 18428 -> 18422 (-0.03%)
SClause: 41375 -> 41353 (-0.05%); split: -0.06%, +0.00%
Copies: 60008 -> 60054 (+0.08%); split: -0.31%, +0.39%
PreVGPRs: 56163 -> 56142 (-0.04%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
9185b7c069 aco: add validation rules for p_split_vector
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
aec872cda0 aco: use p_split_vector for nir_op_unpack_half_*
This enables the use of SDWA if possible

Totals from 9933 (7.27% of 136546) affected shaders (RAVEN):
VGPRs: 731764 -> 731772 (+0.00%); split: -0.00%, +0.00%
CodeSize: 90944852 -> 90671472 (-0.30%); split: -0.30%, +0.00%
Instrs: 17881885 -> 17867831 (-0.08%); split: -0.08%, +0.00%
Cycles: 1597904072 -> 1597771260 (-0.01%); split: -0.01%, +0.00%
VMEM: 1702328 -> 1697383 (-0.29%); split: +0.13%, -0.42%
SMEM: 659583 -> 659049 (-0.08%); split: +0.01%, -0.09%
VClause: 318024 -> 318025 (+0.00%); split: -0.00%, +0.00%
SClause: 631670 -> 631707 (+0.01%); split: -0.01%, +0.01%
Copies: 1504107 -> 1504626 (+0.03%); split: -0.01%, +0.04%
PreVGPRs: 683153 -> 683180 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
a38a497b86 aco: use p_create_vector for nir_op_pack_half_2x16
This enables the use of SDWA if possible

Totals from 2218 (1.62% of 136546) affected shaders (RAVEN):
VGPRs: 68508 -> 68516 (+0.01%)
CodeSize: 4897024 -> 4881068 (-0.33%); split: -0.33%, +0.00%
MaxWaves: 18992 -> 18990 (-0.01%)
Instrs: 946942 -> 939161 (-0.82%); split: -0.82%, +0.00%
Cycles: 8737668 -> 8705704 (-0.37%); split: -0.37%, +0.00%
VMEM: 1155362 -> 1145245 (-0.88%); split: +0.00%, -0.88%
SMEM: 170435 -> 170165 (-0.16%); split: +0.01%, -0.16%
VClause: 18426 -> 18425 (-0.01%)
SClause: 41376 -> 41375 (-0.00%)
Copies: 59813 -> 59787 (-0.04%); split: -0.15%, +0.10%
PreVGPRs: 56126 -> 56136 (+0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
3c2abd7116 aco: expand create_vector more carefully w.r.t. subdword operands
No pipelinedb changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
d887eb141b aco: propagate SGPRs into VOP1 instructions early.
This helps DCE. We should reconsider our optimization order
or maybe do the dead code analysis twice

Totals from 106 (0.08% of 136546) affected shaders (RAVEN):
SGPRs: 7184 -> 7152 (-0.45%)
CodeSize: 736912 -> 736052 (-0.12%)
Instrs: 145739 -> 145509 (-0.16%)
Cycles: 2085344 -> 2084268 (-0.05%)
VMEM: 14819 -> 14807 (-0.08%)
SMEM: 7109 -> 7100 (-0.13%); split: +0.04%, -0.17%
SClause: 5383 -> 5385 (+0.04%)
Copies: 13290 -> 13189 (-0.76%)
PreSGPRs: 5265 -> 5221 (-0.84%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Samuel Pitoiset
20d73a9049 aco: adjust an assertion about the wavesize in emit_gfx10_wave64_bpermute()
This gets rids of one more use of radv_shader_info.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
112e66fa09 aco: compute the CS workgroup size from the shader NIR info
cs.block_size is copied from cs.local_size during the shader info pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
e3e8d13ada radv: move compiler statistics to ACO
They are really specific to ACO.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
97afb2a0a9 aco: remove unused radv_shader.h includes
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
408195ec53 aco: remove useless occurences of radv_nir_compiler_options
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
8a6f60fc6b aco: remove stub lower_wqm() prototype
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>
2020-10-14 15:09:34 +00:00
Rhys Perry
c122315702 aco: fix get_ssbo_size with a vgpr resource
The result of load_vulkan_descriptor is passed directly to get_ssbo_size.
This caused convert_pointer_to_64_bit() to skip creating a
v_readfirstlane_b32 if it was necessary.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 05b6612b4e ('radv: do not lower UBO/SSBO access to offsets')
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3628
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7095>
2020-10-13 14:20:28 +00:00
Rhys Perry
bcf7a70008 aco: use nir_opt_uniform_atomics
Significantly improves performance of a Control compute shader. Also seems
to increase FPS at the very start of the game by ~9% (RX 580, 1080p,
medium settings, no MSAA).

fossil-db (Navi):
Totals from 315 (0.23% of 135946) affected shaders:
SGPRs: 18296 -> 18336 (+0.22%); split: -0.26%, +0.48%
VGPRs: 11856 -> 11844 (-0.10%); split: -0.81%, +0.71%
CodeSize: 2233800 -> 2457508 (+10.01%)
MaxWaves: 4506 -> 4497 (-0.20%); split: +0.04%, -0.24%
Instrs: 438766 -> 486215 (+10.81%); split: -0.00%, +10.81%
Cycles: 7880180 -> 8963340 (+13.75%); split: -0.00%, +13.75%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
2020-10-13 12:47:21 +00:00
Rhys Perry
e1120f274f nir: move divergence analysis options to nir_shader_compiler_options
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
2020-10-13 12:47:21 +00:00
Rhys Perry
bb5c0ba0d2 aco: implement last_invocation
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
2020-10-13 12:47:21 +00:00
Rhys Perry
36da9c4aa2 aco: implement elect
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
2020-10-13 12:47:20 +00:00
Rhys Perry
bf77f539ee aco: optimize more uniform reductions/scans
Uniform atomic optimization will create these.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
2020-10-13 12:47:20 +00:00
Samuel Pitoiset
b9ca4923d6 aco: implement missing nir_op_unpack_half_2x16_split_{x,y}_flush_to_zero
SPIRV->NIR emits nir_op_unpack_half_2x16_flush_to_zero instead of
nir_op_unpack_half_2x16 if the shader enables denorm flush to zero
for 16-bit floating point.

This doesn't fix anything known and CTS doesn't have tests.

Fixes: 56d9bcdded ("radv: enable more float_controls features")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6939>
2020-10-13 08:35:22 +02:00
Samuel Pitoiset
b0829c6af7 radv: replace RADV_ALPHA_ADJUST by AC_FETCH_FORMAT
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7065>
2020-10-12 13:13:40 +00:00
Timur Kristóf
5ae3656890 aco/ngg: Calculate workgroup size of NGG shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
61280bb4b6 aco/ngg: Allocate NGG GS space early for const vertex/primitive counts.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
e8a0409d01 aco/ngg: Use more efficient LDS layout to help reduce bank conflicts.
The LLVM backend has a trick which helps reduce LDS bank conflicts
by swizzling the LDS address where each vertex is emitted.
This commit implements the same thing for ACO.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
dd73719856 aco/ngg: Add shader query support to NGG GS.
In each GS thread, we calculate the number of "real" primitives that
were emitted (points, lines, triangles, not strips). Then we
accumulate the number of "real" primitives emitted by the
entire threadgroup in GDS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
df62c8fbea aco/ngg: Place workgroup barrier outside control flow for NGG GS.
Merged shaders have a workgroup barrier which makes sure that
the first half is completed in every wave before the 2nd half
is started.

This barrier is located in divergent control flow, so that waves
that don't have any invocations in the 2nd half can finish as early
as possible. This is problematic for NGG GS because it has more
workgroup barriers after the 2nd half.

So, for NGG GS we need to put the barrier outside
control flow because otherwise the waves that have 0 GS threads
won't be able to wait for the waves which have non-zero GS threads.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
1129575d5e aco/ngg: Implement NGG GS output.
We store emitted GS vertices in LDS.
Then, at the end of the shader, the emitted vertices are compacted
and each thread loads a single vertex from LDS in order to export
a primitive as needed, and the vertex attributes.

The reason this	is done is because there is an impedance mismatch
between	how API	GS and the NGG HW works. API GS can emit an arbitrary
number of vertices and primites	in each	thread,	but NGG	HW can only
export one vertex per thread.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
62b5012ec3 aco/ngg: Implement workgroup reduce / exclusive scan for NGG GS.
This function calculates two things at once:

1. The total number of vertices emitted by the threadgroup.
2. Exclusive scan of emitted vertex count accross the threadgroup.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
c29e288fb5 aco/ngg: Create LDS layout for NGG GS.
For NGG GS, we need to store the following in LDS:

1. The ESGS ring, similarly to legacy ESGS.
2. Emitted vertices from the GS threads.
3. Temporary space used by the workgroup scan.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
2680329fb7 aco/ngg: Setup NGG GS.
Make it possible for ACO to recognize when to use HW NGG GS.
Also add a few notes about the various GS stages in the comments.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:15 +02:00
Timur Kristóf
9c3d8404de aco/ngg: Allow NGG GS to create VS exports.
NGG GS need to use the same instructions to export vertex
attributes at the end.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Timur Kristóf
b67878f328 aco/ngg: Allow NGG GS to load per-vertex GS inputs.
They work the same way as in legacy GS, so we can reuse that.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00