fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Timur Kristóf	c52ebbcea4	aco: Introduce vgpr_limit to keep track of available VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Timur Kristóf	d59f702e26	aco: Implement subgroup shuffle in GFX10 wave64 mode. Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	3865448012	aco: Fix reductions on GFX10. Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan with all reduction operations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	fc04a2fc31	aco: take LDS into account when calculating num_waves pipeline-db (Vega): SGPRS: 344 -> 344 (0.00 %) VGPRS: 424 -> 524 (23.58 %) Spilled SGPRs: 84 -> 80 (-4.76 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 52812 -> 52484 (-0.62 %) bytes LDS: 135 -> 135 (0.00 %) blocks Max Waves: 56 -> 53 (-5.36 %) v2: consider WGP, rework to be clearer and apply the "maximum 16 workgroups per CU" limit properly v2: use "SIMD" instead of "EU" v2: fix spiller by introducing "Program::max_waves" v2: rename "lds_size" to "lds_limit" v3: make max_waves actually independant of register usage v3: fix issue where max_waves was way too high v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1) v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp" v4: fix typo from "workgroups_per_cu" rename Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-10-23 19:11:21 +01:00
Rhys Perry	08d510010b	aco: increase accuracy of SGPR limits SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed number of SGPRs and has 106 addressable SGPRs. pipeline-db (Vega): SGPRS: 5912 -> 6232 (5.41 %) VGPRS: 1772 -> 1780 (0.45 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 88228 -> 87904 (-0.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 559 -> 571 (2.15 %) piepline-db (Navi): SGPRS: 341256 -> 363384 (6.48 %) VGPRS: 171536 -> 170960 (-0.34 %) Spilled SGPRs: 832 -> 581 (-30.17 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14207332 -> 14190872 (-0.12 %) bytes LDS: 33 -> 33 (0.00 %) blocks Max Waves: 18072 -> 18251 (0.99 %) v2: unconditionally count vcc as an extra sgpr on GFX10+ v3: pass SGPRs rounded to 8 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-23 19:11:21 +01:00
Rhys Perry	f6f15859de	aco: small stage corrections Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	c24cd97515	aco: Assemble opsel in VOP3 instructions. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:53 +02:00
Timur Kristóf	c0df15e645	aco: Support GFX10 MTBUF in aco_assembler. Also remove img_format from aco_ir, since it can be calculated from dfmt and nfmt. So only the assember needs to deal with it. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:53 +02:00
Timur Kristóf	fd1d947457	aco: Add missing GFX10 specific fields and some README notes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:52 +02:00
Timur Kristóf	a01d796de4	aco: Set +wavefrontsize64 for LLVM disassembler in GFX10 wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:52 +02:00
Rhys Perry	b711e62e61	aco: set loop_info::has_discard for demotes We need the loop header phis for the outer exec masks. Needed for dEQP-VK.glsl.demote.dynamic_loop_texture Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-09-27 10:57:03 +01:00
Rhys Perry	06ea3325c3	aco: CSE readlane/readfirstlane/permute/reduce with the same exec mask v2: rename pass_temp to pass_flags v2: also CSE reductions v3: add ds_swizzle_b32 support v3: check gds/offset0/offset1 fields Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-09-26 13:19:51 +01:00
Daniel Schürmann	93c8ebfa78	aco: Initial commit of independent AMD compiler ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Co-authored-by: Connor Abbott <cwabbott0@gmail.com> Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com> Co-authored-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00

1 2 3

113 commits