fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-20 22:30:12 +01:00

Author	SHA1	Message	Date
Connor Abbott	bb78f9b4e4	aco: Use common argument handling Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	e7f4cadd02	radv: Replace supports_spill with explict_scratch_args The former was always true and hence dead code. We will want to explicitly declare the ring offset register with ACO, but we also want to declare the scratch offset too, and we can't try to disable it since ACO also supports spilling and the determination of whether spilling has to happen occurs well after setting up registers. So replace supports_spill with something that will actually be used for ACO. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:17:51 +01:00
Connor Abbott	4d6676d78a	aco: Make num_workgroups and local_invocation_ids one argument each To match the LLVM argument setup code. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	a7f1c63442	aco: Split vector arguments at the beginning Due to how LLVM works we have to make some of the FS inputs become vectors, and therefore have to split them early so that they don't take up extra register pressure due to how RA currently works. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	680b086db1	aco: Constify radv_nir_compiler_options in isel It's already const for everything else. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Rhys Perry	df645fa369	aco: implement VK_KHR_shader_float_controls This actually supports more of the extension than the LLVM backend but we can't enable it because ACO doesn't work with all stages yet. With more of it enabled, some CTS tests fail because our 64-bit sqrt is very imprecise. I can't find any precision requirements for it anywhere, so I'm thinking it might be a CTS issue. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-15 17:36:21 +00:00
Timur Kristóf	8995c0b30a	aco: Treat all booleans as per-lane. Previously, instruction selection had two kinds of booleans: 1. divergent which was per-lane and stored in s2 (VCC size) 2. uniform which was stored in s1 Additionally, uniform booleans were made per-lane when they resulted from operations which were supported only by the VALU. To decide which type was used, we relied on the destination size, which was not reliable due to the per-lane uniform bools, but it mostly works on wave64. However, in wave32 mode (where VCC is also s1) this approach makes it impossible keep track of which boolean is uniform and which is divergent. This commit makes all booleans per-lane. The resulting excess code size will be taken care of by the optimizer. v2 (by Daniel Schürmann): - Better names for some functions - Use s_andn2_b64 with exec for nir_op_inot - Simplify code due to using s_and_b64 in bool_to_scalar_condition v3 (by Timur Kristóf): - Fix several subgroups regressions Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-14 17:27:11 +01:00
Rhys Perry	76544f632d	radv: adjust loop unrolling heuristics for int64 In particular, increase the cost of 64-bit integer division. Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom , with ACO used for GS this creates shaders requiring a branch with >32767 dword offset. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-07 23:29:12 +00:00
Daniel Schürmann	a47e232ccd	aco: workaround Tonga/Iceland hardware bug The workaround got accidentally moved to the wrong place Fixes: `08d510010b` aco: increase accuracy of SGPR limits Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-07 09:19:50 +01:00
Samuel Pitoiset	d3f9957de4	radv: determine shaders wavesize at pipeline level Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-06 09:20:34 +01:00
Daniel Schürmann	c79972b604	aco: always set scratch_offset in startpgm This patch also moves private_segment_buffer and scratch_offset to Program to easily access it. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-30 19:48:33 +00:00
Timur Kristóf	c52ebbcea4	aco: Introduce vgpr_limit to keep track of available VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	fc04a2fc31	aco: take LDS into account when calculating num_waves pipeline-db (Vega): SGPRS: 344 -> 344 (0.00 %) VGPRS: 424 -> 524 (23.58 %) Spilled SGPRs: 84 -> 80 (-4.76 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 52812 -> 52484 (-0.62 %) bytes LDS: 135 -> 135 (0.00 %) blocks Max Waves: 56 -> 53 (-5.36 %) v2: consider WGP, rework to be clearer and apply the "maximum 16 workgroups per CU" limit properly v2: use "SIMD" instead of "EU" v2: fix spiller by introducing "Program::max_waves" v2: rename "lds_size" to "lds_limit" v3: make max_waves actually independant of register usage v3: fix issue where max_waves was way too high v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1) v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp" v4: fix typo from "workgroups_per_cu" rename Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-10-23 19:11:21 +01:00
Rhys Perry	08d510010b	aco: increase accuracy of SGPR limits SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed number of SGPRs and has 106 addressable SGPRs. pipeline-db (Vega): SGPRS: 5912 -> 6232 (5.41 %) VGPRS: 1772 -> 1780 (0.45 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 88228 -> 87904 (-0.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 559 -> 571 (2.15 %) piepline-db (Navi): SGPRS: 341256 -> 363384 (6.48 %) VGPRS: 171536 -> 170960 (-0.34 %) Spilled SGPRs: 832 -> 581 (-30.17 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14207332 -> 14190872 (-0.12 %) bytes LDS: 33 -> 33 (0.00 %) blocks Max Waves: 18072 -> 18251 (0.99 %) v2: unconditionally count vcc as an extra sgpr on GFX10+ v3: pass SGPRs rounded to 8 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-23 19:11:21 +01:00
Rhys Perry	f6f15859de	aco: small stage corrections Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-22 18:52:29 +00:00
Rhys Perry	73184e51d1	aco: run opt_algebraic in a loop Totals from affected shaders: SGPRS: 13920 -> 13656 (-1.90 %) VGPRS: 12972 -> 12960 (-0.09 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1005680 -> 1000648 (-0.50 %) bytes LDS: 91 -> 91 (0.00 %) blocks Max Waves: 688 -> 688 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 19:18:30 +00:00
Rhys Perry	132ae89b19	aco: use nir_lower_idiv_precise v7: rename _nv50/_llvm to _fast/_precise Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 18:49:46 +00:00
Rhys Perry	8b98d0954e	nir/lower_idiv: add new llvm-based path v2: make variable names snake_case v2: minor cleanups in emit_udiv() v2: fix Panfrost build failure v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature v4: remove nir_op_urcp v5: drop nv50 path v5: rebase v6: add back nv50 path v6: add comment for nir_lower_idiv_path enum v7: rename _nv50/_llvm to _fast/_precise v8: fix etnaviv build failure Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 18:49:46 +00:00
Rhys Perry	0c3fe323b6	aco: implement divergent vulkan_resource_index Fixes the UBO/SSBO dEQP-VK.descriptor_indexing.* tests v2: remove bld.copy() usage Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-11 14:26:58 +00:00
Timur Kristóf	d729d8f1dc	aco: Add extra assertion for number of FS input VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:53 +02:00
Timur Kristóf	0be1dd8564	aco: Fix VS input VGPRs on GFX10. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:53 +02:00
Timur Kristóf	a01d796de4	aco: Set +wavefrontsize64 for LLVM disassembler in GFX10 wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-10 09:57:52 +02:00
Rhys Perry	3f6e91a8d8	aco: enable nir_opt_sink SGPRS: 880272 -> 838936 (-4.70 %) VGPRS: 705316 -> 680988 (-3.45 %) Spilled SGPRs: 1032 -> 832 (-19.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 252 -> 252 (0.00 %) dwords per thread Code Size: 55150788 -> 55172436 (0.04 %) bytes LDS: 451 -> 451 (0.00 %) blocks Max Waves: 66178 -> 68706 (3.82 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-09 17:55:25 +00:00
Rhys Perry	a87b0f5141	radv/aco,aco: set lower_fmod This simplifies ACO and allows the lowered code to be optimized (in particular, constant folded). Totals from affected shaders: SGPRS: 1776 -> 1776 (0.00 %) VGPRS: 1436 -> 1436 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 203452 -> 203564 (0.06 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 103 -> 103 (0.00 %) At least some of the code size increase seems to be from literals being applied to instructions as a result of constant folding. v2: remove fmod/frem handling in init_context() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-04 14:00:46 +00:00
Daniel Schürmann	1d29895e5b	aco: call nir_opt_algebraic_late() exhaustively 57559 shaders in 28980 tests Totals: SGPRS: 2963407 -> 2959935 (-0.12 %) VGPRS: 2014812 -> 2016328 (0.08 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 114545436 -> 114498084 (-0.04 %) bytes LDS: 933 -> 933 (0.00 %) blocks Max Waves: 375997 -> 375866 (-0.03 %) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Mauro Rossi	c24ad565ae	android: aco: fix undefined template 'std::__1::array' build errors Fixes a few building errors similar to the following: In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26: In file included from external/libcxx/include/algorithm:639: external/libcxx/include/utility:321:9: error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>' _T2 second; ^ Fixes: `93c8ebf` ("aco: Initial commit of independent AMD compiler") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2019-09-28 15:56:23 +02:00
Rhys Perry	b125dc4839	aco: implement 64-bit ineg We currently lower them, but nir_opt_algebraic() can add new ones because lower_sub=true. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-09-25 15:27:48 +00:00
Rhys Perry	641eac953c	aco: run nir_lower_int64() before nir_lower_idiv() nir_lower_idiv() asserts on 64-bit integers. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-09-25 15:27:48 +00:00
Daniel Schürmann	93c8ebfa78	aco: Initial commit of independent AMD compiler ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Co-authored-by: Connor Abbott <cwabbott0@gmail.com> Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com> Co-authored-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00

29 commits