fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Daniel Schürmann	7f962a9362	aco: guarantee that Temp fits in 4 bytes Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4130>	2020-04-09 15:08:57 +00:00
Timur Kristóf	1436c0b8e0	aco/ngg: Add new stage for hw_ngg_gs. This is needed to distinguish between NGG and legacy. Otherwise, vertex_geometry_gs and ngg_vertex_geometry_gs have the same value, which we want to avoid. Also, there is no such thing as ngg_vertex_tess_control_hs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>	2020-04-07 11:29:35 +00:00
Rhys Perry	20a4b1461b	aco: zero-initialize Temp Fixes dEQP-VK.transform_feedback.* crashes from accesses garbage temporaries in emit_extract_vector(). Fixes: `85521061` ("aco: prepare helper functions for subdword handling") Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4463> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4463>	2020-04-06 19:15:19 +00:00
Daniel Schürmann	f01bf51a2b	aco: refactor regClass setup for subdword VGPRs Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Rhys Perry	c4223fa512	aco: add emission support for register-allocated sdwa sels Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Daniel Schürmann	8acb384471	aco: add sub-dword regclasses Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Rhys Perry	b84d59af50	aco: add SDWA_instruction Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Daniel Schürmann	00312f3c95	aco: add comparison operators for PhysReg Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Rhys Perry	34424b81df	aco: make PhysReg in units of bytes Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Rhys Perry	507956ed04	aco: add vmem/smem score statistic This isn't perfect (for example, changes might not be too meaningful when comparing shaders with different control flow) but it should be useful for evaluating scheduler changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Rhys Perry	b1544352c0	aco: add various compiler statistics Adds these statistics: - hash of code and constant data - number of instructions - number of copies from pseudo-instructions - number of branches - estimate of cycles spent not waiting in s_waitcnt - number of vmem/smem "clauses" - sgpr/vgpr usage before scheduling Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Samuel Pitoiset	2f424c83e0	aco: only break SMEM clauses if XNACK is enabled (mostly APUs) According to LLVM, it seems only required for APUs like RAVEN, but we still ensure that SMEM stores are in their own clause. pipeline-db (VEGA10): Totals from affected shaders: SGPRS: 1775364 -> 1775364 (0.00 %) VGPRS: 1287176 -> 1287176 (0.00 %) Spilled SGPRs: 725 -> 725 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 65386620 -> 65107460 (-0.43 %) bytes Max Waves: 287099 -> 287099 (0.00 %) pipeline-db (POLARIS10): Totals from affected shaders: SGPRS: 1797743 -> 1797743 (0.00 %) VGPRS: 1271108 -> 1271108 (0.00 %) Spilled SGPRs: 730 -> 730 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 64046244 -> 63782324 (-0.41 %) bytes Max Waves: 254875 -> 254875 (0.00 %) This only affects GFX6-GFX9 chips because the compiler uses a different pass for GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349>	2020-04-01 17:50:31 +00:00
Timur Kristóf	0f35b3795d	aco: Fix workgroup size calculation. Clear the workgroup size for all supported shader stages. Also, unify the workgroup size calculation accross various places. As a result, insert_waitcnt can use the proper workgroup size which means that some waits can be dropped from tessellation shaders. Also, in cases where the previous calculation was wrong, we now insert s_barrier instructions. Totals from affected shaders (GFX10): Code Size: 340116 -> 338484 (-0.48 %) bytes Fixes: `a8d15ab6da` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Rhys Perry	43918c9a7f	aco: implement 64-bit VGPR constant copies in handle_operands() 64-bit VGPR constant copies can happen because of 64-bit constant copy propagation. Since this optimization is beneficial and more annoying to deal with in the optimizer, I've implemented 64-bit VGPR constant copies in handle_operands(). This also sets copy_operation::size correctly for 64-bit constant copies. Cc: 20.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>	2020-03-24 11:28:55 +00:00
Rhys Perry	638cbc21a1	aco: handle when ACO adds new continue edges Usually a loop ends with a uniform continue. If it doesn't and we end up adding our own continue edges (because of continue_or_break or divergent breaks at the end), we have to add extra operands to the loop header phis. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	ee9e0d1eca	aco: set late kill for v_interp_p1_f32 for some APUs Apparently needed for Stoney Ridge, Kabini and Mullins APUs. gfx702 also has 16-bank LDS and https://llvm.org/docs/AMDGPUUsage.html lists some dGPUs under there. Those GPUs seem to be Hawaii actually (gfx701) and we don't seem to have gotten any interpolation related bugs reported with them so far. The late kill flag was tested by running pipeline-db with ACO_DEBUG=validatera while setting late kill for SMEM buffer loads, emit_vop2_instruction() and texture instructions. I also tested with just setting the flag for v_interp_p1_f32. As far as I know, the only other thing we have to consider for 16-bank LDS is something to do with 16-bit interpolation. We don't do that yet. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>	2020-03-16 16:09:02 +00:00
Rhys Perry	1872759f55	aco: add a late kill flag Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>	2020-03-16 16:09:02 +00:00
Rhys Perry	c51348bd9b	aco: move some register demand helpers into aco_live_var_analysis.cpp Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>	2020-03-16 16:09:02 +00:00
Rhys Perry	928ac97875	aco: add helpers for ensuring correct ordering while scheduling Pipeline-db changes in 721 shaders. Totals from affected shaders: SGPRS: 42336 -> 42656 (0.76 %) VGPRS: 38368 -> 38636 (0.70 %) Spilled SGPRs: 11967 -> 11967 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5268088 -> 5269840 (0.03 %) bytes LDS: 1069 -> 1069 (0.00 %) blocks Max Waves: 4473 -> 4447 (-0.58 %) SMEM score: 41155.00 -> 41826.00 (1.63 %) VMEM score: 146339.00 -> 147471.00 (0.77 %) SMEM clauses: 24434 -> 24535 (0.41 %) VMEM clauses: 16637 -> 16592 (-0.27 %) Instructions: 996037 -> 996388 (0.04 %) Cycles: 76476112 -> 75281416 (-1.56 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3776>	2020-03-13 14:04:50 +00:00
Timur Kristóf	9016711273	aco: Setup correct HW stages when tessellation is used. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:10 +00:00
Rhys Perry	7f1b537304	aco: add new NOP insertion pass for GFX6-9 This new pass is more similar to the GFX10 pass and should be able to handle control flow better. No pipeline-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4004>	2020-03-05 19:37:24 +00:00
Daniel Schürmann	71440ba0f5	aco: reorder VMEM operands in ACO IR For all VMEM instructions, the resource constant is now in operands[0]. For MIMG instructions, the sampler shares operands[1] with write data in case this instruction writes memory. Moving the VADDR to be the last operand for MIMG is the first step to support Navi NSA encoding. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3602>	2020-01-29 18:45:23 +00:00
Rhys Perry	f8f7712666	aco: implement GS copy shaders v5: rebase on float_controls changes v7: rebase after shader args MR and load/store vectorizer MR Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>	2020-01-24 13:35:07 +00:00
Rhys Perry	e192e268de	aco: explicitly mark end blocks for exports For GS copy shaders, whether we want to do exports is conditional. By explicitly marking the end blocks, we can mark an IF's then branch as an export block and ensure that's where the assembler inserts null exports. v6: only fixup exports in the end block, like before v8: simplify some code Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>	2020-01-24 13:35:07 +00:00
Rhys Perry	40bb81c9dd	radv/aco,aco: implement GS on GFX9+ v2: implement GFX10 v3: rebase v7: rebase after shader args MR v8: fix gs_vtx_offset usage on GFX9/GFX10 v8: use unreachable() instead of printing intrinsic v8: rename output_state to ge_output_state v8: fix formatting around nir_foreach_variable() v8: rename some helpers in the scheduler v8: rename p_memory_barrier_all to p_memory_barrier_common v8: fix assertion comparing ctx.stage against vertex_geometry_gs Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>	2020-01-24 13:35:07 +00:00
Samuel Pitoiset	9e2fde84fc	aco: add new addr64 bit to MUBUF instructions on GFX6-GFX7 According to the different ISA docs (and to LLVM), this bit seems to only exists on GFX6-GFX7. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>	2020-01-20 16:24:55 +00:00
Timur Kristóf	d962bbd895	aco: Implement 64-bit constant propagation. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2020-01-14 21:21:06 +01:00
Rhys Perry	69bed1c918	aco: don't DCE atomics with return values We don't create atomics with definitions if they are not used in NIR, but our own DCE can remove the uses if an export turns out to be null. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `93c8ebfa78` ('aco: Initial commit of independent AMD compiler') Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>	2020-01-13 13:26:43 +00:00
Daniel Schürmann	8b7a42d6d0	aco: compact aco::span<T> to use uint16_t offset and size instead of pointer and size_t. This reduces the size of the Instruction base class from 40 bytes to 16 bytes. No pipelinedb changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>	2020-01-10 17:49:18 +00:00
Daniel Schürmann	ffb4790279	aco: compact various Instruction classes No pipelinedb changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>	2020-01-10 17:49:18 +00:00
Rhys Perry	b5c9688516	aco: limit register usage for large work groups Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2020-01-10 12:10:37 +00:00
Eric Engestrom	51569e525a	amd: fix empty-body issues Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `8d43e2b2de` ("meson: add -Werror=empty-body to disallow `if(x);`") Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>	2019-12-27 22:09:00 +00:00
Rhys Perry	afe1a8ff5b	aco: fix vgpr alloc granule with wave32 We still need to increase the number of physical vgprs Totals from affected shaders: SGPRS: 671976 -> 675288 (0.49 %) VGPRS: 550112 -> 562596 (2.27 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 27621660 -> 27606532 (-0.05 %) bytes Max Waves: 81083 -> 87833 (8.32 %) Instructions: 5391560 -> 5389031 (-0.05 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-21 12:38:42 +01:00
Daniel Schürmann	da7ff58835	aco: make 1/2*PI a literal constant on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	0d42e4d7a0	aco: Initial GFX7 Support Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Timur Kristóf	e0bcefc3a0	aco/wave32: Use lane mask regclass for exec/vcc. Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Rhys Perry	389ee819c0	aco: improve FLAT/GLOBAL scheduling Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:46:02 +00:00
Rhys Perry	cc742562c1	aco: don't enable store_global for helper invocations Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-29 17:46:02 +00:00
Rhys Perry	37843e454e	aco: allow constant offsets for global/scratch instructions on GFX10 I don't think the bug applies for global/scratch instructions and load_barycentric_at_sample selection expects this feature to work. Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>	2019-11-26 14:39:27 +00:00
Connor Abbott	bb78f9b4e4	aco: Use common argument handling Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	680b086db1	aco: Constify radv_nir_compiler_options in isel It's already const for everything else. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Rhys Perry	df645fa369	aco: implement VK_KHR_shader_float_controls This actually supports more of the extension than the LLVM backend but we can't enable it because ACO doesn't work with all stages yet. With more of it enabled, some CTS tests fail because our 64-bit sqrt is very imprecise. I can't find any precision requirements for it anywhere, so I'm thinking it might be a CTS issue. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-15 17:36:21 +00:00
Daniel Schürmann	8657eede8a	aco: check if SALU instructions are predeceeded by exec when calculating WQM needs Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-11-14 17:27:10 +01:00
Rhys Perry	78e3ea9a0f	aco: add Instruction::usesModifiers() and add more checks in the optimizer No pipeline-db changes. v2: use early-exit for VOP3 Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)	2019-11-08 00:14:06 +00:00
Daniel Schürmann	c79972b604	aco: always set scratch_offset in startpgm This patch also moves private_segment_buffer and scratch_offset to Program to easily access it. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-10-30 19:48:33 +00:00
Timur Kristóf	c52ebbcea4	aco: Introduce vgpr_limit to keep track of available VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Timur Kristóf	d59f702e26	aco: Implement subgroup shuffle in GFX10 wave64 mode. Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	3865448012	aco: Fix reductions on GFX10. Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan with all reduction operations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	fc04a2fc31	aco: take LDS into account when calculating num_waves pipeline-db (Vega): SGPRS: 344 -> 344 (0.00 %) VGPRS: 424 -> 524 (23.58 %) Spilled SGPRs: 84 -> 80 (-4.76 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 52812 -> 52484 (-0.62 %) bytes LDS: 135 -> 135 (0.00 %) blocks Max Waves: 56 -> 53 (-5.36 %) v2: consider WGP, rework to be clearer and apply the "maximum 16 workgroups per CU" limit properly v2: use "SIMD" instead of "EU" v2: fix spiller by introducing "Program::max_waves" v2: rename "lds_size" to "lds_limit" v3: make max_waves actually independant of register usage v3: fix issue where max_waves was way too high v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1) v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp" v4: fix typo from "workgroups_per_cu" rename Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-10-23 19:11:21 +01:00
Rhys Perry	08d510010b	aco: increase accuracy of SGPR limits SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed number of SGPRs and has 106 addressable SGPRs. pipeline-db (Vega): SGPRS: 5912 -> 6232 (5.41 %) VGPRS: 1772 -> 1780 (0.45 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 88228 -> 87904 (-0.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 559 -> 571 (2.15 %) piepline-db (Navi): SGPRS: 341256 -> 363384 (6.48 %) VGPRS: 171536 -> 170960 (-0.34 %) Spilled SGPRs: 832 -> 581 (-30.17 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14207332 -> 14190872 (-0.12 %) bytes LDS: 33 -> 33 (0.00 %) blocks Max Waves: 18072 -> 18251 (0.99 %) v2: unconditionally count vcc as an extra sgpr on GFX10+ v3: pass SGPRs rounded to 8 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-23 19:11:21 +01:00

... 3 4 5 6 7

308 commits