fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 18:00:13 +01:00

Author	SHA1	Message	Date
Rhys Perry	3d6f67950d	aco: improve 8/16-bit constants fossil-db (Navi, fp16 enabled): Totals from 1 (0.00% of 127638) affected shaders: CodeSize: 4540 -> 4388 (-3.35%) Instrs: 861 -> 830 (-3.60%) Cycles: 3444 -> 3320 (-3.60%) VMEM: 489 -> 465 (-4.91%) SMEM: 107 -> 110 (+2.80%) SClause: 31 -> 30 (-3.23%) Copies: 58 -> 54 (-6.90%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5245>	2020-06-15 18:24:22 +00:00
Daniel Schürmann	db957f9135	aco: optimize packing of 16bit subdword registers on GFX6/7 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	2a51840c52	aco: skip partial copies on first iteration when lowering to hw Helps some Detroit : Become Human shaders. Totals from affected shaders: (VEGA) Code Size: 47693912 -> 47670212 (-0.05 %) bytes Instructions: 9183788 -> 9177863 (-0.06 %) Copies: 910052 -> 904127 (-0.65 %) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	1d6f667193	aco: coalesce copies more aggressively when lowering to hw Helps some Detroit : Become Human shaders. Totals from affected shaders: (VEGA) Code Size: 9880420 -> 9879088 (-0.01 %) bytes Instructions: 1918553 -> 1918220 (-0.02 %) Copies: 177783 -> 177450 (-0.19 %) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	b21d2d9a9f	aco: add and use scratch SGPR to lower subdword p_create_vector on GFX6/7 This is needed to lower some corner cases correctly, in case the same operand occurs multiple times: e.g. v0 = p_create_vector(v0[0:8], v0[0:8], v0[0:8], v0[0:8]) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	9e8e12ea6d	aco: adjust GFX6 subdword lowering workarounds for 8bit Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	b083581010	aco: Workarounds subdword lowering on GFX6/7 As there are no SDWA instructions, we need to take care not to overwrite the upper bits of other copy_operation's operands. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	942e3c40c3	aco: use full-register instructions to implement subdword packing on GFX6/7 On GFX6/7, there are no SDWA instructions. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Daniel Schürmann	3f03db848d	aco: simplify statistics collection for copies Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Samuel Pitoiset	e1523b34c2	aco: fix sign-extend 8-bit subgroup operations on GFX6-GFX7 SDWA is GFX8+. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5327>	2020-06-05 16:04:05 +02:00
Samuel Pitoiset	ee4bc13de2	aco: use v_bfe_u32 for unsigned reductions sign-extension on GFX6-GFX7 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5327>	2020-06-05 16:04:03 +02:00
Samuel Pitoiset	77f08982af	aco: sign-extend input/identity for 16-bit subgroup ops on GFX6-GFX7 16-bit subgroup ops are implemented with 32-bit instructions on GFX6-GFX7. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5227>	2020-06-03 19:48:43 +02:00
Samuel Pitoiset	6b08d269bf	aco: implement 16-bit reduce operations on GFX6-GFX7 No fp16 on GFX6-GFX7. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5227>	2020-06-03 19:48:37 +02:00
Oschowa	663e8cb4e6	aco: Use correct reference type in for-range-loop. Fixes a clang warning. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5228>	2020-06-02 21:31:17 +00:00
Timur Kristóf	045c9ffa7d	aco: Implement subgroup shuffle on GFX6-7. GFX6 and GFX7 don't have the ds_bpermute (or permute) instruction, but we would like to support subgroup shuffle on these old GPUs. So we introduce a new pseudio instruction which will be lowered to an "unrolled loop" that emulates bpermute on GFX6 and GFX7 using readlane instructions, while also respecting the exec mask thanks to v_cmpx. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>	2020-06-02 21:12:12 +00:00
Timur Kristóf	14a5021aff	aco/gfx10: Refactor of GFX10 wave64 bpermute. The emulated GFX10 wave64 bpermute no longer needs a linear_vgpr, so we don't consider it a reduction anymore. Additionally, the code is slightly reorganized in preparation for the GFX6 emulated bpermute. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>	2020-06-02 21:12:12 +00:00
Samuel Pitoiset	e22567089c	aco: sign-extend input/indentity for 32-bit reduce ops on GFX10 Because some 16-bit instructions are already VOP3 on GFX10, we use the 32-bit variants to remove the temporary VGPR and to use DDP with the arithmetic instructions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>	2020-05-29 11:20:58 +00:00
Samuel Pitoiset	83dcd1690b	aco: allow gfx10_wave64_bpermute with 8-bit/16-bit input Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>	2020-05-29 11:20:58 +00:00
Samuel Pitoiset	2e0ea9bcca	aco: implement 8-bit/16-bit reductions on GFX10 Some 16-bit instructions are VOP3 on GFX10 and we have to emit a 32-bit DPP mov followed by the ALU instruction. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5148>	2020-05-29 11:20:58 +00:00
Samuel Pitoiset	2d4493ee11	aco: sign-extend the input and identity for 8-bit subgroup operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4494>	2020-05-21 15:06:48 +00:00
Samuel Pitoiset	86e2b03e3f	aco: implement 8-bit/16-bit reductions Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4494>	2020-05-21 15:06:48 +00:00
Rhys Perry	cdfede7336	aco: split operations that use a swap's definition Instead of relying it's read being entirely within the swap's definition. No shader-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4950>	2020-05-14 18:36:33 +00:00
Rhys Perry	09c584caeb	aco: split self-intersecting copies instead of swapping Example situation: v1 = {v0.hi, v1.lo} v0.hi = v1.hi The 4-byte copy's definition is completely used, but swapping it makes no sense. We have to split it to generate correct code: swap(v0.hi, v1.lo) swap(v0.hi, v1.hi) Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	e4383b5c7f	aco: decrease the uses of other copy operations after splitting/removing For copies like v[7:8] = v[8:9], what currently happens is: - do_copy() will skip the second dword - the uses of the second dword will be reduced to 0 - the copy operation will be removed from the map and v8 will never be set to v9. So just decrease the uses of other operations after splitting or removing the current operation, so: "v8 = v9" will be split off, it's uses reduced and then the new copy will be done in the next iteration. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4686>	2020-04-23 11:39:23 +00:00
Rhys Perry	f13049f48a	aco: implement 64-bit sgpr swaps In our pipeline-db, helps almost exclusively Detroit: Become Human. Totals from 6726 (5.36% of 125503) affected shaders: CodeSize: 74680952 -> 74102228 (-0.77%) Instrs: 14551507 -> 14406001 (-1.00%) Cycles: 1748272436 -> 1690173104 (-3.32%) VMEM: 964671 -> 964058 (-0.06%) Copies: 1993312 -> 1847806 (-7.30%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>	2020-04-22 13:25:17 +00:00
Rhys Perry	2ab45f41e0	aco: implement sub-dword swaps Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>	2020-04-22 13:25:17 +00:00
Rhys Perry	8fc24f9a45	aco: fix copy statistic for 64-bit vgpr constant copy The statistic is in units of instructions. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>	2020-04-22 13:25:17 +00:00
Daniel Schürmann	8f1712ba2f	aco: lower subdword shuffles correctly. Note that subdword swaps are not yet implemented Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Daniel Schürmann	9f779a2518	aco: small refactoring of shuffle code lowering Uses now bytes instead of 32bit size Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Rhys Perry	34424b81df	aco: make PhysReg in units of bytes Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Rhys Perry	b1544352c0	aco: add various compiler statistics Adds these statistics: - hash of code and constant data - number of instructions - number of copies from pseudo-instructions - number of branches - estimate of cycles spent not waiting in s_waitcnt - number of vmem/smem "clauses" - sgpr/vgpr usage before scheduling Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Timur Kristóf	e0dff5fd86	aco: Create null exports in instruction selection instead of assembler. This allows the passes after isel to assume that the exports are always correct, and also allows to schedule these null exports later. Additionally, it ensures that the correct exec mask is used for these exports. Totals from affected shaders (GFX10): SGPRS: 84224 -> 84344 (0.14 %) VGPRS: 23088 -> 23076 (-0.05 %) Code Size: 882892 -> 894368 (1.30 %) bytes Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Rhys Perry	43918c9a7f	aco: implement 64-bit VGPR constant copies in handle_operands() 64-bit VGPR constant copies can happen because of 64-bit constant copy propagation. Since this optimization is beneficial and more annoying to deal with in the optimizer, I've implemented 64-bit VGPR constant copies in handle_operands(). This also sets copy_operation::size correctly for 64-bit constant copies. Cc: 20.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>	2020-03-24 11:28:55 +00:00
Rhys Perry	21ba2bc595	aco: remove dead code in handle_operands() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>	2020-03-24 11:28:55 +00:00
Timur Kristóf	d962bbd895	aco: Implement 64-bit constant propagation. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2020-01-14 21:21:06 +01:00
Daniel Schürmann	ffb4790279	aco: compact various Instruction classes No pipelinedb changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>	2020-01-10 17:49:18 +00:00
Daniel Schürmann	746165e540	aco: implement exclusive scan for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	7ae227effd	aco: implement inclusive_scan for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	f895a8b1df	aco: implement (clustered) reductions for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	9254fb4fc7	aco: don't use a scalar temporary for reductions on GFX10 This patch also adds the scalar temporary for scans on SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Daniel Schürmann	6a586a6006	aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>	2019-12-07 11:23:11 +01:00
Timur Kristóf	637c5a1dd9	aco/wave32: Fix reductions. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	e0bcefc3a0	aco/wave32: Use lane mask regclass for exec/vcc. Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Rhys Perry	56c06c79fc	aco: implement 64-bit integer reductions The multiplication reduction is larger than it could be, but it should be easier to implement this way. No failures with dEQP-VK.subgroups.int64 except those caused by LLVM being used for other stages. v2: don't call setFixed() for v_add carry-out, since setHint sets physReg v3: add and use emit_vadd32() helper v4: use num_opcodes instead of last_opcode Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-11-19 18:58:04 +00:00
Rhys Perry	33277bd66e	aco: refactor reduction lowering helpers Should make 64-bit integer reductions easier to implement. v4: use num_opcodes instead of last_opcode Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-11-19 18:56:21 +00:00
Rhys Perry	df645fa369	aco: implement VK_KHR_shader_float_controls This actually supports more of the extension than the LLVM backend but we can't enable it because ACO doesn't work with all stages yet. With more of it enabled, some CTS tests fail because our 64-bit sqrt is very imprecise. I can't find any precision requirements for it anywhere, so I'm thinking it might be a CTS issue. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-15 17:36:21 +00:00
Rhys Perry	3204e83768	aco: use DPP instead of exec modification when lowering GFX10 shuffles Seems we can use DPP's row_mask field to get an effect similar to modifying exec. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-12 17:21:38 +00:00
Timur Kristóf	c52ebbcea4	aco: Introduce vgpr_limit to keep track of available VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Timur Kristóf	d59f702e26	aco: Implement subgroup shuffle in GFX10 wave64 mode. Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	c2eebfe3ea	aco: Remove dead code in reduction lowering. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00

1 2

53 commits