fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 05:00:09 +01:00

Author	SHA1	Message	Date
Rhys Perry	b1544352c0	aco: add various compiler statistics Adds these statistics: - hash of code and constant data - number of instructions - number of copies from pseudo-instructions - number of branches - estimate of cycles spent not waiting in s_waitcnt - number of vmem/smem "clauses" - sgpr/vgpr usage before scheduling Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Rhys Perry	ad2703653f	radv: add code for exposing compiler statistics Statistics will be added to ACO in later commits. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Samuel Pitoiset	c953292630	aco: always optimize v_mad to v_madak in presence of literals v_mad and v_madak are both 64-bit instructions, so it doesn't increase code size to always apply a 32-bit literal instead of using v_mad and a sgpr which contains that literal. Found with some Youngblood shaders but help some other games. vkpipeline-db (VEGA10): Totals from affected shaders: SGPRS: 46168 -> 46016 (-0.33 %) VGPRS: 45576 -> 45564 (-0.03 %) Code Size: 5187208 -> 5179584 (-0.15 %) bytes Max Waves: 3297 -> 3297 (0.00 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4410> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4410>	2020-04-03 07:30:49 +00:00
Samuel Pitoiset	2f424c83e0	aco: only break SMEM clauses if XNACK is enabled (mostly APUs) According to LLVM, it seems only required for APUs like RAVEN, but we still ensure that SMEM stores are in their own clause. pipeline-db (VEGA10): Totals from affected shaders: SGPRS: 1775364 -> 1775364 (0.00 %) VGPRS: 1287176 -> 1287176 (0.00 %) Spilled SGPRs: 725 -> 725 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 65386620 -> 65107460 (-0.43 %) bytes Max Waves: 287099 -> 287099 (0.00 %) pipeline-db (POLARIS10): Totals from affected shaders: SGPRS: 1797743 -> 1797743 (0.00 %) VGPRS: 1271108 -> 1271108 (0.00 %) Spilled SGPRs: 730 -> 730 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 64046244 -> 63782324 (-0.41 %) bytes Max Waves: 254875 -> 254875 (0.00 %) This only affects GFX6-GFX9 chips because the compiler uses a different pass for GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4349>	2020-04-01 17:50:31 +00:00
Rhys Perry	4a909068ad	aco: look at p_{extract,split}_vector's definitions in pred_by_exec_mask() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4333> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4333>	2020-03-30 17:34:46 +00:00
Jason Ekstrand	16a80ff18a	aco: Implement b2b32 and b2b1 The implementations here just clone i2b32 and i2b1. This means that b2b32 doesn't technically generate true NIR 0/-1 booleans but it should be fine as it's only ever generated for shared variable writes which will always be consumed by something which will then run it through an i2b again. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Timur Kristóf	0f847b18bc	aco: Don't store LS VS outputs to LDS when TCS doesn't need them. Totals: Code Size: 254764624 -> 254745104 (-0.01 %) bytes Totals from affected shaders: VGPRS: 12132 -> 12112 (-0.16 %) Code Size: 573364 -> 553844 (-3.40 %) bytes Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	798dd98d6e	aco: When LS and HS invocations are the same, pass LS outputs in temps. We know that in this case, the LS and HS invocations are working on the exact same vertex, so it's safe to skip the LDS. Totals: VGPRS: 3960744 -> 3961844 (0.03 %) Code Size: 254824300 -> 254764624 (-0.02 %) bytes Max Waves: 1053748 -> 1053574 (-0.02 %) Totals from affected shaders: VGPRS: 26152 -> 27252 (4.21 %) Code Size: 1496600 -> 1436924 (-3.99 %) bytes Max Waves: 4860 -> 4686 (-3.58 %) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	0a91c086b8	aco: Extract store_output_to_temps into a separate function. Will be used by LS output stores. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	0f35b3795d	aco: Fix workgroup size calculation. Clear the workgroup size for all supported shader stages. Also, unify the workgroup size calculation accross various places. As a result, insert_waitcnt can use the proper workgroup size which means that some waits can be dropped from tessellation shaders. Also, in cases where the previous calculation was wrong, we now insert s_barrier instructions. Totals from affected shaders (GFX10): Code Size: 340116 -> 338484 (-0.48 %) bytes Fixes: `a8d15ab6da` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	99ad62ff27	aco: Extract setup_tcs_info to a separate function. Will be required by the workgroup size calculation. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	0ad65f2c55	aco: Zero-fill undefined elements in create_vec_from_array. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	50634ad4a0	aco: Change isel inputs/outputs to a flat array. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	e4a1b246a4	aco: Treat outputs of the previous stage as inputs of the next stage. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	e7d733fdab	aco: Use more optimal sequence at the beginning of merged shaders. It can be further optimized in the future, but the new sequence already has a few advantages: * Uses fewer instructions * Uses even fewer instructions in wave32 mode * Doesn't use the VALU at all Totals from affected shaders (GFX10): VGPRS: 43504 -> 43496 (-0.02 %) Code Size: 2436000 -> 2423688 (-0.51 %) bytes Max Waves: 8704 -> 8705 (0.01 %) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	17c779ab9e	aco: Skip 2nd read of merged wave info when TCS in/out vertices are equal. When TCS has an equal number of input and output, it means that the number of VS and TCS invocations (LS and HS) are the same; and that the HS invocations operate on the same vertices as the LS. When this is the case, this commit removes the else-if between the merged VS and TCS halves, making it possible to schedule and optimize the code accross the two halves. Totals: SGPRS: 5577367 -> 5581735 (0.08 %) VGPRS: 3958592 -> 3960752 (0.05 %) Code Size: 254867144 -> 254838244 (-0.01 %) bytes Max Waves: 1053887 -> 1053747 (-0.01 %) Totals from affected shaders: SGPRS: 29032 -> 33400 (15.05 %) VGPRS: 35664 -> 37824 (6.06 %) Code Size: 1979028 -> 1950128 (-1.46 %) bytes Max Waves: 7310 -> 7170 (-1.92 %) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	4ec48440a0	aco: Allow combining LDS loads when loading tess factors. Previously the tess factors were loaded individually, but now they can be loaded using a single LDS load instruction. Note that the inner and outer tess factors are not yet combined. Totals (GFX10): Code Size: 254896008 -> 254879212 (-0.01 %) bytes Totals from affected shaders (GFX10): Code Size: 2028352 -> 2011556 (-0.83 %) bytes Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	ace3833293	aco: Allow combining TCS output VMEM stores. Some copypasta may have stuck in the code. This was left on false by mistake. Totals (GFX10): Code Size: 254939248 -> 254896008 (-0.02 %) bytes Totals from affected shaders (GFX10): VGPRS: 16196 -> 16212 (0.10 %) Code Size: 1126332 -> 1083092 (-3.84 %) bytes Max Waves: 2336 -> 2334 (-0.09 %) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	e2b1d749b1	aco: Fix handling of tess factors. There is no need to check whether they are written using indirect indices, because all tess factors should be written to VMEM only at the end of the shader. No pipeline db changes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	d3f6adcaed	aco: Extract tcs_driver_location_matches_api_mask to separate function. Also clear up should_write_tcs_output_to_lds a little bit. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Timur Kristóf	e0dff5fd86	aco: Create null exports in instruction selection instead of assembler. This allows the passes after isel to assume that the exports are always correct, and also allows to schedule these null exports later. Additionally, it ensures that the correct exec mask is used for these exports. Totals from affected shaders (GFX10): SGPRS: 84224 -> 84344 (0.14 %) VGPRS: 23088 -> 23076 (-0.05 %) Code Size: 882892 -> 894368 (1.30 %) bytes Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Eric Engestrom	79af30768d	meson: inline `inc_common` Let's make it clear what includes are being added everywhere, so that they can be cleaned up. Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>	2020-03-28 21:36:54 +01:00
Rhys Perry	43918c9a7f	aco: implement 64-bit VGPR constant copies in handle_operands() 64-bit VGPR constant copies can happen because of 64-bit constant copy propagation. Since this optimization is beneficial and more annoying to deal with in the optimizer, I've implemented 64-bit VGPR constant copies in handle_operands(). This also sets copy_operation::size correctly for 64-bit constant copies. Cc: 20.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>	2020-03-24 11:28:55 +00:00
Rhys Perry	21ba2bc595	aco: remove dead code in handle_operands() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4260>	2020-03-24 11:28:55 +00:00
Rhys Perry	17c7f4e30e	aco: fix boolean undef regclass Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4285> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4285>	2020-03-23 19:43:09 +00:00
Rhys Perry	9d56ed199b	aco: emit IR in IF's merge block instead if the other side ends in a jump Fixes NIR such as: if (divergent) { a = sgpr() } else { break; } use(a) Previously we would have emitted: if (divergent) { a = sgpr() } if (!divergent) { break; } use(a) But "a" isn't available at it's use. Now we emit: if (divergent) { } if (!divergent) { break; } a = sgpr() use(a) pipeline-db (Navi): Totals from affected shaders: SGPRS: 1936 -> 1936 (0.00 %) VGPRS: 1264 -> 1264 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 159408 -> 159152 (-0.16 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 81 -> 81 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> CC: <mesa-stable@lists.freedesktop.org> Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2557 Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	8d8c864beb	aco: improve check for unreachable loop continue blocks The old code would have previously caught: loop { ... break } when it was meant to just catch: loop { if (...) break else break } Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	46e94fd854	aco: skip NIR in unreachable merge blocks NIR removes most of this but undef instructions for loop header phis can remain. These were harmless because ACO would DCE them itself. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	638cbc21a1	aco: handle when ACO adds new continue edges Usually a loop ends with a uniform continue. If it doesn't and we end up adding our own continue edges (because of continue_or_break or divergent breaks at the end), we have to add extra operands to the loop header phis. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	f2c4878de9	aco: handle missing second predecessors at merge block phis Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	f1a2e1df78	aco: set has_divergent_branch for discards in loops Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3658>	2020-03-23 15:55:12 +00:00
Rhys Perry	2d14a8f237	aco: fix operand order for LS VGPR init bug workaround Fixes: `a952bf3946` ('aco: Fix LS VGPR init bug on affected hardware.') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4201> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4201>	2020-03-16 19:34:32 +00:00
Rhys Perry	ded7a8bb46	aco: fix instruction encoding for LS VGPR init bug workaround Fixes: `a952bf3946` ('aco: Fix LS VGPR init bug on affected hardware.') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4201>	2020-03-16 19:34:32 +00:00
Rhys Perry	ee9e0d1eca	aco: set late kill for v_interp_p1_f32 for some APUs Apparently needed for Stoney Ridge, Kabini and Mullins APUs. gfx702 also has 16-bank LDS and https://llvm.org/docs/AMDGPUUsage.html lists some dGPUs under there. Those GPUs seem to be Hawaii actually (gfx701) and we don't seem to have gotten any interpolation related bugs reported with them so far. The late kill flag was tested by running pipeline-db with ACO_DEBUG=validatera while setting late kill for SMEM buffer loads, emit_vop2_instruction() and texture instructions. I also tested with just setting the flag for v_interp_p1_f32. As far as I know, the only other thing we have to consider for 16-bank LDS is something to do with 16-bit interpolation. We don't do that yet. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>	2020-03-16 16:09:02 +00:00
Rhys Perry	1872759f55	aco: add a late kill flag Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>	2020-03-16 16:09:02 +00:00
Rhys Perry	c51348bd9b	aco: move some register demand helpers into aco_live_var_analysis.cpp Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>	2020-03-16 16:09:02 +00:00
Rhys Perry	625d8705f0	aco: don't stop scheduling at exports This allows us to move v_cvt_pkrtz_f16_f32 instructions upwards, improving schedules and (somewhat unintentionally) moving the exports slightly closer together. Totals from affected shaders: SGPRS: `1030224` -> 1030248 (0.00 %) VGPRS: 794080 -> 794392 (0.04 %) Spilled SGPRs: 127117 -> 127117 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 89028152 -> 89032312 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 65252 -> 65219 (-0.05 %) SMEM score: 843808.00 -> 843918.00 (0.01 %) VMEM score: 5331687.00 -> 5397802.00 (1.24 %) SMEM clauses: 567659 -> 567655 (-0.00 %) VMEM clauses: 290715 -> 290716 (0.00 %) Instructions: 17143219 -> 17144259 (0.01 %) Cycles: 1098442808 -> 1098446968 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3776> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3776>	2020-03-13 14:04:50 +00:00
Rhys Perry	6b4c31f814	aco: allow barriers to be skipped during scheduling Much better scheduling apparently in 160 shaders Totals from affected shaders: SGPRS: 6272 -> 6344 (1.15 %) VGPRS: 4832 -> 4844 (0.25 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 467192 -> 467428 (0.05 %) bytes LDS: 459 -> 459 (0.00 %) blocks Max Waves: 1407 -> 1409 (0.14 %) SMEM score: 9309.00 -> 11216.00 (20.49 %) VMEM score: 26679.00 -> 33652.00 (26.14 %) SMEM clauses: 1817 -> 1776 (-2.26 %) VMEM clauses: 2286 -> 2288 (0.09 %) Instructions: 86537 -> 86596 (0.07 %) Cycles: 676260 -> 676568 (0.05 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3776>	2020-03-13 14:04:50 +00:00
Rhys Perry	928ac97875	aco: add helpers for ensuring correct ordering while scheduling Pipeline-db changes in 721 shaders. Totals from affected shaders: SGPRS: 42336 -> 42656 (0.76 %) VGPRS: 38368 -> 38636 (0.70 %) Spilled SGPRs: 11967 -> 11967 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5268088 -> 5269840 (0.03 %) bytes LDS: 1069 -> 1069 (0.00 %) blocks Max Waves: 4473 -> 4447 (-0.58 %) SMEM score: 41155.00 -> 41826.00 (1.63 %) VMEM score: 146339.00 -> 147471.00 (0.77 %) SMEM clauses: 24434 -> 24535 (0.41 %) VMEM clauses: 16637 -> 16592 (-0.27 %) Instructions: 996037 -> 996388 (0.04 %) Cycles: 76476112 -> 75281416 (-1.56 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3776>	2020-03-13 14:04:50 +00:00
Rhys Perry	2cd760847a	aco: add helpers for moving instructions for scheduling No pipeline-db changes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3776>	2020-03-13 14:04:50 +00:00
Rhys Perry	85d05b3fd7	aco: fix uninitialized data error in waitcnt pass Shouldn't create any incorrect waitcnts but may create suboptimial waitcnts in rare cases. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4133> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4133>	2020-03-12 11:46:56 +00:00
Timur Kristóf	61f2e8d9bb	aco: Don't store TCS outputs to LDS when we're sure that none are read. This allows us not to write an output to LDS, even if it has an indirect offset. No pipeline DB changes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	9b36d8c23a	aco: Only write TCS outputs to LDS when they are read by the TCS. Note that tess factors are always read at the end of the shader, so those are still always saved to LDS. Totals from affected shaders: VGPRS: 25244 -> 25164 (-0.32 %) Code Size: 1768268 -> 1690804 (-4.38 %) bytes Max Waves: 4947 -> 4953 (0.12 %) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	4dcca26945	aco: Store tess factors in VMEM only at the end of the shader. This optimizes out several superfluous stores of the tess factors, especially if the shader wrote those outputs multiple times. Pipeline DB changes on GFX10: Totals from affected shaders: SGPRS: 30384 -> 29536 (-2.79 %) Code Size: 2260720 -> 2214484 (-2.05 %) bytes Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	8c3ab49c6b	aco: Don't generate an if when the first part of a merged HS or GS is empty. In some cases (eg. in a few tessellation CTS tests) the VS part of a merged HS is completely empty. Let's not generate a divergent if in these cases. (LLVM also doesn't do it.) No pipeline DB changes, only affects the CTS. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	cec6a856e5	aco: Enable running TES as ES, including merged TES+GS. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	926bdfae7d	aco: Implement loading TES inputs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	ec56a7093c	aco: Enable streamout when TES runs on the HW VS stage. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	6047e51430	aco: Store TES outputs when TES runs on the HW VS stage. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00
Timur Kristóf	1d9d1cbce9	aco: Use TES output info when TES runs on the VS stage. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>	2020-03-11 08:34:11 +00:00

... 7 8 9 10 11 ...

845 commits