From 7994534fe9ecd5352fc9661297facbecf736a6ec Mon Sep 17 00:00:00 2001 From: Ian Romanick Date: Wed, 6 Nov 2024 12:07:15 -0800 Subject: [PATCH] brw/cse: Don't eliminate instructions that write flags MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With other changes in my tree, I observed this code from dEQP-VK.subgroups.vote.compute.subgroupallequal_float have the second cmp.z removed. undef(8) %69:UD cmp.z.f0.0(8) %69:F, %37:F, %57+0.0<0>:F mov(1) v58+0.0:D, 0d NoMask group0 (+f0.0) mov(1) v58+0.0:D, -1d NoMask group0 cmp.nz.f0.0(8) null:D, v58+0.0<0>:D, 0d ... undef(8) %72:UD cmp.z.f0.0(8) %72:F, %37:F, %57+0.0<0>:F mov(1) v63+0.0:D, 0d NoMask group0 (+f0.0) mov(1) v63+0.0:D, -1d NoMask group0 This was also fixed by running dead-code elimination before CSE. That seems more like avoiding the problem than fixing it, though. I believe this affects shader-db results because leaving the second CMP in the shader can give more opportunities for cmod propagation. Reviewed-by: Kenneth Graunke Fixes: 234c45c929e ("intel/brw: Write a new global CSE pass that works on defs") shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 922097690 -> 922260862 (0.02%) cycles in affected programs: 3178926 -> 3342098 (5.13%) helped: 130 HURT: 88 helped stats (abs) min: 2 max: 2194 x̄: 296.71 x̃: 16 helped stats (rel) min: <.01% max: 16.56% x̄: 1.86% x̃: 0.18% HURT stats (abs) min: 4 max: 11992 x̄: 2292.55 x̃: 47 HURT stats (rel) min: 0.04% max: 57.32% x̄: 11.82% x̃: 0.61% 95% mean confidence interval for cycles value: 320.36 1176.63 95% mean confidence interval for cycles %-change: 1.59% 5.73% Cycles are HURT. LOST: 2 GAINED: 1 fossil-db: Lunar Lake, Meteor Lake, Tiger Lake had similar results. (Lunar Lake shown) Totals: Instrs: 142022960 -> 142022928 (-0.00%); split: -0.00%, +0.00% Cycle count: 21995242782 -> 21995384040 (+0.00%); split: -0.00%, +0.00% Max live registers: 48013385 -> 48013343 (-0.00%) Totals from 507 (0.09% of 551441) affected shaders: Instrs: 886191 -> 886159 (-0.00%); split: -0.01%, +0.01% Cycle count: 69302492 -> 69443750 (+0.20%); split: -0.66%, +0.86% Max live registers: 94413 -> 94371 (-0.04%) DG2 Totals: Instrs: 152856370 -> 152856093 (-0.00%); split: -0.00%, +0.00% Cycle count: 17237159885 -> 17236804052 (-0.00%); split: -0.00%, +0.00% Fill count: 150673 -> 150631 (-0.03%) Max live registers: 31871520 -> 31871476 (-0.00%) Totals from 506 (0.08% of 633197) affected shaders: Instrs: 831795 -> 831518 (-0.03%); split: -0.04%, +0.01% Cycle count: 55578509 -> 55222676 (-0.64%); split: -1.38%, +0.74% Fill count: 2779 -> 2737 (-1.51%) Max live registers: 51383 -> 51339 (-0.09%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 152017826 -> 152017793 (-0.00%); split: -0.00%, +0.00% Cycle count: 15180773451 -> 15180761166 (-0.00%); split: -0.00%, +0.00% Fill count: 106610 -> 106614 (+0.00%) Max live registers: 32195006 -> 32194966 (-0.00%) Totals from 411 (0.06% of 637268) affected shaders: Instrs: 705935 -> 705902 (-0.00%); split: -0.01%, +0.01% Cycle count: 47830019 -> 47817734 (-0.03%); split: -0.05%, +0.02% Fill count: 2865 -> 2869 (+0.14%) Max live registers: 42883 -> 42843 (-0.09%) (cherry picked from commit 9aba731d0354d456a8aab53e3a1ceef94a0d7f99) Part-of: --- .pick_status.json | 2 +- src/intel/compiler/brw_fs_cse.cpp | 13 +++++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/.pick_status.json b/.pick_status.json index b19eba3982d..8ebf8aa9c85 100644 --- a/.pick_status.json +++ b/.pick_status.json @@ -1274,7 +1274,7 @@ "description": "brw/cse: Don't eliminate instructions that write flags", "nominated": true, "nomination_type": 2, - "resolution": 0, + "resolution": 1, "main_sha": null, "because_sha": "234c45c929e0341f1d0c2a51a587c4ce7e4bba52", "notes": null diff --git a/src/intel/compiler/brw_fs_cse.cpp b/src/intel/compiler/brw_fs_cse.cpp index c5838e38919..e1c0ab6b0af 100644 --- a/src/intel/compiler/brw_fs_cse.cpp +++ b/src/intel/compiler/brw_fs_cse.cpp @@ -475,6 +475,19 @@ brw_fs_opt_cse_defs(fs_visitor &s) assert(ops_must_match); } + /* Some later instruction could depend on the flags written by + * this instruction. It can only be removed if the previous + * instruction that write the flags is identical. + */ + if (inst->flags_written(devinfo)) { + bool ignored; + + if (last_flag_write == NULL || + !instructions_match(last_flag_write, inst, &ignored)) { + continue; + } + } + progress = true; need_remaps = true; remap_table[inst->dst.nr] =