intel/fs: Don't include sync.nop in instruction count statistics

With the advent of software scoreboarding, we emit sync instructions
in various places to synchronize the execution pipelines.  This results
in assembly being littered with a bunch of sync.nop instructions.  That
means that when you reorder anything in the program, the scoreboarding
changes, and the number of sync.nops can vary wildly - even if the code
isn't really materially better or worse.  This makes it hard to use
tools like shader-db or fossil-db on post-Icelake platforms.

For now, exclude sync.nops from the instruction count statistic.  One
day we may want to consider improving the software scoreboarding pass
to emit fewer redundant sync.nop instructions, at which point tracking
this as a separate stat might be useful.  For now though, it's simply
cluttering and confusing our results.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27701>
This commit is contained in:
Kenneth Graunke 2024-02-13 00:29:29 -08:00 committed by Marge Bot
parent 83d1241cf5
commit 1497f4e0c2

View file

@ -1635,7 +1635,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width,
int start_offset = p->next_insn_offset; int start_offset = p->next_insn_offset;
int loop_count = 0, send_count = 0, nop_count = 0; int loop_count = 0, send_count = 0, nop_count = 0, sync_nop_count = 0;
bool is_accum_used = false; bool is_accum_used = false;
struct disasm_info *disasm_info = disasm_initialize(p->isa, cfg); struct disasm_info *disasm_info = disasm_initialize(p->isa, cfg);
@ -1800,6 +1800,10 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width,
case BRW_OPCODE_SYNC: case BRW_OPCODE_SYNC:
assert(src[0].file == BRW_IMMEDIATE_VALUE); assert(src[0].file == BRW_IMMEDIATE_VALUE);
brw_SYNC(p, tgl_sync_function(src[0].ud)); brw_SYNC(p, tgl_sync_function(src[0].ud));
if (tgl_sync_function(src[0].ud) == TGL_SYNC_NOP)
++sync_nop_count;
break; break;
case BRW_OPCODE_MOV: case BRW_OPCODE_MOV:
brw_MOV(p, dst, src[0]); brw_MOV(p, dst, src[0]);
@ -2478,7 +2482,8 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width,
"Promoted %u constants, " "Promoted %u constants, "
"compacted %d to %d bytes.\n", "compacted %d to %d bytes.\n",
_mesa_shader_stage_to_abbrev(stage), _mesa_shader_stage_to_abbrev(stage),
dispatch_width, before_size / 16 - nop_count, dispatch_width,
before_size / 16 - nop_count - sync_nop_count,
loop_count, perf.latency, loop_count, perf.latency,
shader_stats.spill_count, shader_stats.spill_count,
shader_stats.fill_count, shader_stats.fill_count,
@ -2490,7 +2495,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width,
stats->dispatch_width = dispatch_width; stats->dispatch_width = dispatch_width;
stats->max_polygons = max_polygons; stats->max_polygons = max_polygons;
stats->max_dispatch_width = dispatch_width; stats->max_dispatch_width = dispatch_width;
stats->instructions = before_size / 16 - nop_count; stats->instructions = before_size / 16 - nop_count - sync_nop_count;
stats->sends = send_count; stats->sends = send_count;
stats->loops = loop_count; stats->loops = loop_count;
stats->cycles = perf.latency; stats->cycles = perf.latency;