mesa/src/intel/compiler
Francisco Jerez 531a34c7dd intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.
The current register allocation loop attempts to use a sequence of
pre-RA scheduling heuristics until register allocation is successful.
The sequence of scheduling heuristics is expected to be increasingly
aggressive at reducing the register pressure of the program (at a
performance cost), so that the instruction ordering chosen gives the
lowest latency achievable with the register space available.

Unfortunately that approach doesn't consistently give the best
performance on xe3+, since on recent platforms a schedule with higher
latency may actually give better performance if its lower register
pressure allows the use of a lower number of VRT register blocks which
allows the EU to run more threads in parallel.

This means that on xe3+ the scheduling mode with highest performance
is fundamentally dependent on the specific scenario (in particular
where in the thread count-register use curve the program is at, and
how effective the scheduler heuristics are at reducing latency for
each additional block of GRFs used), so it isn't possible to construct
a fixed sequence of the existing heuristics guaranteed to be ordered
by decreasing performance.  In order to find the scheduling heuristic
with better performance we have to run multiple of them prior to
register allocation and do some arithmetic to account for the effect
on parallelism of the register pressure estimated in each case, in
order to decide which heuristic will give the best performance.

This sounds costly but it is similar to the approach taken by
brw_allocate_registers() when unable to allocate without spills in
order to decide which scheduling heuristic to use in order to minimize
the number of spills.  In cases where that happens on xe3+ the
scheduling runs introduced here don't add to the scheduling runs done
to find the heuristic with minimum register pressure, we attempt to
determine the heuristic with lowest pressure and best performance in
the same loop, and then use one or the other depending on whether
register allocation succeeds without spills.

Significantly improves performance on PTL of the following Traci test
cases (4 iterations, 5% significance):

Nba2K23-trace-dx11-2160p-ultra:                     4.48% ±0.38%
Fortnite-trace-dx11-2160p-epix:                     1.61% ±0.28%
Superposition-trace-dx11-2160p-extreme:             1.37% ±0.26%
PubG-trace-dx11-1440p-ultra:                        1.15% ±0.29%
GtaV-trace-dx11-2160p-ultra:                        0.80% ±0.24%
CitiesSkylines2-trace-dx11-1440p-high:              0.68% ±0.19%
SpaceEngineers-trace-dx11-2160p-high:               0.65% ±0.34%

The compile-time cost of shader-db increases significantly by 3.7%
after this commit (15 iterations, 5% significance), the compile-time
of fossil-db doesn't change significantly in my setup.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:57 +00:00
..
elk brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
tests intel/compiler tests: fix path-to-string conversion 2025-06-23 08:26:29 +00:00
brw_analysis.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_analysis.h intel/brw/xe3+: Model trade-off between parallelism and GRF use in performance analysis. 2025-09-10 02:15:56 +00:00
brw_analysis_def.cpp brw: consider LOAD_PAYLOAD fully defined 2025-07-30 07:57:19 +00:00
brw_analysis_liveness.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_analysis_performance.cpp intel/brw: Allow using performance analysis pass pre-register allocation. 2025-09-10 02:15:57 +00:00
brw_asm.c brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_asm.h brw: Fix size in assembler when compacting 2025-03-03 20:43:56 +00:00
brw_asm_internal.h brw: Rework label tracking in assembler 2025-03-06 17:06:20 -08:00
brw_asm_tool.c intel/compiler tests: fix variable type for getopt_long() return value 2025-06-23 08:26:29 +00:00
brw_builder.cpp brw: Add brw_builder::uniform() 2025-04-04 23:07:21 +00:00
brw_builder.h brw: fix broadcast opcode 2025-08-28 00:23:44 +03:00
brw_cfg.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_cfg.h intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_compile_bs.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_cs.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_fs.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_gs.cpp anv/brw/iris: move VS VUE computation to backend 2025-09-05 07:46:16 +00:00
brw_compile_mesh.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_tcs.cpp brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_compile_tes.cpp brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_compile_vs.cpp anv/brw/iris: move VS VUE computation to backend 2025-09-05 07:46:16 +00:00
brw_compiler.c all: rename gl_shader_stage to mesa_shader_stage 2025-08-06 10:28:40 +08:00
brw_compiler.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_debug_recompile.c all: rename gl_shader_stage to mesa_shader_stage 2025-08-06 10:28:40 +08:00
brw_device_sha1_gen_c.py
brw_disasm.c intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_disasm.h intel/brw: support for dumping shader line numbers 2025-04-08 19:39:53 +00:00
brw_disasm_info.cpp brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_disasm_info.h brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_disasm_tool.c
brw_eu.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_eu.h build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_eu_compact.c brw: Avoid invalid access when compacting out-of-bounds JIP/UIP 2025-08-20 00:54:41 +00:00
brw_eu_defines.h Revert "brw: move texture offset packing to NIR" 2025-08-29 06:29:14 +00:00
brw_eu_emit.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_eu_inst.h brw: Add BRW_TYPE_BF for bfloat16 2025-03-25 05:23:37 +00:00
brw_eu_validate.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_from_nir.cpp brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_generator.cpp brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_generator.h intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_gram.y brw: Add EU assembler support for bfloat16 2025-03-25 05:23:37 +00:00
brw_inst.cpp brw: fix broadcast opcode 2025-08-28 00:23:44 +03:00
brw_inst.h brw: workaround broken indirect RT messages on Gfx11 2025-08-20 15:01:50 +00:00
brw_isa_info.h
brw_kernel.c intel: Update all NIR_PASS_V to NIR_PASS 2025-07-14 19:25:52 +00:00
brw_kernel.h intel: rework CL pre-compile 2025-01-25 03:28:07 +00:00
brw_lex.l brw: Add EU assembler support for bfloat16 2025-03-25 05:23:37 +00:00
brw_list.h intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_load_reg.cpp brw: Add and use brw_reg_is_arf to test for a specific ARF 2025-07-24 23:08:07 +00:00
brw_lower.cpp brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many 2025-08-08 22:12:08 +00:00
brw_lower_dpas.cpp brw: Simplify brw_builder "insert before inst" constructor 2025-03-06 23:33:38 +00:00
brw_lower_integer_multiplication.cpp brw: Remove bblock_t parameters from various passes 2025-03-06 23:33:38 +00:00
brw_lower_logical_sends.cpp Revert "brw: move texture offset packing to NIR" 2025-08-29 06:29:14 +00:00
brw_lower_pack.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_lower_regioning.cpp brw: Rename is_send_from_grf to is_send, replace other is_send() helper 2025-08-08 22:12:05 +00:00
brw_lower_scoreboard.cpp brw: Rename is_send_from_grf to is_send, replace other is_send() helper 2025-08-08 22:12:05 +00:00
brw_lower_simd_width.cpp brw: Use a builder to track position in lower_simd 2025-07-19 17:49:48 +00:00
brw_lower_subgroup_ops.cpp brw: Strategically place flags initialization to help cmod prop 2025-08-28 22:08:20 +00:00
brw_nir.c brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_nir.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_nir_analyze_ubo_ranges.c
brw_nir_lower_alpha_to_coverage.c nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries 2025-06-26 18:20:54 +00:00
brw_nir_lower_cooperative_matrix.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_nir_lower_cs_intrinsics.c all: rename gl_shader_stage_uses_workgroup to mesa_shader_stage_uses_workgroup 2025-08-06 10:28:41 +08:00
brw_nir_lower_fs_barycentrics.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
brw_nir_lower_fsign.py
brw_nir_lower_immediate_offsets.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
brw_nir_lower_intersection_shader.c nir: make nir_block::predecessors & dom_frontier sets non-malloc'd 2025-08-21 06:13:48 +00:00
brw_nir_lower_ray_queries.c intel/compiler: Fix ray geometry index 2025-08-19 09:32:55 +00:00
brw_nir_lower_rt_intrinsics.c intel/compiler: Fix ray geometry index 2025-08-19 09:32:55 +00:00
brw_nir_lower_rt_intrinsics_pre_trace.c nir: Add a faster lowest common ancestor algorithm 2025-09-08 23:03:13 +00:00
brw_nir_lower_sample_index_in_coord.c intel/compiler: Lower sample index into coord for MSRT messages 2025-03-07 23:06:14 +00:00
brw_nir_lower_shader_calls.c nir: make nir_block::predecessors & dom_frontier sets non-malloc'd 2025-08-21 06:13:48 +00:00
brw_nir_lower_storage_image.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_nir_lower_texel_address.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_nir_lower_texture.c Revert "brw: move texture offset packing to NIR" 2025-08-29 06:29:14 +00:00
brw_nir_opt_fsat.c nir: convert nir_instr_worklist to init/fini semantics w/out allocation 2025-08-21 06:13:49 +00:00
brw_nir_rt.c all: rename gl_shader_stage to mesa_shader_stage 2025-08-06 10:28:40 +08:00
brw_nir_rt.h intel: Update all NIR_PASS_V to NIR_PASS 2025-07-14 19:25:52 +00:00
brw_nir_rt_builder.h intel/rt: Update BVH instance leaf load for Xe3+ 2025-04-21 20:10:45 +00:00
brw_nir_trig_workarounds.py
brw_nir_wa_18019110168.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
brw_opt.cpp brw: Do cmod prop again after brw_lower_subgroup_ops 2025-08-28 22:08:20 +00:00
brw_opt_address_reg_load.cpp brw: Fix checking sources of wrong instruction in opt_address_reg_load 2025-08-27 22:50:23 +00:00
brw_opt_algebraic.cpp brw: Fix folding case for MAD instruction with all immediates 2025-08-21 17:19:18 +00:00
brw_opt_bank_conflicts.cpp util: crib SWAP macro from freedreno 2025-07-21 11:42:18 +00:00
brw_opt_cmod_propagation.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_opt_combine_constants.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_opt_copy_propagation.cpp brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many 2025-08-08 22:12:08 +00:00
brw_opt_cse.cpp brw: Stop using is_send_from_grf() in CSE pass 2025-08-08 22:12:05 +00:00
brw_opt_dead_code_eliminate.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_opt_register_coalesce.cpp brw: enable opt_register_coalesce to work with multiple EOT blocks 2025-08-20 15:01:50 +00:00
brw_opt_saturate_propagation.cpp brw: Clean up saturate propagation after non-defs version removal 2025-04-09 19:06:48 +00:00
brw_opt_txf_combiner.cpp brw: Add more specific brw_builder helpers 2025-07-19 17:49:47 +00:00
brw_opt_virtual_grfs.cpp brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs() 2025-04-11 20:34:51 +00:00
brw_packed_float.c
brw_prim.h
brw_print.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_private.h intel/debug: shader dump filter 2025-05-23 19:57:02 +00:00
brw_reg.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_reg.h build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_reg_allocate.cpp intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism. 2025-09-10 02:15:55 +00:00
brw_reg_type.c brw: Add BRW_TYPE_BF for bfloat16 2025-03-25 05:23:37 +00:00
brw_reg_type.h brw: Add BRW_TYPE_BF for bfloat16 2025-03-25 05:23:37 +00:00
brw_rt.h
brw_schedule_instructions.cpp intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode. 2025-09-10 02:15:55 +00:00
brw_shader.cpp intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency. 2025-09-10 02:15:57 +00:00
brw_shader.h intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode. 2025-09-10 02:15:55 +00:00
brw_simd_selection.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_spirv.c nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar() 2025-07-08 15:33:59 +00:00
brw_thread_payload.cpp all: rename gl_shader_stage_is_compute to mesa_shader_stage_is_compute 2025-08-06 10:28:41 +08:00
brw_thread_payload.h intel/brw: Rename fs_visitor to brw_shader 2025-02-11 09:13:28 +00:00
brw_validate.cpp brw: Run validation as soon as we have the CFG around 2025-09-03 20:42:05 +00:00
brw_vue_map.c brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_workaround.cpp brw: Rename is_send_from_grf to is_send, replace other is_send() helper 2025-08-08 22:12:05 +00:00
intel_gfx_ver_enum.h build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
intel_nir.c intel/compiler: Use nir_split_conversions() 2025-04-07 17:45:21 -05:00
intel_nir.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
intel_nir_blockify_uniform_loads.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_clamp_image_1d_2d_array_sizes.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_clamp_per_vertex_loads.c brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
intel_nir_lower_non_uniform_barycentric_at_sample.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
intel_nir_lower_non_uniform_resource_intel.c
intel_nir_lower_printf.c nir: drop printf_base_identifier 2025-02-05 20:33:15 +00:00
intel_nir_lower_shading_rate_output.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_lower_sparse.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_opt_peephole_ffma.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
intel_nir_opt_peephole_imul32x16.c
intel_nir_tcs_workarounds.c nir: make nir_block::predecessors & dom_frontier sets non-malloc'd 2025-08-21 06:13:48 +00:00
intel_shader_enums.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
meson.build brw: replace lower_fs_msaa with nir_inline_sysval 2025-08-03 21:27:47 +00:00
test_eu_compact.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
test_eu_validate.cpp brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
test_helpers.cpp brw: Simplify the test code for brw passes 2025-03-13 17:43:17 +00:00
test_helpers.h brw: Add brw_shader_params 2025-08-28 00:06:18 +00:00
test_insert_load_reg.cpp brw: Add passes to generate and lower load_reg 2025-04-04 06:45:02 +00:00
test_lower_scoreboard.cpp brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many 2025-08-08 22:12:08 +00:00
test_opt_algebraic.cpp brw: Fix folding case for MAD instruction with all immediates 2025-08-21 17:19:18 +00:00
test_opt_cmod_propagation.cpp brw/cmod: Don't propagate from CMP to possible Inf + (-Inf) 2025-04-28 19:44:23 +00:00
test_opt_combine_constants.cpp brw: Add brw_builder::uniform() 2025-04-04 23:07:21 +00:00
test_opt_copy_propagation.cpp brw: Simplify the test code for brw passes 2025-03-13 17:43:17 +00:00
test_opt_cse.cpp brw: Simplify the test code for brw passes 2025-03-13 17:43:17 +00:00
test_opt_register_coalesce.cpp brw: don't generate invalid instructions 2025-06-04 06:08:26 +00:00
test_opt_saturate_propagation.cpp brw/sat: Eliminate non-defs saturate propagation 2025-04-04 06:45:02 +00:00
test_simd_selection.cpp intel: Switch uint64_t intel_debug to a bitset 2025-04-22 23:09:26 +00:00
test_vf_float_conversions.cpp