mesa/src/intel/compiler at 531a34c7ddf216749ac7245604280f002fbec104 - fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-21 13:18:09 +02:00

History

Francisco Jerez 531a34c7dd intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency. The current register allocation loop attempts to use a sequence of pre-RA scheduling heuristics until register allocation is successful. The sequence of scheduling heuristics is expected to be increasingly aggressive at reducing the register pressure of the program (at a performance cost), so that the instruction ordering chosen gives the lowest latency achievable with the register space available. Unfortunately that approach doesn't consistently give the best performance on xe3+, since on recent platforms a schedule with higher latency may actually give better performance if its lower register pressure allows the use of a lower number of VRT register blocks which allows the EU to run more threads in parallel. This means that on xe3+ the scheduling mode with highest performance is fundamentally dependent on the specific scenario (in particular where in the thread count-register use curve the program is at, and how effective the scheduler heuristics are at reducing latency for each additional block of GRFs used), so it isn't possible to construct a fixed sequence of the existing heuristics guaranteed to be ordered by decreasing performance. In order to find the scheduling heuristic with better performance we have to run multiple of them prior to register allocation and do some arithmetic to account for the effect on parallelism of the register pressure estimated in each case, in order to decide which heuristic will give the best performance. This sounds costly but it is similar to the approach taken by brw_allocate_registers() when unable to allocate without spills in order to decide which scheduling heuristic to use in order to minimize the number of spills. In cases where that happens on xe3+ the scheduling runs introduced here don't add to the scheduling runs done to find the heuristic with minimum register pressure, we attempt to determine the heuristic with lowest pressure and best performance in the same loop, and then use one or the other depending on whether register allocation succeeds without spills. Significantly improves performance on PTL of the following Traci test cases (4 iterations, 5% significance): Nba2K23-trace-dx11-2160p-ultra: 4.48% ±0.38% Fortnite-trace-dx11-2160p-epix: 1.61% ±0.28% Superposition-trace-dx11-2160p-extreme: 1.37% ±0.26% PubG-trace-dx11-1440p-ultra: 1.15% ±0.29% GtaV-trace-dx11-2160p-ultra: 0.80% ±0.24% CitiesSkylines2-trace-dx11-1440p-high: 0.68% ±0.19% SpaceEngineers-trace-dx11-2160p-high: 0.65% ±0.34% The compile-time cost of shader-db increases significantly by 3.7% after this commit (15 iterations, 5% significance), the compile-time of fossil-db doesn't change significantly in my setup. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>		2025-09-10 02:15:57 +00:00
..
elk	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
tests	intel/compiler tests: fix path-to-string conversion	2025-06-23 08:26:29 +00:00
brw_analysis.cpp	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_analysis.h	intel/brw/xe3+: Model trade-off between parallelism and GRF use in performance analysis.	2025-09-10 02:15:56 +00:00
brw_analysis_def.cpp	brw: consider LOAD_PAYLOAD fully defined	2025-07-30 07:57:19 +00:00
brw_analysis_liveness.cpp	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_analysis_performance.cpp	intel/brw: Allow using performance analysis pass pre-register allocation.	2025-09-10 02:15:57 +00:00
brw_asm.c	brw: Add `FILE *` parameter to dump_assembly	2025-09-09 10:40:42 -07:00
brw_asm.h	brw: Fix size in assembler when compacting	2025-03-03 20:43:56 +00:00
brw_asm_internal.h	brw: Rework label tracking in assembler	2025-03-06 17:06:20 -08:00
brw_asm_tool.c	intel/compiler tests: fix variable type for getopt_long() return value	2025-06-23 08:26:29 +00:00
brw_builder.cpp	brw: Add brw_builder::uniform()	2025-04-04 23:07:21 +00:00
brw_builder.h	brw: fix broadcast opcode	2025-08-28 00:23:44 +03:00
brw_cfg.cpp	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_cfg.h	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_compile_bs.cpp	intel/brw: Take shader in the brw_generator::generate_code() parameters	2025-08-28 00:06:20 +00:00
brw_compile_cs.cpp	intel/brw: Take shader in the brw_generator::generate_code() parameters	2025-08-28 00:06:20 +00:00
brw_compile_fs.cpp	intel/brw: Take shader in the brw_generator::generate_code() parameters	2025-08-28 00:06:20 +00:00
brw_compile_gs.cpp	anv/brw/iris: move VS VUE computation to backend	2025-09-05 07:46:16 +00:00
brw_compile_mesh.cpp	intel/brw: Take shader in the brw_generator::generate_code() parameters	2025-08-28 00:06:20 +00:00
brw_compile_tcs.cpp	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_compile_tes.cpp	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_compile_vs.cpp	anv/brw/iris: move VS VUE computation to backend	2025-09-05 07:46:16 +00:00
brw_compiler.c	all: rename gl_shader_stage to mesa_shader_stage	2025-08-06 10:28:40 +08:00
brw_compiler.h	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_debug_recompile.c	all: rename gl_shader_stage to mesa_shader_stage	2025-08-06 10:28:40 +08:00
brw_device_sha1_gen_c.py
brw_disasm.c	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_disasm.h	intel/brw: support for dumping shader line numbers	2025-04-08 19:39:53 +00:00
brw_disasm_info.cpp	brw: Add `FILE *` parameter to dump_assembly	2025-09-09 10:40:42 -07:00
brw_disasm_info.h	brw: Add `FILE *` parameter to dump_assembly	2025-09-09 10:40:42 -07:00
brw_disasm_tool.c
brw_eu.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_eu.h	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_eu_compact.c	brw: Avoid invalid access when compacting out-of-bounds JIP/UIP	2025-08-20 00:54:41 +00:00
brw_eu_defines.h	Revert "brw: move texture offset packing to NIR"	2025-08-29 06:29:14 +00:00
brw_eu_emit.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_eu_inst.h	brw: Add BRW_TYPE_BF for bfloat16	2025-03-25 05:23:37 +00:00
brw_eu_validate.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_from_nir.cpp	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_generator.cpp	brw: Add `FILE *` parameter to dump_assembly	2025-09-09 10:40:42 -07:00
brw_generator.h	intel/brw: Take shader in the brw_generator::generate_code() parameters	2025-08-28 00:06:20 +00:00
brw_gram.y	brw: Add EU assembler support for bfloat16	2025-03-25 05:23:37 +00:00
brw_inst.cpp	brw: fix broadcast opcode	2025-08-28 00:23:44 +03:00
brw_inst.h	brw: workaround broken indirect RT messages on Gfx11	2025-08-20 15:01:50 +00:00
brw_isa_info.h
brw_kernel.c	intel: Update all NIR_PASS_V to NIR_PASS	2025-07-14 19:25:52 +00:00
brw_kernel.h	intel: rework CL pre-compile	2025-01-25 03:28:07 +00:00
brw_lex.l	brw: Add EU assembler support for bfloat16	2025-03-25 05:23:37 +00:00
brw_list.h	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_load_reg.cpp	brw: Add and use brw_reg_is_arf to test for a specific ARF	2025-07-24 23:08:07 +00:00
brw_lower.cpp	brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many	2025-08-08 22:12:08 +00:00
brw_lower_dpas.cpp	brw: Simplify brw_builder "insert before inst" constructor	2025-03-06 23:33:38 +00:00
brw_lower_integer_multiplication.cpp	brw: Remove bblock_t parameters from various passes	2025-03-06 23:33:38 +00:00
brw_lower_logical_sends.cpp	Revert "brw: move texture offset packing to NIR"	2025-08-29 06:29:14 +00:00
brw_lower_pack.cpp	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_lower_regioning.cpp	brw: Rename is_send_from_grf to is_send, replace other is_send() helper	2025-08-08 22:12:05 +00:00
brw_lower_scoreboard.cpp	brw: Rename is_send_from_grf to is_send, replace other is_send() helper	2025-08-08 22:12:05 +00:00
brw_lower_simd_width.cpp	brw: Use a builder to track position in lower_simd	2025-07-19 17:49:48 +00:00
brw_lower_subgroup_ops.cpp	brw: Strategically place flags initialization to help cmod prop	2025-08-28 22:08:20 +00:00
brw_nir.c	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_nir.h	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_nir_analyze_ubo_ranges.c
brw_nir_lower_alpha_to_coverage.c	nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries	2025-06-26 18:20:54 +00:00
brw_nir_lower_cooperative_matrix.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_nir_lower_cs_intrinsics.c	all: rename gl_shader_stage_uses_workgroup to mesa_shader_stage_uses_workgroup	2025-08-06 10:28:41 +08:00
brw_nir_lower_fs_barycentrics.c	treewide: simplify nir_def_rewrite_uses_after	2025-08-01 15:34:24 +00:00
brw_nir_lower_fsign.py
brw_nir_lower_immediate_offsets.c	treewide: use nir_def_as_*	2025-08-01 15:34:24 +00:00
brw_nir_lower_intersection_shader.c	nir: make nir_block::predecessors & dom_frontier sets non-malloc'd	2025-08-21 06:13:48 +00:00
brw_nir_lower_ray_queries.c	intel/compiler: Fix ray geometry index	2025-08-19 09:32:55 +00:00
brw_nir_lower_rt_intrinsics.c	intel/compiler: Fix ray geometry index	2025-08-19 09:32:55 +00:00
brw_nir_lower_rt_intrinsics_pre_trace.c	nir: Add a faster lowest common ancestor algorithm	2025-09-08 23:03:13 +00:00
brw_nir_lower_sample_index_in_coord.c	intel/compiler: Lower sample index into coord for MSRT messages	2025-03-07 23:06:14 +00:00
brw_nir_lower_shader_calls.c	nir: make nir_block::predecessors & dom_frontier sets non-malloc'd	2025-08-21 06:13:48 +00:00
brw_nir_lower_storage_image.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_nir_lower_texel_address.c	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_nir_lower_texture.c	Revert "brw: move texture offset packing to NIR"	2025-08-29 06:29:14 +00:00
brw_nir_opt_fsat.c	nir: convert nir_instr_worklist to init/fini semantics w/out allocation	2025-08-21 06:13:49 +00:00
brw_nir_rt.c	all: rename gl_shader_stage to mesa_shader_stage	2025-08-06 10:28:40 +08:00
brw_nir_rt.h	intel: Update all NIR_PASS_V to NIR_PASS	2025-07-14 19:25:52 +00:00
brw_nir_rt_builder.h	intel/rt: Update BVH instance leaf load for Xe3+	2025-04-21 20:10:45 +00:00
brw_nir_trig_workarounds.py
brw_nir_wa_18019110168.c	treewide: use nir_def_as_*	2025-08-01 15:34:24 +00:00
brw_opt.cpp	brw: Do cmod prop again after brw_lower_subgroup_ops	2025-08-28 22:08:20 +00:00
brw_opt_address_reg_load.cpp	brw: Fix checking sources of wrong instruction in opt_address_reg_load	2025-08-27 22:50:23 +00:00
brw_opt_algebraic.cpp	brw: Fix folding case for MAD instruction with all immediates	2025-08-21 17:19:18 +00:00
brw_opt_bank_conflicts.cpp	util: crib SWAP macro from freedreno	2025-07-21 11:42:18 +00:00
brw_opt_cmod_propagation.cpp	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_opt_combine_constants.cpp	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_opt_copy_propagation.cpp	brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many	2025-08-08 22:12:08 +00:00
brw_opt_cse.cpp	brw: Stop using is_send_from_grf() in CSE pass	2025-08-08 22:12:05 +00:00
brw_opt_dead_code_eliminate.cpp	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_opt_register_coalesce.cpp	brw: enable opt_register_coalesce to work with multiple EOT blocks	2025-08-20 15:01:50 +00:00
brw_opt_saturate_propagation.cpp	brw: Clean up saturate propagation after non-defs version removal	2025-04-09 19:06:48 +00:00
brw_opt_txf_combiner.cpp	brw: Add more specific brw_builder helpers	2025-07-19 17:49:47 +00:00
brw_opt_virtual_grfs.cpp	brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs()	2025-04-11 20:34:51 +00:00
brw_packed_float.c
brw_prim.h
brw_print.cpp	intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility	2025-07-31 20:23:02 +00:00
brw_private.h	intel/debug: shader dump filter	2025-05-23 19:57:02 +00:00
brw_reg.cpp	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_reg.h	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_reg_allocate.cpp	intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism.	2025-09-10 02:15:55 +00:00
brw_reg_type.c	brw: Add BRW_TYPE_BF for bfloat16	2025-03-25 05:23:37 +00:00
brw_reg_type.h	brw: Add BRW_TYPE_BF for bfloat16	2025-03-25 05:23:37 +00:00
brw_rt.h
brw_schedule_instructions.cpp	intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.	2025-09-10 02:15:55 +00:00
brw_shader.cpp	intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.	2025-09-10 02:15:57 +00:00
brw_shader.h	intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.	2025-09-10 02:15:55 +00:00
brw_simd_selection.cpp	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
brw_spirv.c	nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar()	2025-07-08 15:33:59 +00:00
brw_thread_payload.cpp	all: rename gl_shader_stage_is_compute to mesa_shader_stage_is_compute	2025-08-06 10:28:41 +08:00
brw_thread_payload.h	intel/brw: Rename fs_visitor to brw_shader	2025-02-11 09:13:28 +00:00
brw_validate.cpp	brw: Run validation as soon as we have the CFG around	2025-09-03 20:42:05 +00:00
brw_vue_map.c	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
brw_workaround.cpp	brw: Rename is_send_from_grf to is_send, replace other is_send() helper	2025-08-08 22:12:05 +00:00
intel_gfx_ver_enum.h	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
intel_nir.c	intel/compiler: Use nir_split_conversions()	2025-04-07 17:45:21 -05:00
intel_nir.h	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
intel_nir_blockify_uniform_loads.c	treewide: simplify nir_def_rewrite_uses_after	2025-08-01 15:34:24 +00:00
intel_nir_clamp_image_1d_2d_array_sizes.c	treewide: simplify nir_def_rewrite_uses_after	2025-08-01 15:34:24 +00:00
intel_nir_clamp_per_vertex_loads.c	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
intel_nir_lower_non_uniform_barycentric_at_sample.c	treewide: use nir_def_as_*	2025-08-01 15:34:24 +00:00
intel_nir_lower_non_uniform_resource_intel.c
intel_nir_lower_printf.c	nir: drop printf_base_identifier	2025-02-05 20:33:15 +00:00
intel_nir_lower_shading_rate_output.c	treewide: simplify nir_def_rewrite_uses_after	2025-08-01 15:34:24 +00:00
intel_nir_lower_sparse.c	treewide: simplify nir_def_rewrite_uses_after	2025-08-01 15:34:24 +00:00
intel_nir_opt_peephole_ffma.c	treewide: use nir_def_as_*	2025-08-01 15:34:24 +00:00
intel_nir_opt_peephole_imul32x16.c
intel_nir_tcs_workarounds.c	nir: make nir_block::predecessors & dom_frontier sets non-malloc'd	2025-08-21 06:13:48 +00:00
intel_shader_enums.h	brw: add support for separate tessellation shader compilation	2025-09-05 07:46:17 +00:00
meson.build	brw: replace lower_fs_msaa with nir_inline_sysval	2025-08-03 21:27:47 +00:00
test_eu_compact.cpp	build: avoid redefining unreachable() which is standard in C23	2025-07-31 17:49:42 +00:00
test_eu_validate.cpp	brw: Add `FILE *` parameter to dump_assembly	2025-09-09 10:40:42 -07:00
test_helpers.cpp	brw: Simplify the test code for brw passes	2025-03-13 17:43:17 +00:00
test_helpers.h	brw: Add brw_shader_params	2025-08-28 00:06:18 +00:00
test_insert_load_reg.cpp	brw: Add passes to generate and lower load_reg	2025-04-04 06:45:02 +00:00
test_lower_scoreboard.cpp	brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many	2025-08-08 22:12:08 +00:00
test_opt_algebraic.cpp	brw: Fix folding case for MAD instruction with all immediates	2025-08-21 17:19:18 +00:00
test_opt_cmod_propagation.cpp	brw/cmod: Don't propagate from CMP to possible Inf + (-Inf)	2025-04-28 19:44:23 +00:00
test_opt_combine_constants.cpp	brw: Add brw_builder::uniform()	2025-04-04 23:07:21 +00:00
test_opt_copy_propagation.cpp	brw: Simplify the test code for brw passes	2025-03-13 17:43:17 +00:00
test_opt_cse.cpp	brw: Simplify the test code for brw passes	2025-03-13 17:43:17 +00:00
test_opt_register_coalesce.cpp	brw: don't generate invalid instructions	2025-06-04 06:08:26 +00:00
test_opt_saturate_propagation.cpp	brw/sat: Eliminate non-defs saturate propagation	2025-04-04 06:45:02 +00:00
test_simd_selection.cpp	intel: Switch uint64_t intel_debug to a bitset	2025-04-22 23:09:26 +00:00
test_vf_float_conversions.cpp