mesa/src/intel/compiler
Francisco Jerez 1272ff5ed1 intel/brw/xehp+: Adjust performance model weights of LSC atomic ops.
The LSC implements several optimizations for atomic operations on a
memory addresses that are uniform across all lanes, in which case its
cost is approximately O(1) instead of O(exec_size).  Even cases where
memory offsets are non-uniform but packed in a cacheline appear to
have a cost that is non-linear with the number of lanes.

In order to approximate this behavior more closely approximate its
back-end cost as roughly 1300 cycles instead of the previous 400 *
exec_size/8.  This fixes some cases where we were incorrectly
predicting the SIMD32 shader would be bound by the throughput of LSC
atomic operations, even though the observed cost per lane of the LSC
operations was significantly lower in SIMD32 mode so it would have the
best performance.

Clearly this is still a rough approximation and it might be possible
to obtain a more accurate result by plumbing divergence analysis data
all the way down to codegen, however the goal of the performance
analysis pass isn't to provide an exact prediction of the performance
of a shader (that's not really possible in general via static analysis
without solving the halting problem), but to provide a good enough
approximation at a low cost -- And the constant approximation seems to
be strictly better in practice than the approximation we were using
before, there appear to be no regressions from this change, and
ShadowTombRaider-trace-dx11-2160p-ultra shows 5.7% better performance
on PTL with a subsequent commit that re-enables the use of the static
analysis-based SIMD32 heuristic on xe3+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:56 +00:00
..
elk brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
tests intel/compiler tests: fix path-to-string conversion 2025-06-23 08:26:29 +00:00
brw_analysis.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_analysis.h intel/brw/xe3+: Model trade-off between parallelism and GRF use in performance analysis. 2025-09-10 02:15:56 +00:00
brw_analysis_def.cpp brw: consider LOAD_PAYLOAD fully defined 2025-07-30 07:57:19 +00:00
brw_analysis_liveness.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_analysis_performance.cpp intel/brw/xehp+: Adjust performance model weights of LSC atomic ops. 2025-09-10 02:15:56 +00:00
brw_asm.c brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_asm.h brw: Fix size in assembler when compacting 2025-03-03 20:43:56 +00:00
brw_asm_internal.h brw: Rework label tracking in assembler 2025-03-06 17:06:20 -08:00
brw_asm_tool.c intel/compiler tests: fix variable type for getopt_long() return value 2025-06-23 08:26:29 +00:00
brw_builder.cpp brw: Add brw_builder::uniform() 2025-04-04 23:07:21 +00:00
brw_builder.h brw: fix broadcast opcode 2025-08-28 00:23:44 +03:00
brw_cfg.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_cfg.h intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_compile_bs.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_cs.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_fs.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_gs.cpp anv/brw/iris: move VS VUE computation to backend 2025-09-05 07:46:16 +00:00
brw_compile_mesh.cpp intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_compile_tcs.cpp brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_compile_tes.cpp brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_compile_vs.cpp anv/brw/iris: move VS VUE computation to backend 2025-09-05 07:46:16 +00:00
brw_compiler.c all: rename gl_shader_stage to mesa_shader_stage 2025-08-06 10:28:40 +08:00
brw_compiler.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_debug_recompile.c all: rename gl_shader_stage to mesa_shader_stage 2025-08-06 10:28:40 +08:00
brw_device_sha1_gen_c.py intel/compiler: drop unused ray-tracing fields from cache hash 2024-03-22 00:01:28 +00:00
brw_disasm.c intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_disasm.h intel/brw: support for dumping shader line numbers 2025-04-08 19:39:53 +00:00
brw_disasm_info.cpp brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_disasm_info.h brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_disasm_tool.c intel/brw: Remove Gfx8- code from disassembler 2024-02-28 05:45:38 +00:00
brw_eu.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_eu.h build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_eu_compact.c brw: Avoid invalid access when compacting out-of-bounds JIP/UIP 2025-08-20 00:54:41 +00:00
brw_eu_defines.h Revert "brw: move texture offset packing to NIR" 2025-08-29 06:29:14 +00:00
brw_eu_emit.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_eu_inst.h brw: Add BRW_TYPE_BF for bfloat16 2025-03-25 05:23:37 +00:00
brw_eu_validate.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_from_nir.cpp brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_generator.cpp brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
brw_generator.h intel/brw: Take shader in the brw_generator::generate_code() parameters 2025-08-28 00:06:20 +00:00
brw_gram.y brw: Add EU assembler support for bfloat16 2025-03-25 05:23:37 +00:00
brw_inst.cpp brw: fix broadcast opcode 2025-08-28 00:23:44 +03:00
brw_inst.h brw: workaround broken indirect RT messages on Gfx11 2025-08-20 15:01:50 +00:00
brw_isa_info.h intel/compiler: Use #pragma once instead of header guards 2024-12-11 19:47:44 +00:00
brw_kernel.c intel: Update all NIR_PASS_V to NIR_PASS 2025-07-14 19:25:52 +00:00
brw_kernel.h intel: rework CL pre-compile 2025-01-25 03:28:07 +00:00
brw_lex.l brw: Add EU assembler support for bfloat16 2025-03-25 05:23:37 +00:00
brw_list.h intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_load_reg.cpp brw: Add and use brw_reg_is_arf to test for a specific ARF 2025-07-24 23:08:07 +00:00
brw_lower.cpp brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many 2025-08-08 22:12:08 +00:00
brw_lower_dpas.cpp brw: Simplify brw_builder "insert before inst" constructor 2025-03-06 23:33:38 +00:00
brw_lower_integer_multiplication.cpp brw: Remove bblock_t parameters from various passes 2025-03-06 23:33:38 +00:00
brw_lower_logical_sends.cpp Revert "brw: move texture offset packing to NIR" 2025-08-29 06:29:14 +00:00
brw_lower_pack.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_lower_regioning.cpp brw: Rename is_send_from_grf to is_send, replace other is_send() helper 2025-08-08 22:12:05 +00:00
brw_lower_scoreboard.cpp brw: Rename is_send_from_grf to is_send, replace other is_send() helper 2025-08-08 22:12:05 +00:00
brw_lower_simd_width.cpp brw: Use a builder to track position in lower_simd 2025-07-19 17:49:48 +00:00
brw_lower_subgroup_ops.cpp brw: Strategically place flags initialization to help cmod prop 2025-08-28 22:08:20 +00:00
brw_nir.c brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_nir.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_nir_analyze_ubo_ranges.c intel/compiler: take reg_unit size into account with ubo ranges 2025-01-07 21:38:06 +00:00
brw_nir_lower_alpha_to_coverage.c nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries 2025-06-26 18:20:54 +00:00
brw_nir_lower_cooperative_matrix.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_nir_lower_cs_intrinsics.c all: rename gl_shader_stage_uses_workgroup to mesa_shader_stage_uses_workgroup 2025-08-06 10:28:41 +08:00
brw_nir_lower_fs_barycentrics.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
brw_nir_lower_fsign.py intel/brw: Use range analysis to optimize fsign 2024-05-14 01:28:21 +00:00
brw_nir_lower_immediate_offsets.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
brw_nir_lower_intersection_shader.c nir: make nir_block::predecessors & dom_frontier sets non-malloc'd 2025-08-21 06:13:48 +00:00
brw_nir_lower_ray_queries.c intel/compiler: Fix ray geometry index 2025-08-19 09:32:55 +00:00
brw_nir_lower_rt_intrinsics.c intel/compiler: Fix ray geometry index 2025-08-19 09:32:55 +00:00
brw_nir_lower_rt_intrinsics_pre_trace.c nir: Add a faster lowest common ancestor algorithm 2025-09-08 23:03:13 +00:00
brw_nir_lower_sample_index_in_coord.c intel/compiler: Lower sample index into coord for MSRT messages 2025-03-07 23:06:14 +00:00
brw_nir_lower_shader_calls.c nir: make nir_block::predecessors & dom_frontier sets non-malloc'd 2025-08-21 06:13:48 +00:00
brw_nir_lower_storage_image.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_nir_lower_texel_address.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_nir_lower_texture.c Revert "brw: move texture offset packing to NIR" 2025-08-29 06:29:14 +00:00
brw_nir_opt_fsat.c nir: convert nir_instr_worklist to init/fini semantics w/out allocation 2025-08-21 06:13:49 +00:00
brw_nir_rt.c all: rename gl_shader_stage to mesa_shader_stage 2025-08-06 10:28:40 +08:00
brw_nir_rt.h intel: Update all NIR_PASS_V to NIR_PASS 2025-07-14 19:25:52 +00:00
brw_nir_rt_builder.h intel/rt: Update BVH instance leaf load for Xe3+ 2025-04-21 20:10:45 +00:00
brw_nir_trig_workarounds.py
brw_nir_wa_18019110168.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
brw_opt.cpp brw: Do cmod prop again after brw_lower_subgroup_ops 2025-08-28 22:08:20 +00:00
brw_opt_address_reg_load.cpp brw: Fix checking sources of wrong instruction in opt_address_reg_load 2025-08-27 22:50:23 +00:00
brw_opt_algebraic.cpp brw: Fix folding case for MAD instruction with all immediates 2025-08-21 17:19:18 +00:00
brw_opt_bank_conflicts.cpp util: crib SWAP macro from freedreno 2025-07-21 11:42:18 +00:00
brw_opt_cmod_propagation.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_opt_combine_constants.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_opt_copy_propagation.cpp brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many 2025-08-08 22:12:08 +00:00
brw_opt_cse.cpp brw: Stop using is_send_from_grf() in CSE pass 2025-08-08 22:12:05 +00:00
brw_opt_dead_code_eliminate.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_opt_register_coalesce.cpp brw: enable opt_register_coalesce to work with multiple EOT blocks 2025-08-20 15:01:50 +00:00
brw_opt_saturate_propagation.cpp brw: Clean up saturate propagation after non-defs version removal 2025-04-09 19:06:48 +00:00
brw_opt_txf_combiner.cpp brw: Add more specific brw_builder helpers 2025-07-19 17:49:47 +00:00
brw_opt_virtual_grfs.cpp brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs() 2025-04-11 20:34:51 +00:00
brw_packed_float.c
brw_prim.h intel/compiler: Use #pragma once instead of header guards 2024-12-11 19:47:44 +00:00
brw_print.cpp intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility 2025-07-31 20:23:02 +00:00
brw_private.h intel/debug: shader dump filter 2025-05-23 19:57:02 +00:00
brw_reg.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_reg.h build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_reg_allocate.cpp intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism. 2025-09-10 02:15:55 +00:00
brw_reg_type.c brw: Add BRW_TYPE_BF for bfloat16 2025-03-25 05:23:37 +00:00
brw_reg_type.h brw: Add BRW_TYPE_BF for bfloat16 2025-03-25 05:23:37 +00:00
brw_rt.h intel/compiler: Use #pragma once instead of header guards 2024-12-11 19:47:44 +00:00
brw_schedule_instructions.cpp intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode. 2025-09-10 02:15:55 +00:00
brw_shader.cpp intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode. 2025-09-10 02:15:55 +00:00
brw_shader.h intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode. 2025-09-10 02:15:55 +00:00
brw_simd_selection.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
brw_spirv.c nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar() 2025-07-08 15:33:59 +00:00
brw_thread_payload.cpp all: rename gl_shader_stage_is_compute to mesa_shader_stage_is_compute 2025-08-06 10:28:41 +08:00
brw_thread_payload.h intel/brw: Rename fs_visitor to brw_shader 2025-02-11 09:13:28 +00:00
brw_validate.cpp brw: Run validation as soon as we have the CFG around 2025-09-03 20:42:05 +00:00
brw_vue_map.c brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
brw_workaround.cpp brw: Rename is_send_from_grf to is_send, replace other is_send() helper 2025-08-08 22:12:05 +00:00
intel_gfx_ver_enum.h build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
intel_nir.c intel/compiler: Use nir_split_conversions() 2025-04-07 17:45:21 -05:00
intel_nir.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
intel_nir_blockify_uniform_loads.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_clamp_image_1d_2d_array_sizes.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_clamp_per_vertex_loads.c brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
intel_nir_lower_non_uniform_barycentric_at_sample.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
intel_nir_lower_non_uniform_resource_intel.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_lower_printf.c nir: drop printf_base_identifier 2025-02-05 20:33:15 +00:00
intel_nir_lower_shading_rate_output.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_lower_sparse.c treewide: simplify nir_def_rewrite_uses_after 2025-08-01 15:34:24 +00:00
intel_nir_opt_peephole_ffma.c treewide: use nir_def_as_* 2025-08-01 15:34:24 +00:00
intel_nir_opt_peephole_imul32x16.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_tcs_workarounds.c nir: make nir_block::predecessors & dom_frontier sets non-malloc'd 2025-08-21 06:13:48 +00:00
intel_shader_enums.h brw: add support for separate tessellation shader compilation 2025-09-05 07:46:17 +00:00
meson.build brw: replace lower_fs_msaa with nir_inline_sysval 2025-08-03 21:27:47 +00:00
test_eu_compact.cpp build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
test_eu_validate.cpp brw: Add FILE * parameter to dump_assembly 2025-09-09 10:40:42 -07:00
test_helpers.cpp brw: Simplify the test code for brw passes 2025-03-13 17:43:17 +00:00
test_helpers.h brw: Add brw_shader_params 2025-08-28 00:06:18 +00:00
test_insert_load_reg.cpp brw: Add passes to generate and lower load_reg 2025-04-04 06:45:02 +00:00
test_lower_scoreboard.cpp brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many 2025-08-08 22:12:08 +00:00
test_opt_algebraic.cpp brw: Fix folding case for MAD instruction with all immediates 2025-08-21 17:19:18 +00:00
test_opt_cmod_propagation.cpp brw/cmod: Don't propagate from CMP to possible Inf + (-Inf) 2025-04-28 19:44:23 +00:00
test_opt_combine_constants.cpp brw: Add brw_builder::uniform() 2025-04-04 23:07:21 +00:00
test_opt_copy_propagation.cpp brw: Simplify the test code for brw passes 2025-03-13 17:43:17 +00:00
test_opt_cse.cpp brw: Simplify the test code for brw passes 2025-03-13 17:43:17 +00:00
test_opt_register_coalesce.cpp brw: don't generate invalid instructions 2025-06-04 06:08:26 +00:00
test_opt_saturate_propagation.cpp brw/sat: Eliminate non-defs saturate propagation 2025-04-04 06:45:02 +00:00
test_simd_selection.cpp intel: Switch uint64_t intel_debug to a bitset 2025-04-22 23:09:26 +00:00
test_vf_float_conversions.cpp