mesa/src/intel/compiler
Kenneth Graunke 589b03d02f intel/fs: Opportunistically split SEND message payloads
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.

This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.

In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point.  For
example, header and data, or address and value.

In this pass, we instead infer a natural split point by looking at the
source registers.  Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate.  Then, there is another bit of data, such as a LOD, that
may come from elsewhere.  We look for the point where the source list
switches VGRFs, and split it there.  (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)

This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.

shader-db results for Icelake are:

   total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
   instructions in affected programs: 6085404 -> 5852146 (-3.83%)
   helped: 23650 / HURT: 15
   helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
   helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
   HURT stats (abs)   min: 1 max: 44 x̄: 7.20 x̃: 2
   HURT stats (rel)   min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
   95% mean confidence interval for instructions value: -10.16 -9.55
   95% mean confidence interval for instructions %-change: -3.84% -3.72%
   Instructions are helped.

   total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
   cycles in affected programs: 599931746 -> 593959441 (-1.00%)
   helped: 22114 / HURT: 13053
   helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
   helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
   HURT stats (abs)   min: 1 max: 94022 x̄: 526.67 x̃: 22
   HURT stats (rel)   min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
   95% mean confidence interval for cycles value: -222.87 -116.79
   95% mean confidence interval for cycles %-change: -1.44% -1.20%
   Cycles are helped.

   total spills in shared programs: 8387 -> 6569 (-21.68%)
   spills in affected programs: 5110 -> 3292 (-35.58%)
   helped: 359 / HURT: 3

   total fills in shared programs: 11833 -> 8218 (-30.55%)
   fills in affected programs: 8635 -> 5020 (-41.86%)
   helped: 358 / HURT: 3

   LOST:   1 SIMD16 shader, 659 SIMD32 shaders
   GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders

   Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)

Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt.  The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions.  Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
2022-07-01 02:05:45 +00:00
..
brw_cfg.cpp intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction. 2021-12-21 00:43:29 +00:00
brw_cfg.h intel/compiler: Add cfg_t::adjust_block_ips() method 2021-07-14 09:56:59 -07:00
brw_clip.h
brw_clip_line.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_point.c
brw_clip_tri.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_unfilled.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_util.c intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_compile_clip.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_ff_gs.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_sf.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compiler.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compiler.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_dead_control_flow.cpp intel/compiler: Pass detailed dependency classes to invalidate_analysis() 2020-03-06 10:20:39 -08:00
brw_dead_control_flow.h
brw_debug_recompile.c intel/compiler: Stop including src/mesa/main/config.h 2022-06-30 23:46:35 +00:00
brw_disasm.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_disasm_info.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_disasm_info.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_eu.c intel/compiler: Convert brw_eu.cpp back to brw_eu.c 2022-06-30 23:46:35 +00:00
brw_eu.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_eu_compact.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_eu_defines.h intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_eu_emit.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_eu_util.c intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
brw_eu_validate.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs.cpp intel/fs: Opportunistically split SEND message payloads 2022-07-01 02:05:45 +00:00
brw_fs.h intel/fs: Opportunistically split SEND message payloads 2022-07-01 02:05:45 +00:00
brw_fs_bank_conflicts.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs_builder.h intel/compiler: Fix instruction size written calculation 2021-11-22 21:27:30 -08:00
brw_fs_cmod_propagation.cpp intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_fs_combine_constants.cpp intel/compiler: Fix missing break in switch 2021-07-22 23:38:04 +00:00
brw_fs_copy_propagation.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs_cse.cpp intel/compiler: Implement nir_intrinsic_last_invocation 2022-03-26 00:28:19 +00:00
brw_fs_dead_code_eliminate.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_generator.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs_live_variables.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_live_variables.h intel: Rename gen_device prefix to intel_device 2021-04-20 20:06:33 +00:00
brw_fs_lower_pack.cpp intel/compiler: Pass detailed dependency classes to invalidate_analysis() 2020-03-06 10:20:39 -08:00
brw_fs_lower_regioning.cpp intel/compiler: make CLUSTER_BROADCAST always deal with integers 2022-02-16 21:36:42 +00:00
brw_fs_nir.cpp intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_fs_reg_allocate.cpp intel/compiler: Handle split-sends in EOT high-register pinning case 2022-07-01 02:05:45 +00:00
brw_fs_register_coalesce.cpp intel/compiler: Update block IPs once in register_coalesce 2021-07-14 09:57:04 -07:00
brw_fs_saturate_propagation.cpp intel/compiler/fs: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:57 -08:00
brw_fs_scoreboard.cpp intel/compiler: call ordered_unit() only once at update_inst_scoreboard() 2022-06-02 23:04:39 +00:00
brw_fs_sel_peephole.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_validate.cpp
brw_fs_visitor.cpp intel/compiler: Move spill/fill tracking to the register allocator 2022-05-25 06:56:01 +00:00
brw_gfx_ver_enum.h intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file 2022-06-30 23:46:35 +00:00
brw_inst.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_interpolation_map.c intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
brw_ir.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_ir_allocator.h
brw_ir_analysis.h intel/compiler: use C++ template instead of preprocessor 2020-11-03 10:42:29 +00:00
brw_ir_fs.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_ir_performance.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_ir_performance.h intel/ir: Import shader performance analysis pass. 2020-04-28 23:01:03 -07:00
brw_ir_vec4.h intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_isa_info.h intel/compiler: Remove use of thread_local for opcode tables 2022-06-30 23:46:35 +00:00
brw_kernel.c nir: add a nir_remove_non_entrypoints helper 2022-05-10 03:37:44 +00:00
brw_kernel.h intel/compiler: fix singleton pointer coverity warning 2022-04-19 12:36:10 +03:00
brw_mesh.cpp intel/compiler: adjust task payload offsets as late as possible 2022-06-27 14:14:41 +00:00
brw_nir.c intel/compiler: vectorize task payload loads/stores 2022-06-20 17:38:20 +00:00
brw_nir.h driconf: Add a limit_trig_input_range option 2022-05-13 06:47:53 +00:00
brw_nir_analyze_boolean_resolves.c
brw_nir_analyze_ubo_ranges.c intel/nir,i965: Move HW generation check for UBO pushing to i965 2021-06-03 05:12:33 +00:00
brw_nir_attribute_workarounds.c intel/compiler: use nir_shader_instructions_pass in brw_nir_apply_attribute_workarounds 2021-08-11 11:23:30 +00:00
brw_nir_clamp_image_1d_2d_array_sizes.c intel/compiler: use nir_shader_instructions_pass in brw_nir_clamp_image_1d_2d_array_sizes 2021-10-05 10:02:54 +00:00
brw_nir_lower_alpha_to_coverage.c intel/nir: Clean up lower_alpha_to_coverage a bit 2020-08-29 16:41:05 +00:00
brw_nir_lower_conversions.c intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_conversions 2021-10-05 10:02:54 +00:00
brw_nir_lower_cs_intrinsics.c intel/compiler: Lower Task/Mesh local_invocation_{id,index} 2021-12-04 00:41:46 +00:00
brw_nir_lower_intersection_shader.c intel/compiler: invalidate all metadata in brw_nir_lower_intersection_shader 2022-04-19 11:43:55 +00:00
brw_nir_lower_mem_access_bit_sizes.c intel/compiler: Use nir_var_mem_task_payload 2022-03-25 23:29:19 +00:00
brw_nir_lower_ray_queries.c intel/fs: tidy up lower of ray queries 2022-04-19 12:56:06 +00:00
brw_nir_lower_rt_intrinsics.c intel/compiler: extract brw_nir_load_global_const out of rt code 2021-12-04 00:41:46 +00:00
brw_nir_lower_scoped_barriers.c intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_scoped_barriers 2021-10-05 10:02:54 +00:00
brw_nir_lower_shader_calls.c intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_nir_lower_shading_rate_output.c intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_nir_lower_storage_image.c nir/builder: Add a nir_trim_vector helper 2022-05-11 14:47:33 +00:00
brw_nir_opt_peephole_ffma.c intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_nir_rt.c intel: Use nir_test_mask instead of i2b(iand) 2022-06-30 18:00:32 +00:00
brw_nir_rt.h intel/fs: lower ray query intrinsics 2022-02-08 12:55:25 +00:00
brw_nir_rt_builder.h intel/fs: add a note on possible optimization of root node address 2022-04-13 11:24:49 +00:00
brw_nir_tcs_workarounds.c intel/compiler: use nir_metadata_none instead of its value 2021-10-05 10:02:54 +00:00
brw_nir_trig_workarounds.py driconf: Add a limit_trig_input_range option 2022-05-13 06:47:53 +00:00
brw_packed_float.c
brw_predicated_break.cpp intel/compiler: Don't predicate a WHILE if there is a CONT 2021-12-08 14:56:32 -08:00
brw_prim.h intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_private.h intel/compiler: Use SIMD selection helpers for variable workgroup size 2021-10-26 17:49:09 +00:00
brw_reg.h intel/fs: Take into account region strides during SIMD lowering decision of SHUFFLE. 2022-01-25 22:40:44 +00:00
brw_reg_type.c intel: Rename gen_device prefix to intel_device 2021-04-20 20:06:33 +00:00
brw_reg_type.h intel/compiler: Move type_is_unsigned_int to brw_reg_type.h 2021-08-30 14:00:14 -07:00
brw_rt.h intel/fs: lower ray query intrinsics 2022-02-08 12:55:25 +00:00
brw_schedule_instructions.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_shader.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_shader.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_simd_selection.c intel/compiler: Don't use SIMD larger than needed for workgroup 2021-10-26 17:49:09 +00:00
brw_vec4.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_vec4.h intel/compiler: remove gfx6 gather wa from backend. 2021-12-22 21:37:55 +00:00
brw_vec4_builder.h intel: Rename Genx keyword to Gfxx 2021-04-02 18:33:07 +00:00
brw_vec4_cmod_propagation.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_copy_propagation.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_vec4_cse.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_dead_code_eliminate.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_generator.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_vec4_gs_nir.cpp nir: Add ability to count emitted GS primitives. 2020-10-09 15:26:14 +02:00
brw_vec4_gs_visitor.cpp intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_vec4_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_live_variables.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_live_variables.h intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_nir.cpp intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_vec4_reg_allocate.cpp intel/vec4: Use ra_alloc_contig_reg_class() to reduce RA overhead. 2021-06-04 19:08:57 +00:00
brw_vec4_surface_builder.cpp intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_vec4_surface_builder.h
brw_vec4_tcs.cpp intel/fs: fix total_scratch computation 2022-03-02 13:13:03 +00:00
brw_vec4_tcs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_tes.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_tes.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_visitor.cpp intel/vec4: Inline emit_texture and move helpers to brw_vec4_nir.cpp 2021-12-16 00:09:45 -08:00
brw_vec4_vs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_vs_visitor.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vue_map.c intel/compiler: add primitive rate output support 2022-02-02 17:09:46 +00:00
brw_wm_iz.cpp intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
gfx6_gs_visitor.cpp intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
gfx6_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
intel_clc.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
meson.build intel/compiler: Convert brw_eu.cpp back to brw_eu.c 2022-06-30 23:46:35 +00:00
test_eu_compact.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
test_eu_validate.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
test_fs_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_saturate_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_scoreboard.cpp intel/fs/xehp: Add unit test for handling of RaR deps across multiple pipelines. 2022-01-25 22:40:44 +00:00
test_simd_selection.cpp intel/compiler: Initialize SIMDSelectionTest member error. 2021-11-03 04:22:35 +00:00
test_vec4_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_dead_code_eliminate.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_register_coalesce.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vf_float_conversions.cpp