mesa/src/intel/compiler
Lionel Landwerlin 04777171e0 intel/fs: try to rematerialize surface computation code
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).

There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.

Here are some stats (all titles seem to have similar gains) :

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 red_dead_redemption2 5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected         4716     -37.29%    -5.67%      +0.95%        +0.07%      -81.26%     -79.16%        -70.62%             -9.15%             -8.47%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected               11732    -37.27%   -22.14%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.67%             -5.11%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                      12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 total_war_warhammer2 462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%
 -----------------------------------------------------------------------------------------------------------------------------------
 All affected         335      -28.31%   -12.77%    -82.35%     -88.46%        -66.67%             -6.25%             -7.24%
 -----------------------------------------------------------------------------------------------------------------------------------
 Total                462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 witcher_3_dxvk_g2 1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected      693      -41.93%   -58.45%      +0.09%        +0.01%      -98.52%     -97.29%        -98.10%             -10.25%            -1.33%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total             1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
..
brw_cfg.cpp intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction. 2021-12-21 00:43:29 +00:00
brw_cfg.h intel/compiler: Add cfg_t::adjust_block_ips() method 2021-07-14 09:56:59 -07:00
brw_clip.h
brw_clip_line.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_point.c
brw_clip_tri.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_unfilled.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_util.c intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_compile_clip.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_ff_gs.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_sf.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compiler.c intel/compiler: Fix 64-bit ufind_msb, find_lsb, and bit_count 2023-05-19 22:44:37 +00:00
brw_compiler.h intel/fs: enable bindless sampler state offsets 2023-05-30 06:36:37 +00:00
brw_dead_control_flow.cpp
brw_dead_control_flow.h
brw_debug_recompile.c intel/compiler: Delete sampler key handling for planar format stuff 2022-12-09 10:18:25 +00:00
brw_disasm.c intel/fs: enable extended bindless surface offset 2023-05-30 06:36:37 +00:00
brw_disasm_info.c intel/eu: Handle compaction when inserting validation errors 2022-07-28 21:31:45 +00:00
brw_disasm_info.h intel/eu: Handle compaction when inserting validation errors 2022-07-28 21:31:45 +00:00
brw_eu.c intel/compiler: export brw_num_sources_from_inst 2022-12-10 03:59:19 +00:00
brw_eu.h intel/fs: enable extended bindless surface offset 2023-05-30 06:36:37 +00:00
brw_eu_compact.c intel/compiler: don't allocate compaction arrays on the stack 2022-10-28 07:10:58 +00:00
brw_eu_defines.h intel/fs: enable get_buffer_size on bindless heap 2023-05-30 06:36:37 +00:00
brw_eu_emit.c intel/fs: enable extended bindless surface offset 2023-05-30 06:36:37 +00:00
brw_eu_util.c
brw_eu_validate.c intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN 2023-01-09 19:15:19 +00:00
brw_fs.cpp intel/fs: enable UBO accesses through bindless heap 2023-05-30 06:36:37 +00:00
brw_fs.h intel/fs: try to rematerialize surface computation code 2023-05-30 06:36:37 +00:00
brw_fs_bank_conflicts.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs_builder.h intel/fs: keep track of new resource_intel information 2023-05-30 06:36:37 +00:00
brw_fs_cmod_propagation.cpp intel/fs: avoid cmod optimization on instruction with different write_mask 2023-01-24 07:35:42 +00:00
brw_fs_combine_constants.cpp intel/fs: Rework the loop of opt_combine_constants that collects constants 2023-04-03 21:50:06 +00:00
brw_fs_copy_propagation.cpp intel/fs: Use specialized version of regions_overlap in opt_copy_propagation 2023-04-06 19:07:50 +00:00
brw_fs_cse.cpp intel/compiler: Implement nir_intrinsic_last_invocation 2022-03-26 00:28:19 +00:00
brw_fs_dead_code_eliminate.cpp intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT 2023-01-19 08:42:22 +00:00
brw_fs_generator.cpp intel/fs: enable extended bindless surface offset 2023-05-30 06:36:37 +00:00
brw_fs_live_variables.cpp intel/fs: White space fixes 2023-04-06 19:07:50 +00:00
brw_fs_live_variables.h
brw_fs_lower_pack.cpp intel/fs: Move packHalf2x16 handling to lower_pack() 2023-03-09 23:26:17 +00:00
brw_fs_lower_regioning.cpp intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops 2023-04-20 11:41:10 -07:00
brw_fs_nir.cpp intel/fs: try to rematerialize surface computation code 2023-05-30 06:36:37 +00:00
brw_fs_reg_allocate.cpp intel/fs: put scratch surface in the surface state heap 2022-11-19 14:58:58 +00:00
brw_fs_register_coalesce.cpp intel/fs: Fix register coalesce in presence of force_writemask_all copy source writes. 2023-03-17 03:05:24 -07:00
brw_fs_saturate_propagation.cpp brw: fix saturate propagation region overlap range 2022-12-09 00:39:05 +00:00
brw_fs_scoreboard.cpp Revert "intel/fs: Fix inferred_sync_pipe for F16TO32 opcodes" 2023-03-09 23:26:17 +00:00
brw_fs_sel_peephole.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_thread_payload.cpp intel/fs: make tcs input_vertices dynamic 2023-05-24 18:32:07 +00:00
brw_fs_validate.cpp intel/fs: add MOV source count validation 2023-03-14 10:38:50 +00:00
brw_fs_visitor.cpp intel/fs: try to rematerialize surface computation code 2023-05-30 06:36:37 +00:00
brw_gfx_ver_enum.h intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file 2022-06-30 23:46:35 +00:00
brw_inst.h intel/fs: enable extended bindless surface offset 2023-05-30 06:36:37 +00:00
brw_interpolation_map.c
brw_ir.h intel/fs: enable extended bindless surface offset 2023-05-30 06:36:37 +00:00
brw_ir_allocator.h
brw_ir_analysis.h
brw_ir_fs.h intel/compiler: Micro optimize regions_overlap 2023-04-06 19:07:50 +00:00
brw_ir_performance.cpp intel/compiler: Use SHADER_OPCODE_SEND for PI messages 2023-02-06 09:12:17 +00:00
brw_ir_performance.h
brw_ir_vec4.h intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_isa_info.h intel/compiler: Remove use of thread_local for opcode tables 2022-06-30 23:46:35 +00:00
brw_kernel.c nir: Drop unused argument from nir_ssa_dest_init_for_type 2023-05-17 23:46:16 +00:00
brw_kernel.h intel/compiler: fix singleton pointer coverity warning 2022-04-19 12:36:10 +03:00
brw_lower_logical_sends.cpp intel/fs: enable bindless sampler state offsets 2023-05-30 06:36:37 +00:00
brw_mesh.cpp intel: drop unused is_scalar function parameter in brw_nir_apply_key 2023-05-18 15:46:06 +02:00
brw_nir.c nir: use more nir_fmul_imm 2023-05-25 06:59:24 +00:00
brw_nir.h intel/fs: teach ubo range analysis pass about resource_intel 2023-05-30 06:36:37 +00:00
brw_nir_analyze_boolean_resolves.c intel: Drop some author comments and update Faith's name 2023-03-26 00:16:25 +00:00
brw_nir_analyze_ubo_ranges.c intel/fs: teach ubo range analysis pass about resource_intel 2023-05-30 06:36:37 +00:00
brw_nir_attribute_workarounds.c nir: use more nir_fmul_imm 2023-05-25 06:59:24 +00:00
brw_nir_blockify_uniform_loads.c intel/fs: optimize uniform SSBO & shared loads 2023-04-05 12:32:56 +00:00
brw_nir_clamp_image_1d_2d_array_sizes.c intel/compiler: use nir_shader_instructions_pass in brw_nir_clamp_image_1d_2d_array_sizes 2021-10-05 10:02:54 +00:00
brw_nir_clamp_per_vertex_loads.c intel/fs: make tcs input_vertices dynamic 2023-05-24 18:32:07 +00:00
brw_nir_lower_alpha_to_coverage.c intel/fs: make alpha_to_coverage a tristate 2023-02-06 09:12:18 +00:00
brw_nir_lower_conversions.c intel/fs: Use nir_type_convert instead of nir_type_conversion_op 2022-12-14 06:23:21 +00:00
brw_nir_lower_cs_intrinsics.c intel/compiler: optimize away local_inv_index and local_inv_id if workgroup size is 1 2022-12-13 13:00:49 +00:00
brw_nir_lower_intersection_shader.c intel/rt: Handle halts in any-hit shaders properly 2022-08-05 11:51:31 +00:00
brw_nir_lower_non_uniform_resource_intel.c intel/fs: add a pass to move resource_intel closer to user 2023-05-30 06:36:37 +00:00
brw_nir_lower_ray_queries.c nir: Make rq_load committed src an index 2023-05-14 17:28:40 +00:00
brw_nir_lower_rt_intrinsics.c intel/nir/rt: wire position fetch intrinsic 2023-05-04 11:25:41 +00:00
brw_nir_lower_shader_calls.c intel: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in brw_nir_lower_shader_calls.c 2022-11-17 23:17:40 +00:00
brw_nir_lower_shading_rate_output.c intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_nir_lower_storage_image.c nir: Drop unused name from nir_ssa_dest_init 2023-05-17 23:46:16 +00:00
brw_nir_opt_peephole_ffma.c nir: Drop unused name from nir_ssa_dest_init 2023-05-17 23:46:16 +00:00
brw_nir_opt_peephole_imul32x16.c nir: Drop unused name from nir_ssa_dest_init 2023-05-17 23:46:16 +00:00
brw_nir_rt.c intel: infer scalar'ness locally for brw_postprocess_nir 2023-05-18 15:46:06 +02:00
brw_nir_rt.h anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_* 2022-10-20 00:03:55 +00:00
brw_nir_rt_builder.h intel/nir/rt: wire position fetch intrinsic 2023-05-04 11:25:41 +00:00
brw_nir_tcs_workarounds.c intel/compiler: use nir_metadata_none instead of its value 2021-10-05 10:02:54 +00:00
brw_nir_trig_workarounds.py driconf: Add a limit_trig_input_range option 2022-05-13 06:47:53 +00:00
brw_packed_float.c
brw_predicated_break.cpp intel/compiler: Don't predicate a WHILE if there is a CONT 2021-12-08 14:56:32 -08:00
brw_prim.h intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_private.h intel/compiler: Use SIMD selection helpers in compile_single_bs() 2022-11-15 04:55:18 +00:00
brw_reg.h intel/compiler: Add a few more brw_ud* helpers 2022-09-13 01:44:24 +00:00
brw_reg_type.c
brw_reg_type.h intel/compiler: Move type_is_unsigned_int to brw_reg_type.h 2021-08-30 14:00:14 -07:00
brw_rt.h intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment. 2023-02-26 11:48:33 -08:00
brw_schedule_instructions.cpp intel/fs: reuse descriptor helper 2023-05-30 06:36:36 +00:00
brw_shader.cpp intel: drop unused is_scalar function parameter in brw_nir_apply_key 2023-05-18 15:46:06 +02:00
brw_shader.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_simd_selection.cpp intel/compiler: fine-grained control of dispatch widths 2023-01-27 11:00:41 +00:00
brw_vec4.cpp intel: drop unused is_scalar function parameter in brw_nir_apply_key 2023-05-18 15:46:06 +02:00
brw_vec4.h i965/vec4: Implement uclz in the vec4 backend 2023-03-17 09:01:18 +00:00
brw_vec4_builder.h
brw_vec4_cmod_propagation.cpp intel: fixes -Werror,-Wunused-but-set-variable for clang-15 2022-11-17 23:17:40 +00:00
brw_vec4_copy_propagation.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_vec4_cse.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vec4_dead_code_eliminate.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_generator.cpp intel/compiler: report max dispatch width statistic 2023-03-21 11:53:04 +00:00
brw_vec4_gs_nir.cpp intel/compiler: Use named NIR intrinsic const index accessors 2022-08-16 05:44:30 +00:00
brw_vec4_gs_visitor.cpp intel: drop unused is_scalar function parameter in brw_nir_apply_key 2023-05-18 15:46:06 +02:00
brw_vec4_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_live_variables.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_live_variables.h intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_nir.cpp intel: switch over to unified atomics 2023-05-15 16:32:21 +00:00
brw_vec4_reg_allocate.cpp intel/compiler: Don't create vec4 reg-set for gen8+ 2022-07-14 17:49:01 +00:00
brw_vec4_surface_builder.cpp intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_vec4_surface_builder.h
brw_vec4_tcs.cpp intel/fs: make tcs input_vertices dynamic 2023-05-24 18:32:07 +00:00
brw_vec4_tcs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_tes.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vec4_tes.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_visitor.cpp intel/vec4: force exec_all on float control instruction 2023-04-14 10:54:01 +00:00
brw_vec4_vs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_vs_visitor.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vue_map.c intel/compiler: Store the number of position slots in the VUE map 2022-08-31 02:00:18 +00:00
gfx6_gs_visitor.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
gfx6_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
intel_clc.c intel/compiler: Fix dynarray usage in intel_clc 2022-11-14 19:15:05 +00:00
meson.build intel/fs: add a pass to move resource_intel closer to user 2023-05-30 06:36:37 +00:00
test_eu_compact.cpp intel/compiler: Fixes [-Wdeprecated-declarations] in test_eu_compact.cpp 2022-08-23 15:19:16 +00:00
test_eu_validate.cpp intel/eu/validate: Validate integer multiplication source size restrictions 2022-11-09 21:34:26 +00:00
test_fs_cmod_propagation.cpp intel/fs: report max register pressure in shader stats 2023-03-08 13:37:07 +00:00
test_fs_copy_propagation.cpp intel/fs: Don't copy propagate from saturate to sel 2023-03-29 23:48:19 +00:00
test_fs_saturate_propagation.cpp intel/fs: report max register pressure in shader stats 2023-03-08 13:37:07 +00:00
test_fs_scoreboard.cpp intel/fs: report max register pressure in shader stats 2023-03-08 13:37:07 +00:00
test_simd_selection.cpp intel/compiler: fine-grained control of dispatch widths 2023-01-27 11:00:41 +00:00
test_vec4_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_dead_code_eliminate.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_register_coalesce.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vf_float_conversions.cpp