mesa/src/intel/compiler at d2c0147228f11d0eb637cb286d7839b2200303c0 - fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

History

Lionel Landwerlin 04777171e0 intel/fs: try to rematerialize surface computation code This helps a lot with accessing surface handles in control flow. Our resource_intel intrinsic has a non_uniform flag, in which case we cannot apply this optimization. But in uniform cases, this is just a massive win. We drop all kind of pipeline stalls due to find_live_channel. We also reduce register pressure by doing the surface handle computation in a single GRF (instead of 2 or 4). There are some regressions in max dispatch width but those I think are only on SIMD32 and due to the current heuristic disabling it after throughput comparison with SIMD16. We know this heuristic is not perfect, it should probably be updated in another change. Here are some stats (all titles seem to have similar gains) : PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width red_dead_redemption2 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93% --------------------------------------------------------------------------------------------------------------------------------------------------------------- All affected 4716 -37.29% -5.67% +0.95% +0.07% -81.26% -79.16% -70.62% -9.15% -8.47% --------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 5860 -36.80% -5.67% +0.77% +0.06% -81.26% -79.16% -70.62% -8.63% -6.93% PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width rise_of_the_tomb_raider_g2 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96% --------------------------------------------------------------------------------------------------------------------------------------------------------------------- All affected 11732 -37.27% -22.14% +0.01% +0.00% -99.01% -99.14% -98.65% -7.67% -5.11% --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 12010 -37.19% -22.12% +0.01% +0.00% -99.01% -99.14% -98.65% -7.62% -4.96% PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers Max dispatch width total_war_warhammer2 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62% ----------------------------------------------------------------------------------------------------------------------------------- All affected 335 -28.31% -12.77% -82.35% -88.46% -66.67% -6.25% -7.24% ----------------------------------------------------------------------------------------------------------------------------------- Total 462 -27.45% -12.42% -82.35% -88.46% -66.67% -5.52% -5.62% PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width witcher_3_dxvk_g2 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00% ------------------------------------------------------------------------------------------------------------------------------------------------------------ All affected 693 -41.93% -58.45% +0.09% +0.01% -98.52% -97.29% -98.10% -10.25% -1.33% ------------------------------------------------------------------------------------------------------------------------------------------------------------ Total 1049 -36.94% -57.82% +0.06% +0.01% -98.52% -97.29% -98.10% -7.81% -1.00% Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>		2023-05-30 06:36:37 +00:00
..
brw_cfg.cpp	intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction.	2021-12-21 00:43:29 +00:00
brw_cfg.h	intel/compiler: Add cfg_t::adjust_block_ips() method	2021-07-14 09:56:59 -07:00
brw_clip.h
brw_clip_line.c	intel/compiler: Split 3DPRIM_* defines out to a separate header.	2022-06-30 23:46:35 +00:00
brw_clip_point.c
brw_clip_tri.c	intel/compiler: Split 3DPRIM_* defines out to a separate header.	2022-06-30 23:46:35 +00:00
brw_clip_unfilled.c	intel/compiler: Split 3DPRIM_* defines out to a separate header.	2022-06-30 23:46:35 +00:00
brw_clip_util.c	intel: move away from booleans to identify platforms	2021-11-08 16:48:06 +00:00
brw_compile_clip.c	intel/compiler: Introduce a new brw_isa_info structure	2022-06-30 23:46:35 +00:00
brw_compile_ff_gs.c	intel/compiler: Introduce a new brw_isa_info structure	2022-06-30 23:46:35 +00:00
brw_compile_sf.c	intel/compiler: Introduce a new brw_isa_info structure	2022-06-30 23:46:35 +00:00
brw_compiler.c	intel/compiler: Fix 64-bit ufind_msb, find_lsb, and bit_count	2023-05-19 22:44:37 +00:00
brw_compiler.h	intel/fs: enable bindless sampler state offsets	2023-05-30 06:36:37 +00:00
brw_dead_control_flow.cpp
brw_dead_control_flow.h
brw_debug_recompile.c	intel/compiler: Delete sampler key handling for planar format stuff	2022-12-09 10:18:25 +00:00
brw_disasm.c	intel/fs: enable extended bindless surface offset	2023-05-30 06:36:37 +00:00
brw_disasm_info.c	intel/eu: Handle compaction when inserting validation errors	2022-07-28 21:31:45 +00:00
brw_disasm_info.h	intel/eu: Handle compaction when inserting validation errors	2022-07-28 21:31:45 +00:00
brw_eu.c	intel/compiler: export brw_num_sources_from_inst	2022-12-10 03:59:19 +00:00
brw_eu.h	intel/fs: enable extended bindless surface offset	2023-05-30 06:36:37 +00:00
brw_eu_compact.c	intel/compiler: don't allocate compaction arrays on the stack	2022-10-28 07:10:58 +00:00
brw_eu_defines.h	intel/fs: enable get_buffer_size on bindless heap	2023-05-30 06:36:37 +00:00
brw_eu_emit.c	intel/fs: enable extended bindless surface offset	2023-05-30 06:36:37 +00:00
brw_eu_util.c
brw_eu_validate.c	intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN	2023-01-09 19:15:19 +00:00
brw_fs.cpp	intel/fs: enable UBO accesses through bindless heap	2023-05-30 06:36:37 +00:00
brw_fs.h	intel/fs: try to rematerialize surface computation code	2023-05-30 06:36:37 +00:00
brw_fs_bank_conflicts.cpp	intel/compiler: Introduce a new brw_isa_info structure	2022-06-30 23:46:35 +00:00
brw_fs_builder.h	intel/fs: keep track of new resource_intel information	2023-05-30 06:36:37 +00:00
brw_fs_cmod_propagation.cpp	intel/fs: avoid cmod optimization on instruction with different write_mask	2023-01-24 07:35:42 +00:00
brw_fs_combine_constants.cpp	intel/fs: Rework the loop of opt_combine_constants that collects constants	2023-04-03 21:50:06 +00:00
brw_fs_copy_propagation.cpp	intel/fs: Use specialized version of regions_overlap in opt_copy_propagation	2023-04-06 19:07:50 +00:00
brw_fs_cse.cpp	intel/compiler: Implement nir_intrinsic_last_invocation	2022-03-26 00:28:19 +00:00
brw_fs_dead_code_eliminate.cpp	intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT	2023-01-19 08:42:22 +00:00
brw_fs_generator.cpp	intel/fs: enable extended bindless surface offset	2023-05-30 06:36:37 +00:00
brw_fs_live_variables.cpp	intel/fs: White space fixes	2023-04-06 19:07:50 +00:00
brw_fs_live_variables.h
brw_fs_lower_pack.cpp	intel/fs: Move packHalf2x16 handling to lower_pack()	2023-03-09 23:26:17 +00:00
brw_fs_lower_regioning.cpp	intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops	2023-04-20 11:41:10 -07:00
brw_fs_nir.cpp	intel/fs: try to rematerialize surface computation code	2023-05-30 06:36:37 +00:00
brw_fs_reg_allocate.cpp	intel/fs: put scratch surface in the surface state heap	2022-11-19 14:58:58 +00:00
brw_fs_register_coalesce.cpp	intel/fs: Fix register coalesce in presence of force_writemask_all copy source writes.	2023-03-17 03:05:24 -07:00
brw_fs_saturate_propagation.cpp	brw: fix saturate propagation region overlap range	2022-12-09 00:39:05 +00:00
brw_fs_scoreboard.cpp	Revert "intel/fs: Fix inferred_sync_pipe for F16TO32 opcodes"	2023-03-09 23:26:17 +00:00
brw_fs_sel_peephole.cpp	intel/fs: sel.cond writes the flags on Gfx4 and Gfx5	2021-08-11 13:09:20 -07:00
brw_fs_thread_payload.cpp	intel/fs: make tcs input_vertices dynamic	2023-05-24 18:32:07 +00:00
brw_fs_validate.cpp	intel/fs: add MOV source count validation	2023-03-14 10:38:50 +00:00
brw_fs_visitor.cpp	intel/fs: try to rematerialize surface computation code	2023-05-30 06:36:37 +00:00
brw_gfx_ver_enum.h	intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file	2022-06-30 23:46:35 +00:00
brw_inst.h	intel/fs: enable extended bindless surface offset	2023-05-30 06:36:37 +00:00
brw_interpolation_map.c
brw_ir.h	intel/fs: enable extended bindless surface offset	2023-05-30 06:36:37 +00:00
brw_ir_allocator.h
brw_ir_analysis.h
brw_ir_fs.h	intel/compiler: Micro optimize regions_overlap	2023-04-06 19:07:50 +00:00
brw_ir_performance.cpp	intel/compiler: Use SHADER_OPCODE_SEND for PI messages	2023-02-06 09:12:17 +00:00
brw_ir_performance.h
brw_ir_vec4.h	intel: fix typos found by codespell	2022-06-27 10:20:55 +00:00
brw_isa_info.h	intel/compiler: Remove use of thread_local for opcode tables	2022-06-30 23:46:35 +00:00
brw_kernel.c	nir: Drop unused argument from nir_ssa_dest_init_for_type	2023-05-17 23:46:16 +00:00
brw_kernel.h	intel/compiler: fix singleton pointer coverity warning	2022-04-19 12:36:10 +03:00
brw_lower_logical_sends.cpp	intel/fs: enable bindless sampler state offsets	2023-05-30 06:36:37 +00:00
brw_mesh.cpp	intel: drop unused is_scalar function parameter in brw_nir_apply_key	2023-05-18 15:46:06 +02:00
brw_nir.c	nir: use more nir_fmul_imm	2023-05-25 06:59:24 +00:00
brw_nir.h	intel/fs: teach ubo range analysis pass about resource_intel	2023-05-30 06:36:37 +00:00
brw_nir_analyze_boolean_resolves.c	intel: Drop some author comments and update Faith's name	2023-03-26 00:16:25 +00:00
brw_nir_analyze_ubo_ranges.c	intel/fs: teach ubo range analysis pass about resource_intel	2023-05-30 06:36:37 +00:00
brw_nir_attribute_workarounds.c	nir: use more nir_fmul_imm	2023-05-25 06:59:24 +00:00
brw_nir_blockify_uniform_loads.c	intel/fs: optimize uniform SSBO & shared loads	2023-04-05 12:32:56 +00:00
brw_nir_clamp_image_1d_2d_array_sizes.c	intel/compiler: use nir_shader_instructions_pass in brw_nir_clamp_image_1d_2d_array_sizes	2021-10-05 10:02:54 +00:00
brw_nir_clamp_per_vertex_loads.c	intel/fs: make tcs input_vertices dynamic	2023-05-24 18:32:07 +00:00
brw_nir_lower_alpha_to_coverage.c	intel/fs: make alpha_to_coverage a tristate	2023-02-06 09:12:18 +00:00
brw_nir_lower_conversions.c	intel/fs: Use nir_type_convert instead of nir_type_conversion_op	2022-12-14 06:23:21 +00:00
brw_nir_lower_cs_intrinsics.c	intel/compiler: optimize away local_inv_index and local_inv_id if workgroup size is 1	2022-12-13 13:00:49 +00:00
brw_nir_lower_intersection_shader.c	intel/rt: Handle halts in any-hit shaders properly	2022-08-05 11:51:31 +00:00
brw_nir_lower_non_uniform_resource_intel.c	intel/fs: add a pass to move resource_intel closer to user	2023-05-30 06:36:37 +00:00
brw_nir_lower_ray_queries.c	nir: Make rq_load committed src an index	2023-05-14 17:28:40 +00:00
brw_nir_lower_rt_intrinsics.c	intel/nir/rt: wire position fetch intrinsic	2023-05-04 11:25:41 +00:00
brw_nir_lower_shader_calls.c	intel: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in brw_nir_lower_shader_calls.c	2022-11-17 23:17:40 +00:00
brw_nir_lower_shading_rate_output.c	intel: fix typos found by codespell	2022-06-27 10:20:55 +00:00
brw_nir_lower_storage_image.c	nir: Drop unused name from nir_ssa_dest_init	2023-05-17 23:46:16 +00:00
brw_nir_opt_peephole_ffma.c	nir: Drop unused name from nir_ssa_dest_init	2023-05-17 23:46:16 +00:00
brw_nir_opt_peephole_imul32x16.c	nir: Drop unused name from nir_ssa_dest_init	2023-05-17 23:46:16 +00:00
brw_nir_rt.c	intel: infer scalar'ness locally for brw_postprocess_nir	2023-05-18 15:46:06 +02:00
brw_nir_rt.h	anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_*	2022-10-20 00:03:55 +00:00
brw_nir_rt_builder.h	intel/nir/rt: wire position fetch intrinsic	2023-05-04 11:25:41 +00:00
brw_nir_tcs_workarounds.c	intel/compiler: use nir_metadata_none instead of its value	2021-10-05 10:02:54 +00:00
brw_nir_trig_workarounds.py	driconf: Add a limit_trig_input_range option	2022-05-13 06:47:53 +00:00
brw_packed_float.c
brw_predicated_break.cpp	intel/compiler: Don't predicate a WHILE if there is a CONT	2021-12-08 14:56:32 -08:00
brw_prim.h	intel/compiler: Split 3DPRIM_* defines out to a separate header.	2022-06-30 23:46:35 +00:00
brw_private.h	intel/compiler: Use SIMD selection helpers in compile_single_bs()	2022-11-15 04:55:18 +00:00
brw_reg.h	intel/compiler: Add a few more brw_ud* helpers	2022-09-13 01:44:24 +00:00
brw_reg_type.c
brw_reg_type.h	intel/compiler: Move type_is_unsigned_int to brw_reg_type.h	2021-08-30 14:00:14 -07:00
brw_rt.h	intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment.	2023-02-26 11:48:33 -08:00
brw_schedule_instructions.cpp	intel/fs: reuse descriptor helper	2023-05-30 06:36:36 +00:00
brw_shader.cpp	intel: drop unused is_scalar function parameter in brw_nir_apply_key	2023-05-18 15:46:06 +02:00
brw_shader.h	intel/compiler: Introduce a new brw_isa_info structure	2022-06-30 23:46:35 +00:00
brw_simd_selection.cpp	intel/compiler: fine-grained control of dispatch widths	2023-01-27 11:00:41 +00:00
brw_vec4.cpp	intel: drop unused is_scalar function parameter in brw_nir_apply_key	2023-05-18 15:46:06 +02:00
brw_vec4.h	i965/vec4: Implement uclz in the vec4 backend	2023-03-17 09:01:18 +00:00
brw_vec4_builder.h
brw_vec4_cmod_propagation.cpp	intel: fixes -Werror,-Wunused-but-set-variable for clang-15	2022-11-17 23:17:40 +00:00
brw_vec4_copy_propagation.cpp	intel/compiler: Introduce a new brw_isa_info structure	2022-06-30 23:46:35 +00:00
brw_vec4_cse.cpp	intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix	2022-07-08 19:45:34 +00:00
brw_vec4_dead_code_eliminate.cpp	intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5	2021-08-11 13:09:32 -07:00
brw_vec4_generator.cpp	intel/compiler: report max dispatch width statistic	2023-03-21 11:53:04 +00:00
brw_vec4_gs_nir.cpp	intel/compiler: Use named NIR intrinsic const index accessors	2022-08-16 05:44:30 +00:00
brw_vec4_gs_visitor.cpp	intel: drop unused is_scalar function parameter in brw_nir_apply_key	2023-05-18 15:46:06 +02:00
brw_vec4_gs_visitor.h	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
brw_vec4_live_variables.cpp	intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5	2021-08-11 13:09:32 -07:00
brw_vec4_live_variables.h	intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5	2021-08-11 13:09:32 -07:00
brw_vec4_nir.cpp	intel: switch over to unified atomics	2023-05-15 16:32:21 +00:00
brw_vec4_reg_allocate.cpp	intel/compiler: Don't create vec4 reg-set for gen8+	2022-07-14 17:49:01 +00:00
brw_vec4_surface_builder.cpp	intel: move away from booleans to identify platforms	2021-11-08 16:48:06 +00:00
brw_vec4_surface_builder.h
brw_vec4_tcs.cpp	intel/fs: make tcs input_vertices dynamic	2023-05-24 18:32:07 +00:00
brw_vec4_tcs.h	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
brw_vec4_tes.cpp	intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix	2022-07-08 19:45:34 +00:00
brw_vec4_tes.h	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
brw_vec4_visitor.cpp	intel/vec4: force exec_all on float control instruction	2023-04-14 10:54:01 +00:00
brw_vec4_vs.h	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
brw_vec4_vs_visitor.cpp	intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix	2022-07-08 19:45:34 +00:00
brw_vue_map.c	intel/compiler: Store the number of position slots in the VUE map	2022-08-31 02:00:18 +00:00
gfx6_gs_visitor.cpp	intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix	2022-07-08 19:45:34 +00:00
gfx6_gs_visitor.h	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
intel_clc.c	intel/compiler: Fix dynarray usage in intel_clc	2022-11-14 19:15:05 +00:00
meson.build	intel/fs: add a pass to move resource_intel closer to user	2023-05-30 06:36:37 +00:00
test_eu_compact.cpp	intel/compiler: Fixes [-Wdeprecated-declarations] in test_eu_compact.cpp	2022-08-23 15:19:16 +00:00
test_eu_validate.cpp	intel/eu/validate: Validate integer multiplication source size restrictions	2022-11-09 21:34:26 +00:00
test_fs_cmod_propagation.cpp	intel/fs: report max register pressure in shader stats	2023-03-08 13:37:07 +00:00
test_fs_copy_propagation.cpp	intel/fs: Don't copy propagate from saturate to sel	2023-03-29 23:48:19 +00:00
test_fs_saturate_propagation.cpp	intel/fs: report max register pressure in shader stats	2023-03-08 13:37:07 +00:00
test_fs_scoreboard.cpp	intel/fs: report max register pressure in shader stats	2023-03-08 13:37:07 +00:00
test_simd_selection.cpp	intel/compiler: fine-grained control of dispatch widths	2023-01-27 11:00:41 +00:00
test_vec4_cmod_propagation.cpp	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
test_vec4_copy_propagation.cpp	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
test_vec4_dead_code_eliminate.cpp	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
test_vec4_register_coalesce.cpp	intel/fs,vec4: Drop support for shader time	2021-12-10 21:20:47 +00:00
test_vf_float_conversions.cpp