mesa/src/intel/compiler
Francisco Jerez 4420251947 intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment.
Power-of-two SW stack sizes are prone to causing collisions in the
hashing function used by the L3 to map memory addresses to banks,
which can cause stack accesses from most DSSes to bottleneck on a
single L3 bank.  Fix it by padding the SW stack stride by a single
cacheline if it was a power of two.  This has been reported by Felix
DeGrood to improve Quake2 RTX performance by ~30% on DG2-512 in
combination with other RT patches Lionel Landwerlin has been working
on.

Many thanks to Felix DeGrood for doing much of the legwork and
providing several iterations of Q2RTX performance counter dumps which
eventually prompted me to consider the hash collision theory and
motivated this patch, and for providing additional performance counter
dumps confirming that there is no longer an appreciable imbalance in
traffic across L3 banks after this change.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21461>
2023-02-26 11:48:33 -08:00
..
brw_cfg.cpp intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction. 2021-12-21 00:43:29 +00:00
brw_cfg.h intel/compiler: Add cfg_t::adjust_block_ips() method 2021-07-14 09:56:59 -07:00
brw_clip.h
brw_clip_line.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_point.c
brw_clip_tri.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_unfilled.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_util.c intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_compile_clip.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_ff_gs.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_sf.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compiler.c intel/compiler: fine-grained control of dispatch widths 2023-01-27 11:00:41 +00:00
brw_compiler.h intel/compiler/mesh: use U888X packed index format 2023-02-10 21:03:33 +00:00
brw_dead_control_flow.cpp
brw_dead_control_flow.h
brw_debug_recompile.c intel/compiler: Delete sampler key handling for planar format stuff 2022-12-09 10:18:25 +00:00
brw_disasm.c intel/disasm/gfx12+: Fix print out of non-existing condmod field with 64-bit immediate. 2023-01-19 06:14:03 +00:00
brw_disasm_info.c intel/eu: Handle compaction when inserting validation errors 2022-07-28 21:31:45 +00:00
brw_disasm_info.h intel/eu: Handle compaction when inserting validation errors 2022-07-28 21:31:45 +00:00
brw_eu.c intel/compiler: export brw_num_sources_from_inst 2022-12-10 03:59:19 +00:00
brw_eu.h intel/compiler: Use SHADER_OPCODE_SEND for PI messages 2023-02-06 09:12:17 +00:00
brw_eu_compact.c intel/compiler: don't allocate compaction arrays on the stack 2022-10-28 07:10:58 +00:00
brw_eu_defines.h intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 2023-01-26 11:26:53 +00:00
brw_eu_emit.c intel/eu/gfx8-9: Fix execution with all channels disabled due to HW bug #220160235. 2023-02-07 21:37:12 +00:00
brw_eu_util.c intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
brw_eu_validate.c intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN 2023-01-09 19:15:19 +00:00
brw_fs.cpp intel/fs: make alpha_to_coverage a tristate 2023-02-06 09:12:18 +00:00
brw_fs.h intel/compiler: remove unused field from fs_thread_payload 2023-02-23 08:04:24 +00:00
brw_fs_bank_conflicts.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs_builder.h intel/fs: reduce liveness of variables in lowering passes 2022-10-27 21:05:00 +00:00
brw_fs_cmod_propagation.cpp intel/fs: avoid cmod optimization on instruction with different write_mask 2023-01-24 07:35:42 +00:00
brw_fs_combine_constants.cpp intel/compiler: Fix missing break in switch 2021-07-22 23:38:04 +00:00
brw_fs_copy_propagation.cpp intel/fs: Fix src and dst types of LOAD_PAYLOAD ACP entries during copy propagation. 2023-01-25 22:22:12 +00:00
brw_fs_cse.cpp intel/compiler: Implement nir_intrinsic_last_invocation 2022-03-26 00:28:19 +00:00
brw_fs_dead_code_eliminate.cpp intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT 2023-01-19 08:42:22 +00:00
brw_fs_generator.cpp intel/compiler: Use SHADER_OPCODE_SEND for PI messages 2023-02-06 09:12:17 +00:00
brw_fs_live_variables.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_live_variables.h intel: Rename gen_device prefix to intel_device 2021-04-20 20:06:33 +00:00
brw_fs_lower_pack.cpp intel/fs: reduce liveness of variables in lowering passes 2022-10-27 21:05:00 +00:00
brw_fs_lower_regioning.cpp intel: add devinfo->has_64bit_float_via_math_pipe 2022-12-10 03:59:19 +00:00
brw_fs_nir.cpp nir: add assertions that loops don't have a Continue Construct 2023-02-21 10:41:11 +00:00
brw_fs_reg_allocate.cpp intel/fs: put scratch surface in the surface state heap 2022-11-19 14:58:58 +00:00
brw_fs_register_coalesce.cpp intel/compiler: Update block IPs once in register_coalesce 2021-07-14 09:57:04 -07:00
brw_fs_saturate_propagation.cpp brw: fix saturate propagation region overlap range 2022-12-09 00:39:05 +00:00
brw_fs_scoreboard.cpp intel/fs/gfx12+: Drop redundant handling of SHADER_OPCODE_BROADCAST in exec pipe inference. 2023-01-19 06:14:03 +00:00
brw_fs_sel_peephole.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_thread_payload.cpp intel/compiler: remove unused field from fs_thread_payload 2023-02-23 08:04:24 +00:00
brw_fs_validate.cpp intel/fs/validate: Assert SEND [extended] descriptors are uniform 2023-02-06 09:12:18 +00:00
brw_fs_visitor.cpp intel/fs: Rework dynamic coarse handling 2023-02-06 09:12:18 +00:00
brw_gfx_ver_enum.h intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file 2022-06-30 23:46:35 +00:00
brw_inst.h intel/eu/gfx12+: Implement decoding of 64-bit immediates. 2023-01-19 06:14:03 +00:00
brw_interpolation_map.c intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
brw_ir.h intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ 2022-08-24 17:51:40 +00:00
brw_ir_allocator.h
brw_ir_analysis.h
brw_ir_fs.h intel: add devinfo->has_64bit_float_via_math_pipe 2022-12-10 03:59:19 +00:00
brw_ir_performance.cpp intel/compiler: Use SHADER_OPCODE_SEND for PI messages 2023-02-06 09:12:17 +00:00
brw_ir_performance.h
brw_ir_vec4.h intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_isa_info.h intel/compiler: Remove use of thread_local for opcode tables 2022-06-30 23:46:35 +00:00
brw_kernel.c intel/fs: make Wa_1806565034 conditional to non robust access 2022-12-13 18:05:19 +00:00
brw_kernel.h intel/compiler: fix singleton pointer coverity warning 2022-04-19 12:36:10 +03:00
brw_lower_logical_sends.cpp intel/fs: Rework dynamic coarse handling 2023-02-06 09:12:18 +00:00
brw_mesh.cpp intel/compiler/mesh: follow the type of offset variable 2023-02-21 11:10:24 +00:00
brw_nir.c intel: Use common helpers for TCS passthrough shaders 2023-02-20 03:54:24 +00:00
brw_nir.h anv: use shader_info->var_copies_lowered 2023-02-06 22:11:34 +00:00
brw_nir_analyze_boolean_resolves.c
brw_nir_analyze_ubo_ranges.c intel/compiler: Fix missing tie-breaker in brw_nir_analyze_ubo_ranges() ordering code 2022-11-14 19:41:35 +00:00
brw_nir_attribute_workarounds.c intel/compiler: Use named NIR intrinsic const index accessors 2022-08-16 05:44:30 +00:00
brw_nir_clamp_image_1d_2d_array_sizes.c intel/compiler: use nir_shader_instructions_pass in brw_nir_clamp_image_1d_2d_array_sizes 2021-10-05 10:02:54 +00:00
brw_nir_clamp_per_vertex_loads.c intel/fs: clamp per vertex input accesses to patchControlPoints 2022-12-07 08:16:03 +00:00
brw_nir_lower_alpha_to_coverage.c intel/fs: make alpha_to_coverage a tristate 2023-02-06 09:12:18 +00:00
brw_nir_lower_conversions.c intel/fs: Use nir_type_convert instead of nir_type_conversion_op 2022-12-14 06:23:21 +00:00
brw_nir_lower_cs_intrinsics.c intel/compiler: optimize away local_inv_index and local_inv_id if workgroup size is 1 2022-12-13 13:00:49 +00:00
brw_nir_lower_intersection_shader.c intel/rt: Handle halts in any-hit shaders properly 2022-08-05 11:51:31 +00:00
brw_nir_lower_ray_queries.c nir: make ray query load values visible in NIR prints 2022-11-10 14:40:08 +02:00
brw_nir_lower_rt_intrinsics.c intel/nir/rt: fixup primitive id 2022-12-12 10:16:21 +02:00
brw_nir_lower_scoped_barriers.c intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_scoped_barriers 2021-10-05 10:02:54 +00:00
brw_nir_lower_shader_calls.c intel: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in brw_nir_lower_shader_calls.c 2022-11-17 23:17:40 +00:00
brw_nir_lower_shading_rate_output.c intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_nir_lower_storage_image.c intel/compiler: use lower_image_samples_to_one 2023-02-01 19:52:49 +00:00
brw_nir_opt_peephole_ffma.c Revert "nir: Drop the unused instr arg for src/dest copy functions." 2022-08-30 18:21:44 +00:00
brw_nir_opt_peephole_imul32x16.c intel/compiler: Fix signed integer range analysis of imax and imin 2022-11-09 21:34:26 +00:00
brw_nir_rt.c anv: use shader_info->var_copies_lowered 2023-02-06 22:11:34 +00:00
brw_nir_rt.h anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_* 2022-10-20 00:03:55 +00:00
brw_nir_rt_builder.h intel/nir/rt: fixup primitive id 2022-12-12 10:16:21 +02:00
brw_nir_tcs_workarounds.c intel/compiler: use nir_metadata_none instead of its value 2021-10-05 10:02:54 +00:00
brw_nir_trig_workarounds.py driconf: Add a limit_trig_input_range option 2022-05-13 06:47:53 +00:00
brw_packed_float.c
brw_predicated_break.cpp intel/compiler: Don't predicate a WHILE if there is a CONT 2021-12-08 14:56:32 -08:00
brw_prim.h intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_private.h intel/compiler: Use SIMD selection helpers in compile_single_bs() 2022-11-15 04:55:18 +00:00
brw_reg.h intel/compiler: Add a few more brw_ud* helpers 2022-09-13 01:44:24 +00:00
brw_reg_type.c intel: Rename gen_device prefix to intel_device 2021-04-20 20:06:33 +00:00
brw_reg_type.h intel/compiler: Move type_is_unsigned_int to brw_reg_type.h 2021-08-30 14:00:14 -07:00
brw_rt.h intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment. 2023-02-26 11:48:33 -08:00
brw_schedule_instructions.cpp intel/compiler: Use SHADER_OPCODE_SEND for PI messages 2023-02-06 09:12:17 +00:00
brw_shader.cpp intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 2023-01-26 11:26:53 +00:00
brw_shader.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_simd_selection.cpp intel/compiler: fine-grained control of dispatch widths 2023-01-27 11:00:41 +00:00
brw_vec4.cpp intel/vec4: Don't optimize multiply by 1.0 away 2023-02-10 16:34:01 +00:00
brw_vec4.h intel/vec4: Set the rounding mode 2023-02-10 16:34:00 +00:00
brw_vec4_builder.h intel: Rename Genx keyword to Gfxx 2021-04-02 18:33:07 +00:00
brw_vec4_cmod_propagation.cpp intel: fixes -Werror,-Wunused-but-set-variable for clang-15 2022-11-17 23:17:40 +00:00
brw_vec4_copy_propagation.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_vec4_cse.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vec4_dead_code_eliminate.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_generator.cpp intel/vec4: Set the rounding mode 2023-02-10 16:34:00 +00:00
brw_vec4_gs_nir.cpp intel/compiler: Use named NIR intrinsic const index accessors 2022-08-16 05:44:30 +00:00
brw_vec4_gs_visitor.cpp intel/compiler: Use FS thread payload only for FS 2022-09-13 01:44:24 +00:00
brw_vec4_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_live_variables.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_live_variables.h intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_nir.cpp nir: add assertions that loops don't have a Continue Construct 2023-02-21 10:41:11 +00:00
brw_vec4_reg_allocate.cpp intel/compiler: Don't create vec4 reg-set for gen8+ 2022-07-14 17:49:01 +00:00
brw_vec4_surface_builder.cpp intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_vec4_surface_builder.h
brw_vec4_tcs.cpp intel/fs: clamp per vertex input accesses to patchControlPoints 2022-12-07 08:16:03 +00:00
brw_vec4_tcs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_tes.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vec4_tes.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_visitor.cpp intel/vec4: Set the rounding mode 2023-02-10 16:34:00 +00:00
brw_vec4_vs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_vs_visitor.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vue_map.c intel/compiler: Store the number of position slots in the VUE map 2022-08-31 02:00:18 +00:00
gfx6_gs_visitor.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
gfx6_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
intel_clc.c intel/compiler: Fix dynarray usage in intel_clc 2022-11-14 19:15:05 +00:00
meson.build intel/nir: Use nir_lower_mem_access_bit_sizes() 2023-02-17 00:55:54 +00:00
test_eu_compact.cpp intel/compiler: Fixes [-Wdeprecated-declarations] in test_eu_compact.cpp 2022-08-23 15:19:16 +00:00
test_eu_validate.cpp intel/eu/validate: Validate integer multiplication source size restrictions 2022-11-09 21:34:26 +00:00
test_fs_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_saturate_propagation.cpp intel/fs: add a saturation propagation test 2022-12-09 00:39:05 +00:00
test_fs_scoreboard.cpp intel/fs/xehp: Add unit test for handling of RaR deps across multiple pipelines. 2022-01-25 22:40:44 +00:00
test_simd_selection.cpp intel/compiler: fine-grained control of dispatch widths 2023-01-27 11:00:41 +00:00
test_vec4_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_dead_code_eliminate.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_register_coalesce.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vf_float_conversions.cpp