mesa/src/intel/compiler
Kenneth Graunke abba55382f intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.

The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling.  Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.

We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.

One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains).  That
said, it's still better than before.

For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).

shader-db results on Icelake:

   total instructions in shared programs: 19605070 -> 19602284 (-0.01%)
   instructions in affected programs: 65338 -> 62552 (-4.26%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
   helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
   95% mean confidence interval for instructions value: -10.71 -9.85
   95% mean confidence interval for instructions %-change: -6.17% -5.43%
   Instructions are helped.

   total cycles in shared programs: 851854659 -> 851820320 (<.01%)
   cycles in affected programs: 618749 -> 584410 (-5.55%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108
   helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06%
   95% mean confidence interval for cycles value: -135.89 -117.54
   95% mean confidence interval for cycles %-change: -6.72% -5.63%
   Cycles are helped.

   total sends in shared programs: 1025285 -> 1024355 (-0.09%)
   sends in affected programs: 6454 -> 5524 (-14.41%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
   helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
   95% mean confidence interval for sends value: -3.57 -3.29
   95% mean confidence interval for sends %-change: -15.42% -14.54%
   Sends are helped.

According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
2022-09-27 18:17:56 -07:00
..
brw_cfg.cpp intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction. 2021-12-21 00:43:29 +00:00
brw_cfg.h intel/compiler: Add cfg_t::adjust_block_ips() method 2021-07-14 09:56:59 -07:00
brw_clip.h
brw_clip_line.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_point.c
brw_clip_tri.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_unfilled.c intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_clip_util.c intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_compile_clip.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_ff_gs.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compile_sf.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_compiler.c intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes 2022-09-27 18:17:56 -07:00
brw_compiler.h intel/compiler: Store the number of position slots in the VUE map 2022-08-31 02:00:18 +00:00
brw_dead_control_flow.cpp
brw_dead_control_flow.h
brw_debug_recompile.c intel/compiler: Stop including src/mesa/main/config.h 2022-06-30 23:46:35 +00:00
brw_disasm.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_disasm_info.c intel/eu: Handle compaction when inserting validation errors 2022-07-28 21:31:45 +00:00
brw_disasm_info.h intel/eu: Handle compaction when inserting validation errors 2022-07-28 21:31:45 +00:00
brw_eu.c intel/compiler: Convert brw_eu.cpp back to brw_eu.c 2022-06-30 23:46:35 +00:00
brw_eu.h intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ 2022-08-24 17:51:40 +00:00
brw_eu_compact.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_eu_defines.h intel/fs: remove unused opcode 2022-08-24 17:51:40 +00:00
brw_eu_emit.c intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ 2022-08-24 17:51:40 +00:00
brw_eu_util.c intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
brw_eu_validate.c intel/fs: fixup SEND validation check on overlapping src0/src1 2022-08-24 17:51:40 +00:00
brw_fs.cpp intel/compiler: Use an existing URB write to end TCS threads when viable 2022-09-27 18:17:42 -07:00
brw_fs.h intel/compiler: Use an existing URB write to end TCS threads when viable 2022-09-27 18:17:42 -07:00
brw_fs_bank_conflicts.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_fs_builder.h intel/compiler: Fix instruction size written calculation 2021-11-22 21:27:30 -08:00
brw_fs_cmod_propagation.cpp intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_fs_combine_constants.cpp intel/compiler: Fix missing break in switch 2021-07-22 23:38:04 +00:00
brw_fs_copy_propagation.cpp intel/compiler: Avoid copy propagating large registers into EOT messages 2022-07-07 20:20:01 +00:00
brw_fs_cse.cpp intel/compiler: Implement nir_intrinsic_last_invocation 2022-03-26 00:28:19 +00:00
brw_fs_dead_code_eliminate.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_generator.cpp intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ 2022-08-24 17:51:40 +00:00
brw_fs_live_variables.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_live_variables.h intel: Rename gen_device prefix to intel_device 2021-04-20 20:06:33 +00:00
brw_fs_lower_pack.cpp
brw_fs_lower_regioning.cpp intel/compiler/fs: Fix compilation of shaders with SHADER_OPCODE_SHUFFLE of float64 type 2022-09-14 19:32:43 +00:00
brw_fs_nir.cpp intel/compiler/fs: Use DF to load constants when has_64bit_int is not supported 2022-09-14 19:32:43 +00:00
brw_fs_reg_allocate.cpp intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ 2022-08-24 17:51:40 +00:00
brw_fs_register_coalesce.cpp intel/compiler: Update block IPs once in register_coalesce 2021-07-14 09:57:04 -07:00
brw_fs_saturate_propagation.cpp
brw_fs_scoreboard.cpp intel/compiler: Don't set SBID on EOT send messages 2022-07-09 05:26:25 +00:00
brw_fs_sel_peephole.cpp intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:20 -07:00
brw_fs_thread_payload.cpp intel/compiler: Use brw_ud* helpers in thread payload code 2022-09-13 01:44:24 +00:00
brw_fs_validate.cpp intel/compiler: Print more details when fs_visitor::validate() fails 2022-08-22 18:58:55 +00:00
brw_fs_visitor.cpp intel/compiler: Make thread_payload struct abstract 2022-09-13 01:44:24 +00:00
brw_gfx_ver_enum.h intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file 2022-06-30 23:46:35 +00:00
brw_inst.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_interpolation_map.c intel: Rename genx keyword to gfxx in source files 2021-04-02 18:33:07 +00:00
brw_ir.h intel/fs: switch register allocation spilling to use LSC on Gfx12.5+ 2022-08-24 17:51:40 +00:00
brw_ir_allocator.h
brw_ir_analysis.h
brw_ir_fs.h intel/compiler: Make component() work for FIXED_GRF/ARF 2022-08-23 19:52:38 +00:00
brw_ir_performance.cpp intel/fs: Remove non-_LOGICAL URB messages 2022-07-08 19:45:34 +00:00
brw_ir_performance.h
brw_ir_vec4.h intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_isa_info.h intel/compiler: Remove use of thread_local for opcode tables 2022-06-30 23:46:35 +00:00
brw_kernel.c intel,anv,iris,crocus: Drop subgroup size from the shader key 2022-07-08 22:47:22 +00:00
brw_kernel.h intel/compiler: fix singleton pointer coverity warning 2022-04-19 12:36:10 +03:00
brw_lower_logical_sends.cpp intel/fs: fixup a64 messages 2022-09-23 08:29:17 +00:00
brw_mesh.cpp intel/compiler/task: use shared memory for small task payload loads & stores 2022-09-21 09:16:20 +00:00
brw_nir.c intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes 2022-09-27 18:17:56 -07:00
brw_nir.h driconf: Add a limit_trig_input_range option 2022-05-13 06:47:53 +00:00
brw_nir_analyze_boolean_resolves.c
brw_nir_analyze_ubo_ranges.c intel/nir,i965: Move HW generation check for UBO pushing to i965 2021-06-03 05:12:33 +00:00
brw_nir_attribute_workarounds.c intel/compiler: Use named NIR intrinsic const index accessors 2022-08-16 05:44:30 +00:00
brw_nir_clamp_image_1d_2d_array_sizes.c intel/compiler: use nir_shader_instructions_pass in brw_nir_clamp_image_1d_2d_array_sizes 2021-10-05 10:02:54 +00:00
brw_nir_lower_alpha_to_coverage.c
brw_nir_lower_conversions.c intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_conversions 2021-10-05 10:02:54 +00:00
brw_nir_lower_cs_intrinsics.c intel/compiler: Lower Task/Mesh local_invocation_{id,index} 2021-12-04 00:41:46 +00:00
brw_nir_lower_intersection_shader.c intel/rt: Handle halts in any-hit shaders properly 2022-08-05 11:51:31 +00:00
brw_nir_lower_mem_access_bit_sizes.c intel/compiler: add support for 8/16 bits task payload loads 2022-09-21 09:16:20 +00:00
brw_nir_lower_ray_queries.c intel/nir/rt: store ray query state in scratch 2022-09-23 08:29:17 +00:00
brw_nir_lower_rt_intrinsics.c intel/compiler: extract brw_nir_load_global_const out of rt code 2021-12-04 00:41:46 +00:00
brw_nir_lower_scoped_barriers.c intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_scoped_barriers 2021-10-05 10:02:54 +00:00
brw_nir_lower_shader_calls.c intel/nir/rt: store ray query state in scratch 2022-09-23 08:29:17 +00:00
brw_nir_lower_shading_rate_output.c intel: fix typos found by codespell 2022-06-27 10:20:55 +00:00
brw_nir_lower_storage_image.c nir/builder: Add a nir_trim_vector helper 2022-05-11 14:47:33 +00:00
brw_nir_opt_peephole_ffma.c Revert "nir: Drop the unused instr arg for src/dest copy functions." 2022-08-30 18:21:44 +00:00
brw_nir_rt.c intel: Use nir_test_mask instead of i2b(iand) 2022-06-30 18:00:32 +00:00
brw_nir_rt.h intel/fs: lower ray query intrinsics 2022-02-08 12:55:25 +00:00
brw_nir_rt_builder.h intel/nir/rt: fixup alignment of memcpy iterations 2022-09-23 08:29:17 +00:00
brw_nir_tcs_workarounds.c intel/compiler: use nir_metadata_none instead of its value 2021-10-05 10:02:54 +00:00
brw_nir_trig_workarounds.py driconf: Add a limit_trig_input_range option 2022-05-13 06:47:53 +00:00
brw_packed_float.c
brw_predicated_break.cpp intel/compiler: Don't predicate a WHILE if there is a CONT 2021-12-08 14:56:32 -08:00
brw_prim.h intel/compiler: Split 3DPRIM_* defines out to a separate header. 2022-06-30 23:46:35 +00:00
brw_private.h intel,anv,iris,crocus: Drop subgroup size from the shader key 2022-07-08 22:47:22 +00:00
brw_reg.h intel/compiler: Add a few more brw_ud* helpers 2022-09-13 01:44:24 +00:00
brw_reg_type.c intel: Rename gen_device prefix to intel_device 2021-04-20 20:06:33 +00:00
brw_reg_type.h intel/compiler: Move type_is_unsigned_int to brw_reg_type.h 2021-08-30 14:00:14 -07:00
brw_rt.h intel/fs: lower ray query intrinsics 2022-02-08 12:55:25 +00:00
brw_schedule_instructions.cpp intel/fs: Lower URB messages to SEND 2022-07-08 19:45:34 +00:00
brw_shader.cpp intel/compiler: Use FS thread payload only for FS 2022-09-13 01:44:24 +00:00
brw_shader.h intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_simd_selection.c intel/fs: fixup simd selection with shader calls 2022-08-05 11:51:31 +00:00
brw_vec4.cpp intel/compiler: Use FS thread payload only for FS 2022-09-13 01:44:24 +00:00
brw_vec4.h intel/compiler: remove gfx6 gather wa from backend. 2021-12-22 21:37:55 +00:00
brw_vec4_builder.h intel: Rename Genx keyword to Gfxx 2021-04-02 18:33:07 +00:00
brw_vec4_cmod_propagation.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_copy_propagation.cpp intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
brw_vec4_cse.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vec4_dead_code_eliminate.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_generator.cpp intel/compiler: Delete unused Gfx8+ code in brw_find_live_channel() 2022-08-02 08:41:43 +00:00
brw_vec4_gs_nir.cpp intel/compiler: Use named NIR intrinsic const index accessors 2022-08-16 05:44:30 +00:00
brw_vec4_gs_visitor.cpp intel/compiler: Use FS thread payload only for FS 2022-09-13 01:44:24 +00:00
brw_vec4_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_live_variables.cpp intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_live_variables.h intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 2021-08-11 13:09:32 -07:00
brw_vec4_nir.cpp gallium,glsl: Delete PIPE_CAP_VERTEXID_NOBASE and lower_vertex_id. 2022-08-31 22:57:03 +00:00
brw_vec4_reg_allocate.cpp intel/compiler: Don't create vec4 reg-set for gen8+ 2022-07-14 17:49:01 +00:00
brw_vec4_surface_builder.cpp intel: move away from booleans to identify platforms 2021-11-08 16:48:06 +00:00
brw_vec4_surface_builder.h
brw_vec4_tcs.cpp intel/compiler: Use FS thread payload only for FS 2022-09-13 01:44:24 +00:00
brw_vec4_tcs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_tes.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vec4_tes.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_visitor.cpp intel/vec4: Inline emit_texture and move helpers to brw_vec4_nir.cpp 2021-12-16 00:09:45 -08:00
brw_vec4_vs.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
brw_vec4_vs_visitor.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
brw_vue_map.c intel/compiler: Store the number of position slots in the VUE map 2022-08-31 02:00:18 +00:00
gfx6_gs_visitor.cpp intel/compiler: Rename vec4 state URB opcodes to have VEC4_ prefix 2022-07-08 19:45:34 +00:00
gfx6_gs_visitor.h intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
intel_clc.c intel/compiler: Introduce a new brw_isa_info structure 2022-06-30 23:46:35 +00:00
meson.build intel/compiler: Use FS thread payload only for FS 2022-09-13 01:44:24 +00:00
test_eu_compact.cpp intel/compiler: Fixes [-Wdeprecated-declarations] in test_eu_compact.cpp 2022-08-23 15:19:16 +00:00
test_eu_validate.cpp intel/compiler: Fixes [-Wdeprecated-declarations] in test_eu_validate.cpp 2022-08-23 15:19:16 +00:00
test_fs_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_saturate_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_fs_scoreboard.cpp intel/fs/xehp: Add unit test for handling of RaR deps across multiple pipelines. 2022-01-25 22:40:44 +00:00
test_simd_selection.cpp intel/compiler: Initialize SIMDSelectionTest member error. 2021-11-03 04:22:35 +00:00
test_vec4_cmod_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_copy_propagation.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_dead_code_eliminate.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vec4_register_coalesce.cpp intel/fs,vec4: Drop support for shader time 2021-12-10 21:20:47 +00:00
test_vf_float_conversions.cpp