mesa/src/intel/compiler
Jason Ekstrand 5abac85177 intel/fs: Rework scratch handling on Gen9+
The current scratch mechanism uses an MRF hack where we reserve a few
GRF registers to treat like the MRF and we collect the data into that
MRF region before doing a scratch write.  We also use that region for
the header for scratch reads.

This commit changes things and gets rid of the MRF hack.  Instead, we
reserve a single register (which RA is free to pick) for the scratch
header and uses split sends for scratch writes to avoid having to do
the copy.  This should provide RA with more freedom in the presence of
spilling as well as avoid some unnecessary data moves.  In future, the
new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own
per-thread scratch base address calculations rather than depending on
the scratch base address that gets pushed into g0.  Having an opcode for
this lets us do it once at the top of the shader rather than repeating
it at every read/write.

One other noticeable difference is the use of SHADER_OPCODE_SEND.  We
can get away with this thanks to the fact that we're now using a set to
track which instructions are generated by spills and don't rely on the
opcodes to find spill/fill instructions.  This allows us to avoid adding
more virtual opcodes and let the normal code paths handle things like
scoreboard dependencies between header setup and the SEND.  It also
means that post-RA scheduling may be able to space out the header setup
MOV and the SEND for better latency hiding.

Shader-db results on Skylake:

    total spills in shared programs: 12137 -> 10604 (-12.63%)
    spills in affected programs: 6685 -> 5152 (-22.93%)
    helped: 274
    HURT: 2

    total fills in shared programs: 13065 -> 11515 (-11.86%)
    fills in affected programs: 9007 -> 7457 (-17.21%)
    helped: 275
    HURT: 1

Shader-db results on Ice Lake:

    total spills in shared programs: 12482 -> 10953 (-12.25%)
    spills in affected programs: 6586 -> 5057 (-23.22%)
    helped: 275
    HURT: 0

    total fills in shared programs: 12819 -> 11234 (-12.36%)
    fills in affected programs: 7867 -> 6282 (-20.15%)
    helped: 274
    HURT: 0

Shader-db results on Tigerlake:

    total spills in shared programs: 11689 -> 10233 (-12.46%)
    spills in affected programs: 4740 -> 3284 (-30.72%)
    helped: 259
    HURT: 0

    total fills in shared programs: 10840 -> 9443 (-12.89%)
    fills in affected programs: 6244 -> 4847 (-22.37%)
    helped: 259
    HURT: 0

Fossil-db results on Ice Lake:

    Spills in all programs: 245249 -> 201633 (-17.8%)
    Fills in all programs: 366066 -> 314368 (-14.1%)

More practically, this seems to give about a 0.5-1% perf boost in
Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
..
brw_cfg.cpp intel/ir: Remove scheduling-based cycle count estimates. 2020-04-28 23:01:27 -07:00
brw_cfg.h intel/ir: Remove scheduling-based cycle count estimates. 2020-04-28 23:01:27 -07:00
brw_clip.h
brw_clip_line.c
brw_clip_point.c
brw_clip_tri.c
brw_clip_unfilled.c
brw_clip_util.c
brw_compile_clip.c intel: drop likely/unlikely around INTEL_DEBUG 2020-10-06 18:43:07 +00:00
brw_compile_sf.c intel: drop likely/unlikely around INTEL_DEBUG 2020-10-06 18:43:07 +00:00
brw_compiler.c intel/fs: Add an option to use dataport messages for UBOs 2020-10-08 01:17:06 -05:00
brw_compiler.h intel/fs: Add an option to use dataport messages for UBOs 2020-10-08 01:17:06 -05:00
brw_dead_control_flow.cpp intel/compiler: Pass detailed dependency classes to invalidate_analysis() 2020-03-06 10:20:39 -08:00
brw_dead_control_flow.h
brw_debug_recompile.c intel/compiler: Add a "base class" for program keys 2019-07-10 19:35:55 +00:00
brw_disasm.c intel/disasm: Label support in shader disassembly for UIP/JIP 2020-09-02 10:33:29 +00:00
brw_disasm_info.c intel/disasm: Label support in shader disassembly for UIP/JIP 2020-09-02 10:33:29 +00:00
brw_disasm_info.h intel/disasm: Label support in shader disassembly for UIP/JIP 2020-09-02 10:33:29 +00:00
brw_eu.cpp intel/eu: Add a mechanism for emitting relocatable constant MOVs 2020-09-02 19:48:44 +00:00
brw_eu.h intel/eu: Add a mechanism for emitting relocatable constant MOVs 2020-09-02 19:48:44 +00:00
brw_eu_compact.c intel: drop likely/unlikely around INTEL_DEBUG 2020-10-06 18:43:07 +00:00
brw_eu_defines.h intel/fs: Add a SCRATCH_HEADER opcode 2020-10-13 21:59:27 +00:00
brw_eu_emit.c intel/compiler: Silence unused parameter warning in brw_surface_payload_size 2020-09-28 11:43:04 -07:00
brw_eu_util.c
brw_eu_validate.c remove final imports.h and imports.c bits 2020-04-21 11:09:04 -07:00
brw_fs.cpp intel/fs: Add a SCRATCH_HEADER opcode 2020-10-13 21:59:27 +00:00
brw_fs.h intel/fs: Add a SCRATCH_HEADER opcode 2020-10-13 21:59:27 +00:00
brw_fs_bank_conflicts.cpp intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function. 2020-04-28 23:00:29 -07:00
brw_fs_builder.h intel/fs: Rename half() helpers to quarter(), allow index up to 3. 2020-04-28 23:00:29 -07:00
brw_fs_cmod_propagation.cpp intel/compiler: don't propagate cmp to add if add is saturated 2020-07-11 00:25:48 +00:00
brw_fs_combine_constants.cpp intel/compiler: Move idom tree calculation and related logic into analysis object 2020-03-06 10:21:03 -08:00
brw_fs_copy_propagation.cpp intel/fs: Don't copy-propagate stride=0 sources into ddx/ddy 2020-09-02 20:31:32 +00:00
brw_fs_cse.cpp intel/compiler/fs: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:57 -08:00
brw_fs_dead_code_eliminate.cpp intel/compiler/fs: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:57 -08:00
brw_fs_generator.cpp intel/fs: Add a SCRATCH_HEADER opcode 2020-10-13 21:59:27 +00:00
brw_fs_live_variables.cpp intel/compiler: Silence unused parameter warning in fs_live_variables::setup_one_read 2020-04-17 08:21:40 -07:00
brw_fs_live_variables.h intel/compiler: Silence unused parameter warning in fs_live_variables::setup_one_read 2020-04-17 08:21:40 -07:00
brw_fs_lower_pack.cpp intel/compiler: Pass detailed dependency classes to invalidate_analysis() 2020-03-06 10:20:39 -08:00
brw_fs_lower_regioning.cpp intel/fs: Assert if lower_source_modifiers converts 32x16 to 32x32 multiplication 2020-08-10 13:29:56 -07:00
brw_fs_nir.cpp nir: Add ability to count emitted GS primitives. 2020-10-09 15:26:14 +02:00
brw_fs_reg_allocate.cpp intel/fs: Rework scratch handling on Gen9+ 2020-10-13 21:59:27 +00:00
brw_fs_register_coalesce.cpp intel/fs: Don't delete coalesced MOVs if they have a cmod 2020-04-29 16:45:51 +00:00
brw_fs_saturate_propagation.cpp intel/compiler/fs: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:57 -08:00
brw_fs_scoreboard.cpp intel/fs/swsb: SCHEDULING_FENCE only emits SYNC_NOP 2020-09-20 14:43:40 +00:00
brw_fs_sel_peephole.cpp intel/compiler: Don't create 64-bit src1 immediates in opt_peephole_sel 2020-04-23 00:53:14 +00:00
brw_fs_validate.cpp
brw_fs_visitor.cpp intel/compiler: initialize remaining fields of various classes 2020-09-10 12:16:58 +00:00
brw_gen_enum.h intel/compiler: Extract GEN_* macros into separate file 2020-01-22 00:19:20 +00:00
brw_inst.h intel/compiler: Fix array bounds warning on GCC 10. 2020-01-22 08:35:18 +01:00
brw_interpolation_map.c intel/compiler: mark debug constant as const 2020-09-02 15:08:01 +00:00
brw_ir.h intel/ir: Add missing initialization of backend_reg::offset during construction. 2020-04-28 23:00:29 -07:00
brw_ir_allocator.h
brw_ir_analysis.h intel/compiler: Define more detailed analysis dependency classes 2020-03-06 10:20:37 -08:00
brw_ir_fs.h intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function. 2020-04-28 23:00:29 -07:00
brw_ir_performance.cpp intel/fs: Add a SCRATCH_HEADER opcode 2020-10-13 21:59:27 +00:00
brw_ir_performance.h intel/ir: Import shader performance analysis pass. 2020-04-28 23:01:03 -07:00
brw_ir_vec4.h intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag(). 2020-04-28 23:00:29 -07:00
brw_nir.c radv/aco,nir/lower_subgroups: don't lower elect 2020-10-13 12:47:20 +00:00
brw_nir.h intel/nir: Rewrite the guts of lower_alpha_to_coverage 2020-08-29 16:41:05 +00:00
brw_nir_analyze_boolean_resolves.c
brw_nir_analyze_ubo_ranges.c intel/compiler: Do not qsort zero sized array 2020-02-19 12:07:24 +02:00
brw_nir_attribute_workarounds.c
brw_nir_clamp_image_1d_2d_array_sizes.c intel/compiler: fixup Gen12 workaround for array sizes 2020-09-21 21:20:09 +00:00
brw_nir_lower_alpha_to_coverage.c intel/nir: Clean up lower_alpha_to_coverage a bit 2020-08-29 16:41:05 +00:00
brw_nir_lower_conversions.c intel/nir: Call nir_metadata_preserve on !progress 2020-06-11 05:08:12 +00:00
brw_nir_lower_cs_intrinsics.c intel/nir: Lower load_num_work_groups to 32-bit if needed 2020-09-02 20:38:22 +00:00
brw_nir_lower_image_load_store.c nir: Add an LOD parameter to image_*_size 2020-08-20 20:48:10 +00:00
brw_nir_lower_mem_access_bit_sizes.c intel/nir: Lower load_global_constant in lower_mem_access_bit_sizes 2020-10-08 03:56:01 +00:00
brw_nir_lower_scoped_barriers.c nir: Call nir_metadata_preserve on !progress 2020-06-11 05:08:12 +00:00
brw_nir_opt_peephole_ffma.c intel/nir: Call nir_metadata_preserve on !progress 2020-06-11 05:08:12 +00:00
brw_nir_tcs_workarounds.c intel/nir: Use nir control flow helpers 2020-09-30 15:47:51 +00:00
brw_nir_trig_workarounds.py intel/nir: do not apply the fsin and fcos trig workarounds for consts 2019-09-17 23:39:18 +03:00
brw_packed_float.c intel/compiler: Cast to target type before shifting left 2019-10-24 16:19:23 +02:00
brw_predicated_break.cpp intel/compiler: Pass detailed dependency classes to invalidate_analysis() 2020-03-06 10:20:39 -08:00
brw_reg.h intel/fs: Emit HALT for discard on Gen4-5 2020-05-30 06:21:15 +00:00
brw_reg_type.c intel/compiler: Handle invalid inputs to brw_reg_type_to_*() 2020-01-22 00:19:21 +00:00
brw_reg_type.h intel/compiler: Add a INVALID_{,HW_}REG_TYPE macros 2020-01-22 00:19:20 +00:00
brw_schedule_instructions.cpp intel/fs: Rework scratch handling on Gen9+ 2020-10-13 21:59:27 +00:00
brw_shader.cpp intel/fs: Add a SCRATCH_HEADER opcode 2020-10-13 21:59:27 +00:00
brw_shader.h intel/compiler: Expose brw_texture_offset to C 2020-06-23 17:43:53 +00:00
brw_vec4.cpp intel: drop likely/unlikely around INTEL_DEBUG 2020-10-06 18:43:07 +00:00
brw_vec4.h intel/vec4: Remove all support for Gen8+ [v2] 2020-09-28 11:43:10 -07:00
brw_vec4_builder.h intel/vec4: Remove inline lowering of LRP 2020-09-28 11:43:10 -07:00
brw_vec4_cmod_propagation.cpp intel/compiler: Pass detailed dependency classes to invalidate_analysis() 2020-03-06 10:20:39 -08:00
brw_vec4_copy_propagation.cpp intel/vec4: Remove all support for Gen8+ [v2] 2020-09-28 11:43:10 -07:00
brw_vec4_cse.cpp i965/vec4: Ignore swizzle of VGRF for use by var_range_end() 2020-05-20 20:19:18 +00:00
brw_vec4_dead_code_eliminate.cpp intel/compiler/vec4: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:59 -08:00
brw_vec4_generator.cpp intel/vec4: Remove everything related to VS_OPCODE_SET_SIMD4X2_HEADER_GEN9 2020-09-28 11:43:10 -07:00
brw_vec4_gs_nir.cpp nir: Add ability to count emitted GS primitives. 2020-10-09 15:26:14 +02:00
brw_vec4_gs_visitor.cpp nir: Add ability to count primitives per stream. 2020-10-09 15:26:14 +02:00
brw_vec4_gs_visitor.h
brw_vec4_live_variables.cpp intel/compiler: Drop invalidate_live_intervals() 2020-03-06 10:21:01 -08:00
brw_vec4_live_variables.h intel/compiler/vec4: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:59 -08:00
brw_vec4_nir.cpp intel/vec4: Remove all support for Gen8+ [v2] 2020-09-28 11:43:10 -07:00
brw_vec4_reg_allocate.cpp intel/compiler/vec4: Switch liveness analysis to IR analysis framework 2020-03-06 10:20:59 -08:00
brw_vec4_surface_builder.cpp intel/vec4: Remove all support for Gen8+ [v2] 2020-09-28 11:43:10 -07:00
brw_vec4_surface_builder.h
brw_vec4_tcs.cpp intel: drop likely/unlikely around INTEL_DEBUG 2020-10-06 18:43:07 +00:00
brw_vec4_tcs.h intel/compiler: Silence unused parameter warnings in vec4_tcs_visitor 2020-04-17 08:21:37 -07:00
brw_vec4_tes.cpp intel/vec4: Drop all of the 64-bit varying code 2019-07-31 18:14:09 -05:00
brw_vec4_tes.h
brw_vec4_visitor.cpp intel/vec4: Remove leftover code from Gen8+ removal. 2020-10-03 03:53:46 +00:00
brw_vec4_vs.h i965: Use NIR to lower legacy userclipping. 2019-07-24 18:00:13 +00:00
brw_vec4_vs_visitor.cpp i965: Use NIR to lower legacy userclipping. 2019-07-24 18:00:13 +00:00
brw_vue_map.c intel/fs: Allow multiple slots for position 2020-04-07 17:16:09 +00:00
brw_wm_iz.cpp intel/fs: Move more prog_data setup into populate_wm_prog_data 2020-06-23 17:43:53 +00:00
gen6_gs_visitor.cpp
gen6_gs_visitor.h
meson.build intel/compiler: Extract control barriers from scoped barriers 2020-06-03 07:39:52 +00:00
test_eu_compact.cpp intel/compiler: Get rid of the global compaction table pointers 2020-09-02 19:48:44 +00:00
test_eu_validate.cpp intel/disasm: Label support in shader disassembly for UIP/JIP 2020-09-02 10:33:29 +00:00
test_fs_cmod_propagation.cpp intel/compiler: don't propagate cmp to add if add is saturated 2020-07-11 00:25:48 +00:00
test_fs_copy_propagation.cpp intel/compiler: Pass backend_shader * to cfg_t() 2020-03-09 04:44:12 +00:00
test_fs_saturate_propagation.cpp intel/compiler: Pass backend_shader * to cfg_t() 2020-03-09 04:44:12 +00:00
test_fs_scoreboard.cpp intel/compiler: Pass backend_shader * to cfg_t() 2020-03-09 04:44:12 +00:00
test_vec4_cmod_propagation.cpp
test_vec4_copy_propagation.cpp intel/compiler/test: use TEST_DEBUG env var consistently 2020-09-02 15:08:01 +00:00
test_vec4_dead_code_eliminate.cpp intel/compiler/test: use TEST_DEBUG env var consistently 2020-09-02 15:08:01 +00:00
test_vec4_register_coalesce.cpp intel/compiler/test: use TEST_DEBUG env var consistently 2020-09-02 15:08:01 +00:00
test_vf_float_conversions.cpp