mesa/src/intel/compiler
Caio Oliveira 2e2b83f72d intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION
Instead of emitting a single one at the top, and making reference to it,
emit the virtual instruction as needed and let CSE do its job.

Since load_subgroup_invocation now can appear not at the start of the
shader, use UNDEF in all cases to ensure that the liveness of the
destination doesn't extend to the first partial write done here (it was
being used only for SIMD > 8 before).

Note this option was considered in the past
6132992cdb but at the time dismissed.  The
difference now is that the lowering of the virtual instruction happens
earlier than the scheduling.

The motivation for this change is to allow passes other than the NIR
conversion to use this value.  The alternative of storing a `brw_reg` in
the shader (instead of NIR state) gets complicated by passes like
compact_vgrfs, that move VGRFs around (and update the instructions).
This and maybe other passes would have to care about the brw_reg.

Fossil-db numbers, TGL

```
*** Shaders only in 'after' results are ignored:
steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more
from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider

*** Shaders only in 'before' results are ignored:
steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more
from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider

Totals:
Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00%
Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03%
Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01%
Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31725096 -> 31671792 (-0.17%)
Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00%

Totals from 52574 (8.34% of 630441) affected shaders:
Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02%
Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29%
Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03%
Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02%
Max live registers: 1992493 -> 1939189 (-2.68%)
Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05%
```

and DG2

```
*** Shaders only in 'after' results are ignored:
steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more
from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider

*** Shaders only in 'before' results are ignored:
steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more
from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider

Totals:
Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 7699776 -> 7699784 (+0.00%)
Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02%
Spill count: 61283 -> 61274 (-0.01%)
Fill count: 119886 -> 119849 (-0.03%)
Max live registers: 31810432 -> 31758920 (-0.16%)
Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06%

Totals from 49286 (7.81% of 631231) affected shaders:
Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04%
Subgroup size: 857752 -> 857760 (+0.00%)
Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72%
Spill count: 6339 -> 6330 (-0.14%)
Fill count: 12571 -> 12534 (-0.29%)
Max live registers: 1788346 -> 1736834 (-2.88%)
Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66%
```

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>
2024-08-08 18:20:49 +00:00
..
elk nir/opt_uniform_atomics: add fs atomics predicated? flag 2024-08-06 11:48:17 -04:00
tests intel/brw: Remove assembler tests for Gfx8- 2024-02-24 02:10:56 +00:00
brw_asm.c intel/brw: Split off assembler logic into library 2024-07-12 19:34:23 +00:00
brw_asm.h intel/brw: Split off assembler logic into library 2024-07-12 19:34:23 +00:00
brw_asm_internal.h intel/brw: Split off assembler logic into library 2024-07-12 19:34:23 +00:00
brw_asm_tool.c intel/brw: Split off assembler logic into library 2024-07-12 19:34:23 +00:00
brw_cfg.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_cfg.h intel/brw: Add a idom_tree::dominates(a, b) helper. 2024-06-08 02:18:56 -07:00
brw_compile_bs.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_cs.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_fs.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_gs.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_mesh.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_tcs.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_tes.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compile_vs.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_compiler.c intel: Let compiler set indirect_ubos_use_sampler 2024-07-31 19:26:20 +00:00
brw_compiler.h intel/brw: Fix undefined shift by 64 of uint64_t in brw_compute_first_urb_slot_required 2024-07-26 17:17:15 -07:00
brw_debug_recompile.c intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_def_analysis.cpp intel/brw: Track the number of uses of each def in def_analysis 2024-06-18 09:02:25 +00:00
brw_device_sha1_gen_c.py intel/compiler: drop unused ray-tracing fields from cache hash 2024-03-22 00:01:28 +00:00
brw_disasm.c intel/disasm: Fix cache load/store disassembly for URB messages 2024-05-09 19:45:18 +00:00
brw_disasm.h intel/compiler: Merge intel_disasm.[ch] into corresponding brw files 2024-02-15 09:26:46 +00:00
brw_disasm_info.cpp intel/brw: Use fs_inst in disasm_annotate() 2024-02-29 21:14:13 -08:00
brw_disasm_info.h intel/brw: Use fs_inst in disasm_annotate() 2024-02-29 21:14:13 -08:00
brw_disasm_tool.c intel/brw: Remove Gfx8- code from disassembler 2024-02-28 05:45:38 +00:00
brw_eu.c intel/brw: Delete SAD2 and SADA2 opcodes 2024-06-10 16:47:50 -07:00
brw_eu.h intel/nir: add reloc delta to load_reloc_const_intel intrinsic 2024-05-15 13:13:38 +00:00
brw_eu_compact.c intel/brw: Fix undefined left shift of negative value in update_uip_jip 2024-07-26 17:17:53 -07:00
brw_eu_defines.h intel/brw: Make gl_SubgroupInvocation lane index loading SSA 2024-06-18 09:02:25 +00:00
brw_eu_emit.c intel/fs/gfx20+: Fix surface state address on extended descriptors for NIR scratch intrinsics. 2024-06-21 01:49:43 +00:00
brw_eu_validate.c intel/brw/validate: Convert access mask to be grf based 2024-08-02 22:18:51 +00:00
brw_fs.cpp intel/brw: Only force g0's liveness to be the whole program if spilling 2024-08-01 16:37:34 -07:00
brw_fs.h intel/brw: Replace predicated break optimization with a simple peephole 2024-08-05 19:17:55 -07:00
brw_fs_bank_conflicts.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_builder.h intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION 2024-08-08 18:20:49 +00:00
brw_fs_cmod_propagation.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_combine_constants.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_fs_copy_propagation.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_cse.cpp intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION 2024-08-08 18:20:49 +00:00
brw_fs_dead_code_eliminate.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_generator.cpp intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 2024-08-01 16:37:31 -07:00
brw_fs_live_variables.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_live_variables.h intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_fs_lower.cpp intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION 2024-08-08 18:20:49 +00:00
brw_fs_lower_dpas.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_fs_lower_integer_multiplication.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_fs_lower_pack.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_fs_lower_regioning.cpp intel/brw: Disallow scalar byte to float conversions on DG2+ 2024-07-18 18:51:35 +00:00
brw_fs_lower_simd_width.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_fs_nir.cpp intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION 2024-08-08 18:20:49 +00:00
brw_fs_opt.cpp intel/brw: Replace predicated break optimization with a simple peephole 2024-08-05 19:17:55 -07:00
brw_fs_opt_algebraic.cpp intel/brw: Rename fs_reg_* helpers to brw_reg_* 2024-07-03 02:53:19 +00:00
brw_fs_opt_virtual_grfs.cpp intel/brw: allocate large table in the heap instead of the stack 2024-07-03 12:10:28 +00:00
brw_fs_reg_allocate.cpp intel/brw: Only force g0's liveness to be the whole program if spilling 2024-08-01 16:37:34 -07:00
brw_fs_register_coalesce.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_saturate_propagation.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_scoreboard.cpp intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_fs_thread_payload.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_fs_validate.cpp intel/brw: Move out of fs_visitor and rename print instructions 2024-07-25 15:37:13 +00:00
brw_fs_visitor.cpp intel/brw: Move interp_reg and per_primitive_reg out of fs_visitor 2024-07-25 15:37:13 +00:00
brw_fs_workaround.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_gram.y intel/brw: Split off assembler logic into library 2024-07-12 19:34:23 +00:00
brw_inst.h intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
brw_ir.h intel/brw: Fold backend_reg into fs_reg 2024-03-01 17:52:09 +00:00
brw_ir_allocator.h
brw_ir_analysis.h
brw_ir_fs.h intel/brw: Move brw_reg helpers into brw_reg.h 2024-07-03 02:53:19 +00:00
brw_ir_performance.cpp intel/brw: Replace uses of fs_reg with brw_reg 2024-07-03 02:53:19 +00:00
brw_ir_performance.h intel/brw: Fold backend_shader into fs_visitor 2024-02-29 19:28:05 +00:00
brw_isa_info.h
brw_kernel.c intel-clc: missing printf lowering 2024-08-06 17:55:18 +00:00
brw_kernel.h intel-clc: Use correct set of nir_options when building for Gfx8 2024-02-24 00:24:32 +00:00
brw_lex.l intel/brw: Split off assembler logic into library 2024-07-12 19:34:23 +00:00
brw_lower_logical_sends.cpp intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 2024-08-01 16:37:31 -07:00
brw_nir.c nir/opt_uniform_atomics: add fs atomics predicated? flag 2024-08-06 11:48:17 -04:00
brw_nir.h intel/nir: add printf lowering 2024-05-15 13:13:38 +00:00
brw_nir_analyze_ubo_ranges.c intel/brw/xe2: Update brw_nir_analyze_ubo_ranges to account for 512b physical registers 2024-04-01 00:00:03 +00:00
brw_nir_lower_alpha_to_coverage.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
brw_nir_lower_cooperative_matrix.c intel/brw/xe2+: Allow vec16 for cooperative matrix 2024-06-25 14:17:47 -07:00
brw_nir_lower_cs_intrinsics.c intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task 2024-06-28 16:30:38 +00:00
brw_nir_lower_fsign.py intel/brw: Use range analysis to optimize fsign 2024-05-14 01:28:21 +00:00
brw_nir_lower_intersection_shader.c intel/rt: fix terminateOnFirstHit handling 2024-08-05 21:43:36 +00:00
brw_nir_lower_ray_queries.c intel/nir: only consider ray query variables in lowering 2024-02-24 12:56:30 +00:00
brw_nir_lower_rt_intrinsics.c treewide: use nir_def_replace sometimes 2024-06-21 15:36:56 +00:00
brw_nir_lower_shader_calls.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
brw_nir_lower_storage_image.c intel/brw: Remove Gfx8- code from lower storage image pass 2024-02-28 05:45:38 +00:00
brw_nir_rt.c treewide: use nir_def_replace sometimes 2024-06-21 15:36:56 +00:00
brw_nir_rt.h
brw_nir_rt_builder.h
brw_nir_trig_workarounds.py
brw_packed_float.c
brw_prim.h
brw_print.cpp intel/brw: Move out of fs_visitor and rename print instructions 2024-07-25 15:37:13 +00:00
brw_private.h intel/brw: fix subgroup size of geometry stages for lnl+ 2024-05-14 23:13:37 +00:00
brw_reg.h intel/brw: Fix undefined left shift of large UW value in brw_imm_uw 2024-07-26 17:17:56 -07:00
brw_reg_type.c intel/brw: Rename brw_reg_type_to_hw_type to brw_type_encode 2024-04-25 11:41:48 +00:00
brw_reg_type.h intel/brw: Make a helper for finding the largest of two types 2024-04-29 07:51:45 +00:00
brw_rt.h
brw_schedule_instructions.cpp intel/brw: Only force g0's liveness to be the whole program if spilling 2024-08-01 16:37:34 -07:00
brw_shader.cpp intel/brw: Move remaining compile stages to their own files 2024-07-25 15:37:13 +00:00
brw_simd_selection.cpp intel/brw: fix subgroup size of geometry stages for lnl+ 2024-05-14 23:13:37 +00:00
brw_vue_map.c intel/brw: Simplify @file annotations 2024-07-22 22:48:03 +00:00
intel_clc.c intel/clc: Free disk_cache 2024-07-24 20:46:28 +00:00
intel_gfx_ver_enum.h intel/compiler: Rename brw_gfx_ver_enum.h to intel_gfx_ver_enum.h 2024-02-16 22:35:05 +00:00
intel_nir.c intel/compiler: Rename the passes and files related to intel_nir.h 2024-02-16 22:35:05 +00:00
intel_nir.h intel/nir: add printf lowering 2024-05-15 13:13:38 +00:00
intel_nir_blockify_uniform_loads.c brw: blockify load_global_const_block_intel 2024-06-21 08:29:44 +00:00
intel_nir_clamp_image_1d_2d_array_sizes.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_clamp_per_vertex_loads.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_lower_conversions.c intel/nir: Don't needlessly split u2f16 for nir_type_uint32 2024-07-11 02:37:05 -07:00
intel_nir_lower_non_uniform_barycentric_at_sample.c intel/compiler: Ensure load_barycentric_at_sample and load_interpolated_input remain together 2024-04-04 23:42:27 +00:00
intel_nir_lower_non_uniform_resource_intel.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_lower_printf.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_lower_shading_rate_output.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_lower_sparse.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_lower_texture.c intel/compiler: Pack texture LOD and offset to a single 32-bit value 2024-02-27 00:22:46 +00:00
intel_nir_opt_peephole_ffma.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_opt_peephole_imul32x16.c treewide: use nir_metadata_control_flow 2024-06-17 16:28:14 -04:00
intel_nir_tcs_workarounds.c intel/nir: Set src_type on TCS quads workaround store_output 2024-05-02 13:58:21 -07:00
intel_shader_enums.h intel/compiler: Use "intel" prefix for walk_order enum 2024-02-21 00:38:35 +00:00
meson.build intel/brw: Replace predicated break optimization with a simple peephole 2024-08-05 19:17:55 -07:00
test_eu_compact.cpp intel/brw: Stop using long BRW_REGISTER_TYPE enum names 2024-04-25 11:41:48 +00:00
test_eu_validate.cpp intel/brw: Rename brw_reg_type_to_hw_type to brw_type_encode 2024-04-25 11:41:48 +00:00
test_fs_cmod_propagation.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
test_fs_combine_constants.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
test_fs_copy_propagation.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
test_fs_cse.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
test_fs_saturate_propagation.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
test_fs_scoreboard.cpp intel/brw: Move calculate_cfg out of fs_visitor 2024-07-25 15:37:13 +00:00
test_simd_selection.cpp intel: Remove brw_ prefix from process debug function 2024-02-16 22:35:05 +00:00
test_vf_float_conversions.cpp