mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-20 19:58:19 +02:00
When the register allocator decides to spill a value, all reads of that value are filled. This can result in cases where the same value is filled many times in a single block. In those cases, the result of an earlier fill may still be available when a later fill occurs. This optimization replaces the later fill with a move from the result of the earlier fill. v2: Use FIXED_GRF for register overlap tests. Since this is after register allocation, the VGRF values will not tell the whole truth. v3: Use brw_transform_inst. Suggested by Caio. Add brw_scratch_inst::offset instead of storing it as a source. Suggested by Lionel. v4: In intervening spill to the same location also invalidates the value. 🤦 v5: Don't eliminate a fill if its destination partially overlaps the preceeding fill destination. Fixes failures in cooperative matrix CTS. shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17249903 -> 17249653 (<.01%) instructions in affected programs: 35550 -> 35300 (-0.70%) helped: 20 / HURT: 0 total cycles in shared programs: 893092398 -> 893101836 (<.01%) cycles in affected programs: 2501720 -> 2511158 (0.38%) helped: 6 / HURT: 14 total fills in shared programs: 1901 -> 1776 (-6.58%) fills in affected programs: 1757 -> 1632 (-7.11%) helped: 20 / HURT: 0 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 929949528 -> 926770338 (-0.34%) Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02% Fill count: 6520785 -> 5021518 (-22.99%) Totals from 54281 (2.69% of 2018922) affected shaders: Instrs: 239616289 -> 236437099 (-1.33%) Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08% Fill count: 6406295 -> 4907028 (-23.40%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827> |
||
|---|---|---|
| .. | ||
| brw | ||
| elk | ||
| brw_device_sha1_gen_c.py | ||
| brw_list.h | ||
| intel_gfx_ver_enum.h | ||
| intel_nir.c | ||
| intel_nir.h | ||
| intel_nir_blockify_uniform_loads.c | ||
| intel_nir_clamp_image_1d_2d_array_sizes.c | ||
| intel_nir_clamp_per_vertex_loads.c | ||
| intel_nir_lower_non_uniform_barycentric_at_sample.c | ||
| intel_nir_lower_non_uniform_resource_intel.c | ||
| intel_nir_lower_printf.c | ||
| intel_nir_lower_shading_rate_output.c | ||
| intel_nir_lower_sparse.c | ||
| intel_nir_opt_peephole_ffma.c | ||
| intel_nir_opt_peephole_imul32x16.c | ||
| intel_nir_tcs_workarounds.c | ||
| intel_prim.h | ||
| intel_shader_enums.h | ||
| meson.build | ||