mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-05 11:48:06 +02:00
brw: Tune vectorizer conditions to allow overfetching with holes
Notably, our convergent block loads were already overfetching - we rounded up to block sizes of 8, 16, 32, or 64(LSC-only). But we did so in the backend, rather than NIR. With recent changes, nir_opt_load_store_vectorizer allows holes of up to 28 bytes (7 components at 4 bytes each). This allows us to detect cases where we did a convergent block load for 1 component (but loaded a whole vec8), then another load for the next vec8, and combine them into a single V16 load. Single component loads aren't the most common, but convergent loads of a vec2 in one group and a vec3 in another are quite common, and it makes no sense to do V8+V8 loads instead of V16. For non-block loads, we allow a max hole of 4 bytes. This allows the common case of XYZ_ + XYZ_ loads (where the last component is unread) to combine into a single larger load. fossil-db results on Lunarlake: Totals: Instrs: 146692608 -> 146246432 (-0.30%); split: -0.33%, +0.02% Subgroup size: 11100528 -> 11100512 (-0.00%) Send messages: 7003425 -> 6862529 (-2.01%); split: -2.01%, +0.00% Cycle count: 22396273274 -> 22523048654 (+0.57%); split: -1.08%, +1.64% Spill count: 67671 -> 67594 (-0.11%); split: -1.59%, +1.48% Fill count: 128999 -> 130223 (+0.95%); split: -1.73%, +2.68% Scratch Memory Size: 5986304 -> 6042624 (+0.94%); split: -1.40%, +2.34% Max live registers: 48898858 -> 48881655 (-0.04%); split: -0.05%, +0.01% Non SSA regs after NIR: 172397792 -> 167577380 (-2.80%); split: -2.80%, +0.00% Totals from 451003 (80.87% of 557667) affected shaders: Instrs: 134111754 -> 133665578 (-0.33%); split: -0.36%, +0.03% Subgroup size: 9039104 -> 9039088 (-0.00%) Send messages: 6127775 -> 5986879 (-2.30%); split: -2.30%, +0.00% Cycle count: 20306336726 -> 20433112106 (+0.62%); split: -1.19%, +1.81% Spill count: 56230 -> 56153 (-0.14%); split: -1.92%, +1.78% Fill count: 112920 -> 114144 (+1.08%); split: -1.97%, +3.06% Scratch Memory Size: 3769344 -> 3825664 (+1.49%); split: -2.23%, +3.72% Max live registers: 43750259 -> 43733056 (-0.04%); split: -0.05%, +0.01% Non SSA regs after NIR: 158449343 -> 153628931 (-3.04%); split: -3.04%, +0.00% In particular, sends get cut by 20.85% for Borderlands 3 DX12, 13.82% on Cyberpunk 2077, 10.75% on Strange Brigade, and 10.20% on Red Dead Redemption 2. Yet, spill/fills remain about the same. fossil-db results on Alchemist are similar though not quite as good. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
This commit is contained in:
parent
f88eb48ff2
commit
6fd10a6620
1 changed files with 7 additions and 4 deletions
|
|
@ -1421,7 +1421,7 @@ brw_nir_should_vectorize_mem(unsigned align_mul, unsigned align_offset,
|
|||
* those back into 32-bit ones anyway and UBO loads aren't split in NIR so
|
||||
* we don't want to make a mess for the back-end.
|
||||
*/
|
||||
if (bit_size > 32 || hole_size || !nir_num_components_valid(num_components))
|
||||
if (bit_size > 32)
|
||||
return false;
|
||||
|
||||
if (low->intrinsic == nir_intrinsic_load_ubo_uniform_block_intel ||
|
||||
|
|
@ -1429,14 +1429,14 @@ brw_nir_should_vectorize_mem(unsigned align_mul, unsigned align_offset,
|
|||
low->intrinsic == nir_intrinsic_load_shared_uniform_block_intel ||
|
||||
low->intrinsic == nir_intrinsic_load_global_constant_uniform_block_intel) {
|
||||
if (num_components > 4) {
|
||||
if (!util_is_power_of_two_nonzero(num_components))
|
||||
return false;
|
||||
|
||||
if (bit_size != 32)
|
||||
return false;
|
||||
|
||||
if (num_components > 32)
|
||||
return false;
|
||||
|
||||
if (hole_size > 4 * (8 - low->num_components))
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
/* We can handle at most a vec4 right now. Anything bigger would get
|
||||
|
|
@ -1444,6 +1444,9 @@ brw_nir_should_vectorize_mem(unsigned align_mul, unsigned align_offset,
|
|||
*/
|
||||
if (num_components > 4)
|
||||
return false;
|
||||
|
||||
if (hole_size > 4)
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue