mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2025-12-27 08:20:12 +01:00
intel/brw/xehp+: Adjust performance model weights of LSC atomic ops.
The LSC implements several optimizations for atomic operations on a memory addresses that are uniform across all lanes, in which case its cost is approximately O(1) instead of O(exec_size). Even cases where memory offsets are non-uniform but packed in a cacheline appear to have a cost that is non-linear with the number of lanes. In order to approximate this behavior more closely approximate its back-end cost as roughly 1300 cycles instead of the previous 400 * exec_size/8. This fixes some cases where we were incorrectly predicting the SIMD32 shader would be bound by the throughput of LSC atomic operations, even though the observed cost per lane of the LSC operations was significantly lower in SIMD32 mode so it would have the best performance. Clearly this is still a rough approximation and it might be possible to obtain a more accurate result by plumbing divergence analysis data all the way down to codegen, however the goal of the performance analysis pass isn't to provide an exact prediction of the performance of a shader (that's not really possible in general via static analysis without solving the halting problem), but to provide a good enough approximation at a low cost -- And the constant approximation seems to be strictly better in practice than the approximation we were using before, there appear to be no regressions from this change, and ShadowTombRaider-trace-dx11-2160p-ultra shows 5.7% better performance on PTL with a subsequent commit that re-enables the use of the static analysis-based SIMD32 heuristic on xe3+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
This commit is contained in:
parent
6eea9659db
commit
1272ff5ed1
1 changed files with 1 additions and 1 deletions
|
|
@ -683,7 +683,7 @@ namespace {
|
|||
case LSC_OP_ATOMIC_OR:
|
||||
case LSC_OP_ATOMIC_XOR:
|
||||
return calculate_desc(info, EU_UNIT_DP_DC, 2, 0, 0,
|
||||
30 /* XXX */, 400 /* XXX */,
|
||||
1300 /* XXX */, 0 /* XXX */,
|
||||
10 /* XXX */, 100 /* XXX */, 0, 0,
|
||||
0, 400 /* XXX */);
|
||||
default:
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue