mesa/src
Francisco Jerez 1272ff5ed1 intel/brw/xehp+: Adjust performance model weights of LSC atomic ops.
The LSC implements several optimizations for atomic operations on a
memory addresses that are uniform across all lanes, in which case its
cost is approximately O(1) instead of O(exec_size).  Even cases where
memory offsets are non-uniform but packed in a cacheline appear to
have a cost that is non-linear with the number of lanes.

In order to approximate this behavior more closely approximate its
back-end cost as roughly 1300 cycles instead of the previous 400 *
exec_size/8.  This fixes some cases where we were incorrectly
predicting the SIMD32 shader would be bound by the throughput of LSC
atomic operations, even though the observed cost per lane of the LSC
operations was significantly lower in SIMD32 mode so it would have the
best performance.

Clearly this is still a rough approximation and it might be possible
to obtain a more accurate result by plumbing divergence analysis data
all the way down to codegen, however the goal of the performance
analysis pass isn't to provide an exact prediction of the performance
of a shader (that's not really possible in general via static analysis
without solving the halting problem), but to provide a good enough
approximation at a low cost -- And the constant approximation seems to
be strictly better in practice than the approximation we were using
before, there appear to be no regressions from this change, and
ShadowTombRaider-trace-dx11-2160p-ultra shows 5.7% better performance
on PTL with a subsequent commit that re-enables the use of the static
analysis-based SIMD32 heuristic on xe3+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:56 +00:00
..
amd aco/tests: add barrier-to-waitcnt tests 2025-09-09 12:34:40 +00:00
android_stub
asahi nir: remove subgroup size related nir_shader_compiler_options members 2025-09-09 11:09:22 +00:00
broadcom v3dv: Fix stencil clear values for only stencil clears 2025-09-08 17:57:33 +00:00
c11
compiler nir/lower_subgroups: remove lower_fp64 option 2025-09-09 11:09:22 +00:00
drm-shim drm-shim: fix with asan 2025-09-03 11:47:00 +00:00
egl egl,glx,X11: Handle case when PlatformDisplay is EGL_DEFAULT_DISPLAY 2025-09-04 14:35:53 +00:00
etnaviv clang-format: Update the .clang-format files to conformance clang-format json-schema 2025-09-09 07:04:55 +00:00
freedreno clang-format: Move ForEachMacros into src/.clang-format for freedreno 2025-09-09 07:04:55 +00:00
gallium zink: eliminate buffer refcounting to improve performance 2025-09-09 20:47:38 +00:00
gbm egl,glx: allow OpenGL with old libx11, but disable glthread if it's unsafe 2025-08-21 02:05:26 +00:00
getopt
gfxstream gfxstream: guest: don't use transitional LFS64 API 2025-09-02 16:45:20 +00:00
glx glx/kopper: don't call glFlush from swapbuffers 2025-08-22 00:42:28 +00:00
gtest
imagination clang-format: Update the .clang-format files to conformance clang-format json-schema 2025-09-09 07:04:55 +00:00
imgui
intel intel/brw/xehp+: Adjust performance model weights of LSC atomic ops. 2025-09-10 02:15:56 +00:00
loader loader: Don't fall back to nouveau GL without zink 2025-08-26 23:36:46 +00:00
mesa gallium: add pipe_context::resource_release to eliminate buffer refcounting 2025-09-09 20:47:38 +00:00
microsoft vulkan: Drop the driver_internal from vk_image_view_init/create() 2025-09-05 23:34:14 +00:00
nouveau nvk/ci: document fixed tests 2025-09-07 22:25:58 +02:00
panfrost clang-format: Update the .clang-format files to conformance clang-format json-schema 2025-09-09 07:04:55 +00:00
tool clang-format: Update the .clang-format files to conformance clang-format json-schema 2025-09-09 07:04:55 +00:00
util util/ra: Allow driver to override class P value. 2025-09-10 02:15:55 +00:00
virtio clang-format: Update the .clang-format files to conformance clang-format json-schema 2025-09-09 07:04:55 +00:00
vulkan vulkan: remove incorrect assert 2025-09-09 13:34:05 +00:00
x11 meson: add missing x11 dependency on libloader_x11 2025-08-08 21:45:59 +00:00
.clang-format clang-format: Move ForEachMacros into src/.clang-format for freedreno 2025-09-09 07:04:55 +00:00
meson.build mesa: remove inc_mapi 2025-08-06 20:35:26 +00:00