mesa/src
Francisco Jerez 4d73988f6f intel/ir/gen12+: Work around FS performance regressions due to SIMD32 discard divergence.
This avoids some performance regressions on Gen12 platforms caused by
SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
and GFXBench5 Car Chase and Aztec Ruins.

The most obvious pattern in the regressing shaders I identified among
these workloads is that they all had non-uniform discard statements,
which are handled rather optimistically by the current IR analysis
pass: No penalty is currently applied to the SIMD32 variant of the
shader in the form of differing branching weights like we do for other
control flow instructions in order to account for the greater
likelihood of divergence of a SIMD32 shader.

Simply changing that by giving the same treatment to discard
statements as we give to other branching instructions seemed to hurt
more than it helped on platforms earlier than Gen12, since it reversed
most of the improvement obtained from SIMD32 fragment shaders in
Manhattan for no measurable benefit in other workloads (Manhattan has
a handful of shaders with statically non-uniform discard statements
which actually perform better in SIMD32 mode due to their approximate
dynamic uniformity).  For that reason this change is applied to Gen12+
platforms only.

I've been running a number of tests trying to understand the
difference in behavior between Gen12 and earlier platforms, and most
of the evidence I've gathered seems to point at EU fusion being the
culprit: Unlike previous generations, on Gen12 EUs are arranged in
pairs which execute instructions in lockstep, giving an effective warp
size of 64 threads in SIMD32 mode, which seems to increase the
likelihood for control flow divergence in some of the affected shaders
significantly.

Fixes: 188a3659ae "intel/ir: Import shader performance analysis pass."
Reported-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5910>
2020-07-23 01:40:06 +00:00
..
amd radeonsi: enable preemption if the kernel enabled it 2020-07-22 12:08:33 -04:00
broadcom nir: Add a face_sysval argument to nir_lower_two_sided_color 2020-07-17 14:50:26 +00:00
compiler nir/lower_io: Add support for global scratch addressing 2020-07-22 23:43:35 +00:00
drm-shim meson: use gnu_symbol_visibility argument 2020-06-01 18:59:18 +00:00
egl egl/dri2: try to bind old context if bindContext failed 2020-07-21 18:42:03 +00:00
etnaviv etnaviv: replace all dup() with os_dupfd_cloexec() 2020-06-18 02:09:56 +00:00
freedreno turnip: disable tiling for NV12/IYUV formats 2020-07-21 20:08:07 +00:00
gallium softpipe: Enable PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS; 2020-07-23 00:24:26 +00:00
gbm gbm: document that gbm_bo_map exposes a linear view 2020-06-03 10:09:52 +00:00
getopt
glx glx: Fix build and warnings with -Dglx=dri -Dglx-direct=false 2020-07-23 01:23:12 +00:00
gtest gtest: Update to 1.10.0 2020-04-20 11:57:11 +00:00
hgl scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00
imgui meson: drop intel_ prefix on imgui_core 2019-12-10 15:16:02 +00:00
intel intel/ir/gen12+: Work around FS performance regressions due to SIMD32 discard divergence. 2020-07-23 01:40:06 +00:00
loader Revert "loader/dri3: Check for window destruction in dri3_wait_for_event_locked" 2020-07-03 09:55:50 +00:00
mapi glx: Fix build and warnings with -Dglx=dri -Dglx-direct=false 2020-07-23 01:23:12 +00:00
mesa mesa/program: fix shadow property for samplers 2020-07-22 12:51:51 +00:00
panfrost pan/mdg: Use the blend RT for blend shader framebuffer fetches 2020-07-20 14:15:49 +00:00
util driconf: allowlist/denylist 2020-07-16 21:56:08 +00:00
vulkan meson: Add mising git_sha1.h dependency. 2020-07-22 00:02:26 +00:00
meson.build meson: use gnu_symbol_visibility argument 2020-06-01 18:59:18 +00:00
SConscript driconf: drop now unused translation facility 2020-06-22 21:50:12 +00:00