mesa/src/intel
Francisco Jerez d4015bcb38 intel/fs: Fix copy propagation dataflow analysis in presence of force_writemask_all ACP overwrites.
This fixes the behavior of copy propagation in cases where either the
source or destination of an ACP is overwritten elsewhere in the
program by a force_writemask_all instruction, which could cause the
overwrite to be executed for an inactive channel under non-uniform
control flow, causing the current per-channel dataflow propagation to
give incorrect results.  This has been reported in cases like:

> while (true) {
>    x = imageSize(img);
>    if (non_uniform_condition()) {
>       y = x;
>       break;
>    }
> }
> use(y);

Currently the copy propagation pass would propagate copy 'y = x' into
'use(y)', which is invalid since in the example above imageSize() is
implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior
of the program.

This patch extends the global dataflow analysis algorithm to determine
whether there is any control flow path from a given copy to an
overwrite of its source or destination which has force_writemask_all
behavior inconsistent with the copy, and in such case prevents copy
propagation for that ACP entry at any point of the program which can
be reached from the overwrite, even if the copy is statically
re-executed along all such control flow paths (as in the example
above), since the execution of the overwrite for a given channel i may
corrupt other channels j!=i inactive for the subsequently re-executed
copy.

Note that a simpler solution has been attempted which fully shuts down
copy propagation if such a force_writemask_all ACP overwrite is
present /anywhere/ in the program regardless of its location in the
control flow graph, however that led to large shader-db regressions in
some programs from shader-db (like a CS from Car Chase which would
emit 53% more instructions).  With this solution the only handful of
shaders that suffer instruction count regressions seem to be getting
misoptimized right now (e.g. some compute shaders from Deus Ex
Mankind).  This solution doesn't seem to affect the run-time of
shader-db significantly, it's less than 1% higher with the fix
applied.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:20 -07:00
..
blorp intel/blorp: Set array_len for 3D images properly 2023-03-04 06:12:46 +00:00
ci ci/iris: Add skips for slow tests on APL. 2023-03-15 08:15:37 +00:00
common intel/common: Implement the Xe functions for intel_gem 2023-03-07 15:41:36 +00:00
compiler intel/fs: Fix copy propagation dataflow analysis in presence of force_writemask_all ACP overwrites. 2023-03-17 03:05:20 -07:00
dev intel/dev: Enable MTL PCI ids 2023-03-13 10:17:51 +00:00
ds iris: trace frames with u_trace 2023-03-10 00:36:41 +00:00
genxml anv/video: fix chroma qp to be a integer value. 2023-03-14 07:32:00 +00:00
isl meson: inline gtest_test_protocol now that it's always 'gtest' 2023-03-10 07:20:29 +00:00
nullhw-layer vulkan/layers: Use PUBLIC instead of VK_LAYER_EXPORT 2023-02-17 03:42:34 +00:00
perf intel/perf: Hide extended metrics by default 2023-03-11 05:05:06 +00:00
tools intel/dev: create a helper dependency for libintel_dev 2023-03-02 00:01:27 +00:00
vulkan anv: implement TES distribution mode WA 22012785325 2023-03-16 14:42:53 +00:00
vulkan_hasvk intel: use generated workaround helpers for Wa_1409600907 2023-03-09 22:56:51 +00:00
meson.build blorp: add dependency on idep_intel_dev 2023-03-03 13:04:23 +00:00