mesa/src/intel
Francisco Jerez b54b67e067 intel/fs: Switch to standard vector layout for barycentrics at optimization time.
This involves permuting the registers of barycentric vectors to have
the standard X[0-n] Y[0-n] layout at NIR translation time.
Barycentrics are converted to the format expected by the PLN
instruction in the lower_barycentrics() pass run after the
optimization loop.

Main reason is correctness of SIMD32 fragment shaders.  The
shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used
during NIR translation are busted for SIMD32.  This leads to serious
corruption at present with INTEL_DEBUG=do32, especially on Gen11+
where these helpers are hit more frequently due to the lack of a
hardware PLN instruction.

Of course one could have chosen to fix those helpers instead, but
there is another far more subtle issue that was reported during review
of the SIMD32 fragment shader codegen changes: The SIMD splitting pass
currently handles SIMD32 barycentric vectors as if they had the
standard X[0-n] Y[0-n] layout, even though they are interleaved for
the PLN instruction, which causes incorrect execution masks to be
applied to the MOVs unzipping barycentric vectors in cases where a
LINTERP instruction occurs under non-uniform control flow.

I'm not aware of any conformance regressions due to the latter issue
at present, but for our peace of mind let's move the conversion to the
PLN layout into the lower_barycentrics() pass run after
lower_simd_width().

This leads to the following shader-db improvements (including SIMD32
shaders) in combination with the previous back-end preparation changes
-- Without them (especially the copy propagation changes) this would
lead to a massive number of regressions.  On ICL:

   total instructions in shared programs: 20662316 -> 20466903 (-0.95%)
   instructions in affected programs: 10538474 -> 10343061 (-1.85%)
   helped: 68775
   HURT: 6

   total spills in shared programs: 8938 -> 8748 (-2.13%)
   spills in affected programs: 376 -> 186 (-50.53%)
   helped: 9
   HURT: 5

   total fills in shared programs: 8965 -> 8663 (-3.37%)
   fills in affected programs: 965 -> 663 (-31.30%)
   helped: 9
   HURT: 6

   LOST:   146
   GAINED: 43

On SKL:

   total instructions in shared programs: 18725867 -> 18614912 (-0.59%)
   instructions in affected programs: 3876590 -> 3765635 (-2.86%)
   helped: 27492
   HURT: 2

   LOST:   191
   GAINED: 417

On SNB:

   total instructions in shared programs: 14573613 -> 13980646 (-4.07%)
   instructions in affected programs: 5199074 -> 4606107 (-11.41%)
   helped: 29998
   HURT: 0

   LOST:   21
   GAINED: 30

Results are somewhat less impressive but still significant without
SIMD32 fragment shaders enabled.  On ICL:

   total instructions in shared programs: 16148728 -> 16061659 (-0.54%)
   instructions in affected programs: 6114788 -> 6027719 (-1.42%)
   helped: 42046
   HURT: 6

   total spills in shared programs: 8218 -> 8028 (-2.31%)
   spills in affected programs: 376 -> 186 (-50.53%)
   helped: 9
   HURT: 5

   total fills in shared programs: 8953 -> 8651 (-3.37%)
   fills in affected programs: 965 -> 663 (-31.30%)
   helped: 9
   HURT: 6

   LOST:   0
   GAINED: 3

On SKL:

   total instructions in shared programs: 14927994 -> 14926738 (-0.01%)
   instructions in affected programs: 168850 -> 167594 (-0.74%)
   helped: 711
   HURT: 2

On SNB:

   total instructions in shared programs: 10770538 -> 10734403 (-0.34%)
   instructions in affected programs: 2702172 -> 2666037 (-1.34%)
   helped: 17818
   HURT: 0

All of the hurt shaders are either spilling slightly more or emitting
additional NOP instructions due to the SIMD16 POW workaround for
Gen8-9 combined with differences in scheduling.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-17 13:23:12 -08:00
..
blorp intel/blorp: Fill out all the dwords of MI_ATOMIC 2020-01-13 21:48:00 +00:00
common intel: add mi_builder_test for gen12 2019-12-11 15:38:19 +00:00
compiler intel/fs: Switch to standard vector layout for barycentrics at optimization time. 2020-01-17 13:23:12 -08:00
dev intel: Use similar brand strings to the Windows drivers 2020-01-13 19:42:35 -08:00
genxml genxml: add new Gen11+ PIPE_CONTROL field 2020-01-16 11:48:04 +02:00
isl intel/isl: Add MOCS settings to isl_device. 2019-11-12 20:41:52 +00:00
perf intel/perf: report query split for mdapi 2020-01-16 15:29:40 +02:00
tools intel/dump_gpu: handle context create extended ioctl 2019-10-30 21:58:31 +02:00
vulkan anv: enable VK_KHR_swapchain_mutable_format 2020-01-17 18:27:29 +00:00
Android.blorp.mk intel: android: remove libdrm_intel requirement 2017-03-30 19:07:23 +01:00
Android.common.mk android: static link with libexpat with Android O+ 2019-03-25 10:11:57 +02:00
Android.compiler.mk android: fix build issues with brw_nir_trig_workarounds.c 2017-10-04 07:39:05 +03:00
Android.dev.mk drm-uapi: use local files, not system libdrm 2019-02-14 11:20:00 +00:00
Android.genxml.mk intel/genxml: generate pack files for gen12 on android builds 2019-08-28 13:38:33 -07:00
Android.isl.mk intel/isl: build android libmesa_isl for gen12 2019-08-28 13:38:33 -07:00
Android.mk i965: extract performance query metrics 2019-04-17 14:10:42 +01:00
Android.perf.mk i965: extract performance query metrics 2019-04-17 14:10:42 +01:00
Android.vulkan.mk anv: implement VK_INTEL_performance_query 2019-10-23 05:41:15 +00:00
Makefile.perf.am i965: extract performance query metrics 2019-04-17 14:10:42 +01:00
Makefile.sources anv: Rework push constant handling 2019-11-18 18:35:14 +00:00
meson.build meson: only build imgui when needed 2019-11-25 07:51:56 +00:00