mesa/src/intel
Jason Ekstrand 5abac85177 intel/fs: Rework scratch handling on Gen9+
The current scratch mechanism uses an MRF hack where we reserve a few
GRF registers to treat like the MRF and we collect the data into that
MRF region before doing a scratch write.  We also use that region for
the header for scratch reads.

This commit changes things and gets rid of the MRF hack.  Instead, we
reserve a single register (which RA is free to pick) for the scratch
header and uses split sends for scratch writes to avoid having to do
the copy.  This should provide RA with more freedom in the presence of
spilling as well as avoid some unnecessary data moves.  In future, the
new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own
per-thread scratch base address calculations rather than depending on
the scratch base address that gets pushed into g0.  Having an opcode for
this lets us do it once at the top of the shader rather than repeating
it at every read/write.

One other noticeable difference is the use of SHADER_OPCODE_SEND.  We
can get away with this thanks to the fact that we're now using a set to
track which instructions are generated by spills and don't rely on the
opcodes to find spill/fill instructions.  This allows us to avoid adding
more virtual opcodes and let the normal code paths handle things like
scoreboard dependencies between header setup and the SEND.  It also
means that post-RA scheduling may be able to space out the header setup
MOV and the SEND for better latency hiding.

Shader-db results on Skylake:

    total spills in shared programs: 12137 -> 10604 (-12.63%)
    spills in affected programs: 6685 -> 5152 (-22.93%)
    helped: 274
    HURT: 2

    total fills in shared programs: 13065 -> 11515 (-11.86%)
    fills in affected programs: 9007 -> 7457 (-17.21%)
    helped: 275
    HURT: 1

Shader-db results on Ice Lake:

    total spills in shared programs: 12482 -> 10953 (-12.25%)
    spills in affected programs: 6586 -> 5057 (-23.22%)
    helped: 275
    HURT: 0

    total fills in shared programs: 12819 -> 11234 (-12.36%)
    fills in affected programs: 7867 -> 6282 (-20.15%)
    helped: 274
    HURT: 0

Shader-db results on Tigerlake:

    total spills in shared programs: 11689 -> 10233 (-12.46%)
    spills in affected programs: 4740 -> 3284 (-30.72%)
    helped: 259
    HURT: 0

    total fills in shared programs: 10840 -> 9443 (-12.89%)
    fills in affected programs: 6244 -> 4847 (-22.37%)
    helped: 259
    HURT: 0

Fossil-db results on Ice Lake:

    Spills in all programs: 245249 -> 201633 (-17.8%)
    Fills in all programs: 366066 -> 314368 (-14.1%)

More practically, this seems to give about a 0.5-1% perf boost in
Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
..
blorp intel/blorp: Conditionally clear full surface depth and stencil 2020-10-01 16:23:10 +00:00
common intel/batch_decoder: Don't clame vec4 vs/gs/tcs shaders on Gen11+ 2020-10-13 21:59:27 +00:00
compiler intel/fs: Rework scratch handling on Gen9+ 2020-10-13 21:59:27 +00:00
dev intel/dev: fix 32bit build issue 2020-10-08 05:42:31 +00:00
genxml intel/genxml: replace gen_sort_tags.py MIT licence with SPDX equivalent 2020-06-13 01:16:17 +00:00
isl intel/isl: Add YUV format info for the aux-map 2020-09-09 20:02:03 +00:00
perf intel: drop likely/unlikely around INTEL_DEBUG 2020-10-06 18:43:07 +00:00
tools intel: Add support for i945g to intel_stub_gpu. 2020-09-29 19:53:22 +00:00
vulkan radv,anv: use CLOCK_MONOTONIC_FAST when CLOCK_MONOTONIC_RAW is undefined 2020-10-09 09:49:20 +00:00
Android.blorp.mk
Android.common.mk intel: split driver/device UUID generators 2020-10-07 11:11:23 +03:00
Android.compiler.mk
Android.dev.mk intel: add identifier for debug purposes 2020-05-20 15:58:22 +00:00
Android.genxml.mk intel/genxml: generate pack files for gen12 on android builds 2019-08-28 13:38:33 -07:00
Android.isl.mk isl: Fix the android build. 2020-02-05 21:31:40 -08:00
Android.mk i965: extract performance query metrics 2019-04-17 14:10:42 +01:00
Android.perf.mk i965: extract performance query metrics 2019-04-17 14:10:42 +01:00
Android.vulkan.mk anv/android: setup gralloc1 usage from gralloc0 usage manually 2020-01-28 14:46:25 +02:00
Makefile.perf.am i965: extract performance query metrics 2019-04-17 14:10:42 +01:00
Makefile.sources intel: split driver/device UUID generators 2020-10-07 11:11:23 +03:00
meson.build meson: only build imgui when needed 2019-11-25 07:51:56 +00:00