Commit graph

4965 commits

Author SHA1 Message Date
Eric Engestrom
b2dac806f8 intel: add mi_builder_test for gen12
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-11 15:38:19 +00:00
Kenneth Graunke
69d7782b15 intel/decoder: Make get_state_size take a full 64-bit address and a base
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.

iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone,  So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
2019-12-10 19:10:49 -08:00
Kenneth Graunke
0f2f561a10 anv: Enable Gen11 Color/Z write merging optimization
TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:46 -08:00
Kenneth Graunke
0b74f85870 intel/genxml: Add a partial TCCNTLREG definition
TCCNTLREG contains additional cache programming settings.  In
particular, there are several write combining controls we'd like to use.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:33 -08:00
Jason Ekstrand
41691ac016 ANV: Stop advertising smoothLines support on gen10+
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
2019-12-10 20:13:56 +00:00
Lionel Landwerlin
5fdea9f401 anv: fix incorrect VMA alignment for CCS main surfaces
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Lionel Landwerlin
dcfe1903c3 anv: fix missing gen12 handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d43 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Anuj Phogat
1a32fbd48c intel: Add pci-ids for Jasper Lake
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-09 12:22:57 -08:00
Anuj Phogat
11fdd5f52c intel: Add device info for 1x4x6 Jasper Lake
Also removing the FIXME comments after matching the numbers with
updated documentation.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-09 12:22:56 -08:00
Jason Ekstrand
0f60aa4037 anv: Re-emit all compute state on pipeline switch
It's a very odd case to hit in the real world.  However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.

Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-07 04:03:35 +00:00
Jason Ekstrand
bce1c3c668 anv: Re-capture all batch and state buffers
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.

Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-07 04:03:35 +00:00
Jason Ekstrand
865ffe4e02 anv: Return VK_ERROR_OUT_OF_DEVICE_MEMORY for too-large buffers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 22:32:05 +00:00
Jason Ekstrand
f9a3d9738b anv: Use BO fences/semaphores for AcquireNextImage
Instead of doing a dummy submit on the command buffer for the fence or a
dummy semaphore and trusting in implicit sync, this commit moves us to
take advantage of implicit sync and just use the WSI image BO as the
fence.  Both semaphores and fences require a tiny bit of extra plumbing
to do this but the result is that we can get rid of a bunch of the extra
synchronization we're doing today.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
ecc119a96e anv: Add a fence_reset_reset_temporary helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
ccb7d606f1 anv: Use submit-time implicit sync instead of allocate-time
In 83b943cc2f, we started making all VkDeviceMemory BOs resident all
the time.  One unfortunate side-effect of this is that every
vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which
means that X server or Wayland compositor, instead of waiting on the
last vkQueueSubmit to actually write the buffer, now waits on the last
vkQueueSubmit to from that driver instance relative to whenever the
compositor's GL driver instance calls execbuf.  This potentially leads
to a lot of extra synchronization that we didn't intend to have.

Instead, this commit makes it so that we leave WSI memory objects with
EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and
set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of
vkQueuePresent.  This should hopefully result in tighter integration
with the compositor, lower latency, and better performance.

Testing with DOOM 2016, this seems to reduce latency by at least a frame
if not two and makes the game much more responsive.  Testing was,
however, subjective, so we don't have any hard data on that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
6ebf677cfd anv: Always add in EXEC_OBJECT_WRITE when specified in extra_flags
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called.  This
is likely true for fences but it seems somewhat fragile.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
1b6991ba1d anv: Implement VK_KHR_buffer_device_address
The primary difference between the KHR and EXT versions of the extension
is that the KHR provides the address at AllocateMemory time for replay
so we can replay it safely without moving to a sparse address model.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
4428cd9127 anv: Use a pNext loop in AllocateMemory
This function has a lot of possible extensions and some of them we can
easily handle on-the-fly so it's easier to just have a loop than to find
each structure manually.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
a8e59b3708 anv: Add allocator support for client-visible addresses
When a BO is flagged as having a client visible address, we put it in
its own heap.  We also support the client explicitly specifying an
address in said heap.  If an address collision happens, we return false
from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
03450e9cfc anv: Add an explicit_address parameter to anv_device_alloc_bo
We already have a mechanism for specifying that we want a fixed address
provided by the driver internals.  We're about to let the client start
specifying addresses in some very special scenarios as well so we want
to pass this through to the allocation function.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
597fdb9e21 anv: Stop advertising two heaps just for the VF cache WA
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
b47bc0202a anv: Set up VMA heaps independently from memory heaps
Our VMA allocations are really independent from the memory heaps we
expose via the API.  The only thing that really matters is the GTT size
so we can make the high heap the right size.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
1037b52cf4 anv: Stop tracking VMA allocations
util_vma_heap_alloc will already return 0 if it doesn't have enough
space.  The only thing the vma_*_available tracking was doing was
preventing us from allocating too much on any given heap.  Now that
we're tracking that in the heap itself, we can drop these.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
a4e3d8f0db anv: Disallow allocating above heap sizes
We're already tracking the amount of memory used in each heap.  This
commit just makes us start rejecting memory allocations if the heap
would grow too large.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
0a36fafa95 anv: Don't leak when set_tiling fails
Fixes: a44744e01d "anv: Require a dedicated allocation for..."
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
46af0ecc1d anv: Use PIPE_CONTROL flushes to implement the gen8 VF cache WA
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
1b5cb92b62 anv: Apply cache flushes after setting index/draw VBs
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
7ce39a55c1 anv: Always invalidate the VF cache in BeginCommandBuffer
I think the reason why we only do this for primaries is that we didn't
expect to have blorp calls in secondaries.  However, you are allowed to
have a full render pass in a secondary command buffer so resolves and
clears can end up in there.  We should just always invalidate.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
a500a6b7f1 blorp: Pass the VB size to the VF cache workaround
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
c142a40a92 anv: Add a has_softpin boolean
This separates "has" from "use" which will make the next commit a bit
cleaner.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
0bba88081b anv: Drop bo_flags from anv_bo_pool
In ee77938733, we started using the BO cache for anv_bo_pool and
stopped using the bo_flags parameter.  However, we never dropped it from
the struct or the init function.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:58:14 -06:00
Tapani Pälli
32ebd4207a intel/compiler: force simd8 when dual src blending on gen8
Patch introduces option to force simd8 and uses it as a workaround for
dual source blending issues seen with skqp (skia testsuite) on gen8.

Fixes following Piglit test on gen8 platforms:
   arb_blend_func_extended-dual-src-blending-issue-1917

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1917
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
c: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 09:42:50 +02:00
Tapani Pälli
f6004bac1f intel/compiler: add newline to limit_dispatch_width message
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 08:13:58 +02:00
Ian Romanick
c9acf0739f anv: Fix error message format string
See also 246261f0ad

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455892
Fixes: 246261f0ad ("anv: prepare the driver for delayed submissions")
2019-12-04 15:34:03 -08:00
Ian Romanick
668635abd2 intel/compiler: Fix 'comparison is always true' warning
Without looking at the assembly or something, I'm not sure what the
compiler does here.  The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t.  That's the only reason I
could think that comparing with -1 would be always true.

This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.

src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1515 |       return *type != -1;
      |                    ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1518 |       return *type != -1;
      |                    ^~

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
2019-12-04 15:34:03 -08:00
Rafael Antognolli
d3e339364f anv: Use 3DSTATE_CONSTANT_ALL when possible.
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).

There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.

v2:
 - Rebased on top of the lasted changes from Jason.
 - Added review suggestions by Caio.
 - Removed struct push_bos and merged some code into
 anv_nir_compute_push_layout().

v3:
 - Remove code churn due to gen8+ workaround in
 anv_nir_compute_push_layout(). This code has been removed in an earlier
 commit, and implemented in cmd_buffer_emit_push_constant().

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
7d5da53d27 anv: Move code for emitting push constants into its own function.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
67d2cb3e93 anv: Add get_push_range_address() helper.
Add a helper function to get the push range address. Once we have a
separate function for emitting gen12 push constants, we can use this
helper and avoid duplicating code.

v3: Do not add range->start to the address in gen7 (Caio).
v4: Do not drop range->start from gen7 (Caio, Jason).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
c0225a728e anv: Move gen8+ push constant packet workaround.
Store push_ranges in ascending order, and only "shift" them to the end
of the array during state packet emission.

We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet.
So instead of applying the workaround here just for GEN < 12 (which
requires and extra loop through all the ranges to figure out if we
should shift them or not), we simply move the whole logic to the state
emission code. At that point, in a later commit, we are already looping
through all of the ranges anyway to check which packet we will be using,
so we might as well implement the workaround there, where it is going to
be used.

v3: Move gen8+ workaround to the state emission code (Caio).
v4: Add explanation of why we moved the workaroudn (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
9db044792f intel/blorp: Use 3DSTATE_CONSTANT_ALL to setup push constants.
In blorp, all the push constants are disabled, so we only need to emit a
single 3DSTATE_CONSTANT_ALL with the bitmask for stage update
appropriately set.

v2: Update comment (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
8983622995 intel/aubinator: Decode 3DSTATE_CONSTANT_ALL.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
2d127614a2 intel/genxml: Add 3DSTATE_CONSTANT_ALL packet.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Lionel Landwerlin
ddacd3d43b intel/perf: fix improper pointer access
This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
8c0b058263 intel/perf: simplify the processing of OA reports
This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
b364e920bf intel/perf: take into account that reports read can be fairly old
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
9d0a5c817c intel/perf: set read buffer len to 0 to identify empty buffer
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
acea59dbf8 intel/perf: fix invalid hw_id in query results
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Jason Ekstrand
178a2946c0 anv: Respect the always_flush_cache driconf option
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-03 17:10:51 -06:00
Jason Ekstrand
b1f37688ba anv: Set up SBE_SWIZ properly for gl_Viewport
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-03 16:20:50 +00:00
Ian Romanick
d15344c0f5 intel/compiler: Increase nir_opt_peephole_select threshold
I tried 2, 4, 6, 8, and 10.  8 seemed to be the sweet spot across all
Intel platforms.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14736141 -> 14661140 (-0.51%)
instructions in affected programs: 2272413 -> 2197412 (-3.30%)
helped: 8416
HURT: 140
helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6
helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20%
HURT stats (abs)   min: 1 max: 140 x̄: 4.73 x̃: 1
HURT stats (rel)   min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60%
95% mean confidence interval for instructions value: -9.36 -8.17
95% mean confidence interval for instructions %-change: -4.14% -3.99%
Instructions are helped.

total cycles in shared programs: 231560416 -> 228585416 (-1.28%)
cycles in affected programs: 126536021 -> 123561021 (-2.35%)
helped: 7092
HURT: 1898
helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159
helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77%
HURT stats (abs)   min: 1 max: 14518 x̄: 371.91 x̃: 36
HURT stats (rel)   min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55%
95% mean confidence interval for cycles value: -514.34 -147.50
95% mean confidence interval for cycles %-change: -9.69% -9.14%
Cycles are helped.

total spills in shared programs: 5763 -> 5848 (1.47%)
spills in affected programs: 1797 -> 1882 (4.73%)
helped: 13
HURT: 13

total fills in shared programs: 17163 -> 16931 (-1.35%)
fills in affected programs: 7214 -> 6982 (-3.22%)
helped: 22
HURT: 19

total sends in shared programs: 730410 -> 730246 (-0.02%)
sends in affected programs: 2705 -> 2541 (-6.06%)
helped: 114
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88%
95% mean confidence interval for sends value: -1.55 -1.33
95% mean confidence interval for sends %-change: -7.90% -6.62%
Sends are helped.

LOST:   4
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10760511 -> 10724637 (-0.33%)
instructions in affected programs: 961305 -> 925431 (-3.73%)
helped: 3734
HURT: 110
helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8
helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95%
HURT stats (abs)   min: 1 max: 20 x̄: 1.68 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52%
95% mean confidence interval for instructions value: -9.76 -8.91
95% mean confidence interval for instructions %-change: -4.90% -4.63%
Instructions are helped.

total cycles in shared programs: 153359411 -> 152991077 (-0.24%)
cycles in affected programs: 11615401 -> 11247067 (-3.17%)
helped: 2725
HURT: 1138
helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80
helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91%
HURT stats (abs)   min: 1 max: 4351 x̄: 69.69 x̃: 25
HURT stats (rel)   min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47%
95% mean confidence interval for cycles value: -103.18 -87.52
95% mean confidence interval for cycles %-change: -4.57% -3.97%
Cycles are helped.

total sends in shared programs: 584038 -> 583855 (-0.03%)
sends in affected programs: 3512 -> 3329 (-5.21%)
helped: 157
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1
helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06%
95% mean confidence interval for sends value: -1.26 -1.07
95% mean confidence interval for sends %-change: -7.17% -5.87%
Sends are helped.

LOST:   23
GAINED: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122617 -> 8111592 (-0.14%)
instructions in affected programs: 380503 -> 369478 (-2.90%)
helped: 912
HURT: 86
helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9
helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36%
95% mean confidence interval for instructions value: -12.00 -10.10
95% mean confidence interval for instructions %-change: -3.56% -3.10%
Instructions are helped.

total cycles in shared programs: 188509780 -> 188534398 (0.01%)
cycles in affected programs: 7211542 -> 7236160 (0.34%)
helped: 859
HURT: 132
helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16
helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33%
HURT stats (abs)   min: 2 max: 1592 x̄: 489.67 x̃: 618
HURT stats (rel)   min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26%
95% mean confidence interval for cycles value: 9.58 40.10
95% mean confidence interval for cycles %-change: 0.65% 2.93%
Cycles are HURT.
2019-12-02 16:46:20 -08:00