Commit graph

4960 commits

Author SHA1 Message Date
Lionel Landwerlin
5fdea9f401 anv: fix incorrect VMA alignment for CCS main surfaces
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Lionel Landwerlin
dcfe1903c3 anv: fix missing gen12 handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d43 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Anuj Phogat
1a32fbd48c intel: Add pci-ids for Jasper Lake
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-09 12:22:57 -08:00
Anuj Phogat
11fdd5f52c intel: Add device info for 1x4x6 Jasper Lake
Also removing the FIXME comments after matching the numbers with
updated documentation.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-09 12:22:56 -08:00
Jason Ekstrand
0f60aa4037 anv: Re-emit all compute state on pipeline switch
It's a very odd case to hit in the real world.  However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.

Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-07 04:03:35 +00:00
Jason Ekstrand
bce1c3c668 anv: Re-capture all batch and state buffers
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.

Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-07 04:03:35 +00:00
Jason Ekstrand
865ffe4e02 anv: Return VK_ERROR_OUT_OF_DEVICE_MEMORY for too-large buffers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 22:32:05 +00:00
Jason Ekstrand
f9a3d9738b anv: Use BO fences/semaphores for AcquireNextImage
Instead of doing a dummy submit on the command buffer for the fence or a
dummy semaphore and trusting in implicit sync, this commit moves us to
take advantage of implicit sync and just use the WSI image BO as the
fence.  Both semaphores and fences require a tiny bit of extra plumbing
to do this but the result is that we can get rid of a bunch of the extra
synchronization we're doing today.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
ecc119a96e anv: Add a fence_reset_reset_temporary helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
ccb7d606f1 anv: Use submit-time implicit sync instead of allocate-time
In 83b943cc2f, we started making all VkDeviceMemory BOs resident all
the time.  One unfortunate side-effect of this is that every
vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which
means that X server or Wayland compositor, instead of waiting on the
last vkQueueSubmit to actually write the buffer, now waits on the last
vkQueueSubmit to from that driver instance relative to whenever the
compositor's GL driver instance calls execbuf.  This potentially leads
to a lot of extra synchronization that we didn't intend to have.

Instead, this commit makes it so that we leave WSI memory objects with
EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and
set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of
vkQueuePresent.  This should hopefully result in tighter integration
with the compositor, lower latency, and better performance.

Testing with DOOM 2016, this seems to reduce latency by at least a frame
if not two and makes the game much more responsive.  Testing was,
however, subjective, so we don't have any hard data on that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
6ebf677cfd anv: Always add in EXEC_OBJECT_WRITE when specified in extra_flags
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called.  This
is likely true for fences but it seems somewhat fragile.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-06 19:58:07 +00:00
Jason Ekstrand
1b6991ba1d anv: Implement VK_KHR_buffer_device_address
The primary difference between the KHR and EXT versions of the extension
is that the KHR provides the address at AllocateMemory time for replay
so we can replay it safely without moving to a sparse address model.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
4428cd9127 anv: Use a pNext loop in AllocateMemory
This function has a lot of possible extensions and some of them we can
easily handle on-the-fly so it's easier to just have a loop than to find
each structure manually.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
a8e59b3708 anv: Add allocator support for client-visible addresses
When a BO is flagged as having a client visible address, we put it in
its own heap.  We also support the client explicitly specifying an
address in said heap.  If an address collision happens, we return false
from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
03450e9cfc anv: Add an explicit_address parameter to anv_device_alloc_bo
We already have a mechanism for specifying that we want a fixed address
provided by the driver internals.  We're about to let the client start
specifying addresses in some very special scenarios as well so we want
to pass this through to the allocation function.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
597fdb9e21 anv: Stop advertising two heaps just for the VF cache WA
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
b47bc0202a anv: Set up VMA heaps independently from memory heaps
Our VMA allocations are really independent from the memory heaps we
expose via the API.  The only thing that really matters is the GTT size
so we can make the high heap the right size.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
1037b52cf4 anv: Stop tracking VMA allocations
util_vma_heap_alloc will already return 0 if it doesn't have enough
space.  The only thing the vma_*_available tracking was doing was
preventing us from allocating too much on any given heap.  Now that
we're tracking that in the heap itself, we can drop these.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
a4e3d8f0db anv: Disallow allocating above heap sizes
We're already tracking the amount of memory used in each heap.  This
commit just makes us start rejecting memory allocations if the heap
would grow too large.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
0a36fafa95 anv: Don't leak when set_tiling fails
Fixes: a44744e01d "anv: Require a dedicated allocation for..."
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
46af0ecc1d anv: Use PIPE_CONTROL flushes to implement the gen8 VF cache WA
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
1b5cb92b62 anv: Apply cache flushes after setting index/draw VBs
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
7ce39a55c1 anv: Always invalidate the VF cache in BeginCommandBuffer
I think the reason why we only do this for primaries is that we didn't
expect to have blorp calls in secondaries.  However, you are allowed to
have a full render pass in a secondary command buffer so resolves and
clears can end up in there.  We should just always invalidate.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
a500a6b7f1 blorp: Pass the VB size to the VF cache workaround
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
c142a40a92 anv: Add a has_softpin boolean
This separates "has" from "use" which will make the next commit a bit
cleaner.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Jason Ekstrand
0bba88081b anv: Drop bo_flags from anv_bo_pool
In ee77938733, we started using the BO cache for anv_bo_pool and
stopped using the bo_flags parameter.  However, we never dropped it from
the struct or the init function.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:58:14 -06:00
Tapani Pälli
32ebd4207a intel/compiler: force simd8 when dual src blending on gen8
Patch introduces option to force simd8 and uses it as a workaround for
dual source blending issues seen with skqp (skia testsuite) on gen8.

Fixes following Piglit test on gen8 platforms:
   arb_blend_func_extended-dual-src-blending-issue-1917

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1917
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
c: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 09:42:50 +02:00
Tapani Pälli
f6004bac1f intel/compiler: add newline to limit_dispatch_width message
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 08:13:58 +02:00
Ian Romanick
c9acf0739f anv: Fix error message format string
See also 246261f0ad

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455892
Fixes: 246261f0ad ("anv: prepare the driver for delayed submissions")
2019-12-04 15:34:03 -08:00
Ian Romanick
668635abd2 intel/compiler: Fix 'comparison is always true' warning
Without looking at the assembly or something, I'm not sure what the
compiler does here.  The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t.  That's the only reason I
could think that comparing with -1 would be always true.

This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.

src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1515 |       return *type != -1;
      |                    ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1518 |       return *type != -1;
      |                    ^~

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
2019-12-04 15:34:03 -08:00
Rafael Antognolli
d3e339364f anv: Use 3DSTATE_CONSTANT_ALL when possible.
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).

There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.

v2:
 - Rebased on top of the lasted changes from Jason.
 - Added review suggestions by Caio.
 - Removed struct push_bos and merged some code into
 anv_nir_compute_push_layout().

v3:
 - Remove code churn due to gen8+ workaround in
 anv_nir_compute_push_layout(). This code has been removed in an earlier
 commit, and implemented in cmd_buffer_emit_push_constant().

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
7d5da53d27 anv: Move code for emitting push constants into its own function.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
67d2cb3e93 anv: Add get_push_range_address() helper.
Add a helper function to get the push range address. Once we have a
separate function for emitting gen12 push constants, we can use this
helper and avoid duplicating code.

v3: Do not add range->start to the address in gen7 (Caio).
v4: Do not drop range->start from gen7 (Caio, Jason).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
c0225a728e anv: Move gen8+ push constant packet workaround.
Store push_ranges in ascending order, and only "shift" them to the end
of the array during state packet emission.

We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet.
So instead of applying the workaround here just for GEN < 12 (which
requires and extra loop through all the ranges to figure out if we
should shift them or not), we simply move the whole logic to the state
emission code. At that point, in a later commit, we are already looping
through all of the ranges anyway to check which packet we will be using,
so we might as well implement the workaround there, where it is going to
be used.

v3: Move gen8+ workaround to the state emission code (Caio).
v4: Add explanation of why we moved the workaroudn (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
9db044792f intel/blorp: Use 3DSTATE_CONSTANT_ALL to setup push constants.
In blorp, all the push constants are disabled, so we only need to emit a
single 3DSTATE_CONSTANT_ALL with the bitmask for stage update
appropriately set.

v2: Update comment (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
8983622995 intel/aubinator: Decode 3DSTATE_CONSTANT_ALL.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
2d127614a2 intel/genxml: Add 3DSTATE_CONSTANT_ALL packet.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Lionel Landwerlin
ddacd3d43b intel/perf: fix improper pointer access
This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
8c0b058263 intel/perf: simplify the processing of OA reports
This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
b364e920bf intel/perf: take into account that reports read can be fairly old
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
9d0a5c817c intel/perf: set read buffer len to 0 to identify empty buffer
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Lionel Landwerlin
acea59dbf8 intel/perf: fix invalid hw_id in query results
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-04 09:21:15 +00:00
Jason Ekstrand
178a2946c0 anv: Respect the always_flush_cache driconf option
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-03 17:10:51 -06:00
Jason Ekstrand
b1f37688ba anv: Set up SBE_SWIZ properly for gl_Viewport
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-03 16:20:50 +00:00
Ian Romanick
d15344c0f5 intel/compiler: Increase nir_opt_peephole_select threshold
I tried 2, 4, 6, 8, and 10.  8 seemed to be the sweet spot across all
Intel platforms.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14736141 -> 14661140 (-0.51%)
instructions in affected programs: 2272413 -> 2197412 (-3.30%)
helped: 8416
HURT: 140
helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6
helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20%
HURT stats (abs)   min: 1 max: 140 x̄: 4.73 x̃: 1
HURT stats (rel)   min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60%
95% mean confidence interval for instructions value: -9.36 -8.17
95% mean confidence interval for instructions %-change: -4.14% -3.99%
Instructions are helped.

total cycles in shared programs: 231560416 -> 228585416 (-1.28%)
cycles in affected programs: 126536021 -> 123561021 (-2.35%)
helped: 7092
HURT: 1898
helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159
helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77%
HURT stats (abs)   min: 1 max: 14518 x̄: 371.91 x̃: 36
HURT stats (rel)   min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55%
95% mean confidence interval for cycles value: -514.34 -147.50
95% mean confidence interval for cycles %-change: -9.69% -9.14%
Cycles are helped.

total spills in shared programs: 5763 -> 5848 (1.47%)
spills in affected programs: 1797 -> 1882 (4.73%)
helped: 13
HURT: 13

total fills in shared programs: 17163 -> 16931 (-1.35%)
fills in affected programs: 7214 -> 6982 (-3.22%)
helped: 22
HURT: 19

total sends in shared programs: 730410 -> 730246 (-0.02%)
sends in affected programs: 2705 -> 2541 (-6.06%)
helped: 114
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88%
95% mean confidence interval for sends value: -1.55 -1.33
95% mean confidence interval for sends %-change: -7.90% -6.62%
Sends are helped.

LOST:   4
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10760511 -> 10724637 (-0.33%)
instructions in affected programs: 961305 -> 925431 (-3.73%)
helped: 3734
HURT: 110
helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8
helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95%
HURT stats (abs)   min: 1 max: 20 x̄: 1.68 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52%
95% mean confidence interval for instructions value: -9.76 -8.91
95% mean confidence interval for instructions %-change: -4.90% -4.63%
Instructions are helped.

total cycles in shared programs: 153359411 -> 152991077 (-0.24%)
cycles in affected programs: 11615401 -> 11247067 (-3.17%)
helped: 2725
HURT: 1138
helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80
helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91%
HURT stats (abs)   min: 1 max: 4351 x̄: 69.69 x̃: 25
HURT stats (rel)   min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47%
95% mean confidence interval for cycles value: -103.18 -87.52
95% mean confidence interval for cycles %-change: -4.57% -3.97%
Cycles are helped.

total sends in shared programs: 584038 -> 583855 (-0.03%)
sends in affected programs: 3512 -> 3329 (-5.21%)
helped: 157
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1
helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06%
95% mean confidence interval for sends value: -1.26 -1.07
95% mean confidence interval for sends %-change: -7.17% -5.87%
Sends are helped.

LOST:   23
GAINED: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122617 -> 8111592 (-0.14%)
instructions in affected programs: 380503 -> 369478 (-2.90%)
helped: 912
HURT: 86
helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9
helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36%
95% mean confidence interval for instructions value: -12.00 -10.10
95% mean confidence interval for instructions %-change: -3.56% -3.10%
Instructions are helped.

total cycles in shared programs: 188509780 -> 188534398 (0.01%)
cycles in affected programs: 7211542 -> 7236160 (0.34%)
helped: 859
HURT: 132
helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16
helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33%
HURT stats (abs)   min: 2 max: 1592 x̄: 489.67 x̃: 618
HURT stats (rel)   min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26%
95% mean confidence interval for cycles value: 9.58 40.10
95% mean confidence interval for cycles %-change: 0.65% 2.93%
Cycles are HURT.
2019-12-02 16:46:20 -08:00
Jason Ekstrand
a8965c076b anv: Push constants are relative to dynamic state on IVB
Fixes: aecde2351 "anv: Pre-compute push ranges for graphics pipelines"
Closes: #2136
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-26 22:15:54 +00:00
Jason Ekstrand
854859fefa anv/entrypoints: Better handle promoted extensions
In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support.  For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones.  We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-26 02:48:42 +00:00
Samuel Pitoiset
d6db858771 meson: only build imgui when needed
Only required for Intel tools or the Vulkan overlay layer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-11-25 07:51:56 +00:00
Ian Romanick
e51eda99df intel/fs: Disable conditional discard optimization on Gen4 and Gen5
The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk.  Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.

Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated.  This results in a sequence like:

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD     g8<8,8,1>UD

instead of

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
        or(16) g4<1>UD g4<8,8,1>UD     g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD     1UD

I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66 ("intel/fs: Improve discard_if code generation")

Iron Lake
total instructions in shared programs: 8122757 -> 8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.

total cycles in shared programs: 188510100 -> 188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs)   min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel)   min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.

GM45
total instructions in shared programs: 4994403 -> 4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.

total cycles in shared programs: 128928750 -> 128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs)   min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel)   min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.
2019-11-21 16:40:50 -08:00
Jason Ekstrand
2fca325ea6 Revert "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
This reverts commit 52c7df1643.  The pass,
while clearly useful for some shaders, has at least three bugs that I
was able to find fairly quickly:

 1. It doesn't work for type-converting MOVs because f > 0 is not the
    same as f2i(f) > 0

 2. CSEL is a 3src instruction and only supports one source type; it
    doesn't take this into account and tries to create instructions
    which do a F compare and a D select.  This is especially nasty to
    debug because you don't see that in the dumped assembly because we
    don't properly assert that types are the same in codegen.

 3. While you can handle 2, in theory, by reinterpreting types, you
    can't do that in the presence of source modifiers.  This pass
    doesn't even attempt to detect that.

Those are just the ones I found with the one almost trival shader I was
debugging.  There very likely may be more and.  Best thing to do for now
is just shut it off until someone has the time to figure out how to do
this properly and write tests to ensure it's correct.

Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-11-20 20:47:32 +00:00