Commit graph

1093 commits

Author SHA1 Message Date
Jason Ekstrand
dcb6a68bb4 anv: Move shader hashing to anv_pipeline
Shader hashing is very closely related to shader compilation.  Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing.  The only
real change (other than the order of hashing) is that we now hash in the
shader stage.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
d6b8106eea anv/pass: Store the per-subpass view mask
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
e997f548de anv: Add the KHX_multiview boilerplate
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
0bed97006f anv/nir: Delete the apply_dynamic_offsets prototype
That pass hasn't existed since dd4db84640
but the prototype stuck around for no reason.

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-05-03 11:25:46 -07:00
Jason Ekstrand
6ef1bd4fa5 anv/tests: Create a dummy instance as well as device
This fixes crashes caused by 35e626bd0e
which made us start referencing the instance in the allocators.  With
this commit, the tests now happily pass again.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-05-01 17:06:40 -07:00
Chad Versace
85ca563b58 anv: Drop 'x11' prefix from non-X11 WSI funcs
Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The
functions are used by Wayland WSI too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2017-04-28 08:54:45 -07:00
Jason Ekstrand
ebd1bd6998 anv: Alphabetize KHR extensions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-28 07:41:03 -07:00
Jason Ekstrand
032861693e anv: Move queues, events, and semaphores to their own file
Things are about to get more complicated, especially as far as
semaphores are concerned.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
9bd1f03487 anv: Implement VK_KHX_external_memory_fd
This commit just exposes the memory handle type.  There's interesting we
need to do here for images.  So long as the user doesn't set any crazy
environment variables such as INTEL_DEBUG=nohiz, all of the compression
formats etc. should "just work" at least for opaque handle types.

v2 (chadv):
  - Rebase.
  - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
    handleType == 0.
  - Move handleType-independency comments out of handleType-switch, in
    vkGetPhysicalDeviceExternalBufferPropertiesKHX.  Reduces diff in
    future dma_buf patches.

Co-authored-with: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
818b857914 anv: Use the BO cache for DeviceMemory allocations
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
494d6f65a7 anv/allocator: Add a BO cache
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle.  We'll need this in order to support multiple-import of
memory objects and semaphores.

v2 (Jason Ekstrand):
 - Reject BO imports if the size doesn't match the prime fd size as
   reported by lseek().

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
5d25ac6a4b anv: Implement VK_KHX_external_memory
This is the trivial implementation that just exposes the extension
string but exposes zero external handle types.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Chad Versace
354ca7a1d4 anv: Implement VK_KHX_external_memory_capabilities
This is a complete but trivial implementation. It's trivial becasue We
support no external memory capabilities yet.  Most of the real work in
this commit is in reworking the UUIDs advertised by the driver.

v2 (chadv):
  - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR.
    Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of
    input structs, not the chain of output structs.
  - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the
    input chain and the output chain separately. Reduces diff in future
    dma_buf patches.

Co-authored-with: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
d4d9258b61 anv/physical_device: Rename uuid to pipeline_cache_uuid
We're about to have more UUIDs for different things so this one really
needs to be properly labeled.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
02767cb4ff anv: Refactor device_get_cache_uuid into physical_device_init_uuids
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
35e626bd0e anv: Set EXEC_OBJECT_ASYNC when available
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
bd3a9813b9 anv/cmd_buffer: Use the device allocator for QueueSubmit
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
2017-04-27 20:08:46 -07:00
Jason Ekstrand
c43b4bc85e anv: Don't place scratch buffers above the 32-bit boundary
This fixes rendering corruptions in DOOM.  Hopefully, it will also make
Jenkins a bit more stable as we've been seeing some random failures and
GPU hangs ever since turning on 48bit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620
Fixes: 651ec926fc "anv: Add support for 48-bit addresses"
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
2017-04-27 02:04:57 -07:00
Rafael Antognolli
9670124e31 genxml: Make BLEND_STATE command support variable length array.
We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.

For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.

With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.

v2:
   - Use designated initializers on blorp and remove 0 from
   initialization (Jason)
   - Default entries to disabled on Vulkan (Jason)
   - Rebase code.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 15:14:10 -07:00
Nanley Chery
b1066f7365 anv/pass: Delete anv_pass::subpass_attachments
This field has no users.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-24 13:30:50 -07:00
Jason Ekstrand
1e21d4227e anv/query: Use genxml for MI_MATH
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
2017-04-20 15:24:06 -07:00
Nanley Chery
d9d793696b anv/cmd_buffer: Disable CCS on BDW input attachments
The description under RENDER_SURFACE_STATE::RedClearColor says,

   For Sampling Engine Multisampled Surfaces and Render Targets:
    Specifies the clear value for the red channel.
   For Other Surfaces:
    This field is ignored.

This means that the sampler on BDW doesn't support CCS.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-17 16:47:38 -07:00
Lionel Landwerlin
d71efbe5f2 anv: blorp: flush memory after copy
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-17 14:45:57 -07:00
Jason Ekstrand
d2d6cf6c83 anv: Add the pci_id into the shader cache UUID
This prevents a user from using a cache created on one hardware
generation on a different one.  Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2017-04-14 17:41:07 -07:00
Jason Ekstrand
220974b38d anv/blorp: Properly handle VK_ATTACHMENT_UNUSED
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description.  The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere.  This commit should
fix all of them.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand
21d2ca72d8 anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand
02eca8b6f8 anv/cmd_buffer: Always set up a null surface state
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand
e1f6fb8021 anv/cmd_buffer: Flush the VF cache at the top of all primaries
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand
939337e49f anv/blorp: Flush the texture cache in UpdateBuffer
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand
475bab0330 anv: Limit VkDeviceMemory objects to 2GB
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-04-14 13:35:02 -07:00
Matt Turner
ab18578b03 anv: Only define wsi_cbs when VK_USE_PLATFORM_WAYLAND_KHR defined 2017-04-12 11:00:39 -07:00
Juan A. Suarez Romero
8d7a82ae32 anv: remove needless VALGRIND_MAKE_MEM_DEFINED
This is already invoked in the following VG_NOACCESS_READ() call.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-11 17:21:57 +02:00
Jason Ekstrand
f93dc5beee anv: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand
bc68aa42bd anv: Use subpass dependencies for flushes
Instead of figuring it all out ourselves, just use the information given
to us by the client.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
e5bbf8be36 anv/pass: Record required pipe flushes
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
0039d0cf27 anv/pass: Use anv_multialloc for allocating the anv_pass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
415633a722 anv/descriptor_set: Use anv_multialloc for descriptor set layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
e5c29b8c27 anv: Add a helper for doing mass allocations
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure.  While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier.  This commit
adds a nice little helper struct for getting rid of some of that
complexity.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
82695c32b6 anv: Add helpers for converting access flags to pipe bits
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
4e17b59f6c anv/query: Use snooping on !LLC platforms
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms.  For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced.  More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again.  Getting the clflushing both correct
and efficient is very subtle and painful.  Instead, let's side-step the
problem by just snooping.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-07 12:17:20 -07:00
Emil Velikov
5318d1ff94 anv: provide anv_gem_busy() stub for the tests
Otherwise linking way fail.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-04-07 19:45:58 +01:00
Samuel Iglesias Gonsálvez
1c934bc71b anv/blorp: sample input attachments with resolves on BDW
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.

Fixes 16 tests on BDW:

dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-07 07:49:43 +02:00
Jason Ekstrand
b2c97bc789 anv/query: Busy-wait for available query entries
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO.  Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers.  In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.

This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
2017-04-05 21:17:11 -07:00
Jason Ekstrand
f195d40eca anv/device: Add a helper for querying whether a BO is busy
This is a bit more efficient than using GEM_WAIT with a timeout of 0.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-05 21:17:11 -07:00
Emil Velikov
a6840efc09 anv: provide required gem stubs for the tests
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.

v2: Introduce all the missing stubs at once.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
2017-04-05 17:54:38 +01:00
Jason Ekstrand
060a6434ec anv: Advertise larger heap sizes
Instead of just advertising the aperture size, we do something more
intelligent.  On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU.  In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
651ec926fc anv: Add support for 48-bit addresses
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware.  Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary.  In particular, general
and state base address need to live within 32 bits.  (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.)  In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set.  We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects.  While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
439da38d18 anv: Replace anv_bo::is_winsys_bo with a uint32_t flags
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
5d1ba2cb04 anv/blorp: Align vertex buffers to 64B
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
c964f0e485 anv: Query the kernel for reset status
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible.  In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering.  In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-04 18:33:52 -07:00