Commit graph

949 commits

Author SHA1 Message Date
Jason Ekstrand
dd4db84640 anv: Use on-the-fly surface states for dynamic buffer descriptors
We have a performance problem with dynamic buffer descriptors.  Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction.  For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads.  Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.

There are two potential solutions I have seen for this problem.  The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway.  The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.

This solution, however, is recommended for a few of reasons:

 1. Surface states are relatively cheap.  We've been using on-the-fly
    surface state setup for some time in GL and it works well.  Also,
    dynamic offsets with on-the-fly surface state should still be
    cheaper than allocating new descriptor sets every time you want to
    change a buffer offset which is really the only requirement of the
    dynamic offsets feature.

 2. This requires substantially less compiler plumbing.  Not only can we
    delete the entire apply_dynamic_offsets pass but we can also avoid
    having to add architecture for passing dynamic offsets to the back-
    end compiler in such a way that it can continue using oword messages.

 3. We get robust buffer access range-checking for free.  Because the
    offset and range are baked into the surface state, we no longer need
    to pass ranges around and do bounds-checking in the shader.

 4. Once we finally get UBO pushing implemented, it will be much easier
    to handle pushing chunks of dynamic descriptors if the compiler
    remains blissfully unaware of dynamic descriptors.

This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-13 07:58:00 -07:00
Jason Ekstrand
6b644e571e anv: Stall before fast-clear operations
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards.  It appears
that the same is needed for fast-clears as well.  This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4.  The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense.  The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-13 07:57:03 -07:00
Jason Ekstrand
5e44ef4a76 anv: Accurately advertise dynamic descriptor limits
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things.  Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-13 07:57:03 -07:00
Jason Ekstrand
d36b463817 anv: Add a helper for working with VK_WHOLE_SIZE for buffers
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2017-03-13 07:57:03 -07:00
Jason Ekstrand
ee8044fd33 intel/vulkan: Get rid of recursive make
v2 [Emil Velikov]
 - Various fixes and initial stab at the Android build.
 - Keep the generation rules/EXTRA_DIST outside the conditional

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Jason Ekstrand
e042f5fcbc anv: Stop including brw_context.h
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Jason Ekstrand
12f348bc98 vulkan/wsi: Generate wayland protocol headers separately from EGL
Previously, we were depending on EGL for generating the headers and
providing the protocol symbols. However, since neither Vulkan driver
actually wants to link against EGL, this is kind of pointless. It also
creates a weird build dependency.

v2 [Jason]
 - Add missing wsi/ prefix, MKDIR_GEN

v3 [Emil Velikov]
 - include BUILT_SOURCES/generation rules outside of conditional

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Jason Ekstrand
4ea9bbe1f6 anv/wsi: Don't include wayland headers
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.

v2 [Emil]
 - Also remove wayland-client.h

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:30 +00:00
Tapani Pälli
db5f9c3177 anv: change BLOCK_POOL_MEMFD_SIZE to exactly 2GB
This is what comment above definition says and change fixes issue with
32bit build where BLOCK_POOL_MEMFD_SIZE is used as ftruncate parameter
and constant currently gets converted from 4294967296 to 0.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-08 07:57:55 +02:00
Jason Ekstrand
0421813588 anv: Make the framebuffer-renderpass format assert non-fatal
This should let Dota 2 run on debug builds though it will spew errors
like mad.  Hopefully, Valve will get this fixed sooner rather than
later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
33301d949f anv: Drop the anv_validate block helper
Over the course of driver development, we've come up with a number of
different schemes for adding giant blocks of asserts inside the driver.
This one is only being used once in anv_pipeline.c and the way it's
being used actually generates compiler warnings in release builds.  This
commit drops the anv_validate macro and just puts the contents of the
one validation function in side of a "#ifdef DEBUG" guard.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
a316d8f406 anv: Get rid of the stub() macros
Except for a few unimplemented things on gen7, we don't really have
stubs anymore so we should drop this.  This commit replaces the few gen7
stub() calls with explicitly labeled finishme's and makes the sparse
binding stuff silently no-op or return a FEATURE_NOT_PRESENT error.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
1488d079cb anv: Remove a pointless finishme
We've been supporting multiple shaders per module for some time now.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
1a43792783 anv: Convert the HiZ finishme's to perf_warn
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
201fc83df7 anv: Add a performance warning helper
This acts identically to anv_finishme except that it only dumps out
these nice log messages if you run with INTEL_DEBUG=perf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
b3135c3cf3 anv: Advertise shaderInt64 on Broadwell and above
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-03 13:59:29 -08:00
Nanley Chery
d7d64f1091 anv/image: Allow HiZ on input attachment-capable depth/stencil images
While an input attachment may only take on one of those two layouts,
other depth/stencil attachments that use the same image may have
HiZ-enabled layouts. Improves the average frame rate on a release
candidate of a proprietary Vulkan benchmark by 9.94% over 3 runs on my
SKL GT4.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
76b8cc2a1c anv/cmd_buffer: Centralize automatic layout transitions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
0a72b5f3cb anv/cmd_buffer: Add attachment transitioning functions
This is needed to transition input attachments.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
9950774f8b anv/blorp: Encapsulate subpass id querying
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
c78a959bcf anv/cmd_buffer: Enable render pass awareness
v2: Update cmd_state_reset (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
c0223d052b anv/pass: Store subpass attachment reference list
We'll loop through this array when performing automatic layout
transitions.

v2: Adjust formatting of an assignment (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
8f6a17c8e7 anv/pass: Fix size of anv_render_pass:subpass_attachments
Don't allocate space for resolve attachments if the subpass has none.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
608d17b80e anv: Store the user's VkAttachmentReference
We will be using the image layout. Store the full struct directly from
the user.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
6326f0f4be anv/cmd_buffer: Remove extra resolve for certain depth buffers
Due to recent commits, the sampler now bypasses the auxiliary HiZ buffer
when reading from a depth image subresource that is in the general
layout. Remove this unneeded resolve.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
ea744912b3 anv/cmd_buffer: Conditionally choose the sampled image surface state
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
5408d3fd05 anv/descriptor_set: Store aux usage of sampled image descriptors
v2: Rebase onto latest changes
v3: Account for NULL image_view in aux_usage assignment

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
efc2222323 anv/image: Create an additional surface state for sampling
This will be used to sample a depth input attachment without having to
pass through the HiZ buffer.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
f3621f4e71 anv/image: Simplify setup of HiZ sampler surface state
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
258af3a856 anv/image: Remove extra dependency on HiZ-specific variable
surf_usage is only useful to image views that may use HiZ buffers.
Storage image views don't use HiZ buffers.

v2: Update commit message and add an assertion.

Fixes: 055ff2ec52 ("anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ")
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
54d29ee65f anv: Update the HiZ sampling helper
Validate the inputs, verify that this image has a depth
buffer, use gen_device_info instead of

v2:
- Add parenthesis (Jason Ekstrand)
- Make parameters const
- Use gen_device_info instead of gen
- Pass aspect to missed function in transition_depth_buffer

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
172747a963 anv/cmd_buffer: Replace layout_to_hiz_usage()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
425e33bcdb anv/image: Add anv_layout_to_aux_usage()
This function supersedes layout_to_hiz_usage().

v2:
- Don't find the optimal buffer for layout transitions (Jason Ekstrand).
- Pass the devinfo instead of the gen (Jason Ekstrand)
- Update the function documentation.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
178f9e5f29 anv/pass: Avoid accessing attachment array out of bounds
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Lionel Landwerlin
af5f13e58c anv: add VK_KHR_descriptor_update_template support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
9f60ed98e5 anv: add VK_KHR_push_descriptor support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
12dee851a3 anv: descriptor: make descriptor writing take a stream allocator
This allows us to allocate surface states from the command buffer when
pushing descriptor sets rather than allocating them through a
descriptor set pool.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
194fa58285 anv: descriptors: extract writing of descriptors elements
This will be reused later on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
c2d199adec anv: make layout size computation helper available across compilation units
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
c83e33e6ee anv: move buffer_view declaration
We will need this declaration closer for readability later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Iago Toral Quiroga
7ad692d8e2 anv: do not subtract the base layer to compute depth in 3DSTATE_DEPTH_BUFFER
According to the PRM description of the Depth field:

  "This field specifies the total number of levels for a volume texture
   or the number of array elements allowed to be accessed starting at the
   Minimum Array Element for arrayed surfaces"

However, ISL defines array_len as the length of the range
[base_array_layer, base_array_layer + array_len], so it already represents
a value relative to the base array layer like the hardware expects.

v2: Depth is defined as a U11-1 field, so subtract 1 from
    the actual value (Jason)

This fixes a number of new CTS tests that would crash otherwise:
dEQP-VK.pipeline.render_to_image.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 09:04:03 +01:00
Jason Ekstrand
e647c4fbd9 util/build-id: Return a pointer rather than copying the data
We're about to use the build-id as the starting point for another SHA1
hash in the Intel Vulkan driver, and returning a pointer is far more
convenient.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-01 15:31:44 -08:00
Jason Ekstrand
e3d33a23e6 anv: Properly handle destroying NULL devices and instances
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0 13.0" <mesa-dev@lists.freedesktop.org>
2017-03-01 15:31:44 -08:00
Mauro Rossi
3f2cb699cf android: vulkan: add support for libmesa_vulkan_util
The following changes are implemented:

Add src/vulkan/Android.mk to build libmesa_vulkan_util
Android.mk: add src/vulkan to SUBDIR to build new module
intel/vulkan: fix libmesa_vulkan_util,vk_enum_to_str.h dependencies
Add -o OUTPUT_PATH option in src/vulkan/util/gen_enum_to_str.py script
Use -o OUTPUT_PATH option in automake generation rules for vk_enum_to_str.{c,h}

Fixes: e9dcb17 "vulkan/util: Add generator for enum_to_str functions"
Fixes: 8e03250 "vulkan: Combine wsi and util makefiles"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

[Emil Velikov]
 - Move parser within main()
 - Use --outdir instead of -o
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-28 01:24:41 +01:00
Emil Velikov
3935690d58 automake: anv: add missing include $(top_srcdir)/src/vulkan/util
Otherwise we'll fail to find the header and `make distcheck` will bail.

Fixes: e9dcb17962 ("vulkan/util: Add generator for enum_to_str functions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-28 14:08:17 +00:00
Jason Ekstrand
76c8327e6e anv: Bump advertised version to 1.0.42
We've been following the spec changes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-27 09:44:46 -08:00
Dave Airlie
f695735ed6 vulkan/wsi/radv: add initial prime support (v1.1)
This is a complete rewrite of my previous rfc patches.

This adds the ability to present to a different GPU that rendering
using a driver side operation that can copy from the tiled to
linear shared image.

This does prime support completely in the swapchain present code,
and each queue has a precreated command buffer for each image
and for the each queue family. This means presenting should work
on graphics and compute queues and transfer in the future.

v1.1: initialise needs_linear_copy in swapchain.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-27 05:42:16 +10:00
Emil Velikov
ab6fa871ef anv: automake: add TODO to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2017-02-24 17:36:59 +00:00
Jason Ekstrand
261092f7d4 anv: Enable MSAA compression
This just enables basic MSAA compression (no fast clears) for all
multisampled surfaces.  This improves the framerate of the Sascha
"multisampling" demo by 76% on my Sky Lake laptop.  Running Talos on
medium settings with 8x MSAA, this improves the framerate in the
benchmark by 80%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-23 12:10:42 -08:00