Commit graph

2307 commits

Author SHA1 Message Date
Lionel Landwerlin
da2d67fc3b anv: gem-stubs: return a valid fd got anv_gem_userptr()
Fixes invalid close(-1) in the unit tests.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-25 22:02:51 +03:00
Kenneth Graunke
b9e93db208 intel: Increase Gen11 compute shader scratch IDs to 64.
From the MEDIA_VFE_STATE docs:

   "Starting with this configuration, the Maximum Number of Threads must
    be set to (#EU * 8) for GPGPU dispatches.

    Although there are only 7 threads per EU in the configuration, the
    FFTID is calculated as if there are 8 threads per EU, which in turn
    requires a larger amount of Scratch Space to be allocated by the
    driver."

It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern.  The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.

Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.

Fixes: 5ac804bd9a ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-09-23 16:59:40 -07:00
Kenneth Graunke
50c0dd8621 Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM"
This reverts commit 729de1488f.

It turns out that, although the register is in the logical context,
it isn't whitelisted, so we can't actually write it from userspace
batch buffers.  The write just becomes a noop, which is why we saw
no performance changes.

I manually whitelisted it, and still observed no performance gains, but
it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments
on the iris driver.  So we might need to fix something before enabling
this.  To prevent it randomly getting turned on should the kernel ever
whitelist this register, we revert the patch for now.
2019-09-23 16:31:23 -07:00
Jason Ekstrand
7d861ab812 anv: Advertise VK_KHR_shader_subgroup_extended_types
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-20 18:02:15 +00:00
Eric Engestrom
3c1a24de07 anv: implement ICD interface v4
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-20 08:31:58 +00:00
Eric Engestrom
19db95e78e anv: split instance dispatch table
This effectively breaks the instance dispatch table in 2 with entry
points using a physical device as first argument getting their own
dispatch table.

As a result we now have to check instance & physical device dispatch
table instead of just the instance dispatch table before.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-20 08:31:58 +00:00
Jason Ekstrand
0c4e89ad5b Move blob from compiler/ to util/
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-19 19:56:22 +00:00
Arcady Goldmints-Orlov
5ec5fecc26 anv: fix descriptor limits on gen8
Later generations support bindless for samplers, images, and buffers and
thus per-stage descriptors are not limited by the binding table size.
However, gen8 doesn't support bindless images and thus needs to report a
lower per-stage limit so that all combinations of descriptors that fit
within the advertised limits are reported as supported by
vkGetDescriptorSetLayoutSupport.

Fixes test dEQP-VK.api.maintenance3_check.descriptor_set
Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-19 09:10:40 -05:00
Samuel Iglesias Gonsálvez
f5dd6dfe01 anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls
This adds support for
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and
enables de Vulkan and SPIR-V extensions.

Also, notice that this includes the updates applied to the
VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension
VK_KHR_shader_float_controls v4 and Vulkan 1.1.116.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Lionel Landwerlin
0616b7ac90 vulkan: add vk_x11_strict_image_count option
This option strictly allocate the minImageCount given by the
application at swapchain creation.

This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.

v2: Add values in default drirc (Bas)

v3: specify engine name/version (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
2019-09-15 15:37:02 +03:00
Lionel Landwerlin
04dc6074cf driconfig: add a new engine name/version parameter
Vulkan applications can register with the following structure :

typedef struct VkApplicationInfo {
    VkStructureType    sType;
    const void*        pNext;
    const char*        pApplicationName;
    uint32_t           applicationVersion;
    const char*        pEngineName;
    uint32_t           engineVersion;
    uint32_t           apiVersion;
} VkApplicationInfo;

This enables the Vulkan implementations to apply workarounds based off
matching this description.

Here we add a new parameter for matching the driconfig options with
the following :

    <device driver="anv">
        <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
            <option name="blaaah" value="true" />
        </application>
    </device>

v2: switch engine name match to use regexps

v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)

v4: Add missing bit that went to the following commit (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
2019-09-15 15:37:02 +03:00
Anuj Phogat
729de1488f intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM
Initial benchmarking didn't show any performance benefits. But it might eventually.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-11 11:29:37 -07:00
Jason Ekstrand
34541be7b0 intel/blorp: Use wide formats for nicely aligned stencil clears
In the case where the stencil clear is nicely aligned, we can clear
stencil much more efficiently by mapping it as a wide format (say
RGBA32_UINT) and blasting out the stencil clear value with a repclear.
On Unigine Heaven, this makes one stencil clear go from non-trivial to
unnoticeable when looking at per-draw timings.

In order for this change to work properly, ANV needs to do a bit more
flushing around depth and stencil clears.  i965 and iris already have
the cache tracking logic to handle this so no changes are required
there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:35:09 +00:00
Eric Engestrom
037b5b567f anv: add support for vk_x11_override_min_image_count
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:16:05 +01:00
Eric Engestrom
4dcb1fff19 anv: add support for driconf
No option is supported yet, this is just the boilerplate.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:16:05 +01:00
Jordan Justen
9790cfcefa
anv,iris: L3ALLOC register replaces L3CNTLREG for gen12
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-06 13:11:25 -07:00
Jason Ekstrand
3b1a7e5333 anv: Bump maxComputeWorkgroupSize
Fixes: 9a129510f5 "anv: Bump maxComputeWorkgroupInvocations"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111552
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 18:26:55 +00:00
Kenneth Graunke
0d0ae16e8f intel: Stop redirecting state cache to command streamer cache section
This bit redirects the state cache from the unified/RO sections of the
L3 cache to the "CS command buffer" section of the cache, which would
be set up via TCCNTLREG.  The documentation says:

   "Additionaly, this redirection should be enabled only if there is a
    non-zero allocation for the CS command buffer section."

We don't allocate any cache to the CS command buffer section, so
enabling this redirection effectively disabled the state cache.
The Windows driver only sets up that section when using POSH, which
we do not currently use.  So, leave it unallocated and disable the
redirection to get a functional state cache again.

Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%,
and Car Chase by 2%.
2019-09-06 10:57:55 -07:00
Eric Engestrom
7abf65aedc anv: fix format string in error message
Fixes: 9775894f10 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-04 00:13:20 +01:00
Jordan Justen
181be14d43
anv: Build for gen12
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-28 13:38:34 -07:00
Jason Ekstrand
f58e0405b6 intel/fs: Drop the gl_program from fs_visitor
It's not used by anything anymore now that so much lowering has been
moved into NIR.  Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge.  Short of a lot of pointless work,
that one's probably not going away.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-25 01:02:52 -05:00
Rafael Antognolli
2b7ba9f239 anv: Only re-emit non-dynamic state that has changed.
On commit f6e7de41d7, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.

This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.

v2: Improve commit message and add documentation about skipped checks
(Jason)

Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-23 15:55:18 -07:00
Jason Ekstrand
951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Arcady Goldmints-Orlov
3835535537 anv: inline uniforms blocks don't count toward descriptor set limits
In a descriptor set inline uniform blocks don't use up any bindings.
However, the presence of any inline uniform blocks doed require the
use of the descriptor buffer, which takes up one binding.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 16:48:45 +00:00
Rafael Antognolli
ceeaf93c8e anv: Properly initialize device->slice_hash.
When subslices_delta == 0 and we take the early return,
device->slice_hash is not initialized on GEN11. It then causes a
segfault when going through anv_DestroyDevice, if compiled with
valgrind.

Fixes: 7bc022b4bb ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are
                    unbalanced.)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-15 09:42:48 -07:00
Rafael Antognolli
7bc022b4bb anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.

v2: Don't need to set the mask - it's mbo (Ken).
2019-08-12 16:19:08 -07:00
Jason Ekstrand
d787a2d05e anv: Implement VK_KHR_pipeline_executable_properties
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
67cb55ad11 anv: Add a ralloc context to anv_pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
fec4bdff40 anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is set
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
651fbbf9b8 anv/pipeline: Split setting up per-stage keys into its own loop
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
78f3dfb4a2 anv: Record shader compile stats in the pipeline cache
We're going to want these to be available regardless of caching.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
2af380d20f anv/pipeline: Stash generated code in the pipeline stage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
134607760a intel/compiler: Fill a compiler statistics struct
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct.  This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Francisco Jerez
c2fe7a0fb8 anv/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  According to Jason, improves Aztec Ruins
performance by 2.7%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Undo CPU performance micro-optimization done in i965 and iris due
    to lack of data justifying it on anv.  Use
    cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe
    control command directly.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-12 14:40:21 -07:00
Jason Ekstrand
14c96a6300 anv: Implement VK_EXT_subgroup_size_control version 2
The version bump adds a proper features struct.

Fixes: d10de25309 "anv: Implement VK_EXT_subgroup_size_control"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-12 14:56:33 +00:00
Eric Engestrom
e4aa0fc63a anv: add missing break
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 23:34:31 +01:00
Lionel Landwerlin
cefb4341b7 anv: drop unused code
We stopped using this when we moved to Jason's mi_builder.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 17:01:38 +03:00
Tapani Pälli
5e38db0c47 anv/android: disable shared representable image support explicitly
Android 9 loader conditionally advertises VK_KHR_shared_presentable_image
extension based on this property and it looks like it does not
initialize the struct before query.

Pragmas are added to ignore warnings with Android specific structure
types in same manner as commit 8d386e6eef  did.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 08:53:54 +03:00
Greg V
7b520dc74f anv: add MAP_POPULATE fallback define for portability
FreeBSD does not have MAP_POPULATE

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
2be3f16600 anv: remove unused Linux-specific include
Fixes: 4201cc2dd3 ("anv: Implement VK_KHX_external_semaphore_fd")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Rhys Perry
c52c54a746 anv,i965,iris: deduplicate setting of total_shared
v5: add patch

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Rhys Perry
024a46a407 anv: use derefs for shared memory access
vkpipeline-db for my Skylake GPU:
total instructions in shared programs: 8847602 -> 8847896 (<.01%)
instructions in affected programs: 10165 -> 10459 (2.89%)
helped: 8
HURT: 2

total cycles in shared programs: 1606273555 -> 1606251634 (<.01%)
cycles in affected programs: 2201803 -> 2179882 (-1.00%)
helped: 7
HURT: 3

The shaders with more instructions is due to a loop over a shared array
in Three Kingdoms being unrolled (and creating a lot of nested ifs). Not sure
if that's good or bad.

One of the shaders with worse cycles is only worse by 0.04% and the other
two are the shaders with loops unrolled.

v2: add patch
v4: don't set spirv_options.shared_addr_format
v4: move comment concerning the shared address format used and NULL
v4: add vkpipeline-db results
v5: rename to nir_lower_vars_to_explicit_types
v5: move setting of total_shared to outside brw_compile_cs
v6: set shared_addr_format
v6: formatting changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Tapani Pälli
aba57b11ee anv: support GetSwapchainGrallocUsage2ANDROID for Android
New function supports gralloc1 usage flags that get set separately
for producer and consumer. As we still need to support old method too,
let's share common code and use android_convertGralloc0To1Usage helper.
Bump the VK_ANDROID_native_buffer version to indicate support for the
new call.

Changes were tested on Android Celadon P with Basemark GPU and various
Sascha Willems Vulkan demos.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 05:08:01 +00:00
Greg V
c0376a1234 util: add anon_file.h for all memfd/temp file usage
Move the Weston os_create_anonymous_file code from egl/wayland into util,
add support for Linux memfd and FreeBSD SHM_ANON,
use that code in anv/aubinator instead of explicit memfd calls for portability.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-07 22:57:55 +00:00
Bas Nieuwenhuizen
5a26f528cb meson,i965: Link with android deps when building for android.
The DBG marco in brw_blorp.c ends up calling an android log function:

error: undefined reference to '__android_log_print'

v2: On suggestion from Lionel, hang the Android dependency onto a new
    libintel_common dependency.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 15:34:46 +02:00
Jason Ekstrand
bc612536eb anv: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D
There is an object-level  preemption workaround which requires this.
However, even without object-level preemption, we seem to have issues
with geometry flickering when 3D and compute are combined in the same
batch and this appears to fix it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109630
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111267
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-06 05:46:28 +00:00
Jason Ekstrand
f6e7de41d7 anv: Implement VK_EXT_line_rasterization
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Jason Ekstrand
abf9e10488 anv: Use dirty bits for dynamic state tracking
Previously, we assumed that the dirty bit was always 1 << VK_DYNAMIC_*
and this assumption is about to be false.  Extensions which define new
VK_DYNAMIC_* enums won't be nice and tightly packed which this really
requires.  Instead, add functions to don the conversions and rework the
bits a bit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Jason Ekstrand
aa13f75f01 anv: Advertise the right line width range on gen9 and CHV
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Eric Engestrom
d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00