We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure. While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier. This commit
adds a nice little helper struct for getting rid of some of that
complexity.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms. For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced. More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again. Getting the clflushing both correct
and efficient is very subtle and painful. Instead, let's side-step the
problem by just snooping.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Otherwise linking way fail.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.
Fixes 16 tests on BDW:
dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO. Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers. In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.
This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.
v2: Introduce all the missing stubs at once.
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
Instead of just advertising the aperture size, we do something more
intelligent. On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU. In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware. Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary. In particular, general
and state base address need to live within 32 bits. (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set. We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects. While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible. In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering. In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's possible that the device could have been lost while we were
waiting. We should let the user know if this has happened.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the shader does not set one of these values, they are supposed to
get a default value of 0. We have hardware bits in 3DSTATE_CLIP for
this but haven't been setting them. This fixes the intermittent failure
of dEQP-VK.geometry.layered.3d.render_to_default_layer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
This allows us to run 32bit Vulkan apps on Android, ftruncate
call would fail on 2GB (max size being 2GB - 1).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While having the _3d and _gpgpu versions is nice, there's no reason why
we need to have duplicated logic for tracking the current pipeline.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The programming note that says we need to do this still exists in the
SkyLake PRM and, from looking at the bspec, seems like it may apply to
all hardware generations SNB+. Unfortunately, this isn't particularly
clear cut since there is also language in the bspec that says you can
skip the flushing and stall to get better throughput. Experimentation
with the "Car Chase" benchmark in GL seems to indicate that some form of
flushing is still needed. This commit makes us do the full set of
flushes regardless of hardware generation. We can always reduce the
flushing later.
Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
A bunch of code was indented in such a way that it looked like it went
with the if statement above but it definitely didn't.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
This fixes rendering issues in the Vulkan port of skia on some hardware.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
All callers of isl_surf_init() that set 'min_row_pitch' wanted to
request an *exact* row pitch, as evidenced by nearby asserts, but isl
lacked API for doing so. Now that isl has an API for that, update the
code to use it.
v2: Assert that isl_surf_init() succeeds because the callers assume
it. [for jekstrand]
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
We should use anv_get_layerCount() to access layerCount of VkImageSub-
resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil-
Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case.
Test: Sample multithreadcmdbuf from LunarG can run without crash
Signed-off-by: Xu Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
A resolve is not needed on Skylake in this case. We were forcing
a resolve because we set the input_aux_usage to ISL_AUX_USAGE_NONE.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We don't need to make the caller (CmdCopyQueryPoolResults) aware of the
problem since compute_query_result() only emits state. The caller is also
expected to hit OOM in this scenario right after calling this function, but
it is already handling it safely.
Fixes:
dEQP-VK.api.out_of_host_memory.cmd_copy_query_pool_results
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.
Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).
Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).
Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to section 14.6 of the Vulkan specification:
"When sample shading is enabled, the x and y components of FragCoord
reflect the location of the sample corresponding to the shader
invocation."
So add a boolean parameter to the lowering pass to select this behavior
when we need it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we know the device has been lost we should return this error code for
any command that can report it before we attempt to do anything with the
device.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan specs say:
"A logical device may become lost because of hardware errors, execution
timeouts, power management events and/or platform-specific events. This
may cause pending and future command execution to fail and cause hardware
resources to be corrupted. When this happens, certain commands will
return VK_ERROR_DEVICE_LOST (see Error Codes for a list of such commands).
After any such event, the logical device is considered lost. It is not
possible to reset the logical device to a non-lost state, however the lost
state is specific to a logical device (VkDevice), and the corresponding
physical device (VkPhysicalDevice) may be otherwise unaffected. In some
cases, the physical device may also be lost, and attempting to create a
new logical device will fail, returning VK_ERROR_DEVICE_LOST."
This means that we need to track if a logical device has been lost so we can
have the commands referenced by the spec return VK_ERROR_DEVICE_LOST
immediately.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So that we don't have to do things like rolling back address relocations in
case that we ran into OOM after computing them, etc
Also, make sure that if the queue submission comes with a fence, we set it up
correctly so it behaves according to the spec after returning
VK_ERROR_DEVICE_LOST.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's written in C rather than pure python and is strictly faster, the
only reason not to use it that it's classes cannot be subclassed.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This has the potential to mask errors, since Element.get works like
dict.get, returning None if the element isn't found. I think the reason
that Element.get was used is that vulkan has one extension that isn't
really an extension, and thus is missing the 'protect' field.
This patch changes the behavior slightly by replacing get with explicit
lookup in the Element.attrib dictionary, and using xpath to only iterate
over extensions with a "protect" attribute.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Instead of using an if and a check, use dict.get, which does the same
thing, but more succinctly.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This produces the header and the code in one command, saving the need to
call the same script twice, which parses the same XML file.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>