This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states. I've never liked that hand-rolled table but it
was the best we had at the time.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The helper block is extremely general. It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop. This way we can easily generalize it to emit more different
types of getter functions.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure. While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier. This commit
adds a nice little helper struct for getting rid of some of that
complexity.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms. For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced. More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again. Getting the clflushing both correct
and efficient is very subtle and painful. Instead, let's side-step the
problem by just snooping.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Otherwise linking way fail.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.
Fixes 16 tests on BDW:
dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Decoding with aubinator encountered a command of 0xffffffff. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.
Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.
v2:
* Update aubinator_error_decode
* Actually convert the length variable returned into a *signed* integer
in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The call to gen_print_group should provide a pointer to the beginning
of the the structure data, not the start of the batch data.
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO. Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers. In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.
This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.
v2: Introduce all the missing stubs at once.
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
Instead of just advertising the aperture size, we do something more
intelligent. On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU. In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware. Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary. In particular, general
and state base address need to live within 32 bits. (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set. We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects. While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible. In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering. In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's possible that the device could have been lost while we were
waiting. We should let the user know if this has happened.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the shader does not set one of these values, they are supposed to
get a default value of 0. We have hardware bits in 3DSTATE_CLIP for
this but haven't been setting them. This fixes the intermittent failure
of dEQP-VK.geometry.layered.3d.render_to_default_layer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages. However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations. This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Adding the actual table from the docs makes it clearer exactly what the
restrictions are. In particular, it becomes clear that compressed
textures ignore the alignment parameters in RENDER_SURFACE_STATE.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is pretty much the same tool as what i-g-t has, only with a more
fancy decoding of the instructions/registers. It also doesn't support
anything before gen4.
v2 (from Matt): Drop authors
Remove undefined automake variable
v3: Fix incorrect offsets for dword > 1 (Jordan)
v4: Fix decompression error with large blobs (Jordan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Recent changes in Makefile.sources merged the aubinator files in
a unique list of generated files and genxml/genX_xml.h is now needed
to avoid the following building error:
ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h',
missing and no known rule to make it
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
Fixes: 0f83c05 "intel: genxml: compress all gen files into one"
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE,
3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in
dword0, we probably want to print those.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: use Emil's recommendation
change rule to closer to genxml/genX_bits.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This allows us to run 32bit Vulkan apps on Android, ftruncate
call would fail on 2GB (max size being 2GB - 1).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is to fix following compile error with libmesa_isl:
mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found
Fixes: f0eaf38 ("genxml: New generated header genX_bits.h (v6)")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emli Velikov <emil.velikov@collabora.com>