This commit switches ANV to using the new common physical device and
instance base structs as well as the new dispatch framework. This
should make code sharing between Vulkan drivers much easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
This should make future platform enabling a good bit easier and more
reliable than having 3 of them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>
v2 (Jason Ekstrand):
- Separate the anv_gem interface from anv_queue internals
- Rework on top of the new anv_queue_family stuff
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
v2 (Jason Ekstrand):
- Don't take an anv_physical_device in anv_gem_get_engine_info()
- Return the engine info from anv_gem_get_engine_info()
- Free the engine info in anv_physical_device_destroy()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
This may vary based on the newer kernel engines based contexts.
v2 (Jason Ekstrand):
- Initialize anv_queue::exec_flags in anv_queue_init
- Don't conflate this with refactors to get_reset_stats
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
This is modeled on anv_memory_type and anv_memory_heap which we already
use for managing memory types. Each anv_queue_family contains some data
which is returned by vkGetPhysicalDeviceQueueFamilyProperties() verbatim
as well as some internal book-keeping bits. An array of queue families
along with a count is stored in the physical device. Each anv_queue
then contains a pointer to the anv_queue_family to which it belongs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
I don't know if this is a typo or an artifact of ancient versions of the
Vulkan API. In any case, it's wrong.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
The VkPhysicalDeviceSampleLocationPropertiesEXT struct is filled with
implementation dependent values and according to the table from the
Vulkan Specification section [36.1. Limit Requirements]:
pname | max | min
pname:sampleLocationSampleCounts |- |ename:VK_SAMPLE_COUNT_4_BIT
pname:maxSampleLocationGridSize |- |(1, 1)
pname:sampleLocationCoordinateRange|(0.0, 0.9375)|(0.0, 0.9375)
pname:sampleLocationSubPixelBits |- |4
pname:variableSampleLocations | true |implementation dependent
The hardware only supports setting the same sample location for all the
pixels, so we only support 1x1 grids.
Also, variableSampleLocations is set to true because we can set sample
locations per draw.
Implement the vkGetPhysicalDeviceMultisamplePropertiesEXT according to
the Vulkan Specification section [36.2. Additional Multisampling
Capabilities].
v2: 1- Replaced false with VK_FALSE for consistency. (Sagar Ghuge)
2- Used the isl_device_sample_count to take the number of samples
per platform to avoid extra checks. (Sagar Ghuge)
v3: 1- Replaced VK_FALSE with false as Jason has sent a patch to replace
VK_FALSE with false in other places. (Jason Ekstrand)
2- Removed unecessary defines and set the grid size to 1 (Jason Ekstrand)
v4: Fix properties reporting in GetPhysicalDeviceProperties2, not
GetPhysicalDeviceFeatures2 (Lionel)
Use same alignment as other functions (Lionel)
Report variableSampleLocations=true (Lionel)
v5: Don't overwrite the pNext in GetPhysicalDeviceMultisamplerPropertiesEXT
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
On Gen11, they took away our hardware int64 support. We have lowering
for all of it in NIR except for subgroup ops. Now that all the subgroup
ops are implemented, we can enable the feature.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
This isn't actually capable of deferring anything; it just trivially
returns success.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7735>
We are limited to 64 threads per dispatched group, regardless of what
num_cs_threads claims, so advertise that limit correctly.
Fixes (on TGL and up):
dEQP-VK.subgroups.size_control.compute.required_subgroup_size_min
and other *.required_subgroup_size_min tests.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7453>
Now that we have the HDC, using the data cache for UBO pulls seems to
help things quite a bit:
GTA V DXVK 104.0%
Talos Principle GL 102.8%
Rise of Tomb Raider VK 102.8%
Dark Souls 3 DXVK 101.4%
Witcher3 DXVK 101.3%
Bioshock Infinite GL 100.5%
Doom 2016 VK 97.7%
Doom is a bit of a loss but it helps enough other stuff, it's probably
worth the hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7230>
On Gen12+, we can enable additional caches in certain usage situations.
This routes that decision making to a central place in ISL, based on
surface usage flags, and updates both drivers to use it. (i965 doesn't
need to change because it doesn't support Gen12.)
We continue handling the "external" decision via an anv_mocs() wrapper
for now, since we store that flag in anv_bo, which isl doesn't know
about. (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
actually sure that would be cleaner.)
This patch should not have any functional nor performance effects, as
we continue selecting the exact same MOCS values for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
It's a bit hard to exactly map our implementation to the limits
described by Vulkan. The bindless surface handle in the extended
message descriptors is 20 bits and it's an index into the table of
RENDER_SURFACE_STATE structs that starts at bindless surface base
address. This means that we can have at must 1M surface states
allocated at any given time. Since most image views take two
descriptors, this means we have a limit of about 500K image views.
However, since we allocate surface states at vkCreateImageView time,
this means our limit is actually something on the order of 500K image
views allocated at any time. The actual limit describe by Vulkan, on
the other hand, is a limit of how many you can have in a descriptor set.
Assuming anyone using 1M descriptors will be using the same image view
twice a bunch of times (or a bunch of null descriptors), we can safely
advertise a larger limit. 1M is what's required by D3D12, so let's
advertise that.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3335
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7180>
v2: Re-wrap lines in meson.build. Suggested by Jason.
v3: Also update Makefile.sources and Android build files. Noticed by
Lionel.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
This reverts commit bcfec61d1e.
The previous patch fixed the underlying issue that the above commit was
actually working around. It turns out that the previously observed
performance regression was due to invalid aux-map entries for
multi-layer HiZ+CCS buffers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
On Gen7, the data cache is pretty terrible so we'd rather avoid it
there. On Gen8+, it should be fine and is less likely to conflict with
texturing so we should get less cache thrashing there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>
Use the same generators as used in anv driver so both Vulkan and OpenGL
drivers can share the same external memory objects.
v2: removed extra parameter from function gen_uuid_compute_device_id
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Eleni Maria Stea <estea@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7025>
We need Vulkan and GL to produce the same UUIDs. So move the generator
from ANV to a common code that can be shared by ANV and Iris driver.
v2: fix android build (Tapani)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7025>
It's included in declaration of INTEL_DEBUG.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>
On Fallout4, enabling HIZ_CCS_WT compression for D16_UNORM format
regress the performance by 2%, in order to avoid that disable
compression via driconf option.
The experiment showed that, running Fallout4 with HIZ performs better
than HIZ_CCS and HIZ_CCS_WT. Reason behind that is the benchmark uses
the depth pass with D16_UNORM surfaces format which fills the L3 cache
and next pass doesn't make use of it where we end up clearing cache.
v2:
- Don't add conditional check in isl (Nanley, Jason)
- Move disable_d16unorm_compression flag to instance (Lionel)
- Use plane_format.isl_format (Nanley)
v3:
- Add more descriptive comment (Marcin Ślusarz)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6734>
Things work a little different on Gen7 than they do on Gen8+. In
particular, SOBufferEnable lives in 3DSTATE_STREAMOUT but BufferPitch
lives in 3DSTATE_SO_BUFFER. This leaves us having to marshal data
around a bit more than we did on Gen8. Still, it's not too bad.
Normally, I don't spend much time on Gen7 but XFB just became a hard
requirement for DXVK so it stopped working for all our Haswell users.
Let's get them happily playing their games again. 😸
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3532
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6997>
Now that we're not trying to evade preprocessor macro expansion in
preprocessor string concatenation, we can use plain old bools in option
setup.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6916>
We can generate the XML if anybody actually queries it, but this reduces
the amount of work in driver setup and means that we'll be able to support
driconf option queries on Android without libexpat.
This updates the driconf interface struct version for i965, i915, and
radeon to use the new getXml entrypoint to call the on-demand xml
generation. Note that our loaders (egl, glx) implement the v2 function
interface and don't use .xml when that's set, and the X server doesn't use
this interface at all.
XML generation tested on iris and i965 using adriconf
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6916>
I'm bringing up freedreno Vulkan on an Android phone, and my pains are
exactly what Chad said when working on Intel's vulkan for Android in
aa716db0f6 ("intel: Add simple logging façade for Android (v2)"):
On Android, stdio goes to /dev/null. On Android, remote gdb is even
more painful than the usual remote gdb. On Android, nothing works like
you expect and debugging is hell. I need logging.
This patch introduces a small, simple logging API that can easily wrap
Android's API. On non-Android platforms, this logger does nothing
fancy. It follows the time-honored Unix tradition of spewing
everything to stderr with minimal fuss.
My goal here is not perfection. My goal is to make a minimal, clean API,
that people hate merely a little instead of a lot, and that's good
enough to let me bring up Android Vulkan. And it needs to be fast,
which means it must be small. No one wants to their game to miss frames
while aiming a flaming bow into the jaws of an angry robot t-rex, and
thus become t-rex breakfast, because some fool had too much fun desiging
a bloated, ideal logging API.
Compared to trusty fprintf, _mesa_log[ewi]() is actually usable on
Android. Compared to os_log_message(), this has different error levels
and supports format arguments.
The only code change in the move is wrapping flockfile/funlockfile in
!DETECT_OS_WINDOWS, since mingw32 doesn't have it. Windows likely wants
different logging code, anyway.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6806>
This doesn't really do anything for us today. One day, I suppose we
could use it to do something with wide loads with non-uniform offsets.
The big reason to do this is to get better testing to make sure that NIR
doesn't blow up on the deref paths.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
Only advertise VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT if CLOCK_MONOTONIC_RAW
is defined. Fixes the build on OpenBSD which has CLOCK_MONOTONIC but not
CLOCK_MONOTONIC_RAW.
Fixes: 67a2c1493c ("vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6517>
Replace local get_available_system_memory() function with
os_get_available_system_memory().
Fixes: b80930a6fe ("anv: add support for VK_EXT_memory_budget")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6517>
This implements timeline semaphores using a new type of dma-fence
stored into drm-syncobjs. We use a thread to implement delayed
submissions.
v2: Drop cloning of temporary semaphores and just transfer their ownership (Jason)
Drain queue when dealing with binary semaphore
Ensure we don't submit to the thread as long as we don't need to
v3: Use __u64 not uintptr_t for kernel pointers
Fix commented code for INTEL_DEBUG=bat
Set DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES in timeline fence execbuf extension
Add new anv_queue_set_lost()
Drop multi queue stuff meant for the fake multi queue patch
Rework temporary syncobj handling
Don't use syncobj when not available (DeviceWaitIdle/CreateDevice)
Use ANV_MULTIALLOC
And a few more tweaks...
v4: Drop drained condition helper (Lionel)
Fix missing EXEC_OBJECT_WRITE on BOs we want to wait on (Jason)
v5: Add missing device->lost_reported in _anv_device_report_lost (Lionel)
Fix missing free on submit->simple_bo (Lionel)
Don't drop setting the device in lost state on QueueSubmit error (Jason)
Store submit->fence_bos as an array of uintptr_t (Jason)
v6: condition device->has_thread_submit to i915 & core DRM support (Jason)
v7: Fix submit->in_fence leakage on error (Jason)
Keep dummy semaphore with no thread submission (Jason)
v8: Move ownership of submit->out_fence to submit (Jason)
v9: Don't forget to read the VkFence's syncobj binary payload (Lionel)
v10: Take the mutex lock on anv_gem_close() (Jason/Lionel)
v11: Fix void* -> u64 cast on 32bit (Lionel)
v12: Rebase after BO backed timeline semaphore (Lionel)
v13: Fix missing snippets lost after rebase (Lionel)
v14: Drop update_binary usage (Lionel)
v15: Use ANV_MULTIALLOC (Lionel)
v16: Fix some realloc issues (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901>
Needed for dealing with the new DRM timeline syncobj ioctls.
v2: Make use of the anv_gem_get_drm_cap() parameter... (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901>
This adds applicationName + version through like engineName.
Rationale: A game (World War Z) includes the store name in the
executable name, so has multiple executable names.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6120>