snprintf() guarantees that it will not write more chars than allowed,
and that the string will be null-terminated, without the need to fill
the whole thing with zeroes to begin with.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
anv_GetPhysicalDeviceSurfaceSupportKHR will already return success for
this, but anv_GetPhysicalDevice{Xcb,Xlib}PresentationSupportKHR do not.
Apps which check for presentation support via the latter (all Feral
Vulkan games at least) will therefore fail.
This allows me to render on an Intel GPU and present to a display
connected to an AMD card (tested HD 530 + Vega 64).
v2: Rebase on current master.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch exposes support for the following two extensions:
* VK_GOOGLE_decorate_string
* VK_GOOGLE_hlsl_functionality1
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107971
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of having weak references to the anv functions and separate
trampoline functions with their own dispatch table, just make the
trampoline functions weak. This gets rid of a dispatch table and
potentially lets the compiler delete the unused weak function. The
end result is a reduction in the .text section of 5.7K and a reduction
in the .data section of 1.4K.
Before:
text data bss dec hex filename
3190329 282232 8960 3481521 351fb1 _install/lib64/libvulkan_intel.so
After:
text data bss dec hex filename
3184548 280792 8960 3474300 35037c _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back. This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.
The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs. On a
single-GPU setup or one which doesn't support this present mode, we need
to do something. We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries. This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.
v2:
Ensure deviation is at least as big as the GPU time interval.
v3:
Set device->lost when returning DEVICE_LOST.
Use MAX2 and DIV_ROUND_UP instead of open coding these.
Delete spurious TIMESTAMP in radv version.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v4:
Add anv_gem_reg_read to anv_gem_stubs.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5:
Adjust maxDeviation computation to max(sampled_clock_period) +
sample_interval.
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
-v2: enabling of ForceThreadDispatchEnable is only for gen8, for
gen9 and higher reverted enabling of PixelShaderHasUAV.
-v3 (Jason Ekstrand): Rework the comments a bit.
CC: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140 (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Even though the Intel GPU are always at the same PCI location, all the
info we need is already provided by libdrm. Let's be future proof.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There's no reason why we need generate trampoline functions for instance
functions or carry N copies of the instance dispatch table around for
every hardware generation. Splitting the tables and being more
conservative shaves about 34K off .text and about 4K off .data when
built with clang.
Before splitting dispatch tables:
text data bss dec hex filename
3224305 286216 8960 3519481 35b3f9 _install/lib64/libvulkan_intel.so
After splitting dispatch tables:
text data bss dec hex filename
3190325 282232 8960 3481517 351fad _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is basically a port of commit,
3ade766684
("i965: Disable 3DSTATE_WM_HZ_OP fields.")
The BDW+ docs describe how to use the 3DSTATE_WM_HZ_OP instruction in
the section titled, "Optimized Depth Buffer Clear and/or Stencil Buffer
Clear." It mentions that the packet overrides GPU state for the clear
operation and needs to be reset to 0s to clear the overrides. Depending
on the kernel, we may not get a context with the GPU state for this
packet zeroed. Do it ourselves just in case.
Prevents a number of GPU hangs when running crucible on ICL. I tried to
get the exact number of hangs that occurs without this patch, but was
unsuccessful. The test machine became unresponsive before completing the
full run.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ref: 263b584d5e "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on Skylake."
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Not going to matter, but be consistent.
Found by coverity
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: caf41c78c (anv/allocator: Support softpin in the BO cache)
Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
normal. If we are near enough to the end, this can cause us to start a
new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We
always reserve enough space at the end for an MI_BATCH_BUFFER_START so
we can just increment cmd_buffer->batch.end prior to emitting the
command.
Fixes: a0b133286a "anv/batch_chain: Simplify secondary batch return..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
Tested-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Broadwell and above, we have to use different MOCS settings to allow
the kernel to take over and disable caching when needed for external
buffers. On Broadwell, this is especially important because the kernel
can't disable eLLC so we have to do it in userspace. We very badly
don't want to do that on everything so we need separate MOCS for
external and internal BOs.
In order to do this, we add an anv-specific BO flag for "external" and
use that to distinguish between buffers which may be shared with other
processes and/or display and those which are entirely internal. That,
together with an anv_mocs_for_bo helper lets us choose the right MOCS
settings for each BO use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is the minimum value according to the spec.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Let's just be explicit that VK_NV_shading_rate_image is not supported.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 6ee1709170 "vulkan: Update the XML and headers to 1.1.86"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I was about to make the claim to someone that every field in isl_surf
is either an enum or has explicit units. Then I looked at isl_surf and
discovered this claim was wrong. We should fix that. This commit does
a few refactors:
* Add _B suffixes to some struct fields
* Add _B to some variables and parameters
* Rename row_pitch_tiles -> row_pitch_tl
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This was added as part of 1.1 but it's very hard to track exactly what
extension added it. In any case, we should implement it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <Airlied@redhat.com>
The only thing that matters is the size since we never specify any
offsets in terms of blocks.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of passing around BOs and offsets, use addresses which are anv's
GPU equivalent of pointers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Each query slot is a uint64_t and we were only zeroing half of it.
Fixes: 7ec6e4e689 "anv/query: implement multiview interactions"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of computing an index at the end which we hope maps to the
number of things written, just count the number of things as we go.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The offsets now come from the anv_address, these references were not
updated and using the old variable.
Fixes: e1ab834557 "anv/memcpy: Use addresses instead of bo+offset"
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
This shouldn't matter as we'll never write OOB anyway but we may as well
get it right. It's supposed to be in dwords - 1.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The Vulkan 1.1.81 spec says:
"It is legal for offset.x + extent.width or offset.y + extent.height
to exceed the dimensions of the framebuffer - the scissor test still
applies as defined above. Rasterization does not produce fragments
outside of the framebuffer, so such fragments never have the scissor
test performed on them."
Elsewhere, the Vulkan 1.1.81 spec says:
"The application must ensure (using scissor if necessary) that all
rendering is contained within the render area, otherwise the pixels
outside of the render area become undefined and shader side effects
may occur for fragments outside the render area. The render area
must be contained within the framebuffer dimensions."
Unfortunately, there's some room for interpretation here as to what the
consequences are of having the render area set to exactly the
framebuffer dimensions and having a scissor that is larger than the
framebuffer. Given that GL and other APIs provide automatic clipping to
the framebuffer, it makes sense that applications would assume that
Vulkan does this as well. It costs us very little to play it safe and
just clamp client-provided scissors to the framebuffer dimensions.
Fortunately, the user is required to provide us with at least one
scissor so we don't need to handle the case where they don't.
Fixes: fb2a5ceb32 "anv: Emit DRAWING_RECTANGLE once at driver..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I have no idea if I'm correct about what's going wrong or if this is the
correct fix. However, in my multiple weeks of banging my head on this
hang, a VUE reference counting bug seems to match all the symptoms and
it definitely fixes the hang.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107280
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some of the bits of VERTEX_BUFFER_STATE such as access type, instance
data step rate, and pitch come from the pipeline.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.
v2: - Fix additional uses of _mesa_bitcount added after this was
originally written
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Seems in case of 32-bit library, usage of msse2 makes
some stack corruption or incorrect instructions.
Usage with mstackrealign fixes that case.
v2: Fixed meson.
v3: Definition of c_sse2_args moved on the top (L.Landwerlin).
Added mstackrealign for Android's mks where msee4.1 is used.
v4: Added for Vulkan also.
v5: Commit message correction.
CC: <mesa-stable@lists.freedesktop.org>
Fixes: 6b05c080f2 (i965: Compile with -msse3)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107779
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The brw_vs_prog_data::double_inputs_read field comes directly from
shader_info::double_inputs which may contain inputs which are not
actually read. Instead of using it directly, AND it with inputs_read
which is only things which are read. Otherwise, we may end up
subtracting too many elements when computing elem_count.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103241
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>