Commit 582bf4d9 turned on write-combining for most (all?) memory
allocations. This caused a fairly large performance drop in some of
our VMware tests (application traces, such as Windows Metro Paint).
This patch adds a third memory type configuration: DEVICE_LOCAL,
HOST_VISIBLE, HOST_COHERENT. This is uncached. Then, in
anv_AllocateMemory() we only use write-combining for this uncached
type. This memory type is found in the Intel Windows Vulkan driver.
And according to
https://asawicki.info/news_1740_vulkan_memory_types_on_pc_and_how_to_use_them
uncached memory correlates to write-combined memory.
This fixes our performance regression (and actually produced the
fastest ever results for our test suite).
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20770>
Handle lookup (for example PRIME_FD_TO_HANDLE) must be synchronized with
GEM_CLOSE, otherwise re-import can race with bo_del path, resulting in
the handle of the newly (re)imported BO getting closed. Now that the
finalize step has been decoupled, fixing this is mostly just deleting
code.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
The complexity around batching up handle closing is simply to allow the
virtgpu to back up ccmd's to the host (because virtio/virtgpu is pretty
inefficient when it comes to lots of small msgs to the host, and it is
common that when we are deleting BOs, we delete a lot of them at the
same time. But that will make the locking fix in the next commit
impossible (without nested locks). So let's flip this around and do the
step that virtgpu wants to batch up first, before we get into closing
GEM handles, etc.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
In prep for the next patch, where locking is swapped around to cover the
whole bo_del() path, decouple handling of the recycle-to-BO-cache path.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
When importing from a GEM name or dmabuf fd, we can race with the final
unref of the same BO, in which case we can get a hit in the handle
table for an fd_bo that another thread is about to free(). Detect and
handle this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
This was a merge conflict from the Win32 WSI DXGI swapchain changes.
I missed moving a new line of code that was added when rearranging
things for using the common helpers.
Fixes: cfa260cd ("dzn: Use common physical device list/enumeration helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20944>
The given screen is already freed by the caller in case a NULL-pointer is
returned by the implementation.
Cc: mesa-stable
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20933>
VKD3D-Proton DXBC f32 to f16 conversion implements a float conversion using PackHalf2x16.
Because the spec does not specify a rounding mode, it emits a sequence to ensure
D3D-like behaviour for infinity.
When we know the current backend has pack_half_2x16_rtz_split,
we can eliminate the extra sequence.
Fossil DB stats on GFX11:
Totals from 835 (0.62% of 134913) affected shaders:
VGPRs: 49368 -> 49224 (-0.29%)
CodeSize: 5341956 -> 5124564 (-4.07%)
Instrs: 1024062 -> 987041 (-3.62%)
Latency: 6530956 -> 6465120 (-1.01%); split: -1.01%, +0.00%
InvThroughput: 908189 -> 870253 (-4.18%)
VClause: 18704 -> 18702 (-0.01%); split: -0.02%, +0.01%
SClause: 33406 -> 33284 (-0.37%); split: -0.38%, +0.01%
Copies: 67440 -> 65992 (-2.15%); split: -2.15%, +0.00%
Branches: 18498 -> 18465 (-0.18%)
PreSGPRs: 38409 -> 38331 (-0.20%)
PreVGPRs: 44089 -> 43834 (-0.58%)
Note, some fossils are from before this pattern was added to VKD3D-Proton,
so the above may not reflect real-world impact.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
Constant folding always uses RTNE for pack_half_2x16_split, but some
backends implement it with RTZ.
Lowering to RTZ when available ensures that the behaviour will be
consistent between constant folding and the backend.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
Same as pack_half_2x16_rtz_split, but always uses RTZ mode.
Note that pack_half_2x16 rounding mode is unspecified.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.
The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.
No shader-db changes on TGL & DG2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
It is not an exhaustive list but it helps by reducing the bulk of
"Failed to create waffle_context for OpenGL [34].x" errors in the logs
by thousands of occurrences and those are probably not going to be
needed.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20909>
The code emitted by lp_build_fpstate_set to reset the FP state could be
jumped over when the write mask was zero, leading to denormals not being
flushed to zero.
Spotted by Roland Scheidegger.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20901>
If a shader stage is already imported from a library it should be
properly ignored.
Fixes recent CTS dEQP-VK.pipeline.fast_linked_library.misc.unused_shader_stages*.
Fixes: c8765c5244 ("radv: ignore shader stages that don't need to be imported with GPL")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20899>
It's legal to create a library with FRAGMENT_OUTPUT_INTERFACE and with
all CB states as dynamic, in this case the PS epilog should be dynamic.
This fixes a bunch of regressions while running Zink/RADV CTS with
RADV_PERFTEST=gpl.
Zink is the final boss.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20882>
For the common case where we're emitting packet we don't need to
update the cl_out pointer and then store the result in cl->next,
we can directly update cl->next.
This shows a small improvement in vkoverhead's scores for basic
draw tests.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20897>
Fix: Acquire/release should have one valid access/sync and one set
to none.
Workaround: D3D doesn't like simultaneous access resources leaving
COMMON layout, nor does it like setting UAV/RTV access bits for the
COMMON layout.
Use UNDEFINED -> UNDEFINED layout transitions, where the access bits
just aren't validated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20919>
Float controls are emitted as function attributes on the entrypoint.
These function attributes are not the standard build-in LLVM kind, but
are strings, which the DXIL backend didn't know how to emit. So, this
change adds string attribute support and uses it for fp32 ftz/preserve.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20919>
Allow the compiler to assume that the shader always has full subgroups,
meaning that the initial EXEC mask is -1 in all waves (all lanes enabled).
This assumption is incorrect for ray tracing and internal (meta) shaders
because they can use unaligned dispatch.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
This will make sure the internal field is set to true for internal
shaders which are initialized outside of radv_device_init_meta.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
The shifting was off-by-one compared to how it is done in the kernel. Also, excess_length needs to be casted to uint64_t to prevent zeroing everything except the 5 LSBs.
Fixes: abf3bcd6 ("radv: Add RMV resource tracking")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20820>