Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34623>
A significant CPU performance bottleneck in mesa GL is refcounting atomics:
even with the current pinning attempts, eliminating them can yield huge
performance gains (easily verified by running drawoverhead with return false at the top of pipe_reference_described()).
This is a proof of concept for removing refcounts from gallium objects,
namely sampler views and resources. Sampler views were smaller in scope,
so I started there. This MR alone is not expected to noticeably affect
performance, though if applied to all drivers/frontends,
it would enable a bunch of code deletion for crazy samplerview
refcounting hacks currently used to try circumventing the existing overhead.
Co-authored-by: Karol Herbst <kherbst@redhat.com
Co-authored-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33813>
There probably is a better way to provide this information to the
gallium driver, but this allows the driver to apply conversions as
needed when writing input tensors and reading back output tensors.
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31979>
Operations other than tensor addition will also need to be able to
handle multiple inputs, and a variable number of them.
And for testing individual operations, we also need to support models
with multiple inputs.
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32105>
This removes the take_ownership parameter and defines the behavior as if
take_ownership was always true, which is the fast path. This way, we don't
have to have 2 codepaths in set_vertex_buffers of every driver.
The old behavior is optionally available through util_set_vertex_buffers.
It also documents a new constraint that count in set_vertex_buffers must be
equal to the number of vertex buffers referenced by vertex elements or 0.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27492>
Teflon is a Gallium frontend that TensorFlow Lite can load to delegate
the execution of operations in a neural network model.
See docs for more.
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25714>
We've copied the signature too many times already. This will also be used
more.
It intentionally deviates from the name by not including the "_vbo" part.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26619>
This patch removes start_slot from set_vertex_buffers() as suggested in
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8142
compilation testing:
all gallium drivers, nine frontend compilation has been tested.
d3d10umd compilation has not been tested
driver, frontend testing:
only llvmpipe and radeonsi driver was tested running game
only the nine frontend changes are complex. All other changes are easy.
nine front end was using start slot and also using multi context.
nine frontend code changes:
In update_vertex_elements() and update_vertex_buffers(), the vertex
buffers or streams are ordered removing the holes. In update_vertex_elements()
the vertex_buffer_index is updated for pipe driver to match the ordered list.
v2: remove start_slot usage code from Marek (Marek Olšák)
v3: nine stream number holes mask code from Axel (Axel Davy)
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (except nine, which is Ab)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22436>
This will be required by drivers supporting multiple subgroup sizes with
a given CSO to properly implement OpenCL subgroups.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893>
And pipe/p_compiler.h are removed as it not used any more
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23577>
16K * 16K * 16bpp = 4G, which overflows int32, so layer_stride needs to
have 64 bits. More generally, any "byte_stride * height" computation
can overflow int32.
Use (u)intptr_t in some gallium and st/mesa places where we do CPU access.
Use uint64_t otherwise.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23389>
Drivers that don't use compile queues can use a common implementation. Drivers
that do can tail call into the common implementation.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18351>
pipe_context shouldn't have functions that return values because this
prevent multithreading.
Move this hook to pipe_screen.
Fixes: 606e42027e ("gallium: add hook on getting canonical format")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16078>
On swizzled copies canonical formats are used to reduce the formats to a
simpler subset.
Nevertheless, it is possible that some of the canonical formats defined
in Gallium are actually not supported by the drivers themselves.
This provides a driver-defined hook that can be used to provide an
alternative canonical format in case the canonical one defined by
Gallium is not supported by the driver.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15693>
Currently this just has wait, but in order to get the right answer
for vulkan partial, lavapipe/llvmpipe need to pass a partial flag
through here in the future.
This just changes the API so that's possible.
v2: use an enum (zmike)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15009>
Allow drivers to register a callback for when a shader program is linked.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13674>
The main motivation is to improve the score of viewperf13/snx.
This new interface is designed to be optimal for display lists as implemented
by the vbo module. It has much lower CPU overhead in the frontend, threaded
context, and the driver.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13050>
We would like draw-only display lists to have immutable draw info and
this is the only GL non-draw state in pipe_draw_info (not counting
view_mask).
It also allows removing some code from draw_vbo for tessellation.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12351>
The u_resource_vtbl indirection is going to be removed.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10659>
the only case in which this is nonzero is if a multidraw gets split by the frontend,
i.e., mesa core, and in all other cases it can be ignored. the value can also be ignored
for all indirect draws, though it seems many (most?) gallium drivers are not aware of this
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10166>
this moves index_bias into the multidraw struct, enabling draws where the value
changes to be merged; the draw_info struct member is renamed and moved to the end
of the struct for tc use
u_vbuf still has some checks to split draws if index_bias changes, maybe
this can be removed at some point?
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10166>
This new surface attribute can be supplied by the client to indicate
a list of modifiers that the driver can choose from for buffer
allocation. This is useful to make sure the buffers allocated via libva
are compatible with the intended usage (e.g. can be scanned out via KMS
or can be imported to EGL).
Introduce a new Gallium pipe_context.create_video_buffer_with_modifiers
hook that drivers can implement if they are modifiers-aware. Add a
modifiers argument to vlVaHandleSurfaceAllocate so that the
user-supplied list of modifiers can be passed down from vaCreateSurfaces
to the Gallium driver.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10237>
v2 (Karol): Fix declaration of pointers argument
v3 (Karol): Move flags into function interface as bools
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6401>
There are a few places (mainly u_threaded_context) that do:
set_vertex_buffers(...);
for (i = 0; i < count; i++)
pipe_resource_reference(&buffers[i].resource.buffer, NULL);
set_vertex_buffers increments the reference counts while the loop
decrements them.
This commit eliminates those reference count changes by adding a parameter
into set_vertex_buffers that tells the callee to accept all buffers
without incrementing the reference counts.
AMD Zen benefits from this because it has slow atomics if they come from
different CCXs.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8298>
Instead of calling this functions again to unbind trailing slots,
extend it to do it when binding. This reduces CPU overhead.
A lot of drivers ignore "start" and always unbind all slots after "count".
Such drivers don't need any changes here.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8298>
Instead of calling this function again to unbind trailing slots,
extend it to do it when images are being set. This reduces CPU overhead.
Only st/mesa benefits.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8298>
Instead of calling this functions again to unbind trailing slots,
extend it to do it as part of the call that sets vertex buffers.
This reduces CPU overhead. Only st/mesa benefits from this.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8298>
We often do this:
pipe->set_constant_buffer(pipe, shader, slot, &cb);
pipe_resource_reference(&cb->buffer, NULL);
That results in atomic increment in set_constant_buffer followed by
atomic decrement after set_constant_buffer. This new interface
eliminates those atomics.
For the case above, this should be used instead:
pipe->set_constant_buffer(pipe, shader, slot, true, &cb);
cb->buffer = NULL; // if cb is not a local variable, else do nothing
AMD Zen benefits from this. The perf improvement is ~3% for Viewperf13/Catia.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8298>
The vulkan cond rendering hook is quite different than the
traditional gallium one so add a new interface for it.
This just conditionalises rendering on the memory location.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8182>
To remove PIPE_CAP checking in the common code.
It's better if drivers lower multi draws even if the hardware doesn't
support it beause the multi draw loop can be moved deeper into the driver
to remove more overhead.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7679>
This is needed to implement the vulkan transform feedback pause
resume functionality
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7981>
Essentially rename multi_draw to draw_vbo and remove start and count
from pipe_draw_info.
This is only an interface change. It doesn't add multi draw support
anywhere.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7441>