We don't want to export suballocated resources to external consumers,
for a variety of reasons. First of all, it would be exporting random
other pieces of memory which we may not want those external consumers
to have access to. Secondly, external clients wouldn't be aware of
what buffers are packed together and busy-tracking implications there.
Nor should they be. And those are just the obvious reasons.
When we allocate a resource with the PIPE_BIND_SHARED flag, indicating
that it's going to be used externally, we avoid suballocation.
However, there are times when the client may suddenly decide to export
a texture or buffer, without any prior warning. Since we had no idea
this buffer would be exported, we suballocated it. Unfortunately, this
means we need to transition it to a dedicated allocation on the fly, by
allocating a new buffer and copying the contents over.
Making things worse, this often happens in DRI hooks that don't have an
associated context (which we need to say, run BLORP commands). We have
to create an temporary context for this purpose, perform our blit, then
destroy it. The radeonsi driver uses a permanent auxiliary context
stored in the screen for this purpose, but we can't do that because it
causes circular reference counting. radeonsi doesn't do the reference
counting that we do, but also doesn't use u_transfer_helper, so they
get lucky in avoiding stale resource->screen pointers. Other drivers
don't create an auxiliary context, so they avoid this problem for now.
For auxiliary data, rather than copying it over bit-for-bit, we simply
copy over the underlying data using iris_copy_region (GPU memcpy), and
take whatever the resulting aux state is from that operation. Assuming
the copy operation compresses, the result will be compressed.
v2: Stop using a screen->aux_context and just invent one on the fly to
avoid circular reference counting issues.
Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12623>
This code assumed that batch->exec_bos[i] matched validation_list[i],
which won't be true once we start suballocating BOs. This patch changes
it to print the full exec_bos[i] list instead of the validation list,
as that has the logical list of objects, names, addresses, placement,
whether they are suballocated, and so on.
It may be useful to look at the actual validation list as well; I'm not
sure how common that is. We may want to add additional debug prints in
the future.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12623>
We don't want to suballocate some buffers, such as ones that we know
we're intending to export to other clients, or ones with special
semantics (such as the workaround BO not having proper synchronization).
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12623>
The vast majority of cases need no handling at all, as they simply
read bo->address or similar fields, which works in either case.
Some other cases simply need to unwrap to look at the underlying BO.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12623>
With suballocation, our batch BO list may have multiple BOs that are
suballocated from the same GEM object. We still need to track each of
those buffers for cross-batch write tracking, cache tracking, and busy
tracking. However, we only want to include underlying GEM objects in
the actual validation list building. The validation list entry should
have EXEC_OBJECT_WRITE if any of the BOs are marked as writable.
We use a temporary array to map GEM handles to validation list entries
so we can quickly see if we've already emitted one and update the
EXEC_OBJECT_WRITE flag as needed.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12623>
We would like to start performing slab allocation of resources, where
multiple resources can be backed by a single GEM object.
Originally, I had thought to move busy tracking, cache domain tracking,
and so on into resources themselves, instead of having them at the BO
level. Multiple resources would point at the same BO with an offset.
Unfortunately, this meant adjusting the batch BO pinning code to take
resources rather than BOs. That cascades into needing iris_address
for genxml packing to store resources, not BOs. Which means that places
which have use raw BOs would need to start creating resources instead.
Except some places, like aux BO handling, really don't make sense as
pipe resources and really would rather use raw BOs. So iris_address
would need to store both, which convolutes the genxml field. And,
having a BO and resource means that every place in the code needs to
handle that offset correctly. It sounds simple, but is a giant mess.
Instead, we take a different route: adjust iris_bo itself, so that BOs
are either be backed by a GEM object (as is the case today), or backed
by another underlying BO. "Real" BOs have bo->gem_handle != 0. "Slab
allocated" or "fake" or "wrapper" BOs have bo->gem_handle == 0. We move
fields into a union based on these cases. amdgpu takes this approach.
This sounds complex at first glance---in theory, every place that
interacts with BOs might need to handle the wrapper BO special case.
But in practice, they don't. For suballocated BOs, we can set the
wrapper's address field to the underlying BO's address plus any offset,
at which point it looks like any other BO. Most other properties are
easily queried; the main code that needs updating is execbuf handling
and bufmgr internals.
For now, we simply move the fields. Any code that accesses either
bo->real.* or bo->gem_handle will need updating in future patches to
actually handle the slab-allocated case.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12623>
gbm_{bo,surface}_create_with_modifiers doesn't allow callers to
pass usage flags. Assume USE_SCANOUT since this is what most
callers want.
Bump the GBM ABI version so that other backends can discover when
the usage flags can be used.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: James Jones <jajones@nvidia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197>
gbm_{bo,surface}_create_with_modifiers is missing the usage flags. Add a new
function which lets library users specify it.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: James Jones <jajones@nvidia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197>
We were sometimes using "usage", sometimes using "flags". Let's
just use "flags" everywhere (since the enum is named gbm_bo_flags).
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: James Jones <jajones@nvidia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197>
==767499== Conditional jump or move depends on uninitialised value(s)
==767499== at 0xEB8E3ED: x11_image_finish (wsi_common_x11.c:1539)
==767499== by 0xEB8E768: x11_swapchain_destroy (wsi_common_x11.c:1640)
==767499== by 0xEB8A8FA: wsi_common_destroy_swapchain (wsi_common.c:505)
==767499== by 0xDE30BA1: anv_DestroySwapchainKHR (anv_wsi.c:242)
==767499== by 0x6817A21: copper_displaytarget_destroy (zink_copper.c:192)
==767499== by 0x6882BE6: zink_destroy_resource_object (zink_resource.c:95)
==767499== by 0x6882447: zink_resource_object_reference (zink_resource.h:198)
==767499== by 0x6882D33: zink_resource_destroy (zink_resource.c:123)
==767499== by 0x688AC97: pipe_resource_destroy (u_inlines.h:145)
==767499== by 0x688AD2E: pipe_resource_reference (u_inlines.h:162)
==767499== by 0x688BE1E: zink_destroy_surface (zink_surface.c:319)
==767499== by 0x688AE0A: zink_surface_reference (zink_surface.h:102)
==767499== by 0x688BE6D: zink_surface_destroy (zink_surface.c:328)
==767499== by 0x67F9CA2: pipe_surface_release (u_inlines.h:134)
==767499== by 0x67FB8AD: zink_context_destroy (zink_context.c:92)
==767499== by 0x5D47B65: st_destroy_context_priv (st_context.c:475)
==767499== by 0x5D49AF2: st_destroy_context (st_context.c:1193)
==767499== by 0x5D5C90F: st_context_destroy (st_manager.c:816)
==767499== by 0x5CC1FC9: dri_destroy_context (dri_context.c:248)
==767499== by 0x658DD63: driDestroyContext (dri_util.c:535)
==767499== by 0x5A30166: drisw_destroy_context (drisw_glx.c:417)
==767499== by 0x5A32484: glXDestroyContext (glxcmds.c:515)
==767499== by 0x5315AEB: glXDestroyContext (libglx.c:332)
==767499== by 0x4AA8E7D: glXDestroyContext (g_libglglxwrapper.c:384)
==767499== by 0x4D5A3F0: ??? (in /usr/lib64/libwaffle-1.so.0.6.1)
==767499== by 0x499DDD5: piglit_wfl_framework_teardown (piglit_wfl_framework.c:638)
==767499== by 0x499E4C5: piglit_winsys_framework_teardown (piglit_winsys_framework.c:238)
==767499== by 0x499F50C: destroy (piglit_x11_framework.c:212)
==767499== by 0x498C535: destroy (piglit-framework-gl.c:210)
==767499== by 0x4F48AF6: __run_exit_handlers (in /usr/lib64/libc-2.33.so)
==767499== by 0x4F48C9F: exit (in /usr/lib64/libc-2.33.so)
==767499== by 0x4AEFD71: piglit_report_result (piglit-util.c:245)
==767499== by 0x499F2CA: process_next_event (piglit_x11_framework.c:139)
==767499== by 0x499F365: enter_event_loop (piglit_x11_framework.c:153)
==767499== by 0x499DF88: run_test (piglit_winsys_framework.c:88)
==767499== by 0x498C5EF: piglit_gl_test_run (piglit-framework-gl.c:229)
==767499== by 0x4022B4: main (primitive-restart.c:45)
==767499== Uninitialised value was created by a heap allocation
==767499== at 0x484086F: malloc (vg_replace_malloc.c:380)
==767499== by 0xE964E85: vk_default_alloc (vk_alloc.c:26)
==767499== by 0xEB8B24B: vk_alloc (vk_alloc.h:43)
==767499== by 0xEB8EAF9: x11_surface_create_swapchain (wsi_common_x11.c:1723)
==767499== by 0xEB8A82A: wsi_common_create_swapchain (wsi_common.c:476)
==767499== by 0xDE30B47: anv_CreateSwapchainKHR (anv_wsi.c:225)
==767499== by 0xE96134F: vk_tramp_CreateSwapchainKHR (vk_dispatch_table.c:6592)
==767499== by 0xD7B88F0: ??? (in /usr/lib64/libvulkan.so.1.2.162)
==767499== by 0x6817796: copper_CreateSwapchain (zink_copper.c:123)
==767499== by 0x6817960: copper_displaytarget_create (zink_copper.c:170)
==767499== by 0x6884C65: resource_create (zink_resource.c:780)
==767499== by 0x6884EC5: zink_resource_create_drawable (zink_resource.c:829)
==767499== by 0x5CC0FE3: copper_allocate_textures (copper.c:199)
==767499== by 0x5CC28C2: dri_st_framebuffer_validate (dri_drawable.c:82)
==767499== by 0x5D5B69A: st_framebuffer_validate (st_manager.c:222)
==767499== by 0x5D5D32D: st_api_make_current (st_manager.c:1102)
==767499== by 0x5CC220B: dri_make_current (dri_context.c:306)
==767499== by 0x658DE23: driBindContext (dri_util.c:588)
==767499== by 0x5A3022A: drisw_bind_context (drisw_glx.c:435)
==767499== by 0x5A36CC2: MakeContextCurrent (glxcurrent.c:220)
==767499== by 0x5A36DF9: glXMakeCurrent (glxcurrent.c:253)
==767499== by 0x531849C: InternalMakeCurrentVendor (libglx.c:875)
==767499== by 0x53185C3: InternalMakeCurrentDispatch (libglx.c:930)
==767499== by 0x5318DE5: CommonMakeCurrent (libglx.c:1074)
==767499== by 0x5318ED5: glXMakeCurrent (libglx.c:1119)
==767499== by 0x4AA9CFA: glXMakeCurrent (g_libglglxwrapper.c:930)
==767499== by 0x4D5AA36: ??? (in /usr/lib64/libwaffle-1.so.0.6.1)
==767499== by 0x4D5E16E: waffle_make_current (in /usr/lib64/libwaffle-1.so.0.6.1)
==767499== by 0x499C8CD: wfl_checked_make_current (piglit-util-waffle.h:115)
==767499== by 0x499DA04: make_context_current_singlepass (piglit_wfl_framework.c:488)
==767499== by 0x499DC43: make_context_current (piglit_wfl_framework.c:565)
==767499== by 0x499DD88: piglit_wfl_framework_init (piglit_wfl_framework.c:628)
==767499== by 0x499E3FC: piglit_winsys_framework_init (piglit_winsys_framework.c:209)
==767499== by 0x499F581: piglit_x11_framework_create (piglit_x11_framework.c:229)
==767499== by 0x499E361: piglit_winsys_framework_factory (piglit_winsys_framework.c:175)
==767499== by 0x498CA60: piglit_gl_framework_factory (piglit_gl_framework.c:53)
==767499== by 0x498C587: piglit_gl_test_run (piglit-framework-gl.c:221)
==767499== by 0x4022B4: main (primitive-restart.c:45)
Fixes: b5c390c113 ("vulkan/wsi: add support for detecting mit-shm pixmaps.")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13124>
If we're going to have to bind them as separate planes with colorspace
conversion for sampling on the frontend, then we need to report that
they're only for external-image samplers, otherwise the lowering won't be
applied.
Fixes: 4e3a7dcf ("gallium: enable EGL_EXT_image_dma_buf_import_modifiers unconditionally")
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13038>
Latest VCN FW is capable of parsing the VP9 uncompressed header.
Removing the parsing from gallium.
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13106>
emit_slice_hashing_state gets called once per queue, meaning
device->slice_hash can get allocated multiple times. This can be
reproduced by setting the env-var ANV_QUEUE_OVERRIDE=gc=2.
Reworks:
* Only pack the struct once (s-b Lionel, Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13114>
This morning saw at least 3 pipelines fail from a530 not responding and
not coming back after we tried to power cycle, and airlied was having some
similar trouble yesterday. Until I can figure out what's going wrong
(python script failing to set up serial after retry? relays worn out?
Power supply failing?), just disable the boards.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13123>
Reworks:
* Set BLORP_BATCH_USE_COMPUTE in anv_blorp_batch_init (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Reworks:
* Let blorp_clear handle DEBUG_BLOCS
* Old subject was: "Use compute blorp for vkCmdFillBuffer with
INTEL_DEBUG=blocs"
* Old subject was: "anv/blorp: Support params.cs_prog_data being set"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Reworks:
* Use BLORP_BATCH_USE_COMPUTE flag rather than compute param to
blorp_copy (s-b Jason)
* Squash "intel/blorp: Set shader_pipeline for compute"
* Squash "intel/blorp: Add blorp_copy_supports_compute function"
* Squash "intel: Support compute for image/buffer copy if INTEL_DEBUG=blocs
is set"
* Squash "intel/blorp: Support compute for some blit operations"
* Use nir_image_store (s-b Jason)
* Use nir_push_if (s-b Jason)
* Require gfx12 for ccs in blorp_copy_supports_compute (s-b Jason)
* Add nir_pop_if (s-b Ken)
* Fix aux_usage check on gfx12 blorp_copy_supports_compute (s-b Ken)
* Use blorp_set_cs_dims (s-b Jason)
* Use dim=2d with array=true for nir_image_store (s-b Jason, Francisco)
* Restructure gen checks in blorp_copy_supports_compute (s-b Ken)
* Use nir_load_global_invocation_id (s-b Jason)
* Fix inefficient calculation of store_pos (s-b Jason)
* Use bounds_if being NULL/non-NULL for nir_pop_if (s-b Jason)
* discard => bounds (s-b Ken)
* Re-add ISL_AUX_USAGE_CCS_E in *_supports_compute (s-b Sagar)
* Skip duplicated in_bounds calculation (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Based the blorp_params, blorp_get_cs_local_y returns a recommended
local_y size for a compute shader.
blorp_set_cs_dims sets the compute program dims based on a given
local_y size.
Reworks:
* Add blorp_set_cs_dims (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Reworks:
* Don't pack params, just memcpy param struct (s-b Jason)
* Old subject: "intel/blorp: Emit compute program if
params.cs_prog_data is set"
* Various cleanups of push-const size/alignment (s-b Jason)
* Fix subslice count by moving to devinfo (s-b Ken)
* Simplify cw.InterfaceDescriptor code (s-b Ken)
* Drop some comments from i965 (s-b Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Render will use blorp_setup_binding_table and blorp_emit_btp, but
compute will only use blorp_setup_binding_table.
Rework:
* Use blorp_setup_binding_table, blorp_emit_btp (s-b Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
The compute path will use blorp_emit_sampler_state, whereas the render
path will use blorp_emit_sampler_state_ps.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
Make INTEL_DEBUG=blorp dump the blorp compute shaders instead using
the general INTEL_DEBUG=cs which is now reserved for actual compute
programs.
Ref: 05933fb0f7 ("intel/compiler: Use INTEL_DEBUG=blorp to dump blorp shaders")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
If the host supports explicit context initialization, try it.
If no capabilitiies associated with virgl are present, return
an error.
Reviewed-by: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Tested-by: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7712>
This change allows creating contexts of depending on set of
context parameters. The meaning of each of the parameters
is listed below:
1) VIRTGPU_CONTEXT_PARAM_CAPSET_ID
This determines the type of a context based on the capability set
ID. For example, the current capsets:
VIRTIO_GPU_CAPSET_VIRGL
VIRTIO_GPU_CAPSET_VIRGL2
define a Gallium, TGSI based "virgl" context. We only need 1 capset
ID per context type, though virgl has two due a bug that has since
been fixed.
The use case is the "gfxstream" rendering library and "venus"
renderer.
gfxstream doesn't do Gallium/TGSI translation and mostly relies on
auto-generated API streaming. Certain users prefer gfxstream over
virgl for GLES on GLES emulation. {gfxstream vk}/{venus} are also
required for Vulkan emulation.
The goal is for guest userspace to choose the optimal context type
depending on the situation/hardware.
2) VIRTGPU_CONTEXT_PARAM_NUM_RINGS
This tells the number of independent command rings that the context
will use. This value may be zero and is inferred to be zero if
VIRTGPU_CONTEXT_PARAM_NUM_RINGS is not passed in. This is backwards
compatibility for virgl, which has one big giant command ring for all
commands.
The maxiumum number of rings is 32. In practice, multi-queue or
multi-ring submission is used for powerful dGPUs and virtio-gpu
may not be the best option in that case (see PCI passthrough or
rendernode forwarding).
3) VIRTGPU_CONTEXT_PARAM_POLL_RING_IDX_MASK
This is a mask of ring indices for which the DRM fd is pollable.
For example, if VIRTGPU_CONTEXT_PARAM_NUM_RINGS is 2, then the mask
may be:
[ring idx] | [1 << ring_idx] | final mask
-------------------------------------------
0 1 1
1 2 3
The "Sommelier" guest Wayland proxy uses this to poll for events
from the host compositor.
Reviewed-by: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Tested-by: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7712>
The gallium driver makes use of blend shaders, but panvk takes a
slightly different approach. Vulkan drivers are passed the blend
operation at pipeline creation time, which means they know it when
compiling the fragment shader and can lower the blend operation
directly in the fragment shader itself. Doing that simplifies the
pipeline creation since we don't have to deal with blend shaders
anymore.
This might come at a cost for translation layers like Zink though,
since it requires re-compiling the fragment shader every time the
blend operation changes, which we do anyway, since we don't have
a pipeline cache yet. Let's keep things simple for now and revise
things if/when we end up having performance issues.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>