With EGL_EXT_image_dma_buf_import, user can import dma_buf
with offset.
This is also used by AOSP GLConsumer::updateTexImage
with HAL_PIXEL_FORMAT_YV12 buffer which store YUV planes in
the same buffer with offset. Render sample from it using
GL_OES_EGL_image_external. This should fix some video
display problem when using MediaCodec soft decoding which
generates HAL_PIXEL_FORMAT_YV12 buffer and render it on
screen.
Test program:
https://github.com/yuq/gfx/tree/master/yuv2rgb/dma-buf
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4362>
Fragment shader can write depth and stencil if we set necessary flags
in RSW. In addition to that we need to use special format for Z24S8.
Original format is apparently Z24X8 since we can't sample stencil in GLES2.
This new format also seems to use several components for storing depth
since we saw r != g != b when sampling with this format.
[vasily: - initialize clear->depth to 0xffffff if we reload depth, just
like blob does. Reloading doesn't work otherwise
- use single bitmap for reload type]
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4197>
Generating PLB PP stream is expensive. PLB PP stream content depends on
damage, and if damage consists of several rects it's impossible to come
up with a simple key.
Simplify damage to a single bounding box so we have a simple key
and cache PLB PP stream. Cache size is limited to 0.1% of system RAM and
once limit is reached least recently used entries are dropped.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3834>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3834>
This information is preserved across draws and needed
when task submission.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
lima_submit is created by lima_submit_get() in draw/clear functions
and freed after submit to kernel.
There is a hash map to find the same lima_submit for color/depth
buffer combination if user switch framebuffer w/o flush the command
then switch back again.
v2:
rename lima_flush_submit to lima_flush_submit_accessing_bo.
v3:
delay flush access submit to lima_update_submit_wb when really know
if this submit will write to the target buffer.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
There's a lot going on here (it's a ton of commits squashed together
since otherwise this would be impossible to review...)
1. We have a fast path for linear->tiled for whole (aligned) tiles, but we
have to use a slow path for unaligned accesses. We can get a pretty
major win for partial updates by using this slow path simply on the
borders of the update region, and then hit the fast path for the
tile-aligned interior. This does require some shuffling.
2. Mark the LUTs constant, which allows the compiler to inline them,
which pairs well with loop unrolling (eliminating the memory accesses
and just becoming some immediates.. which are not as immediate on
aarch64 as I'd like..)
3. Add fast path for bpp1/2/8/16. These use the same algorithm and we
have native types for them, so may as well get the fast path.
4. Drop generic path for bpp != 1/2/8/16, since these formats are
generally awful and there's no way to tile them efficienctly and
honestly there's not a good reason too either. Lima doesn't support any
of these formats; Panfrost can make the opinionated choice to make them
linear.
5. Specialize the unaligned routines. They don't have to be fully
generic, they just can't assume alignment. So now they should be nearly
as fast as the aligned versions (which get some extra tricks to be even
faster but the difference might be neglible on some workloads).
6. Specialize also for the size of the tile, to allow 4x4 tiling as well
as 16x16 tiling. This allows compressed textures to be efficiently tiled
with the same routines (so we add support for tiling ASTC/ETC textures
while we're at it)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
There's an implicit dependence on Gallium here that will add more
complexity than needed when testing/optimizing out of driver as well as
potentially Vulkanizing. We don't need a full pipe_box, just the x/y/w/h
properties directly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Add debug flag to disable tiling. Note that it prevents lima from creating
tiled buffers, but it's still able to import them if modifier is specified
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Use linear layout for shared buffers if modifier is not specified
and use linear layout when importing buffers with invalid modifier.
Fixes: 01a451b04d ("lima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Assume that resource is tiled if we get DRM_FORMAT_MOD_INVALID
in resource_from_handle() and we don't have RO.
Fixes: 8c12f4e5f2 ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Now that we have tiled format modifier merged into linux we can enable tiling.
That should improve overall performance and also workaround broken mipmapping
for linear textures since now we prefer tiled textures.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Allocating BOs is expensive, so we should avoid doing that by caching
freed BOs.
BO cache is modelled after one in v3d driver and works as follows:
- in lima_bo_create() check if we have matching BO in cache and return
it if there's one, allocate new BO otherwise.
- in lima_bo_unreference() (renamed from lima_bo_free()): put BO in
cache instead of freeing it and remove all stale BOs from cache
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Some time weston set full damage region. It is
more effient to use the cached pp stream instead
of dynamically create one.
Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
This extension set a damage region for each
buffer swap which can be used to reduce buffer
reload cost by only feed damage region's tile
buffer address for PP.
Reviewed-and-Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow both drivers to share this code. Both drivers
build-tested with meson. Android build not tested.
v2: Change naming from tiling->shared, in case Lima and Panfrost can
share more in the future. Fix Android build system.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-and-tested-by: Qiang Yu <yuq825@gmail.com>
Buffer needs to be reloaded every time unless explicit clear() was
called.
Fixes rendering issues with wayland compositors.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Hardware supports writing back Z/S buffers and sampling from them,
so add support for that.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
As we have already prepared for using util_blitter, use it to implement
lima_blit.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>