Use the DRM interface to h/w accelerate composition on image surfaces.
The purpose of the backend is simply to explore what such a hardware
interface might look like and what benefits we might expect. The
use case that might justify writing such custom backends are embedded
devices running a drm compositor like wayland - which would, for example,
allow one to write applications that seamlessly integrated accelerated,
dynamic, high quality 2D graphics using Cairo with advanced interaction
(e.g. smooth animations in the UI) driven by a clutter framework...
In this first step we introduce the fundamental wrapping of GEM for intel
and radeon chipsets, and, for comparison, gallium. No acceleration, all
we do is use buffer objects (that is use the kernel memory manager) to
allocate images and simply use the fallback mechanism. This provides a
suitable base to start writing chip specific drivers.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
The span renderer uses ARB_vertex_buffer_object which was included into
the core as part of OpenGL 1.5. We failed to check for the required version
during initialisation, and to my surprise the i915 can only support OpenGL
1.4 as it lacks ARB_occlusion_query. So just use the ARB extension instead
which is present on i915.
After a run, it can be useful to reprint the results, so add
cairo-perf-print to perform that task.
For the future, I'd like to move the performance suite over to the
git/perf style of single, multi-function binary.
The sequence of operations that I typically do are:
./cairo-perf-trace -r -v -i 6 > `git describe`.`hostname`.perf
./cairo-perf-diff-files REVA REVB
./cairo-perf-print REVA
./cairo-perf-compare-backends REVA
which misses the caching available with cairo-perf-diff. 'make html' is
almost what I want, but still too prescriptive. However, that does need to
be addressed for continuous performance monitoring.
Along the perf lines, those sequence of operations become:
./cairo-perf record -i 6
./cairo-perf report
./cairo-perf report REVA REVB
./cairo-perf report --backends="image,xlib,gl" REVA REVB
./cairo-perf report --html REVA REVB
Also we want to think about installing the cairo-perf binary. So we want
to differentiate when run inside a git checkout.
The loop between texture_setup() and clone_similar() should be
impossible, since every compositing backend should know how to clone an
image surface. cairo-gl is no longer an exception and so this code can
safely be removed.
When creating the trapezoid mask, avoid having to allocate a temporary
array to hold the converted pixman trapezoids by instead rasterizing each
trapezoid separately into the mask.
In order to make the initial context current we need a Drawable that
matches the context. In general, the RootWindow may not match the desired
context so we need to query the context and construct an appropriate
Drawable.
In view of sharing traces between multiple builder, add some system wide
directories to the search path. This should be refined to a single
canonical location before release.
After looking at backend specific images, check against the base image
reference. This is useful to fallback surfaces like xlib-fallback, which
should look closer to the image backend than the xlib backend.
Based on the work by Øyvind Kolås and Pierre Tardy -- many thanks to
Pierre for pushing this backend for inclusion as well as testing and
reviewing my initial patch. And many more thanks to pippin for writing the
backend in the first place!
Hacked and chopped by myself into a suitable basis for a backend. Quite a
few issues remain open, but would seem to be ready for testing on suitable
hardware.