Use the DRM interface to h/w accelerate composition on image surfaces.
The purpose of the backend is simply to explore what such a hardware
interface might look like and what benefits we might expect. The
use case that might justify writing such custom backends are embedded
devices running a drm compositor like wayland - which would, for example,
allow one to write applications that seamlessly integrated accelerated,
dynamic, high quality 2D graphics using Cairo with advanced interaction
(e.g. smooth animations in the UI) driven by a clutter framework...
In this first step we introduce the fundamental wrapping of GEM for intel
and radeon chipsets, and, for comparison, gallium. No acceleration, all
we do is use buffer objects (that is use the kernel memory manager) to
allocate images and simply use the fallback mechanism. This provides a
suitable base to start writing chip specific drivers.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
After a run, it can be useful to reprint the results, so add
cairo-perf-print to perform that task.
For the future, I'd like to move the performance suite over to the
git/perf style of single, multi-function binary.
The sequence of operations that I typically do are:
./cairo-perf-trace -r -v -i 6 > `git describe`.`hostname`.perf
./cairo-perf-diff-files REVA REVB
./cairo-perf-print REVA
./cairo-perf-compare-backends REVA
which misses the caching available with cairo-perf-diff. 'make html' is
almost what I want, but still too prescriptive. However, that does need to
be addressed for continuous performance monitoring.
Along the perf lines, those sequence of operations become:
./cairo-perf record -i 6
./cairo-perf report
./cairo-perf report REVA REVB
./cairo-perf report --backends="image,xlib,gl" REVA REVB
./cairo-perf report --html REVA REVB
Also we want to think about installing the cairo-perf binary. So we want
to differentiate when run inside a git checkout.
In view of sharing traces between multiple builder, add some system wide
directories to the search path. This should be refined to a single
canonical location before release.
The meta-surface is a vital tool to record a trace of drawing commands
in-memory. As such it is used throughout cairo.
The value of such a surface is immediately obvious and should be
applicable for many applications. The first such case is by
cairo-test-trace which wants to record the entire graph of drawing commands
that affect a surface in the event of a failure.
I have an idea to categorise traces within their own subdirectories and so
for convenience added path walking to cairo-perf-trace. Principally this
should allow for forests of symlinks of all types.
The build system has a singular failure whereby if a backend disappears
between on compile and the next, automake will fail to reconstruct the
Makefiles - resulting in a broken build. Attempt to fix this by removing
the build dir and recloning, which should work for any corrupt caches but
obviously will fail again at a true build failure.
As cairo-perf-diff will execute the current cairo-perf against historical
revisions, any introduced api must be protect in order to compile on old
versions.
Written by Vladimir Vukicevic to enable integration with Qt embedded
devices, this backend allows cairo code to target QPainter, and use
it as a source for other cairo backends.
This imports the sources from mozilla-central:
http://mxr.mozilla.org/mozilla-central/find?text=&kind=text&string=cairo-qpainter
renames them from cairo-qpainter to cairo-qt, and integrates the patch
by Oleg Romashin:
https://bugs.freedesktop.org/attachment.cgi?id=18953
And then attempts to restore 'make check' to full functionality.
However:
- C++ does not play well with the PLT symbol hiding, and leaks into the
global namespace. 'make check' fails at check-plt.sh
- Qt embeds a GUI into QApplication which it requires to construct any
QPainter drawable, i.e. used by the boilerplate to create a cairo-qt
surface, and this leaks fonts (cairo-ft-fonts no less) causing assertion
failures that all cairo objects are accounted for upon destruction.
[Updated by Chris Wilson]
Acked-by: Jeff Muizelaar <jeff@infidigm.net>
Acked-by: Carl Worth <cworth@cworth.org>
Using a null surface is a convenient method to measure the overhead of the
performance testing framework, so export it although as a test-surface so
that it will only be available in development builds and not pollute
distributed libraries.
It seems adding the explicit dependencies to encourage it to rebuild
components from other parts of the source tree removed the automagic
dependency of libcairoperf.la. So add it to the list. Maybe this is not
the correct solution, but it works again for now.
Gah, I presumed that the ':' separated options that required arguments
from stand-alone options. I was wrong. The ':' indicates that the
preceding option takes an argument. So add it back to -i.
Read names of traces to exclude from a file specified using -x on the
commandline, i.e.
$ ./cairo-perf-trace -x cairo-traces/tiny.exclude
This is a convenient method for me to exclude certain traces for
particular machines. For example tiny cannot run
firefox-36-20090609.trace as that has a greater working set than the
available RAM on tiny.
Promote the information on how to use cairo-perf-trace and include it
immediately after the details on cairo-perf. This should make it much
clearer on how to replay the traces, and the difference between the two
benchmarks.
Rather than complicating cairo-perf to extend it to perform both micro-
and macro-benchmarks, simply run the two binaries in succession during
make perf.
For bonus points, consider whether we should hook cairo-perf-trace into
cairo-perf-diff.
When using fonts circular references are established between the holdover
font caches and the interpreter which need manual intervention via
cairo_script_interpreter_finish() to break.
Waiting for a long running benchmark can be very annoying, especially if
you just want a rough-and-ready result. So hook into SIGINT and stop the
current benchmark (after the end of the iteration) on the first ^C. A
second ^C within the same iteration will kill the program as before.
To save typing when creating macro-benchmarks --profile disables
mark-dirty and caller-info and compresses the trace using LZMA. Not for
computers short on memory!
Use 'cairo-perf -v -r' to have both the summary output along with the raw
values. This gives a progress report whilst benchmarking, very reassuring
with long running tests.
There are synchronisation issues with similar surfaces (as only the
original target surface is synced) which interferes with making
performance comparisons. (There still maybe some value should you be aware
of the limitations...)