Use "mild outliers" method to remove exceptional speed-ups and slow-downs
from the graph, so that the majority of information is not lost by the
scaling. Add the timing labels to the bars so that the true factor is
always presented.
Originally written by Vladimir Vukicevic to investigate using Skia for
Mozilla, it provides a nice integration with a rather interesting code
base. By hooking Skia underneath Cairo it allows us to directly compare
code paths... which is interesting.
[updated by Chris Wilson]
In order to handle 'cairo-perf-trace benchmark', we need to perform the
can_run? test on the directory name as opposed to the individual trace
names. Make it so.
I'd disabled this to look at cairo-qt performance, then forgot about it.
Be clean, cleanup globals -- this should fix the huge performance loss
when running in series multiple backends that need separate font caches.
These traces run for much longer than the original synthetic benchmarks
and seek to replicate 'real-world' applications, so the warning that the
xserver and cairo-perf are not bound to any cpu is false.
cairo-perf-chart takes multiple runs (currently it is limited to
prefiltered data sets) and pretty-prints a chart showing performace
improvements/regressions (in either ASCII or HTML) along with a
cairo-perf-chart.png
Use cairo_stroke() to perform the equivalent of
spiral-rect-(pix|non)align-evenodd-fill. A useful comparison of stroking
versus filling, as we can assume the composition costs are similar.
Oops we were accumulating paths during each spiral iteration and so the
tests were getting slower and slower and slower...
[And fix a couple of other instances of path accumulation.]
Enabling 'FAST CLIP' appears to trigger an infinite loop so disable.
Enabling 'FAST FILL' has limited effect on performance, so disable whilst
the basic QT surface is improved.
By ensuring that tests take longer than a couple of seconds we eliminate
systematic errors in our measurements. However, we also effectively
eliminate the synchronisation overhead. To compensate, we attempt to
estimate the overhead by reporting the difference between a single
instance and the minimum averaged instance.
As the change and ranking is based on the min_ticks, and as this can
sometimes deviate wildly from median_ticks, include min_ticks in the
output.
In particular it helps to explain cases like:
xlib-rgba rectangles_similar-rgba-mag_source-512 10.13 88.41% -> 5.77 0.19%: 1.50x slowdown
which becomes
xlib-rgba rectangles_similar-rgba-mag_source-512 3.83 (10.13 88.41%) -> 5.75 (5.77 0.19%): 1.50x slowdown
(Considering the poor standard deviation on the initial measurement, this
is more likely a sampling error than a true regression.)
More the large slowdowns to the end. This has two pleasing effects:
1. There is symmetry between large speedups at the top, and large
slowdowns at the bottom, with long bars -> short bars -> long bars.
2. After a cairo-perf-diff run the largest slowdowns are immediately
visible on the console. What better way to flag performance
regressions?
Add a CAIRO_PERF_OUTPUT environment variable to cause cairo-perf to first
generate an output image in order to manually check that the test is
functioning correctly. This needs to be automated, so that we have
absolute confidence that the performance tests are not broken - but isn't
that the role of the test suite? If we were ever to publish cairo-perf
results, I would want some means of verification that the test-suite had
first been passed.
The warning is repeated in the error message if we fail to find any
traces, and now that we search a path it is likely that some elements do
not exist. Thus we annoy the user with irrelevant, non-fatal warnings.
Still looking for suggestions for the most appropriate home for the system
wide cairo-traces dir...
The original stroke only contains a single subpath. Self-intersection
removal particularly affects strokes with multiple curved segments, so add
a path that encompasses both straight edges and rounded corners.
Use the DRM interface to h/w accelerate composition on image surfaces.
The purpose of the backend is simply to explore what such a hardware
interface might look like and what benefits we might expect. The
use case that might justify writing such custom backends are embedded
devices running a drm compositor like wayland - which would, for example,
allow one to write applications that seamlessly integrated accelerated,
dynamic, high quality 2D graphics using Cairo with advanced interaction
(e.g. smooth animations in the UI) driven by a clutter framework...
In this first step we introduce the fundamental wrapping of GEM for intel
and radeon chipsets, and, for comparison, gallium. No acceleration, all
we do is use buffer objects (that is use the kernel memory manager) to
allocate images and simply use the fallback mechanism. This provides a
suitable base to start writing chip specific drivers.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
After a run, it can be useful to reprint the results, so add
cairo-perf-print to perform that task.
For the future, I'd like to move the performance suite over to the
git/perf style of single, multi-function binary.
The sequence of operations that I typically do are:
./cairo-perf-trace -r -v -i 6 > `git describe`.`hostname`.perf
./cairo-perf-diff-files REVA REVB
./cairo-perf-print REVA
./cairo-perf-compare-backends REVA
which misses the caching available with cairo-perf-diff. 'make html' is
almost what I want, but still too prescriptive. However, that does need to
be addressed for continuous performance monitoring.
Along the perf lines, those sequence of operations become:
./cairo-perf record -i 6
./cairo-perf report
./cairo-perf report REVA REVB
./cairo-perf report --backends="image,xlib,gl" REVA REVB
./cairo-perf report --html REVA REVB
Also we want to think about installing the cairo-perf binary. So we want
to differentiate when run inside a git checkout.
In view of sharing traces between multiple builder, add some system wide
directories to the search path. This should be refined to a single
canonical location before release.
The meta-surface is a vital tool to record a trace of drawing commands
in-memory. As such it is used throughout cairo.
The value of such a surface is immediately obvious and should be
applicable for many applications. The first such case is by
cairo-test-trace which wants to record the entire graph of drawing commands
that affect a surface in the event of a failure.
I have an idea to categorise traces within their own subdirectories and so
for convenience added path walking to cairo-perf-trace. Principally this
should allow for forests of symlinks of all types.
The build system has a singular failure whereby if a backend disappears
between on compile and the next, automake will fail to reconstruct the
Makefiles - resulting in a broken build. Attempt to fix this by removing
the build dir and recloning, which should work for any corrupt caches but
obviously will fail again at a true build failure.
As cairo-perf-diff will execute the current cairo-perf against historical
revisions, any introduced api must be protect in order to compile on old
versions.
Written by Vladimir Vukicevic to enable integration with Qt embedded
devices, this backend allows cairo code to target QPainter, and use
it as a source for other cairo backends.
This imports the sources from mozilla-central:
http://mxr.mozilla.org/mozilla-central/find?text=&kind=text&string=cairo-qpainter
renames them from cairo-qpainter to cairo-qt, and integrates the patch
by Oleg Romashin:
https://bugs.freedesktop.org/attachment.cgi?id=18953
And then attempts to restore 'make check' to full functionality.
However:
- C++ does not play well with the PLT symbol hiding, and leaks into the
global namespace. 'make check' fails at check-plt.sh
- Qt embeds a GUI into QApplication which it requires to construct any
QPainter drawable, i.e. used by the boilerplate to create a cairo-qt
surface, and this leaks fonts (cairo-ft-fonts no less) causing assertion
failures that all cairo objects are accounted for upon destruction.
[Updated by Chris Wilson]
Acked-by: Jeff Muizelaar <jeff@infidigm.net>
Acked-by: Carl Worth <cworth@cworth.org>