In order to catch infinite loops whilst replaying and converting vector
surfaces to images (via external renderers) we need to also install
alarms around the calls to finish() and get_image().
The stroker code is liable to wedge when passed
dash patterns which don't advance the dash offset
due to limited precision arithmetic. This test
attempts to hit all the places in the stroker where
that can happen.
Reported on the cairo mailing list by Hans Breuer:
http://lists.cairographics.org/archives/cairo/2009-June/017506.html
As pointed out by Andrea, and now tested by test/degenerate-curve-to, a
curve-to that begins and ends on the same point may extend further due to
its control points. It can not be simply replaced with a degenerate
line-to. In order to do so we will need more extensive degeneracy
checking, ala _cairo_spline_init().
Andrea Canciani (ranma42) pointed out a second bug in the curve-to as
line-to optimisation, that is a curve starting and finishing on the same
point is not necessarily degenerate. This test case exercises 5 different
curves that start and end on the same point.
Looks like ghostscript has a similar buggy optimization like we
just fixed in cairo. I'm just waiting on a new bugzilla account
from bugs.ghostscript.com after which I plan to report the bug
there.
This test anticipates a future optimization, (already pushed
upstream but not pulled yet), with a buggy implementation
of replacing curve_to with line_to.
In a discussion on IRC, attention was drawn to a dubious comment in
_cairo_xlib_show_glyphs() - the precise details of which have passed
out of the collective memory.
In bed2701, I removed the explicit finish of the paginated's target
surface, since as a wrapper it did not explicitly own the surface and so
should not be calling finish(). However, we do need to propagate errors
from the backing surface, such as PDF, which will only be detected during
the implicit finish in the final destroy. So check to see it we hold the
last reference to the target (and so our destroy will trigger the implicit
finish) and call the finish explicitly and check the error status
afterwards.
If we know the CPU can read pointers atomically, then we can simply peek
into the cached_xrender_formats to see if we already have a match, before
taking the mutex. (Acquiring the mutex here is a minor nuisance that
appears on the callgrind profiles.)
In order to efficient store small images, we need to pack them into a
large texture. The rtree handles allocation of small rectangles out of a
much larger whole. As well as tracking free rectangles, it can also be
used to note which parts of the texture are 'pinned' -- that is have
operations currently pending and so can not be modified until that batch
of operations have been flushed. When the rtree is full, i.e. there is no
single free rectangle to accommodate the allocation request, it will
randomly evict an unpinned block large enough to fit the request. The
block may comprise just a single glyph, or a subtree of many glyphs. This
may not be the best strategy, but it is an effective start.
The fallback for degenerate splines is to treat them as a line-to, so if
the spline is straight, we can just replace it with a simple line-to by
treating as degenerate.
After diverting the pointers to accommodate lazy decompressing of the
source, the bytecode pointer was left pointing to the original location
that had already been freed - thus passing an invalid block to FreeType
and unsurprisingly then, blowing up.
_cairo_pdf_surface_emit_to_unicode_stream() was reserving glyph 0 for
the .notdef glyph (as required by TrueType/CFF/Type1 fallback
fonts). However Type 3 fonts do not reserve glyph 0 for .notdef and
need glyph 0 to be included in the toUnicode stream.
http://lists.cairographics.org/archives/cairo/2009-July/017731.html
On slow machines the call to pixman_fill_sse2() on similar surfaces that
we know are already zeroed takes a significant amount of time [12.77% of
the profile for a firefox trace, cf to just 3% of the profile is spent
inside memset].
Rather than solve why the pixman_fill_sse2() is so slow, simply skip the
redundant clears.
Optimistically check to see if there is any outstanding work before
checking under the mutex. We don't care if we occasionally do not run the
queue this time due to contention, since we will most likely check again
very shortly or clean up with the display.
user-font-rescale copied unitialized values from the widths array into
the desired array. Although these corresponded to unused glyphs and so
were never used during the rendering, the values may have been illegal
causing FPE as they were copied.
Use the DRM interface to h/w accelerate composition on image surfaces.
The purpose of the backend is simply to explore what such a hardware
interface might look like and what benefits we might expect. The
use case that might justify writing such custom backends are embedded
devices running a drm compositor like wayland - which would, for example,
allow one to write applications that seamlessly integrated accelerated,
dynamic, high quality 2D graphics using Cairo with advanced interaction
(e.g. smooth animations in the UI) driven by a clutter framework...
In this first step we introduce the fundamental wrapping of GEM for intel
and radeon chipsets, and, for comparison, gallium. No acceleration, all
we do is use buffer objects (that is use the kernel memory manager) to
allocate images and simply use the fallback mechanism. This provides a
suitable base to start writing chip specific drivers.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
The span renderer uses ARB_vertex_buffer_object which was included into
the core as part of OpenGL 1.5. We failed to check for the required version
during initialisation, and to my surprise the i915 can only support OpenGL
1.4 as it lacks ARB_occlusion_query. So just use the ARB extension instead
which is present on i915.
After a run, it can be useful to reprint the results, so add
cairo-perf-print to perform that task.
For the future, I'd like to move the performance suite over to the
git/perf style of single, multi-function binary.
The sequence of operations that I typically do are:
./cairo-perf-trace -r -v -i 6 > `git describe`.`hostname`.perf
./cairo-perf-diff-files REVA REVB
./cairo-perf-print REVA
./cairo-perf-compare-backends REVA
which misses the caching available with cairo-perf-diff. 'make html' is
almost what I want, but still too prescriptive. However, that does need to
be addressed for continuous performance monitoring.
Along the perf lines, those sequence of operations become:
./cairo-perf record -i 6
./cairo-perf report
./cairo-perf report REVA REVB
./cairo-perf report --backends="image,xlib,gl" REVA REVB
./cairo-perf report --html REVA REVB
Also we want to think about installing the cairo-perf binary. So we want
to differentiate when run inside a git checkout.