We can offload creation of gradients to server that support RENDER 0.10
and later. This greatly reduces the amount of traffic we need to send over
our display connection as the gradient patterns are much smaller than the
full image. Even if the server fallbacks to using pixman, performance
should be improved by the reduced transport overhead. Furthermore this is a
requisite to enable hardware accelerated gradients with the xlib backend.
Running cairo-perf-trace on tiny, Celeron/i915:
before: firefox-20090601 211.585
after: firefox-20090601 270.939
and on tiger, CoreDuo/nvidia:
before: firefox-20090601 70.143
after: firefox-20090601 87.326
where linear gradients are used extensively throughout the GTK+ theme.
Not quite the result I was expecting!
In particular, looking at tiny:
xlib-rgba paint-with-alpha_linear-rgba_over-512 47.11 (47.16 0.05%) -> 123.42 (123.72 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgba_over-512 47.27 (47.32 0.04%) -> 123.78 (124.04 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear-rgb_over-512 47.19 (47.21 0.02%) -> 123.37 (123.70 0.13%): 2.61x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgb_over-512 47.30 (47.31 0.04%) -> 123.52 (123.62 0.09%): 2.61x slowdown
█▋
xlib-rgba paint_linear3-rgb_over-512 47.29 (47.32 0.05%) -> 118.95 (119.60 0.29%): 2.52x slowdown
█▌
xlib-rgba paint_linear-rgba_over-512 47.14 (47.17 0.06%) -> 116.76 (117.06 0.16%): 2.48x slowdown
█▌
xlib-rgba paint_linear3-rgba_over-512 47.32 (47.34 0.04%) -> 116.85 (116.98 0.05%): 2.47x slowdown
█▌
xlib-rgba paint_linear-rgb_over-512 47.15 (47.19 0.03%) -> 114.08 (114.55 0.20%): 2.42x slowdown
█▍
xlib-rgba paint-with-alpha_radial-rgb_over-512 117.25 (119.43 1.21%) -> 194.36 (194.73 0.09%): 1.66x slowdown
▋
xlib-rgba paint-with-alpha_radial-rgba_over-512 117.22 (117.26 0.02%) -> 193.81 (194.17 0.11%): 1.65x slowdown
▋
xlib-rgba paint_radial-rgba_over-512 117.23 (117.26 0.02%) -> 186.35 (186.41 0.03%): 1.59x slowdown
▋
xlib-rgba paint_radial-rgb_over-512 117.23 (117.27 0.02%) -> 184.14 (184.62 1.51%): 1.57x slowdown
▋
Before 1.10, we may choose to disable server-side gradients for the
current crop of Xorg servers, similar to the extended repeat modes.
[Updated by Chris Wilson. All bugs are his.]
The extended repeat modes were only introduced in RENDER 0.10, so disable
them if the server reports an earlier version. This is in addition to
disabling the repeat modes if we know (guess!) the server to have a buggy
implementation.
As the change and ranking is based on the min_ticks, and as this can
sometimes deviate wildly from median_ticks, include min_ticks in the
output.
In particular it helps to explain cases like:
xlib-rgba rectangles_similar-rgba-mag_source-512 10.13 88.41% -> 5.77 0.19%: 1.50x slowdown
which becomes
xlib-rgba rectangles_similar-rgba-mag_source-512 3.83 (10.13 88.41%) -> 5.75 (5.77 0.19%): 1.50x slowdown
(Considering the poor standard deviation on the initial measurement, this
is more likely a sampling error than a true regression.)
More the large slowdowns to the end. This has two pleasing effects:
1. There is symmetry between large speedups at the top, and large
slowdowns at the bottom, with long bars -> short bars -> long bars.
2. After a cairo-perf-diff run the largest slowdowns are immediately
visible on the console. What better way to flag performance
regressions?
Add a CAIRO_PERF_OUTPUT environment variable to cause cairo-perf to first
generate an output image in order to manually check that the test is
functioning correctly. This needs to be automated, so that we have
absolute confidence that the performance tests are not broken - but isn't
that the role of the test suite? If we were ever to publish cairo-perf
results, I would want some means of verification that the test-suite had
first been passed.
The warning is repeated in the error message if we fail to find any
traces, and now that we search a path it is likely that some elements do
not exist. Thus we annoy the user with irrelevant, non-fatal warnings.
Still looking for suggestions for the most appropriate home for the system
wide cairo-traces dir...
The original stroke only contains a single subpath. Self-intersection
removal particularly affects strokes with multiple curved segments, so add
a path that encompasses both straight edges and rounded corners.
Andrea Canciani (ranma42) found another instance of my broken 'degenerate'
curve-to as line-to optimisation. All I can say is when I do something
wrong, at least I'm consistent!
This test case highlights the bug in the rel-curve-to path.
In order to catch infinite loops whilst replaying and converting vector
surfaces to images (via external renderers) we need to also install
alarms around the calls to finish() and get_image().
The stroker code is liable to wedge when passed
dash patterns which don't advance the dash offset
due to limited precision arithmetic. This test
attempts to hit all the places in the stroker where
that can happen.
Reported on the cairo mailing list by Hans Breuer:
http://lists.cairographics.org/archives/cairo/2009-June/017506.html
As pointed out by Andrea, and now tested by test/degenerate-curve-to, a
curve-to that begins and ends on the same point may extend further due to
its control points. It can not be simply replaced with a degenerate
line-to. In order to do so we will need more extensive degeneracy
checking, ala _cairo_spline_init().
Andrea Canciani (ranma42) pointed out a second bug in the curve-to as
line-to optimisation, that is a curve starting and finishing on the same
point is not necessarily degenerate. This test case exercises 5 different
curves that start and end on the same point.
Looks like ghostscript has a similar buggy optimization like we
just fixed in cairo. I'm just waiting on a new bugzilla account
from bugs.ghostscript.com after which I plan to report the bug
there.
This test anticipates a future optimization, (already pushed
upstream but not pulled yet), with a buggy implementation
of replacing curve_to with line_to.
In a discussion on IRC, attention was drawn to a dubious comment in
_cairo_xlib_show_glyphs() - the precise details of which have passed
out of the collective memory.
In bed2701, I removed the explicit finish of the paginated's target
surface, since as a wrapper it did not explicitly own the surface and so
should not be calling finish(). However, we do need to propagate errors
from the backing surface, such as PDF, which will only be detected during
the implicit finish in the final destroy. So check to see it we hold the
last reference to the target (and so our destroy will trigger the implicit
finish) and call the finish explicitly and check the error status
afterwards.
If we know the CPU can read pointers atomically, then we can simply peek
into the cached_xrender_formats to see if we already have a match, before
taking the mutex. (Acquiring the mutex here is a minor nuisance that
appears on the callgrind profiles.)
In order to efficient store small images, we need to pack them into a
large texture. The rtree handles allocation of small rectangles out of a
much larger whole. As well as tracking free rectangles, it can also be
used to note which parts of the texture are 'pinned' -- that is have
operations currently pending and so can not be modified until that batch
of operations have been flushed. When the rtree is full, i.e. there is no
single free rectangle to accommodate the allocation request, it will
randomly evict an unpinned block large enough to fit the request. The
block may comprise just a single glyph, or a subtree of many glyphs. This
may not be the best strategy, but it is an effective start.
The fallback for degenerate splines is to treat them as a line-to, so if
the spline is straight, we can just replace it with a simple line-to by
treating as degenerate.
After diverting the pointers to accommodate lazy decompressing of the
source, the bytecode pointer was left pointing to the original location
that had already been freed - thus passing an invalid block to FreeType
and unsurprisingly then, blowing up.
_cairo_pdf_surface_emit_to_unicode_stream() was reserving glyph 0 for
the .notdef glyph (as required by TrueType/CFF/Type1 fallback
fonts). However Type 3 fonts do not reserve glyph 0 for .notdef and
need glyph 0 to be included in the toUnicode stream.
http://lists.cairographics.org/archives/cairo/2009-July/017731.html
On slow machines the call to pixman_fill_sse2() on similar surfaces that
we know are already zeroed takes a significant amount of time [12.77% of
the profile for a firefox trace, cf to just 3% of the profile is spent
inside memset].
Rather than solve why the pixman_fill_sse2() is so slow, simply skip the
redundant clears.
Optimistically check to see if there is any outstanding work before
checking under the mutex. We don't care if we occasionally do not run the
queue this time due to contention, since we will most likely check again
very shortly or clean up with the display.
user-font-rescale copied unitialized values from the widths array into
the desired array. Although these corresponded to unused glyphs and so
were never used during the rendering, the values may have been illegal
causing FPE as they were copied.