Behdad pointed out that fprintf() returns a value so that we could simply
use the comma operator to return the correct value instead of the
expression-block gcc-ism.
As the associated is now explicitly the font-face used to create the font
by the user, whereas what we require is the current implementation
(quartz) font.
If a subsequent PATH_OP is just a continuation of the previous line, i.e.
it has the same gradient, then just replace the end-point of the previous
line with the new point rather than adding a new operation. Surprisingly
this occurs in the wild, but the main motivation is a future optimisation
to reduce the number of intersections during stroke-to-path.
Use hypot() instead of open-coding sqrt(x*x + y*y). In theory, the
compiler could emit highly efficient code. In practice it's slower, but
more likely to be more accurate -- but the difference over a bare sqrt()
is likely not to be perceptible.
Bug 23067 -- using clear drawing operator crashes printing
http://bugs.freedesktop.org/show_bug.cgi?id=23067
Here we were hitting an assert within the paginated surface after creating
a zero sized fallback image, due to the paginated surface being created
with an x fallback resolution of 0 dpi (by
_gtk_printer_create_cairo_surface(), gtk/gtkprinter.c:924).
Avoid the bug by guarding against bad input to
cairo_surface_set_fallback_resolution() which also allows us to identity
the invalid caller.
A problem that does not present itself whilst using spans to intermediate
masks is that the tor-span-convertor did not emit the empty rows. When
compositing directly using spans with an unbounded operator this was
causing rendering artefacts, for overlapping-glyphs and the gl backend.
Prior to introduction of the buggy members to the surface, we obviously
cannot set them. However, the boilerplate code is meant to compile against
older revisions of the library so we need to check for the existence prior
to use.
We can offload creation of gradients to server that support RENDER 0.10
and later. This greatly reduces the amount of traffic we need to send over
our display connection as the gradient patterns are much smaller than the
full image. Even if the server fallbacks to using pixman, performance
should be improved by the reduced transport overhead. Furthermore this is a
requisite to enable hardware accelerated gradients with the xlib backend.
Running cairo-perf-trace on tiny, Celeron/i915:
before: firefox-20090601 211.585
after: firefox-20090601 270.939
and on tiger, CoreDuo/nvidia:
before: firefox-20090601 70.143
after: firefox-20090601 87.326
where linear gradients are used extensively throughout the GTK+ theme.
Not quite the result I was expecting!
In particular, looking at tiny:
xlib-rgba paint-with-alpha_linear-rgba_over-512 47.11 (47.16 0.05%) -> 123.42 (123.72 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgba_over-512 47.27 (47.32 0.04%) -> 123.78 (124.04 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear-rgb_over-512 47.19 (47.21 0.02%) -> 123.37 (123.70 0.13%): 2.61x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgb_over-512 47.30 (47.31 0.04%) -> 123.52 (123.62 0.09%): 2.61x slowdown
█▋
xlib-rgba paint_linear3-rgb_over-512 47.29 (47.32 0.05%) -> 118.95 (119.60 0.29%): 2.52x slowdown
█▌
xlib-rgba paint_linear-rgba_over-512 47.14 (47.17 0.06%) -> 116.76 (117.06 0.16%): 2.48x slowdown
█▌
xlib-rgba paint_linear3-rgba_over-512 47.32 (47.34 0.04%) -> 116.85 (116.98 0.05%): 2.47x slowdown
█▌
xlib-rgba paint_linear-rgb_over-512 47.15 (47.19 0.03%) -> 114.08 (114.55 0.20%): 2.42x slowdown
█▍
xlib-rgba paint-with-alpha_radial-rgb_over-512 117.25 (119.43 1.21%) -> 194.36 (194.73 0.09%): 1.66x slowdown
▋
xlib-rgba paint-with-alpha_radial-rgba_over-512 117.22 (117.26 0.02%) -> 193.81 (194.17 0.11%): 1.65x slowdown
▋
xlib-rgba paint_radial-rgba_over-512 117.23 (117.26 0.02%) -> 186.35 (186.41 0.03%): 1.59x slowdown
▋
xlib-rgba paint_radial-rgb_over-512 117.23 (117.27 0.02%) -> 184.14 (184.62 1.51%): 1.57x slowdown
▋
Before 1.10, we may choose to disable server-side gradients for the
current crop of Xorg servers, similar to the extended repeat modes.
[Updated by Chris Wilson. All bugs are his.]
The extended repeat modes were only introduced in RENDER 0.10, so disable
them if the server reports an earlier version. This is in addition to
disabling the repeat modes if we know (guess!) the server to have a buggy
implementation.
There are many common scenarios, mostly involving overlapping glyphs,
for which to guarantee correct rendering we have to composite the glyphs
via an explicit mask. That is instead of just blending the glyphs on to
the destination, we have to add the glyphs to a mask, and then composite
that mask+source with the destination.
By performing the check on whether the operator is supported prior to
acquiring the context, we do not need to handle the error part way
through the context setup. This makes the code much cleaner, and save
some work for the unsupported cases.
As pointed out by Andrea, and now tested by test/degenerate-curve-to, a
curve-to that begins and ends on the same point may extend further due to
its control points. It can not be simply replaced with a degenerate
line-to. In order to do so we will need more extensive degeneracy
checking, ala _cairo_spline_init().
This is a major performance improvement for GL even on non-TNL hardware
(i915), as we get to emit one giant DrawArrays and make GL drivers love us.
Now we're actually faster than not having the glyph cache.
Before glyph cache (HP 945GM):
[ 0] gl firefox-20090601 238.103 238.195 0.35% 5/6
After:
[ 0] gl firefox-20090601 68.181 76.735 5.46% 6/6
In a discussion on IRC, attention was drawn to a dubious comment in
_cairo_xlib_show_glyphs() - the precise details of which have passed
out of the collective memory.
In bed2701, I removed the explicit finish of the paginated's target
surface, since as a wrapper it did not explicitly own the surface and so
should not be calling finish(). However, we do need to propagate errors
from the backing surface, such as PDF, which will only be detected during
the implicit finish in the final destroy. So check to see it we hold the
last reference to the target (and so our destroy will trigger the implicit
finish) and call the finish explicitly and check the error status
afterwards.
If we know the CPU can read pointers atomically, then we can simply peek
into the cached_xrender_formats to see if we already have a match, before
taking the mutex. (Acquiring the mutex here is a minor nuisance that
appears on the callgrind profiles.)
In order to efficient store small images, we need to pack them into a
large texture. The rtree handles allocation of small rectangles out of a
much larger whole. As well as tracking free rectangles, it can also be
used to note which parts of the texture are 'pinned' -- that is have
operations currently pending and so can not be modified until that batch
of operations have been flushed. When the rtree is full, i.e. there is no
single free rectangle to accommodate the allocation request, it will
randomly evict an unpinned block large enough to fit the request. The
block may comprise just a single glyph, or a subtree of many glyphs. This
may not be the best strategy, but it is an effective start.