Kerning is quite frequent, that is to apply a horizontal but no vertical
offset to a glyph. For instance by discarding the vertical coordinate
where it remains the same and only encoding the horizontal offset we
reduce the file size by ~12.5% when tracing poppler.
For the simple cases where the clip is an unaligned box (or boxes), apply
the clip directly to the geometry and avoid having to use an intermediate
clip-mask.
Add an interface to spans that accepts trapezoids. This allows backends
that have an efficient span-line interface but lack efficient handling
of boxes (partly due to the current poor compositor interface) to
redirect composite_trapezoids() to composite_polygon() and the
span-renderer.
First perform a simple geometric clip to catch the majority of cases where
an unaligned clip has been set outside the operation extents that can be
discarded without having to use an image surface.
This causes a dramatic increase of over 13x for the poppler-bug-12266
trace and little impact elsewhere for more sensible clippers.
We can ensure that we always produce a clip region when possible by using
the rectilinear tessellator to convert complex, device-aligned polygons to
regions. Prior to using the tessellator, we relied on pixman's region code
which could only handle a union of rectangles.
For the frequent cases where we know in advance that we are dealing with a
rectilinear path, but can not use the simple region code, implement a
variant of the Bentley-Ottmann tessellator. The advantages here are that
edge comparison is very simple (we only have vertical edges) and there are
no intersection, though possible overlaps. The idea is the same, maintain
a y-x sorted queue of start/stop events that demarcate traps and sweep
through the active edges at each event, looking for completed traps.
The motivation for this was noticing a performance regression in
box-fill-outline with the self-intersection work:
1.9.2 to HEAD^: 3.66x slowdown
HEAD^ to HEAD: 5.38x speedup
1.9.2 to HEAD: 1.57x speedup
The cause of which was choosing to use spans instead of the region handling
code, as the complex polygon was no longer being tessellated.
The active edge list is typically short, and the skiplist adds significant
overhead that far outweigh the benefit of the O(n lg n) sort. Instead we
track the position of the last insertion edge, knowing that the start
events are lexicographically sorted, and begin a linear search from there.
We refactor the surface fallbacks to convert full strokes and fills to the
intermediate polygon representation (as opposed to before where we
returned the trapezoidal representation). This allow greater flexibility
to choose how then to rasterize the polygon. Where possible we use the
local spans rasteriser for its increased performance, but still have the
option to use the tessellator instead (for example, with the current
Render protocol which does not yet have a polygon image).
In order to accommodate this, the spans interface is tweaked to accept
whole polygons instead of a path and the tessellator is tweaked for speed.
Performance Impact
==================
...
Still measuring, expecting some severe regressions.
...
In order to handle 'cairo-perf-trace benchmark', we need to perform the
can_run? test on the directory name as opposed to the individual trace
names. Make it so.
I'd disabled this to look at cairo-qt performance, then forgot about it.
Be clean, cleanup globals -- this should fix the huge performance loss
when running in series multiple backends that need separate font caches.
These traces run for much longer than the original synthetic benchmarks
and seek to replicate 'real-world' applications, so the warning that the
xserver and cairo-perf are not bound to any cpu is false.
cairo-perf-chart takes multiple runs (currently it is limited to
prefiltered data sets) and pretty-prints a chart showing performace
improvements/regressions (in either ASCII or HTML) along with a
cairo-perf-chart.png
Use cairo_stroke() to perform the equivalent of
spiral-rect-(pix|non)align-evenodd-fill. A useful comparison of stroking
versus filling, as we can assume the composition costs are similar.
Oops we were accumulating paths during each spiral iteration and so the
tests were getting slower and slower and slower...
[And fix a couple of other instances of path accumulation.]
A common mistake is to forget to pass the foreground mode to
cairo-test-suite when launching it under the debugger, resulting in the
debugger not attaching to the children and missing the error you were
trying to capture. Under linux, we can inspect the path to our parent's
executable and if that looks like gdb, we assume it is and disable forking
of traces.
Add a command line option to the test suite to cause it to exit after the
first failure. The purpose of this is for integration into 'git bisect run',
where the failing test is unknown and we are looking for any failure. For
example, for use in a regression script to find commits in the midst of as
series that need a refresh of a reference image (or fixing!).
Cairo should include the contents of subwindows when using a Window as a
source but will clip to subwindows when using a Window as a destination.
This can be set using the GC's subwindow_mode.
XCopyArea and XFillRectangle can however only use one GC for both source
and destination. Cairo's mode is set to (the default) ClipByChildren.
This means that copying from a Window is broken, so we only allow the
optimization when we know that the source is a Pixmap.
The performance impact of this change has not been tested. It should be
small, as the code will use XRender otherwise.
If it turns out to be a bigger impact, the optimizations could be
improved by doing a two-step copy process:
1) Copy to an intermediate Pixmap with IncludeInferiors
2) Copy to the destination with ClipByChildren
(potentially omitting one one of the steps if source or destination are
known to be Pixmaps).
references:
commit 0c5d28a4e5https://bugs.freedesktop.org/show_bug.cgi?id=12996