For the simple cases where the clip is an unaligned box (or boxes), apply
the clip directly to the geometry and avoid having to use an intermediate
clip-mask.
First perform a simple geometric clip to catch the majority of cases where
an unaligned clip has been set outside the operation extents that can be
discarded without having to use an image surface.
This causes a dramatic increase of over 13x for the poppler-bug-12266
trace and little impact elsewhere for more sensible clippers.
For the frequent cases where we know in advance that we are dealing with a
rectilinear path, but can not use the simple region code, implement a
variant of the Bentley-Ottmann tessellator. The advantages here are that
edge comparison is very simple (we only have vertical edges) and there are
no intersection, though possible overlaps. The idea is the same, maintain
a y-x sorted queue of start/stop events that demarcate traps and sweep
through the active edges at each event, looking for completed traps.
The motivation for this was noticing a performance regression in
box-fill-outline with the self-intersection work:
1.9.2 to HEAD^: 3.66x slowdown
HEAD^ to HEAD: 5.38x speedup
1.9.2 to HEAD: 1.57x speedup
The cause of which was choosing to use spans instead of the region handling
code, as the complex polygon was no longer being tessellated.
We refactor the surface fallbacks to convert full strokes and fills to the
intermediate polygon representation (as opposed to before where we
returned the trapezoidal representation). This allow greater flexibility
to choose how then to rasterize the polygon. Where possible we use the
local spans rasteriser for its increased performance, but still have the
option to use the tessellator instead (for example, with the current
Render protocol which does not yet have a polygon image).
In order to accommodate this, the spans interface is tweaked to accept
whole polygons instead of a path and the tessellator is tweaked for speed.
Performance Impact
==================
...
Still measuring, expecting some severe regressions.
...
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Provide a mechanism for backends to attach and remove snapshots. This can
be used by backends to provide a cache for _cairo_surface_clone_similar(),
or by the meta-surfaces to only emit a single pattern for each unique
snapshot.
In order to prevent stale data being returned upon a snapshot operation,
if the surface is modified (via the 5 high level operations, and on
notification of external modification) we break the association with any
current snapshot of the surface and thus preserve the current data for
their use.
Make the treatment of replacing the NULL source pattern with WHITE
consistent. As it is a solid pattern, we can skip _cairo_pattern_fini()
and so make the code more readable, and consistent along the error paths.
Damian Frank noted
[http://lists.cairographics.org/archives/cairo/2009-May/017095.html]
a performance problem with an older XServer with an
unaccelerated composite - similar problems will be seen with non-XRender
servers which will trigger extraneous fallbacks. The problem he found was
that painting an ARGB32 image onto an RGB24 destination window (using
SOURCE) was going via the RENDER protocol and not core. He was able to
demonstrate that this could be worked around by declaring the pixel data as
an RGB24 image. The issue is that the image is uploaded into a temporary
pixmap of matching depth (i.e. 32 bit for ARGB32 and 24 bit for RGB23
data), however the core protocol can only blit between Drawables of
matching depth - so without the work-around the Drawables are mismatched
and we either need to use RENDER or fallback.
This patch adds a content mask to _cairo_surface_clone_similar() to
provide the extra bit of information to the backends for when it is
possible for them to drop channels from the clone. This is used by the
xlib backend to only create a 24 bit source when blitting to a Window.
Currently the surface snapshotting attempts to clone the source using a
new surface of identical format. This will raise an error if the source is
an unusual xserver, for example one operating at 16bpp. The solution to
this is to create the surface using the content type rather than the
format (as elsewhere within the code base). However, we also wish to
preserve FORMAT_A1 (which is lost if we only choose the format based on
_cairo_format_from_content) as the various backends may be able to
trivially special case such bitmaps.
Specifically,
cairo_region_union_rect -> cairo_region_union_rectangle
cairo_region_create_rect -> cairo_region_create_rectangle
Also delete cairo_region_clear() which is not that useful.
Usually, rectangles are more useful than boxes, so regions should only
expose rectangles in their public API.
Specifically,
_cairo_region_num_boxes becomes _cairo_region_num_rectangles
_cairo_region_get_box becomes _cairo_region_get_rectangle
Remove the cairo_box_int_t type
Move the mime-data into its own array so that it cannot be confused with
user-data and we do not need to hard-code the copy list during
snapshotting. The copy-on-snapshotting code becomes far simpler and will
accommodate all future mime-types.
Keeping mime-data separate from user-data is important due to the
principle of least surprise - the API is different and so it would be
surprising if you queried for user-data and were returned an opaque
mime-data pointer, and vice versa. (Note this should have been prevented
by using interned strings, but conceptually it is cleaner to make the
separation.) Also it aides in trimming the user data arrays which are
linearly searched.
Based on the original patch by Adrian Johnson:
http://cgit.freedesktop.org/~ajohnson/cairo/commit/?h=metadata&id=37e607cc777523ad12a2d214708d79ecbca5b380
As a fun itch to scratch, I've been fixing incorrect uses of the
contraction "it's" in comments within the mozilla source tree (tracked
in https://bugzilla.mozilla.org/show_bug.cgi?id=458167 ), and I ran
across 6 instances of this typo in mozilla's snapshot of cairo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rename approximate_extents() to approximate_clip_extents() so that it is
consistent with the fill and stroke variants and clearer under what
circumstances you may wish to use it.
With Behdad's analytical analysis of the spline bbox, tolerance is now
redundant for the path extents and the approximate bounds, so remove it
from the functions parameters.
Based on feedback from Jeff Muizelaar, there is a case for a very quick
and dirty extents approximation based solely on the curve control points
(for example when computing the clip intersect rectangle of a path) and
by moving the stroke extension into a core function we can clean up the
interface for all users, and centralise the logic of approximating the
stroke extents.
When analysing the stroke extents, we need the original fixed-point
extents so that we do not incur an OBO when we round-to-integer a second
time. We also need a more accurate estimate than simply using the control
points of the curve, so pass in tolerance and decompose until someone
discovers a cheaper algorithm to determine the precise aligned bounding
box of a bezier curve.
When querying the intersection of a rectangle with the clip region, the
result only depends upon the region extents so we do not need to perform
an expensive region-region intersection computation.
This speeds up the mask generation step in cairo_fill() for the image
surface by up to 10x in especially favourable cases.
image-rgba twin-800 7757.80 0.20% -> 749.41 0.29%: 10.36x speedup
image-rgba spiral-diag-pixalign-nonzero-fill-512 15.16 0.44% -> 3.45 8.80%: 5.54x speedup
More typical simple non-rectilinear geometries are sped up by 30-50%.
This patch does not affect any stroking operations or any fill
operations of pixel aligned rectilinear geometries; those are still
rendered using trapezoids.
Instead of doing a full-copy of the mime data (which can be 10K-100K,
or even larger) just copy a reference to the original mime to the
snapshot surface (as suggested by Behdad).
Use the surface user-data array allow to store an arbitrary set of
alternate image representations keyed by an interned string (which
ensures that it has a unique key in the user-visible namespace).
Update the API to mirror that of cairo_surface_set_user_data() [i.e.
return a status indicator] and switch internal users of the mime-data to
the public functions.
Only copy the pattern if we need to modify it, e.g. preserve a copy in a
snapshot or a soft-mask, or to modify the matrix. Otherwise we can
continue to use the original pattern and mark it as const in order to
generate compiler warnings if we do attempt to write to it.
Adrian Johnson discovered cases where we mistakenly compared the result
of unsigned arithmetic where we need signed quantities. Look for similar
cases in the users of cairo_rectangle_int_t.
After discussing the scaled font locking with Behdad, it transpired that it
is not sufficient for a font to be locked for the lifetime of a scaled glyph,
but that the scaled font's glyph cache must be frozen for the glyph'
lifetime. If the cache is not frozen, then there is a possibility that the
glyph may be evicted before the reference goes out of scope i.e. the glyph
becomes invalid whilst we are trying to use it.
Since the freezing of the cache is the stronger barrier, we remove the
locking/unlocking of the mutex from the backends and instead move the
mutex acquisition into the freeze/thaw routines. Then update the rule on
acquiring glyphs to enforce that the cache is frozen and review the usage
of freeze/thaw by all the backends to ensure that the cache is frozen for
the lifetime of the glyph.
A little bit of sleep and reflection suggested that the use of
device_offset_[xy] was confusing and clone_offset_[xy] more consistent
with the function naming.
If the operator is unbounded, then its area of effect extends beyond
the definition of the mask by the trapezoids and so we must always perform
the image composition.
Fixes test/operator*.
Previously the rule for clone_similar() was that the returned surface
had exactly the same size as the original, but only the contents within
the region of interest needed to be copied. This caused failures for very
large images in the xlib-backend (see test/large-source).
The obvious solution to allow cloning only the region of interest seemed
to be to simply set the device offset on the cloned surface. However, this
fails as a) nothing respects the device offset on the surface at that
layer in the compositing stack and b) possibly returning references to the
original source surface provides further confusion by mixing in another
source of device offset.
The second method was to add extra out parameters so that the
device offset could be returned separately and, for example, mixed into
the pattern matrix. Not as elegant, a couple of extra warts to the
interface, but it works - one less XFAIL...
Use the utility functions _cairo_box_from_rectangle and
_cairo_box_round_to_rectangle() instead of open-coding. Simultaneously
tweak the whitespace so that all users of traps look similar.
Avoid tessellating the path if we know that the target extents is zero.
Besides the rare occurrence when everything is clipped out, a zero-sized
surface is often intended as a no-op surface for benchmarking.
Instead of allocating the union of all possible pattern types, just
allocate the specific pattern as used by the function in order to trim
the stack space consumption and flag potential misuse.