Revamp clipping in preparation for the removal of the low-level interface
and promote backend to use the higher levels. The principle here is that
the higher level interface gives the backend more scope for choosing
better performing primitives.
By preallocating in our data segment a couple of solid patterns for the
stock colours, it becomes more convenient when using those in surface
operations, such as when clearing.
This is a more useful definition that is able to individually track the
rectangles that compose the composite operation. This will be used by
the specialist compositors as a means to perform the common extents
determination for an operation.
As a simple step to ensure that we do not inadvertently modify (or at least
generate compiler warns if we try) user data, mark the incoming style
and matrices as constant.
Previously target device offsets were applied in cairo-surface.c which
could cause bugs when paths were taken as fallbacks, as for example
pointed out by ade55037ff and quick-fixed
by 79190d8985. The quick-fix is now
unnecessary and was removed.
Gah, that was a horrible mistake. It was a flawed hack to create Pixmaps
of the correct depth when cloning patterns for blitting to the xlib
backend. However, it had the nasty side-effect of discarding alpha when
targeting Window surfaces. The correct solution is to simply correct the
Pixmap of the desired depth and render a matching pattern onto the
surface - i.e. a reversal the current acquire -> clone. See the
forthcoming revised xcb backend on how I should have done it originally.
The goal is to create a similar surface with an identical format to
maximise performance in the subsequent blit, e.g. the xlib backend could
make the similar surface with an identical depth and so use the core
protocol, or the image surface could indeed make an identical copy so
that pixman only has to do a fast memcpy. As there is no direct method
to specify such a clone, we ask the backend for a similar surface of
identical content, and trust that the semantics are clear enough for the
intent to obvious.
_cairo_surface_fallback_paint() attempts to avoid a clipped operation if
we can convert the paint into a fill of the clipmask. However by calling
_cairo_surface_fill() we incur a double application of device offset to
the source, triggering various failures.
Company spotted this and managed to extract an excellent minimal test
case, test/clip-device-offset. This commit fixes that failure.
When combining a clip-mask with a subsurface, as when used to combine with
the composite mask, we need to pass the destination surface offset to the
clip so that the paths can be corrected for the new surface.
Fixes test/implicit-close
By forgetting the implicit-close when checking for rectilinear paths, we
tried to feed the triangle (and other diagclose) into the specialised
rectilinear tesselators which completely mishandled that final edge.
Always use spans, even for unaligned boxes. In the future (given a new
interface) we may want to emit the common unaligned box code more
efficient than a per-scanline computation -- but for now simply avoid the
requirements to write a temporary CPU buffer.
Under simple, yet common, conditions using a bounded operator and painting
with a single complex clip we can reduce the strength of that operation to
a fill. In effect this removes the need for a temporary mask for some
backends (GL, drm, xlib).
Add an even simpler sweep-line tessellator for rectangular trapezoids (as
produced by the rectilinear stoker and box filler).
This is so simple it even outperforms pixman's region validation code for the
purposes of path-to-region conversion.
For the simple cases where the clip is an unaligned box (or boxes), apply
the clip directly to the geometry and avoid having to use an intermediate
clip-mask.
First perform a simple geometric clip to catch the majority of cases where
an unaligned clip has been set outside the operation extents that can be
discarded without having to use an image surface.
This causes a dramatic increase of over 13x for the poppler-bug-12266
trace and little impact elsewhere for more sensible clippers.
For the frequent cases where we know in advance that we are dealing with a
rectilinear path, but can not use the simple region code, implement a
variant of the Bentley-Ottmann tessellator. The advantages here are that
edge comparison is very simple (we only have vertical edges) and there are
no intersection, though possible overlaps. The idea is the same, maintain
a y-x sorted queue of start/stop events that demarcate traps and sweep
through the active edges at each event, looking for completed traps.
The motivation for this was noticing a performance regression in
box-fill-outline with the self-intersection work:
1.9.2 to HEAD^: 3.66x slowdown
HEAD^ to HEAD: 5.38x speedup
1.9.2 to HEAD: 1.57x speedup
The cause of which was choosing to use spans instead of the region handling
code, as the complex polygon was no longer being tessellated.
We refactor the surface fallbacks to convert full strokes and fills to the
intermediate polygon representation (as opposed to before where we
returned the trapezoidal representation). This allow greater flexibility
to choose how then to rasterize the polygon. Where possible we use the
local spans rasteriser for its increased performance, but still have the
option to use the tessellator instead (for example, with the current
Render protocol which does not yet have a polygon image).
In order to accommodate this, the spans interface is tweaked to accept
whole polygons instead of a path and the tessellator is tweaked for speed.
Performance Impact
==================
...
Still measuring, expecting some severe regressions.
...
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Provide a mechanism for backends to attach and remove snapshots. This can
be used by backends to provide a cache for _cairo_surface_clone_similar(),
or by the meta-surfaces to only emit a single pattern for each unique
snapshot.
In order to prevent stale data being returned upon a snapshot operation,
if the surface is modified (via the 5 high level operations, and on
notification of external modification) we break the association with any
current snapshot of the surface and thus preserve the current data for
their use.
Make the treatment of replacing the NULL source pattern with WHITE
consistent. As it is a solid pattern, we can skip _cairo_pattern_fini()
and so make the code more readable, and consistent along the error paths.
Damian Frank noted
[http://lists.cairographics.org/archives/cairo/2009-May/017095.html]
a performance problem with an older XServer with an
unaccelerated composite - similar problems will be seen with non-XRender
servers which will trigger extraneous fallbacks. The problem he found was
that painting an ARGB32 image onto an RGB24 destination window (using
SOURCE) was going via the RENDER protocol and not core. He was able to
demonstrate that this could be worked around by declaring the pixel data as
an RGB24 image. The issue is that the image is uploaded into a temporary
pixmap of matching depth (i.e. 32 bit for ARGB32 and 24 bit for RGB23
data), however the core protocol can only blit between Drawables of
matching depth - so without the work-around the Drawables are mismatched
and we either need to use RENDER or fallback.
This patch adds a content mask to _cairo_surface_clone_similar() to
provide the extra bit of information to the backends for when it is
possible for them to drop channels from the clone. This is used by the
xlib backend to only create a 24 bit source when blitting to a Window.
Currently the surface snapshotting attempts to clone the source using a
new surface of identical format. This will raise an error if the source is
an unusual xserver, for example one operating at 16bpp. The solution to
this is to create the surface using the content type rather than the
format (as elsewhere within the code base). However, we also wish to
preserve FORMAT_A1 (which is lost if we only choose the format based on
_cairo_format_from_content) as the various backends may be able to
trivially special case such bitmaps.
Specifically,
cairo_region_union_rect -> cairo_region_union_rectangle
cairo_region_create_rect -> cairo_region_create_rectangle
Also delete cairo_region_clear() which is not that useful.
Usually, rectangles are more useful than boxes, so regions should only
expose rectangles in their public API.
Specifically,
_cairo_region_num_boxes becomes _cairo_region_num_rectangles
_cairo_region_get_box becomes _cairo_region_get_rectangle
Remove the cairo_box_int_t type
Move the mime-data into its own array so that it cannot be confused with
user-data and we do not need to hard-code the copy list during
snapshotting. The copy-on-snapshotting code becomes far simpler and will
accommodate all future mime-types.
Keeping mime-data separate from user-data is important due to the
principle of least surprise - the API is different and so it would be
surprising if you queried for user-data and were returned an opaque
mime-data pointer, and vice versa. (Note this should have been prevented
by using interned strings, but conceptually it is cleaner to make the
separation.) Also it aides in trimming the user data arrays which are
linearly searched.
Based on the original patch by Adrian Johnson:
http://cgit.freedesktop.org/~ajohnson/cairo/commit/?h=metadata&id=37e607cc777523ad12a2d214708d79ecbca5b380
As a fun itch to scratch, I've been fixing incorrect uses of the
contraction "it's" in comments within the mozilla source tree (tracked
in https://bugzilla.mozilla.org/show_bug.cgi?id=458167 ), and I ran
across 6 instances of this typo in mozilla's snapshot of cairo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rename approximate_extents() to approximate_clip_extents() so that it is
consistent with the fill and stroke variants and clearer under what
circumstances you may wish to use it.
With Behdad's analytical analysis of the spline bbox, tolerance is now
redundant for the path extents and the approximate bounds, so remove it
from the functions parameters.
Based on feedback from Jeff Muizelaar, there is a case for a very quick
and dirty extents approximation based solely on the curve control points
(for example when computing the clip intersect rectangle of a path) and
by moving the stroke extension into a core function we can clean up the
interface for all users, and centralise the logic of approximating the
stroke extents.
When analysing the stroke extents, we need the original fixed-point
extents so that we do not incur an OBO when we round-to-integer a second
time. We also need a more accurate estimate than simply using the control
points of the curve, so pass in tolerance and decompose until someone
discovers a cheaper algorithm to determine the precise aligned bounding
box of a bezier curve.
When querying the intersection of a rectangle with the clip region, the
result only depends upon the region extents so we do not need to perform
an expensive region-region intersection computation.
This speeds up the mask generation step in cairo_fill() for the image
surface by up to 10x in especially favourable cases.
image-rgba twin-800 7757.80 0.20% -> 749.41 0.29%: 10.36x speedup
image-rgba spiral-diag-pixalign-nonzero-fill-512 15.16 0.44% -> 3.45 8.80%: 5.54x speedup
More typical simple non-rectilinear geometries are sped up by 30-50%.
This patch does not affect any stroking operations or any fill
operations of pixel aligned rectilinear geometries; those are still
rendered using trapezoids.