Rename approximate_extents() to approximate_clip_extents() so that it is
consistent with the fill and stroke variants and clearer under what
circumstances you may wish to use it.
With Behdad's analytical analysis of the spline bbox, tolerance is now
redundant for the path extents and the approximate bounds, so remove it
from the functions parameters.
Based on feedback from Jeff Muizelaar, there is a case for a very quick
and dirty extents approximation based solely on the curve control points
(for example when computing the clip intersect rectangle of a path) and
by moving the stroke extension into a core function we can clean up the
interface for all users, and centralise the logic of approximating the
stroke extents.
When analysing the stroke extents, we need the original fixed-point
extents so that we do not incur an OBO when we round-to-integer a second
time. We also need a more accurate estimate than simply using the control
points of the curve, so pass in tolerance and decompose until someone
discovers a cheaper algorithm to determine the precise aligned bounding
box of a bezier curve.
When querying the intersection of a rectangle with the clip region, the
result only depends upon the region extents so we do not need to perform
an expensive region-region intersection computation.
This speeds up the mask generation step in cairo_fill() for the image
surface by up to 10x in especially favourable cases.
image-rgba twin-800 7757.80 0.20% -> 749.41 0.29%: 10.36x speedup
image-rgba spiral-diag-pixalign-nonzero-fill-512 15.16 0.44% -> 3.45 8.80%: 5.54x speedup
More typical simple non-rectilinear geometries are sped up by 30-50%.
This patch does not affect any stroking operations or any fill
operations of pixel aligned rectilinear geometries; those are still
rendered using trapezoids.
Instead of doing a full-copy of the mime data (which can be 10K-100K,
or even larger) just copy a reference to the original mime to the
snapshot surface (as suggested by Behdad).
Use the surface user-data array allow to store an arbitrary set of
alternate image representations keyed by an interned string (which
ensures that it has a unique key in the user-visible namespace).
Update the API to mirror that of cairo_surface_set_user_data() [i.e.
return a status indicator] and switch internal users of the mime-data to
the public functions.
Only copy the pattern if we need to modify it, e.g. preserve a copy in a
snapshot or a soft-mask, or to modify the matrix. Otherwise we can
continue to use the original pattern and mark it as const in order to
generate compiler warnings if we do attempt to write to it.
Adrian Johnson discovered cases where we mistakenly compared the result
of unsigned arithmetic where we need signed quantities. Look for similar
cases in the users of cairo_rectangle_int_t.
After discussing the scaled font locking with Behdad, it transpired that it
is not sufficient for a font to be locked for the lifetime of a scaled glyph,
but that the scaled font's glyph cache must be frozen for the glyph'
lifetime. If the cache is not frozen, then there is a possibility that the
glyph may be evicted before the reference goes out of scope i.e. the glyph
becomes invalid whilst we are trying to use it.
Since the freezing of the cache is the stronger barrier, we remove the
locking/unlocking of the mutex from the backends and instead move the
mutex acquisition into the freeze/thaw routines. Then update the rule on
acquiring glyphs to enforce that the cache is frozen and review the usage
of freeze/thaw by all the backends to ensure that the cache is frozen for
the lifetime of the glyph.
A little bit of sleep and reflection suggested that the use of
device_offset_[xy] was confusing and clone_offset_[xy] more consistent
with the function naming.
If the operator is unbounded, then its area of effect extends beyond
the definition of the mask by the trapezoids and so we must always perform
the image composition.
Fixes test/operator*.
Previously the rule for clone_similar() was that the returned surface
had exactly the same size as the original, but only the contents within
the region of interest needed to be copied. This caused failures for very
large images in the xlib-backend (see test/large-source).
The obvious solution to allow cloning only the region of interest seemed
to be to simply set the device offset on the cloned surface. However, this
fails as a) nothing respects the device offset on the surface at that
layer in the compositing stack and b) possibly returning references to the
original source surface provides further confusion by mixing in another
source of device offset.
The second method was to add extra out parameters so that the
device offset could be returned separately and, for example, mixed into
the pattern matrix. Not as elegant, a couple of extra warts to the
interface, but it works - one less XFAIL...
Use the utility functions _cairo_box_from_rectangle and
_cairo_box_round_to_rectangle() instead of open-coding. Simultaneously
tweak the whitespace so that all users of traps look similar.
Avoid tessellating the path if we know that the target extents is zero.
Besides the rare occurrence when everything is clipped out, a zero-sized
surface is often intended as a no-op surface for benchmarking.
Instead of allocating the union of all possible pattern types, just
allocate the specific pattern as used by the function in order to trim
the stack space consumption and flag potential misuse.
cairo_rectangle_int16_t was being used in a number of places instead
of cairo_rectangle_int_t, which led to memory corruption when cairo was
using a fixed point format with a bigger space than 16.16 (such as 24.8).
&image_extra was being passed instead of image_extra to release; the
bug only manifested itself when the particular backend did something
with image_extra.
Every time we assign or return a hard-coded error status wrap that value
with a call to _cairo_error(). So the idiom becomes:
status = _cairo_error (CAIRO_STATUS_NO_MEMORY);
or
return _cairo_error (CAIRO_STATUS_INVALID_DASH);
This ensures that a breakpoint placed on _cairo_error() will trigger
immediately cairo detects the error.
This reverts commit 919bea6dbb.
Sadly as Behdad points out some backends do modify the glyph array and,
for example cairo-xlib-surface, hide this from the compiler with some
evil casts.
Skip the memory duplication of the incoming glyphs if we do not need
to transform them into the backend coordinate system.
As a consequence we need to constify the glyphs passed to the backend
functions.
Introduce cairo_gradient_stop_t, and remove pixman dependency
for core pattern types. Perform conversion from cairo types
to pixman types as necessary in fallback code.
This patch introduces three macros: _cairo_malloc_ab,
_cairo_malloc_abc, _cairo_malloc_ab_plus_c and replaces various calls
to malloc(a*b), malloc(a*b*c), and malloc(a*b+c) with them. The macros
return NULL if int overflow would occur during the allocation. See
CODING_STYLE for more information.
It's quite simple to add a new _cairo_traps_limit call which installs
a box into the cairo_traps_t structure. Then at the time of
_cairo_traps_add we can discard any trapezoid that is wholly outside
the box and also clip any trapezoid that is partially outside the box.
We take advantage of this for both cairo_stroke and cairo_fill, (when
cairo is computing the trapezoids in cairo-surface-fallback.c). Note
that we explicitly do not do any clipping for cairo_stroke_extents,
cairo_fill_extents, cairo_in_stroke, or cairo_in_fill which are
defined to ignore clipping.
As seen by the long-lines perf case, this fix successfully works
around the bug in the X server where it creates overly large masks for
partially-outside-the-destination-surface trapezoids:
xlib-rgba long-lines-uncropped-100 545.84 -> 5.83: 93.09x speedup
██████████████████████████████████████████████
xlib-rgb long-lines-uncropped-100 554.74 -> 8.10: 69.04x speedup
██████████████████████████████████
This allows for the surface acquired from the pattern to have the
same content. In particular, in a case such as cairo_paint_with_alpha
we can now acquire an A8 mask surface instead of an ARGB32 mask
surface which can be rendered much more efficiently. This results
in a 4x speedup when using the OVER operator with the recently
added paint-with-alpha test:
Speedups
========
image-rgb paint-with-alpha_image_rgb_over-256 2.25 -> 0.60: 4.45x speedup
███▌
It does slowdown the same test when using the SOURCE operator, but
I don't think we care. Performing SOURCE with a mask is already a very
slow operation, (hitting compositeGeneral), so the slowdown here is
likely from having to convert from A8 back to ARGB32 before the
generalized compositing. So if someone cares about this slowdown,
(though SOURCE with cairo_paint_with_alpha doesn't seem extremely
useful), they will probably be motivated enough to contribute a
customized compositing function to replace compositeGeneral in which
case this slowdown should go away:
image-rgba paint-with-alpha_image_rgb_source-256 3.84 -> 8.86%: 1.94x slowdown
█
Most of the time pixman_region_init is called without any extents, and
followed by a pixman_region_union_rect, used to used to initialize
rectangular regions. pixman_region_union_rect is not that cheap, but
the sequence is called quite often. So it should be worth introducing
a specialized and fast function for this sequence.
This introduces pixman_region_init_rect. This new function makes
_cairo_region_init_from_rectangle obsolete.
Also removes the extent argument from pixman_region_init as it was
called with NULL most of the time. A pixman_region_init_with_extents
is added for the general case.