For the simple cases where the clip is an unaligned box (or boxes), apply
the clip directly to the geometry and avoid having to use an intermediate
clip-mask.
For the frequent cases where we know in advance that we are dealing with a
rectilinear path, but can not use the simple region code, implement a
variant of the Bentley-Ottmann tessellator. The advantages here are that
edge comparison is very simple (we only have vertical edges) and there are
no intersection, though possible overlaps. The idea is the same, maintain
a y-x sorted queue of start/stop events that demarcate traps and sweep
through the active edges at each event, looking for completed traps.
The motivation for this was noticing a performance regression in
box-fill-outline with the self-intersection work:
1.9.2 to HEAD^: 3.66x slowdown
HEAD^ to HEAD: 5.38x speedup
1.9.2 to HEAD: 1.57x speedup
The cause of which was choosing to use spans instead of the region handling
code, as the complex polygon was no longer being tessellated.
We refactor the surface fallbacks to convert full strokes and fills to the
intermediate polygon representation (as opposed to before where we
returned the trapezoidal representation). This allow greater flexibility
to choose how then to rasterize the polygon. Where possible we use the
local spans rasteriser for its increased performance, but still have the
option to use the tessellator instead (for example, with the current
Render protocol which does not yet have a polygon image).
In order to accommodate this, the spans interface is tweaked to accept
whole polygons instead of a path and the tessellator is tweaked for speed.
Performance Impact
==================
...
Still measuring, expecting some severe regressions.
...
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
cairo_region_union_rectangle() is linear in the number of rectangles
in the region. There is no way to make it significantly faster without
losing the ability to return errors synchronously, so a
cairo_region_create_rectangles() is needed to avoid a large
performance regression.
Ensure that no assumptions are made that a small allocation will succeed
by manually injecting faults when we may be simply allocating from an
embedded memory pool.
The main advantage in manual fault injection is improved code coverage -
from within the test suite most allocations are handled by the embedded
memory pools.
Specifically,
cairo_region_union_rect -> cairo_region_union_rectangle
cairo_region_create_rect -> cairo_region_create_rectangle
Also delete cairo_region_clear() which is not that useful.
Usually, rectangles are more useful than boxes, so regions should only
expose rectangles in their public API.
Specifically,
_cairo_region_num_boxes becomes _cairo_region_num_rectangles
_cairo_region_get_box becomes _cairo_region_get_rectangle
Remove the cairo_box_int_t type
The convex_quad tessellator (and possibly even the more general polygon
tessellator) will generate empty trapezoids when given a
rectangle which can be trivially discarded before inserting into traps.
If either point lies on the limit and the other outside, adjust the line
to be parallel to the boundary. This adjusts the previous test where both
points needed to be entirely outside.
When reporting the extents of the traps, constrain them to the imposed
limits. This is relied upon in the analysis-surface which sets the
limits on the traps (based on clip/surface extents) and then reports the
extents of the traps as the region of the operation.
Avoid the overhead of sorting the rectangle in
_cairo_traps_tessellate_convex_quad() by constructing the trap directly
from the line segment. This also has secondary effects in only passing
the non-degenerate trap to _cairo_traps_add_trap().
For rectilinear Hilbert curves this makes the rectilinear stoker over 4x
faster.
The rectilinear stroke finalized the cairo_traps_t passed to it - which
was then subsequently used without re-initialized. So instead of
finalizing the structure, just remove any traps that we may have added
(leaving the limits and memory intact).
==3429== Conditional jump or move depends on uninitialised value(s)
==3429== at 0x4E3FB0F: _cairo_box_round_to_rectangle (cairo-fixed-private.h:196)
==3429== by 0x4E34B29: _cairo_clip_intersect_to_rectangle (cairo-clip.c:162)
==3429== by 0x4E31943: cairo_push_group_with_content (cairo.c:495)
==3429== by 0x403044: draw (clip-zero.c:48)
==3429== by 0x404221: cairo_test_expecting (cairo-test.c:377)
==3429== by 0x64701C3: (below main) (libc-start.c:222)
Caused by setting extents->p2.y to zero twice.
Previously, _cairo_traps_extents () returned the extents
p1={INT_MAX, INT_MAX} p2={INT_MIN, INT_MIN} for an empty traps leading
to integer overflow when computing the width using p2-p1 and causing
further overflows within libpixman. [For example this caused the
allocation of massive regions with test/mask and the PS backend.]
In http://bugs.freedesktop.org/show_bug.cgi?id=13699, Pavel Vozenilek
reports a duplicate define for computing the appropriate length for an
on-stack array. The macro in question, and a few other places, was
performing CAIRO_STACK_BUF_SIZE/sizeof(stack[0]) so we can simplify
them slightly by using a common macro.
Simply return the error status from the traps_grow() function rather
than having an assignment in the return function and then immediately
another assignment of the error to the status member at its callsite.
Every time we assign or return a hard-coded error status wrap that value
with a call to _cairo_error(). So the idiom becomes:
status = _cairo_error (CAIRO_STATUS_NO_MEMORY);
or
return _cairo_error (CAIRO_STATUS_INVALID_DASH);
This ensures that a breakpoint placed on _cairo_error() will trigger
immediately cairo detects the error.
This patch introduces three macros: _cairo_malloc_ab,
_cairo_malloc_abc, _cairo_malloc_ab_plus_c and replaces various calls
to malloc(a*b), malloc(a*b*c), and malloc(a*b+c) with them. The macros
return NULL if int overflow would occur during the allocation. See
CODING_STYLE for more information.
It's quite simple to add a new _cairo_traps_limit call which installs
a box into the cairo_traps_t structure. Then at the time of
_cairo_traps_add we can discard any trapezoid that is wholly outside
the box and also clip any trapezoid that is partially outside the box.
We take advantage of this for both cairo_stroke and cairo_fill, (when
cairo is computing the trapezoids in cairo-surface-fallback.c). Note
that we explicitly do not do any clipping for cairo_stroke_extents,
cairo_fill_extents, cairo_in_stroke, or cairo_in_fill which are
defined to ignore clipping.
As seen by the long-lines perf case, this fix successfully works
around the bug in the X server where it creates overly large masks for
partially-outside-the-destination-surface trapezoids:
xlib-rgba long-lines-uncropped-100 545.84 -> 5.83: 93.09x speedup
██████████████████████████████████████████████
xlib-rgb long-lines-uncropped-100 554.74 -> 8.10: 69.04x speedup
██████████████████████████████████
Add a _cairo_traps_status function and use it instead of adding
error checks to callers of _cairo_traps_add_trap and
_cairo_traps_add_trap_from_points, (both of which are now given
a void return type).