During the copy, allocation of the gradient may fail and so the callers
need to check for a pattern that returned in an error state. No callers
did so and in order to force all callers to check the error status,
the status return was added to _cairo_pattern_init_copy(). The early
error checking may appear redundant for an object with an embedded
structure, however it does fix an error where an uninitialised pattern
was being used:
==1922== Process terminating with default action of signal 11 (SIGSEGV)
==1922== Access not within mapped region at address 0x55555555
==1922== at 0x402CF6F: _cairo_array_index (cairo-array.c:208)
==1922== by 0x402D4F3: _cairo_user_data_array_fini (cairo-array.c:370)
==1922== by 0x4046464: _cairo_pattern_fini (cairo-pattern.c:188)
==1922== by 0x404992A: _cairo_meta_surface_paint (cairo-meta-surface.c:266)
==1922== by 0x403FCE0: _cairo_surface_paint (cairo-surface.c:1331)
==1922== by 0x405CB5E: _test_meta_surface_paint (test-meta-surface.c:195)
==1922== by 0x403FCE0: _cairo_surface_paint (cairo-surface.c:1331)
==1922== by 0x4032A60: _cairo_gstate_paint (cairo-gstate.c:822)
==1922== by 0x402B2D1: cairo_paint (cairo.c:1879)
==1922== by 0x804A4F7: draw (radial-gradient.c:73)
==1922== by 0x804AFA4: cairo_test_expecting (cairo-test.c:326)
==1922== by 0x804A57C: main (radial-gradient.c:109)
==1922== Injected fault at:
==1922== at 0x4020EA5: malloc (vg_replace_malloc.c:207)
==1922== by 0x404475C: _cairo_pattern_init_copy (cairo-pattern.c:136)
==1922== by 0x403F779: _cairo_surface_copy_pattern_for_destination (cairo-surface.c:2153)
==1922== by 0x403FCC1: _cairo_surface_paint (cairo-surface.c:1328)
==1922== by 0x405CB5E: _test_meta_surface_paint (test-meta-surface.c:195)
==1922== by 0x403FCE0: _cairo_surface_paint (cairo-surface.c:1331)
==1922== by 0x4032A60: _cairo_gstate_paint (cairo-gstate.c:822)
==1922== by 0x402B2D1: cairo_paint (cairo.c:1879)
==1922== by 0x804A4F7: draw (radial-gradient.c:73)
==1922== by 0x804AFA4: cairo_test_expecting (cairo-test.c:326)
==1922== by 0x804A57C: main (radial-gradient.c:109)
Simply return without writing to potentially read-only members of an
invalid pattern rather than assert. This is cleaner than tracking down
all the error paths that may call into cairo_pattern_transform()...
_cairo_surface_create_similar_solid() may return an image surface,
should the backend not support the required content or should it
encounter an error whilst creating the surface. In those circumstances
we choose not to cache the fallback surface.
Original work by Jorn Baayen <jorn@openedhand.com>,
2715f20981
We use a small cache of size 16 for surfaces created for solid patterns.
This mainly helps with the X backends where we don't have to create a
pattern for every operation, so we save a lot on X traffic. Xft uses a
similar cache, so cairo's text rendering traffic with the xlib backend
now completely matches that of Xft.
The cache uses an static index variable, which itself acts like a cache of
size 1, remembering the most recently used solid pattern. So repeated
lookups for the same pattern hit immediately. If that fails, the cache is
searched linearly, and if that fails too, a new surface is created and a
random member of the cache is evicted.
A cached surface can only be reused if it is similar to the destination.
In order to check for similar surfaces a new test is introduced for the
backends to determine that the cached surface is as would be returned by
a _create_similar() call for the destination and content.
As surfaces are in general complex encapsulation of graphics state we
only return unshared cached surfaces and reset them (to clear any error
conditions and graphics state). In practice this makes little difference
to the efficacy of the cache during various benchmarks. However, in order
to transparently share solid surfaces it would be possible to implement a
COW scheme.
Cache hit rates: (hit same index + hit in cache) / lookups
cairo-perf: (42346 + 28480) / 159600 = 44.38%
gtk-theme-torturer: (3023 + 3502) / 6528 = 99.95%
gtk-perf: (8270 + 3190) / 21504 = 53.29%
This translates into a reduction of about 25% of the XRENDER traffic during
cairo-perf.
This allows for the surface acquired from the pattern to have the
same content. In particular, in a case such as cairo_paint_with_alpha
we can now acquire an A8 mask surface instead of an ARGB32 mask
surface which can be rendered much more efficiently. This results
in a 4x speedup when using the OVER operator with the recently
added paint-with-alpha test:
Speedups
========
image-rgb paint-with-alpha_image_rgb_over-256 2.25 -> 0.60: 4.45x speedup
███▌
It does slowdown the same test when using the SOURCE operator, but
I don't think we care. Performing SOURCE with a mask is already a very
slow operation, (hitting compositeGeneral), so the slowdown here is
likely from having to convert from A8 back to ARGB32 before the
generalized compositing. So if someone cares about this slowdown,
(though SOURCE with cairo_paint_with_alpha doesn't seem extremely
useful), they will probably be motivated enough to contribute a
customized compositing function to replace compositeGeneral in which
case this slowdown should go away:
image-rgba paint-with-alpha_image_rgb_source-256 3.84 -> 8.86%: 1.94x slowdown
█
_cairo_surface_create_similar_solid() creates a fresh pattern to wrap
color, however sometimes the caller already has that pattern available.
In those circumstances we can pass the pattern as well as the color and
avoid the extra allocation.
For opaque surfaces the backends may use simpler code paths - for
example, the xlib backend may be able to use the Core protocol rather
than Render. So we only generate a surface with an alpha component if
the color is not opaque.
The so-attributed-to-X-server bug was that cairo maps the drawing
region to the pattern space, rounds the box, and uploads only that
part of the source surface to the X server. Well, this only works for
NEAREST filter as any more sophisticated filter needs to sneak a peek
at the neighboring pixels around the edges too.
The right fix involves taking into account the filter used, and the
pattern matrix, but for most cases, a single pixel should be enough.
Not sure about scaling down...
Anyway, this is just a workaround to get 1.4.4 out of the door. I'll
commit a proper fix soon.
Confusion had been introduced as to who provided the fixup after
the malloc failed which resulted in a NULL deference whilst checking for
an erroneous pattern in _cairo_pattern_create_in_error.
Previously, the convention was that static ones started with cairo_, but
renamed to start with _cairo_ when they were needed from other files and
became cairo_private instead of static...
This is error prune indeed, and two symbols were already violating. Now
all nil objects start with _cairo_.
Frequently cairo_set_source_rgb[a]() is used to replace the current
solid-pattern source with a new one of a different colour. The current
pattern is very likely to be unshared and unmodified and so it is likely
just to be immediately freed [or rather simply moved to recently freed
cache]. However as the last active pattern it is likely to cache-warm and
suitable to satisfy the forthcoming allocation. So by setting the current
pattern to 'none' we can move the pattern to the freed list before we
create the new pattern and hopefully immediately reuse it.
Unfortunately one cannot cache live patterns and return a fresh reference
instead of creating new ones as patterns can be modified by the user and
so cannot be transparently shared between different users. However,
solid colour allocation is still a frequent operation, so we maintain a
small cache of recently freed patterns to reduce the malloc pressure.
We use a small cache of size 16 for patterns created from solid colors,
e.g. cairo_set_source_rgb(). This helps with toolkits that draw many
widgets using the same colour scheme.
The cache uses a static index variable, which itself acts like a cache
of size 1, remembering the most recently used colour. So repeated
lookups for the same colour hit immediately. If that fails, the cache
is searched linearly, and if that fails too, a new pattern is created
and a random member of the cache is evicted.
All mutex declarations have been moved to cairo-mutex-list.h.
This should avoid breaking of less frequently tested backends,
when mutexes are introduced or when existing mutexes are renamed.
Instead of initializing mutexes on library startup, mutexes are
lazily initialized within the few entry points of now by calling
CAIRO_MUTEX_INITIALIZE(). Currently only the OS/2 backend takes
care about releasing global mutexes. Therefore there is no counter
part of that macro for finalizing all global mutexes yet - but
as cairo-backend-os2.c shows such a function would be quite
easy to implement.
This reverts the following commits:
2715f2098167e3b3c53b
See this thread for an analysis of the problems it caused:
http://lists.freedesktop.org/archives/cairo/2007-February/009825.html
In short, a single cache for all backends doesn't work, as one thread
using any backend can cause an unused xlib pattern to be evicted from
the cache, and trigger an xlib call while the display is being used
from another thread. Xlib is not prepared for this.
This was needed for SVG backend because it does not implement clone_similar.
However, I'm worried about possible infinite recursion here. Not sure what
to do.
We do this through a hack, that is, we make
_cairo_pattern_acquire_surface to return a surface that has four
copies of the original surface painted such that this image can
be simply repeated to get the effect of reflecting the original
surface.
This fixes the formerly XFAIL test extend-reflect.
Previous commit broke cairo_surface_finish, since it was checking for
ref_count == CAIRO_REF_COUNT_INVALID and bailing. But, that condition
was reached from destroy, so finish was bailing out early.
user_data setters/getters were added to public refcounted objects
that were missing them (cairo_t, pattern, scaled_font). Also,
a refcount getter (cairo_*_get_reference_count) was added to all
public refcounted objects.
We use a small cache of size 16 for surfaces created for solid patterns.
This mainly helps with the X backends where we don't have to create a
pattern for every operation, so we save a lot on X traffic. Xft uses a
similar cache, so cairo's text rendering traffic with the xlib backend
now completely matches that of Xft.
The cache uses an static index variable, which itself acts like a cache of
size 1, remembering the most recently used solid pattern. So repeated
lookups for the same pattern hit immediately. If that fails, the cache is
searched linearly, and if that fails too, a new surface is created and a
random member of the cache is evicted.
Only surfaces that are "compatible" are used. The definition of compatible
is backend specific. For the xlib backend, it means that the two surfaces
are allocated on the same display. Implementations for compatibility are
provided for all backends that it makes sense.
Rotation and other transformations would cause extents to be
computed which were outside the bounds of the surface to be
cloned, (and for non repeating patterns). Now we simply
restrict the computed extents to the surface extents.
This fixes the xlib failure of the recent rotate-image-surface-paint
test, (the apparently distinct ps failure remains).
It turns out that all of the callers want a box anyway, so this
simplfies the code in addition to being more honest to the name.
(For those new to the convention, a "box" is an (x1,y2),(x2,y2)
pair while a "rectangle" is an (x,y),(width,height) pair.)
This broke with the clone_similar optimization in
8d7a02ed58 The optimization added an
interest rectangle to clone_similar, but with a repeating source
pattern, the interest rectangle might not intersect the extents of the
surface at all.
The test suite caught this with the trap-clip case.
The fix here is to clone the entire surface if the pattern has an
extend mode of REPEAT.
This broke with the clone_similar optimization in
8d7a02ed58
The optimization added an interest rectangle to clone_similar,
but the acquire_surface path was neglecting to transform its
rectangle by the pattern matrix.
The test suite did catch this, but apparently we were too
distracted by the performance improvements to notice. Only
backends other than image that implemented clone_similar
would be affected by the bug, (which meant I only saw xlib
failures in my testing).
This fixes bug #8711
This fixes a huge performance bug (entire image was being pushed to X
server in order to copy a tiny piece of it). I see up to 50x improvement
from subimage_copy (which was designed to expose this problem) but also
a 5x improvement in some text performance cases.
xlib-rgba subimage_copy-512 3.93 2.46% -> 0.07 2.71%: 52.91x faster
███████████████████████████████████████████████████▉
xlib-rgb subimage_copy-512 4.03 1.97% -> 0.09 2.61%: 44.74x faster
███████████████████████████████████████████▊
xlib-rgba subimage_copy-256 1.02 2.25% -> 0.07 0.56%: 14.42x faster
█████████████▍
xlib-rgba text_image_rgb_over-256 63.21 1.53% -> 11.87 2.17%: 5.33x faster
████▍
xlib-rgba text_image_rgba_over-256 62.31 0.72% -> 11.87 2.82%: 5.25x faster
████▎
xlib-rgba text_image_rgba_source-256 67.97 0.85% -> 16.48 2.23%: 4.13x faster
███▏
xlib-rgba text_image_rgb_source-256 68.82 0.55% -> 16.93 2.10%: 4.07x faster
███▏
xlib-rgba subimage_copy-128 0.19 1.72% -> 0.06 0.85%: 3.10x faster
██▏