Due to the nature of the reference counting, an X resource may be
destroyed later than anticipated and possibly from a different thread
than the original context. This becomes an issue for applications that
carefully manage their single X connection from a single thread and do
not use locking and are then suprised when cairo triggers X traffic when
performing work for a different part of the application in another thread.
Previously, we stored the per-display attributes inside a special
screen=NULL _cairo_xlib_screen_info_t. Now we keep track of known X
displays and store the screen information beneath the display structure
alongside the per-display hooks.
_cairo_pattern_acquire_surfaces() may substitute an image surface for
either the source or the mask should the backend not support creation
of similar scratch surfaces or an error occurs during creation. For
composition we require xlib surfaces and so we must trigger the
fallback path if this happens.
If we create the Pixmap whilst constructing a similar xlib surface, then
it our responsibility to free the Pixmap should we fail to allocate the
surface.
This allows for the surface acquired from the pattern to have the
same content. In particular, in a case such as cairo_paint_with_alpha
we can now acquire an A8 mask surface instead of an ARGB32 mask
surface which can be rendered much more efficiently. This results
in a 4x speedup when using the OVER operator with the recently
added paint-with-alpha test:
Speedups
========
image-rgb paint-with-alpha_image_rgb_over-256 2.25 -> 0.60: 4.45x speedup
███▌
It does slowdown the same test when using the SOURCE operator, but
I don't think we care. Performing SOURCE with a mask is already a very
slow operation, (hitting compositeGeneral), so the slowdown here is
likely from having to convert from A8 back to ARGB32 before the
generalized compositing. So if someone cares about this slowdown,
(though SOURCE with cairo_paint_with_alpha doesn't seem extremely
useful), they will probably be motivated enough to contribute a
customized compositing function to replace compositeGeneral in which
case this slowdown should go away:
image-rgba paint-with-alpha_image_rgb_source-256 3.84 -> 8.86%: 1.94x slowdown
█
Toolkits like GTK+ almost always set a simple rectangular clip mask before
any cairo operation, so avoid the allocation for this simple case by
embedding a small number of XRectangles into the surface structure.
Due to the interaction between multiple threads showing glyphs and
asynchronous CloseDisplays, it is possible for a font to maintain a
cairo_xlib_screen_info_t beyond the CloseDisplay. The simple solution
is to add a reference count in order to track the lifetime of the
cairo_xlib_screen_info_t correctly.
These were found during a cairo_static pass on an alternative branch...
A critical one in particular was setting the have added glyph flag to
TRUE even if _cairo_xlib_surface_add_glyph() fails. This can cause an
application crash due to a RenderBadGlyph error later when the scaled
font is cleaned and we attempt to remove the glyph.
_cairo_pattern_release_surface() asserts that it is passed a pattern
surface. This itself is bad as breaks the symmetry with
_cairo_pattern_acquire_surface under() error conditions, however reorder
the cleanup to avoid this assertion.
Detect when a substitute image surface is returned for a solid pattern,
and avoid mixed image/xlib composite operations. This can happen for example
if there is a resource allocation failure during creating a similar surface.
The original test for 'if (surface->visual)' dates back to a very old
assumption that if the xlib surface was created with an XRenderFormat
that the surface->visual field would be set to NULL. This assumption
was broken years ago with the following commit:
0c05b23b31
This fixes the crash reported here:
BadMatch when running gnome-terminal with the murrine-0.51 gtk engine
https://bugs.freedesktop.org/show_bug.cgi?id=10250
Use the new hook functions to register a callback for xlib to clear
the private glyph data when the display is closed. In order to do this
we need to reset the glyph cache inside the generic scaled font as well.
This reverts the following commits:
2715f2098167e3b3c53b
See this thread for an analysis of the problems it caused:
http://lists.freedesktop.org/archives/cairo/2007-February/009825.html
In short, a single cache for all backends doesn't work, as one thread
using any backend can cause an unused xlib pattern to be evicted from
the cache, and trigger an xlib call while the display is being used
from another thread. Xlib is not prepared for this.
Two drawables can be used in an X and Render operation only if they share
the same screen. Previously we were only checking for the same display
in is_compatible. Check for the same screen now.
We use a small cache of size 16 for surfaces created for solid patterns.
This mainly helps with the X backends where we don't have to create a
pattern for every operation, so we save a lot on X traffic. Xft uses a
similar cache, so cairo's text rendering traffic with the xlib backend
now completely matches that of Xft.
The cache uses an static index variable, which itself acts like a cache of
size 1, remembering the most recently used solid pattern. So repeated
lookups for the same pattern hit immediately. If that fails, the cache is
searched linearly, and if that fails too, a new surface is created and a
random member of the cache is evicted.
Only surfaces that are "compatible" are used. The definition of compatible
is backend specific. For the xlib backend, it means that the two surfaces
are allocated on the same display. Implementations for compatibility are
provided for all backends that it makes sense.
The old implementation was a very naive one that used to generate one XRender
glyph element per glyph. That is, position glyphs individually. This was
raised here:
http://lists.freedesktop.org/archives/cairo/2006-December/008835.html
The new implmentation is a free rewriting of the Xft logic, that is,
compressing glyphs with "natural" advance into elements, but with various
optimizations and improvements.
In short, it works like this: glyphs are looped over, skipping those that are
not desired, and computing offset from "current position". Whenever a glyph
has non-zero offsets from the current position, a new element should be
started. All these are used to compute the request size in the render
protocol. Whenever the request size may exceed the max request size, or at
the end, glyphs are flushed. For this to work, we now set non-zero glyph
advances when sending glyphs to the server.
Notable optimizations and improvements include:
- Reusing the input glyph array (with double glyph positions) as a working
array to compute glyph offsets.
- Reusing the input glyph array as the output glyph-index array to be passed
to XRender.
- Marking glyphs to be skipped as so, avoiding a copy of the glyph array,
which is what the old code was doing.
- Skip glyphs with positions "out-of-range". That is, those with positions
that would cause an overflow in Xrender's glyph offset calculations.
On my Fedora desktop on Pentium 4, and on a Nokia 770, it shows a 6% speedup on
the timetext test.
The rule is: cairo_glyph_t* is always passed as const for measurement
purposes. This was not reflected in our public api previously. Fixed
Showing glyphs used to have cairo_glyph_t* always as const. With this
changed, it is only const on cairo_t and cairo_gstate_t operations.
cairo_surface_t, cairo_scaled_font_t, and individual backends receive
cairo_glyph_t* as non-const. The desired semantics is that they may modify
the contents of the array as long as they do not return
CAIRO_STATUS_UNSUPPORTED. This makes it possible to avoid copying the glyph
array again and again, and edit it in-place. Backends are in fact free to use
the array as a generic buffer as they see fit.
This fixes a huge performance bug (entire image was being pushed to X
server in order to copy a tiny piece of it). I see up to 50x improvement
from subimage_copy (which was designed to expose this problem) but also
a 5x improvement in some text performance cases.
xlib-rgba subimage_copy-512 3.93 2.46% -> 0.07 2.71%: 52.91x faster
███████████████████████████████████████████████████▉
xlib-rgb subimage_copy-512 4.03 1.97% -> 0.09 2.61%: 44.74x faster
███████████████████████████████████████████▊
xlib-rgba subimage_copy-256 1.02 2.25% -> 0.07 0.56%: 14.42x faster
█████████████▍
xlib-rgba text_image_rgb_over-256 63.21 1.53% -> 11.87 2.17%: 5.33x faster
████▍
xlib-rgba text_image_rgba_over-256 62.31 0.72% -> 11.87 2.82%: 5.25x faster
████▎
xlib-rgba text_image_rgba_source-256 67.97 0.85% -> 16.48 2.23%: 4.13x faster
███▏
xlib-rgba text_image_rgb_source-256 68.82 0.55% -> 16.93 2.10%: 4.07x faster
███▏
xlib-rgba subimage_copy-128 0.19 1.72% -> 0.06 0.85%: 3.10x faster
██▏
Basically, it's evil to write a loop like:
while ((c -= 4) > 0) {
...
}
for one reason that doesn't work if c is unsigned. And when c is signed, if
for some reason c is about -MAXINT, then it will overflow and not work as
expected.
It's much safer (and more gcc warning friendly) to rewrite it as:
unsigned int c;
while (c >= 4) {
...
c -= 4;
}
Behdad chased this bug down when looking into bug #7593. This
bug is what finally motivated us to figure out how to get -Wextra
(for the "always true" comparisons of unsigned variables against
negative values).
This approach to fixing the bug is valid since there is code in pixman
for rendering to BGR images, (which is why cairo 1.0 worked with BGR X
servers for example). But, since we don't want to advertise additional
image formats we implement this through a new cairo_internal_format_t.
This is rather fragile since we don't want to leak any internal formats
nor do we ever want an internal format to be used somewhere a real
format is expected, (and trigger a CAIRO_FORMAT_VALID assertion failure).
More comments than code are added here to help compensate for the
fragility and to give some guidance in fixing this mess in a better way
in the future.
There appears to be a bug in some X servers which is triggered by
rendering 1-bit glyphs with zero size via the functions
XRenderAddGlyphs and XRenderCompositeText8 (and likely its variants).
We avoid this bug by making a copy of the glyphs array which does not
include any of the size-zero glyphs so that the X server never sees them.
This is an attempt to fix the following bug:
http://bugzilla.gnome.org/show_bug.cgi?id=332266
With the recent rewrite of the device-offset code, which pushed things
from the gstate to the surface layer, the 16-bit limitations on coordinates
which previously applied to device space only, have lately been applying to
user space. This commit moves the device_transform back up above the conversion
from floating-point to fixed-point values so that once again the limitation
only applies to device space.
When accessing the underlying drawable etc of an xlib surface, it is
also helpful to be able to get the width and height without a server
round trip. This patch provides those functions.