Within our code base we carried a few hacks to utilize the component
alpha capabilities of pixman, whilst not supporting the concept for our
own masks. Thus we were setting it upon the pixman_image_t that we
passed around through code that was blissfully unaware and indeed the
component-alpha property was forgotten (e.g. upgrading glyph masks).
The real issue is that without explicit support that a pattern carries
subpixel masking information, that information is lost when using that
pattern with composite. Again we can look at the example of compositing
a sub-pixel glyph mask onto a remote xlib surface for further failure.
Gah, that was a horrible mistake. It was a flawed hack to create Pixmaps
of the correct depth when cloning patterns for blitting to the xlib
backend. However, it had the nasty side-effect of discarding alpha when
targeting Window surfaces. The correct solution is to simply correct the
Pixmap of the desired depth and render a matching pattern onto the
surface - i.e. a reversal the current acquire -> clone. See the
forthcoming revised xcb backend on how I should have done it originally.
Gah! I had believed that the dst extents and the clip were correct to
enable unbounded fixup for the unbounded trapezoids. I was wrong, so I
need to requery the trapezoid extents. As this information is already
known, I should update the interface to pass along all relevant
information.
Project lines that exceed the 16.16 limits onto the XFixedLine, as we know
that top/bottom must fit within the confines of the surface and so will be
less than 16 bits.
Sadly this is a run-on patch that also does:
1. Make FillTiled the default for new GCs.
2. Stores extend mode as opposed to repeat, and thereby cleaning up some
inconsistent code.
3. Remove the special casing for unbounded trapezoids, as it is redundant
with the polygon clipping.
4. Tidy the logic for deciding when to use the core protocol
(_categorize_composite_operation)
Implement cheap calculation of glyph extents to see whether we can discard
the clip region. This is effective around 50% of the time for firefox (and
makes the xtrace so much neater).
Creating an widthxheight solid picture for using with
RenderCompositeTrapezoids defeats the optimization with the xserver that
checks for a solid alpha pattern. The checks it performs are for
CONTENT_ALPHA, Repeat, 1x1 and value == 0xff.
The issue Joonas was trying to solve was the unwanted inclusion of
the inlines via cairo-freelist-private.h. Unwittingly he included
cairoint.h from cairo-xlib-private.h instead, a far more heinous crime as
that causes the boilerplate to try to use the hidden, private symbols.
Instead we resolve this issue by making the cairo_xlib_display_t structure
private to cairo-xlib-display.c and provide functions to manipulate the
abstract data type. Whilst in the vicinity, we rename
cairo_xlib_screen_info_t to cairo_xlib_screen_t for consistency and
cleanliness.
This patch revises xlib so that it doesn't depend on having recent
Xrender headers to build. In particular, some definitions were added
to the private xrender header file, and an ifdef render version check
CAIRO_SURFACE_RENDER_SUPPORTS_OPERATOR was changed to a run-time
check using CAIRO_SURFACE_RENDER_HAS_PDF_OPERATORS.
Cairo should include the contents of subwindows when using a Window as a
source but will clip to subwindows when using a Window as a destination.
This can be set using the GC's subwindow_mode.
XCopyArea and XFillRectangle can however only use one GC for both source
and destination. Cairo's mode is set to (the default) ClipByChildren.
This means that copying from a Window is broken, so we only allow the
optimization when we know that the source is a Pixmap.
The performance impact of this change has not been tested. It should be
small, as the code will use XRender otherwise.
If it turns out to be a bigger impact, the optimizations could be
improved by doing a two-step copy process:
1) Copy to an intermediate Pixmap with IncludeInferiors
2) Copy to the destination with ClipByChildren
(potentially omitting one one of the steps if source or destination are
known to be Pixmaps).
references:
commit 0c5d28a4e5https://bugs.freedesktop.org/show_bug.cgi?id=12996
Behdad pointed out that fprintf() returns a value so that we could simply
use the comma operator to return the correct value instead of the
expression-block gcc-ism.
We can offload creation of gradients to server that support RENDER 0.10
and later. This greatly reduces the amount of traffic we need to send over
our display connection as the gradient patterns are much smaller than the
full image. Even if the server fallbacks to using pixman, performance
should be improved by the reduced transport overhead. Furthermore this is a
requisite to enable hardware accelerated gradients with the xlib backend.
Running cairo-perf-trace on tiny, Celeron/i915:
before: firefox-20090601 211.585
after: firefox-20090601 270.939
and on tiger, CoreDuo/nvidia:
before: firefox-20090601 70.143
after: firefox-20090601 87.326
where linear gradients are used extensively throughout the GTK+ theme.
Not quite the result I was expecting!
In particular, looking at tiny:
xlib-rgba paint-with-alpha_linear-rgba_over-512 47.11 (47.16 0.05%) -> 123.42 (123.72 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgba_over-512 47.27 (47.32 0.04%) -> 123.78 (124.04 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear-rgb_over-512 47.19 (47.21 0.02%) -> 123.37 (123.70 0.13%): 2.61x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgb_over-512 47.30 (47.31 0.04%) -> 123.52 (123.62 0.09%): 2.61x slowdown
█▋
xlib-rgba paint_linear3-rgb_over-512 47.29 (47.32 0.05%) -> 118.95 (119.60 0.29%): 2.52x slowdown
█▌
xlib-rgba paint_linear-rgba_over-512 47.14 (47.17 0.06%) -> 116.76 (117.06 0.16%): 2.48x slowdown
█▌
xlib-rgba paint_linear3-rgba_over-512 47.32 (47.34 0.04%) -> 116.85 (116.98 0.05%): 2.47x slowdown
█▌
xlib-rgba paint_linear-rgb_over-512 47.15 (47.19 0.03%) -> 114.08 (114.55 0.20%): 2.42x slowdown
█▍
xlib-rgba paint-with-alpha_radial-rgb_over-512 117.25 (119.43 1.21%) -> 194.36 (194.73 0.09%): 1.66x slowdown
▋
xlib-rgba paint-with-alpha_radial-rgba_over-512 117.22 (117.26 0.02%) -> 193.81 (194.17 0.11%): 1.65x slowdown
▋
xlib-rgba paint_radial-rgba_over-512 117.23 (117.26 0.02%) -> 186.35 (186.41 0.03%): 1.59x slowdown
▋
xlib-rgba paint_radial-rgb_over-512 117.23 (117.27 0.02%) -> 184.14 (184.62 1.51%): 1.57x slowdown
▋
Before 1.10, we may choose to disable server-side gradients for the
current crop of Xorg servers, similar to the extended repeat modes.
[Updated by Chris Wilson. All bugs are his.]
The extended repeat modes were only introduced in RENDER 0.10, so disable
them if the server reports an earlier version. This is in addition to
disabling the repeat modes if we know (guess!) the server to have a buggy
implementation.
In a discussion on IRC, attention was drawn to a dubious comment in
_cairo_xlib_show_glyphs() - the precise details of which have passed
out of the collective memory.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Before attempting to even set the attributes on the source Picture, we
ensure that it exists. So remove the redundant safe-guards to do nothing
if it doesn't exist.
We always query an xrender_format for a Visual upon surface creation, so
checking again in create_similar() is redundant. (It also interferes with
disabling XRender...)
Shrink the overall size of the per-screen GC cache, but allow multiple GCs
per depth, as it quite common to need up to two temporary GCs along some
drawing paths. Decrease the number of GCs we obtain in total by returning
clean (i.e. a GC without a clip set) back to the screen pool after use.
Compensate for the increased number of put/get by performing the query
using atomic operations where available. So overall we see a dramatic
reduction on the numbers of XCreateGC and XFreeGC, of even greater benefit
for RENDER-less servers.
Written by Vladimir Vukicevic to enable integration with Qt embedded
devices, this backend allows cairo code to target QPainter, and use
it as a source for other cairo backends.
This imports the sources from mozilla-central:
http://mxr.mozilla.org/mozilla-central/find?text=&kind=text&string=cairo-qpainter
renames them from cairo-qpainter to cairo-qt, and integrates the patch
by Oleg Romashin:
https://bugs.freedesktop.org/attachment.cgi?id=18953
And then attempts to restore 'make check' to full functionality.
However:
- C++ does not play well with the PLT symbol hiding, and leaks into the
global namespace. 'make check' fails at check-plt.sh
- Qt embeds a GUI into QApplication which it requires to construct any
QPainter drawable, i.e. used by the boilerplate to create a cairo-qt
surface, and this leaks fonts (cairo-ft-fonts no less) causing assertion
failures that all cairo objects are accounted for upon destruction.
[Updated by Chris Wilson]
Acked-by: Jeff Muizelaar <jeff@infidigm.net>
Acked-by: Carl Worth <cworth@cworth.org>
Most drivers and the X server used to have incorrect RepeatPad/RepeatReflect
implementations, forcing cairo to fall back to client-side software rendering,
which is painfully slow due to pixmaps being transfered over the wire. These
issues are mostly fixed in the drivers (with the exception of radeonhd, whose
developers didn't respond) and the RepeatPad software fallback is implemented
correctly as of pixman-0.15.0, so this patch will hand off composite operations
with EXTEND_PAD/EXTEND_REFLECT source patterns to XRender.
There is no way to detect whether the X server or the drivers use a
broken Render implementation, we make a guess based on the server
version: It's probably safe to assume that 1.7 X servers will use
fixed drivers and a recent enough version of pixman.
This patch adds more implementation of the snapshot method. For
surface types where acquire_source_image is already making a copy
of the bits, doing another one as is the case for the fallback
implementation is a waste.
Allow the caller to choose whether or not various conversions take place.
The first flag is used to disable the expansion of reflected patterns into a
repeating surface.
Damian Frank noted
[http://lists.cairographics.org/archives/cairo/2009-May/017095.html]
a performance problem with an older XServer with an
unaccelerated composite - similar problems will be seen with non-XRender
servers which will trigger extraneous fallbacks. The problem he found was
that painting an ARGB32 image onto an RGB24 destination window (using
SOURCE) was going via the RENDER protocol and not core. He was able to
demonstrate that this could be worked around by declaring the pixel data as
an RGB24 image. The issue is that the image is uploaded into a temporary
pixmap of matching depth (i.e. 32 bit for ARGB32 and 24 bit for RGB23
data), however the core protocol can only blit between Drawables of
matching depth - so without the work-around the Drawables are mismatched
and we either need to use RENDER or fallback.
This patch adds a content mask to _cairo_surface_clone_similar() to
provide the extra bit of information to the backends for when it is
possible for them to drop channels from the clone. This is used by the
xlib backend to only create a 24 bit source when blitting to a Window.
Simply request a surface with a similar content to the source image when
uploading pixel data. Failing to do so prevents using a 16-bit (or
otherwise non-standard pixman image format) window as a source - in fact
it will trigger an infinite recursion.
Eliminate the extremely short-lived and oft unnecessary heap allocation
of the region by first checking to see whether the clip exceeds the
surface bounds and only then intersect the clip with a local
stack-allocated region.
Specifically,
cairo_region_union_rect -> cairo_region_union_rectangle
cairo_region_create_rect -> cairo_region_create_rectangle
Also delete cairo_region_clear() which is not that useful.
Usually, rectangles are more useful than boxes, so regions should only
expose rectangles in their public API.
Specifically,
_cairo_region_num_boxes becomes _cairo_region_num_rectangles
_cairo_region_get_box becomes _cairo_region_get_rectangle
Remove the cairo_box_int_t type
The _cairo_region_get_boxes() interface was difficult to use and often
caused unnecessary memory allocation. With _cairo_region_get_box() it
is possible to access the boxes of a region without allocating a big
temporary array.
Adds an error code replacing CAIRO_STATUS_NO_MEMORY in one case where it
is not really appropriate. CAIRO_STATUS_INVALID_SIZE is used by several
backends that do not support image sizes beyond 2^15 pixels on each side.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are dithering on the Xlib backend we can not simply repaint the
surface used for a solid pattern and must recreate it from scratch.
However, for ordinary XRender usage we do not want to have to pay that
price - so query the backend to see if we can reuse the surface.
A surface will have the chance to use span rendering at cairo_fill()
time by creating a renderer for a specific combination of
pattern/dst/op before the path is scan converted. The protocol is to
first call check_span_renderer() to see if the surface wants to render
with spans and then later call create_span_renderer() to create the
renderer for real once the extents of the path are known.
No backends have an implementation yet.