In order to reuse the original image as the pixman pattern, then the
entire operation must be wholly contained within the extents of the
image (including subsurfaces) and be reducible to an untransformed
REPEAT_NONE.
I updated the Free Software Foundation address using the following script.
for i in $(git grep Temple | cut -d: -f1 )
do
sed -e 's/59 Temple Place[, -]* Suite 330, Boston, MA *02111-1307[, ]* USA/51 Franklin Street, Suite 500, Boston, MA 02110-1335, USA/' -i "$i"
done
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=21356
Stop the callers from guessing the origin of the clip surface by
reporting it explicitly! This enables the clip to bypass any rectangles
overlaid on top of the clip surface, which is common when the backends
limit the clip to the extents of the operation -- but irrelevant to the
actual content of the clip mask
In the event of an empty bounded rectangle, the computation of the
unbounded - bounded rectangles leads to negative areas, integer overflow
and death.
[And similarly for the derived surfaces.]
We were exposing the actual value of CAIRO_FORMAT_INVALID
through API functions already, so it makes sense to just
go ahead and put it in the cairo_format_t enum.
The image surface tries to convert surface pattern's extend
modes to EXTEND_NONE, if it can, when converting a cairo_pattern_t
to a pixman_image_t. The check was not taking into account the
transformation matrix on the pattern, so it was possible to
trick it into using EXTEND_NONE by downscaling the source
pattern enough. This patch changes the optimization to only
take if the pattern has no transformation.
Fixes surface-pattern-scale-down-extend-{pad,reflect,repeat}
failures in the test suite for the image backend.
Split into a general cairo_image_surface_coerce() that coerces to one of
the 3 supported formats (ARGB32, RGB24, A8) based on content and the
more general cairo_image_surface_coerce_to_format() that coerces to a
specified format.
When HAS_ATOMIC_OPS is not defined, cairo-image-surface.c does not
compile because _pixman_white_image calls _pixman_image_for_solid
which gets defined only later in the code.
Write a dedicated compositor for pixman so that we avoid the
middle-layer syndrome of surface-fallback. The major upshot of this
rewrite is that the image surface is now several times quicker for glyph
compositing, which dramatically improves performance for text rendering
by firefox and friends. It also uses a couple of the new scan
convertors, such as the rectangular scan converter for rectilinear
paths.
Speedups
========
image-rgba firefox-talos-gfx-0 342050.17 (342155.88 0.02%) -> 69412.44 (69702.90 0.21%): 4.93x speedup
███▉
image-rgba vim-0 97518.13 (97696.23 1.21%) -> 30712.63 (31238.65 0.85%): 3.18x speedup
██▏
image-rgba evolution-0 69927.77 (110261.08 19.84%) -> 24430.05 (25368.85 1.89%): 2.86x speedup
█▉
image-rgba poppler-0 41452.61 (41547.03 2.51%) -> 21195.52 (21656.85 1.08%): 1.96x speedup
█
image-rgba firefox-planet-gnome-0 217512.61 (217636.80 0.06%) -> 123341.02 (123641.94 0.12%): 1.76x speedup
▊
image-rgba swfdec-youtube-0 41302.71 (41373.60 0.11%) -> 31343.93 (31488.87 0.23%): 1.32x speedup
▍
image-rgba swfdec-giant-steps-0 20699.54 (20739.52 0.10%) -> 17360.19 (17375.51 0.04%): 1.19x speedup
▎
image-rgba gvim-0 167837.47 (168027.68 0.51%) -> 151105.94 (151635.85 0.18%): 1.11x speedup
▏
image-rgba firefox-talos-svg-0 375273.43 (388250.94 1.60%) -> 356846.09 (370370.08 1.86%): 1.05x speedup
This is a more useful definition that is able to individually track the
rectangles that compose the composite operation. This will be used by
the specialist compositors as a means to perform the common extents
determination for an operation.
It is quite common amongst our geometry to have rows of repeated span
data, for example a rounded rectangle will have repeating data between
the top and bottom rounded corners. By passing the repeat length to the
renderers, they may be able to use that information more efficiently,
and the scan converters can avoid recomputing the same span data.
The device is a generic method for accessing the underlying interface
with the native graphics subsystem, typically the X connection or
perhaps the GL context. By exposing a cairo_device_t on a surface and
its various methods we enable finer control over interoperability with
external interactions of the device by applications. The use case in
mind is, for example, a multi-threaded gstreamer which needs to serialise
its own direct access to the device along with Cairo's across many
threads.
Secondly, the cairo_device_t is a unifying API for the mismash of
backend specific methods for controlling creation of surfaces with
explicit devices and a convenient hook for debugging and introspection.
The principal components of the API are the memory management of:
cairo_device_reference(),
cairo_device_finish() and
cairo_device_destroy();
along with a pair of routines for serialising interaction:
cairo_device_acquire() and
cairo_device_release()
and a method to flush any outstanding accesses:
cairo_device_flush().
The device for a particular surface may be retrieved using:
cairo_surface_get_device().
The device returned is owned by the surface.
Within our code base we carried a few hacks to utilize the component
alpha capabilities of pixman, whilst not supporting the concept for our
own masks. Thus we were setting it upon the pixman_image_t that we
passed around through code that was blissfully unaware and indeed the
component-alpha property was forgotten (e.g. upgrading glyph masks).
The real issue is that without explicit support that a pattern carries
subpixel masking information, that information is lost when using that
pattern with composite. Again we can look at the example of compositing
a sub-pixel glyph mask onto a remote xlib surface for further failure.
A nasty surprise whilst profiling is that performing redundant clear
operations is extremely painful. We can mitigate this somewhat by
tracking the cleared state of surfaces and skipping repeated attempts to
clear a surface.
Gah, that was a horrible mistake. It was a flawed hack to create Pixmaps
of the correct depth when cloning patterns for blitting to the xlib
backend. However, it had the nasty side-effect of discarding alpha when
targeting Window surfaces. The correct solution is to simply correct the
Pixmap of the desired depth and render a matching pattern onto the
surface - i.e. a reversal the current acquire -> clone. See the
forthcoming revised xcb backend on how I should have done it originally.
Honour the incoming surface format when we are asked to create a similar
surface with identical content. The goal of
cairo_surface_create_similar() is to create an intermediate with similar
characteristics to the original that can be used in place of the
original and be quick to copy to the original. Matching the format for
the same content, ensures that the blits between the two need only be a
memcpy.
Benjamin Otte pointed out the error of my ways that a clear on a
cairo_image_surface_create_for_data() was not working. This is because I
modified the image surface to skip clears when it knows the target data
has been cleared. This flag must be reset when the user interacts with
the surface, such as providing the initial surface data.
The image surface code doesn't reliably work on images larger than
32767 in width or height. This patch makes the image surface
constructors fail by returning a surface in the CAIRO_STATUS_INVALID_SIZE
state when given negative or too large dimensions so that client code
gets a prompt and correct error rather than flaky rendering on large
images.
Always use spans, even for unaligned boxes. In the future (given a new
interface) we may want to emit the common unaligned box code more
efficient than a per-scanline computation -- but for now simply avoid the
requirements to write a temporary CPU buffer.
On slow machines the call to pixman_fill_sse2() on similar surfaces that
we know are already zeroed takes a significant amount of time [12.77% of
the profile for a firefox trace, cf to just 3% of the profile is spent
inside memset].
Rather than solve why the pixman_fill_sse2() is so slow, simply skip the
redundant clears.
Use the DRM interface to h/w accelerate composition on image surfaces.
The purpose of the backend is simply to explore what such a hardware
interface might look like and what benefits we might expect. The
use case that might justify writing such custom backends are embedded
devices running a drm compositor like wayland - which would, for example,
allow one to write applications that seamlessly integrated accelerated,
dynamic, high quality 2D graphics using Cairo with advanced interaction
(e.g. smooth animations in the UI) driven by a clutter framework...
In this first step we introduce the fundamental wrapping of GEM for intel
and radeon chipsets, and, for comparison, gallium. No acceleration, all
we do is use buffer objects (that is use the kernel memory manager) to
allocate images and simply use the fallback mechanism. This provides a
suitable base to start writing chip specific drivers.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Allow the caller to choose whether or not various conversions take place.
The first flag is used to disable the expansion of reflected patterns into a
repeating surface.
By simply iterating over the array cairo_trapezoid_t, converting each one
separately to a pixman_trapezoid_t and rasterizing each one individually
we can avoid the common heap allocation. pixman performs exactly the same
iteration internally so there is no efficiency loss.
Damian Frank noted
[http://lists.cairographics.org/archives/cairo/2009-May/017095.html]
a performance problem with an older XServer with an
unaccelerated composite - similar problems will be seen with non-XRender
servers which will trigger extraneous fallbacks. The problem he found was
that painting an ARGB32 image onto an RGB24 destination window (using
SOURCE) was going via the RENDER protocol and not core. He was able to
demonstrate that this could be worked around by declaring the pixel data as
an RGB24 image. The issue is that the image is uploaded into a temporary
pixmap of matching depth (i.e. 32 bit for ARGB32 and 24 bit for RGB23
data), however the core protocol can only blit between Drawables of
matching depth - so without the work-around the Drawables are mismatched
and we either need to use RENDER or fallback.
This patch adds a content mask to _cairo_surface_clone_similar() to
provide the extra bit of information to the backends for when it is
possible for them to drop channels from the clone. This is used by the
xlib backend to only create a 24 bit source when blitting to a Window.