Commit 8d67186cb2 caches whether the device
transform is identity on context creation. However, the api is quite lax
and allows the user to modify the device transform *after* he has
started to use the surface in a context, as apparently WebKit does.
Since this is not the only instance where we may need to invalidate
caches if the user modifies state, introduce a simple mechanism for
hooking into notifications of property changes.
Fixes test/clip-device-offset.
This allows designing a cleaner interface for cairo_composite_t as there
will not be static functions that get called outside of the "published"
interfaces.
Trying to build xcb on a system without SHM wrapped by xcb. The right
answer would be to build libxcb-shm. The quick answer is to compile out
shm support.
Still an experimental backend, it's now a little too late to stabilise
for 1.10, but this should represent a major step forward in its feature
set and an attempt to catch up with all the bug fixes that have been
performed on xlib. Notably not tested yet (and expected to be broken)
are mixed-endian connections and low bitdepth servers (the dithering
support has not been copied over for instance). However, it seems robust
enough for daily use...
Of particular note in this update is that the xcb surface is now capable
of subverting the xlib surface through the ./configure --enable-xlib-xcb
option. This replaces the xlib surface with a proxy that forwards all
operations to an equivalent xcb surface whilst preserving the cairo-xlib
API that is required for compatibility with the existing applications,
for instance GTK+ and Mozilla. Also you can experiment with enabling a
DRM bypass, though you need to be extremely foolhardy to do so.
As proof-of-principle add the nearly working demonstrations of using DRM
to render directly with the GPU bypassing both RENDER and GL for
performance whilst preserving high quality rendering.
The basis behind developing these chip specific backends is that this is
the idealised interface that we desire for this chips, and so a target
for cairo-gl as we continue to develop both it and our GL stack.
Note that this backends do not yet fully pass the test suite, so only
use if you are brave and willing to help develop them further.
This is a highly specialised scan converter for the relatively common
case of where the input geometry is known to be a series of rectangles.
Generally not device aligned (or else we would most likely have chosen
an even higher performance path that does not require a coverage mask),
this optimised converter can simply compute the analytical coverage by
utilising a special case Bentley-Ottmann intersection finder.
This variant uses the Bentley-Ottmann algorithm to only maintain the
active edge list upon edge events and so can efficiently skip areas
where no change occurs. This means that it can be much quicker than the
Tor algorithm (which is still used to compute the coverages from the
active edges) for geometries consisting of long straight lines with few
intersections. However due to the computational overhead of the
Bentley-Ottmann event processing, for dense curvy paths, simply updating
the active edge list in sync with computing the coverages is a win. Due
to advantageous adaptive step size, the scan converter can be run at a
much higher subsampling with little extra overhead compared with Tor,
currently it uses a 256x256 subsampling grid to avoid any impedance
mismatch with path precision.
Given the current status of implementations, this scan converter [botor]
is likely to be advantage where detecting large regions of unchanged
span data will result in improved performance, for instance the drm
backends which convert the scan data into rectangles.
Currently we use cairo_traps_t to also pass around arrays of boxes. This
is woefully inefficient in terms of storage, but also means that we
repeatedly have to verify that the traps are a set of boxes. By
explicitly passing around a cairo_boxes_t we avoid the semantic loss.
This will be heavily used in pending commits.
This is a more useful definition that is able to individually track the
rectangles that compose the composite operation. This will be used by
the specialist compositors as a means to perform the common extents
determination for an operation.
The device is a generic method for accessing the underlying interface
with the native graphics subsystem, typically the X connection or
perhaps the GL context. By exposing a cairo_device_t on a surface and
its various methods we enable finer control over interoperability with
external interactions of the device by applications. The use case in
mind is, for example, a multi-threaded gstreamer which needs to serialise
its own direct access to the device along with Cairo's across many
threads.
Secondly, the cairo_device_t is a unifying API for the mismash of
backend specific methods for controlling creation of surfaces with
explicit devices and a convenient hook for debugging and introspection.
The principal components of the API are the memory management of:
cairo_device_reference(),
cairo_device_finish() and
cairo_device_destroy();
along with a pair of routines for serialising interaction:
cairo_device_acquire() and
cairo_device_release()
and a method to flush any outstanding accesses:
cairo_device_flush().
The device for a particular surface may be retrieved using:
cairo_surface_get_device().
The device returned is owned by the surface.
The first iteration of COW snapshotting always made an initial copy when
the snapshot was requested (and reused that copy until the surface was
modified). However, in a few circumstances we can avoid even that copy
so long as the surface is still alive and unmodified between the
snapshotting and its use. In order to do so, we need a new proxy surface
that can automatically perform the copy if the target should disappear
prior to use.
A subsurface is a region of another surface that may be used either to
restrict the writable area of a context or the readable extents of a
source. Whilst writing, access to the exterior of the subsurface is
prevented via clipping and when used as a source reads from the exterior
of the subsurface are governed via the extend mechanism of the pattern.
This is a simplified version of the wrapping surface where the target
surface is just a subsurface onto which we wish to draw the current
operation. In particular this is useful for the subsurface API as well
as fallbacks.
Ultimately, we want all of our paths to use shaders when they are
exposed -- it brings us closer to GL 3.0 compatibility and it should
reduce the work that GL drivers have to do per operation to compute
the required hardware state.
This appears to be the simplest mechanism to build libglew at the moment -
should a system copy be unavailable. Fortunately libglew is now distributed
under a permissive licence.
If you want to pass 'make -C src check' you have to use the system copy,
or spend quite a bit of time cairo-fying libglew.
Originally written by Vladimir Vukicevic to investigate using Skia for
Mozilla, it provides a nice integration with a rather interesting code
base. By hooking Skia underneath Cairo it allows us to directly compare
code paths... which is interesting.
[updated by Chris Wilson]
A very simple surface that produces a hierarchical DAG in a simple XML
format. It is intended to be used whilst debugging, for example with the
automatic regression finding tools of cairo-sphinx, and with test suites
that just want to verify that their code made a particular Cairo call.
Add a new surface type that multiplies it input onto several output
surfaces. The only limitation is that it requires a master surface that is
used whenever we need to query surface options, such as font options and
extents.
Add an even simpler sweep-line tessellator for rectangular trapezoids (as
produced by the rectilinear stoker and box filler).
This is so simple it even outperforms pixman's region validation code for the
purposes of path-to-region conversion.
For the frequent cases where we know in advance that we are dealing with a
rectilinear path, but can not use the simple region code, implement a
variant of the Bentley-Ottmann tessellator. The advantages here are that
edge comparison is very simple (we only have vertical edges) and there are
no intersection, though possible overlaps. The idea is the same, maintain
a y-x sorted queue of start/stop events that demarcate traps and sweep
through the active edges at each event, looking for completed traps.
The motivation for this was noticing a performance regression in
box-fill-outline with the self-intersection work:
1.9.2 to HEAD^: 3.66x slowdown
HEAD^ to HEAD: 5.38x speedup
1.9.2 to HEAD: 1.57x speedup
The cause of which was choosing to use spans instead of the region handling
code, as the complex polygon was no longer being tessellated.
In order to efficient store small images, we need to pack them into a
large texture. The rtree handles allocation of small rectangles out of a
much larger whole. As well as tracking free rectangles, it can also be
used to note which parts of the texture are 'pinned' -- that is have
operations currently pending and so can not be modified until that batch
of operations have been flushed. When the rtree is full, i.e. there is no
single free rectangle to accommodate the allocation request, it will
randomly evict an unpinned block large enough to fit the request. The
block may comprise just a single glyph, or a subtree of many glyphs. This
may not be the best strategy, but it is an effective start.
Use the DRM interface to h/w accelerate composition on image surfaces.
The purpose of the backend is simply to explore what such a hardware
interface might look like and what benefits we might expect. The
use case that might justify writing such custom backends are embedded
devices running a drm compositor like wayland - which would, for example,
allow one to write applications that seamlessly integrated accelerated,
dynamic, high quality 2D graphics using Cairo with advanced interaction
(e.g. smooth animations in the UI) driven by a clutter framework...
In this first step we introduce the fundamental wrapping of GEM for intel
and radeon chipsets, and, for comparison, gallium. No acceleration, all
we do is use buffer objects (that is use the kernel memory manager) to
allocate images and simply use the fallback mechanism. This provides a
suitable base to start writing chip specific drivers.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Based on the work by Øyvind Kolås and Pierre Tardy -- many thanks to
Pierre for pushing this backend for inclusion as well as testing and
reviewing my initial patch. And many more thanks to pippin for writing the
backend in the first place!
Hacked and chopped by myself into a suitable basis for a backend. Quite a
few issues remain open, but would seem to be ready for testing on suitable
hardware.
Remove some redundant defining of surfaces and contexts and of setting
defaults. In order to reduce the number of defines, we need to operate on
the operand stack more frequently - though in practice those operations
are quite rare.
Written by Vladimir Vukicevic to enable integration with Qt embedded
devices, this backend allows cairo code to target QPainter, and use
it as a source for other cairo backends.
This imports the sources from mozilla-central:
http://mxr.mozilla.org/mozilla-central/find?text=&kind=text&string=cairo-qpainter
renames them from cairo-qpainter to cairo-qt, and integrates the patch
by Oleg Romashin:
https://bugs.freedesktop.org/attachment.cgi?id=18953
And then attempts to restore 'make check' to full functionality.
However:
- C++ does not play well with the PLT symbol hiding, and leaks into the
global namespace. 'make check' fails at check-plt.sh
- Qt embeds a GUI into QApplication which it requires to construct any
QPainter drawable, i.e. used by the boilerplate to create a cairo-qt
surface, and this leaks fonts (cairo-ft-fonts no less) causing assertion
failures that all cairo objects are accounted for upon destruction.
[Updated by Chris Wilson]
Acked-by: Jeff Muizelaar <jeff@infidigm.net>
Acked-by: Carl Worth <cworth@cworth.org>
Using a null surface is a convenient method to measure the overhead of the
performance testing framework, so export it although as a test-surface so
that it will only be available in development builds and not pollute
distributed libraries.
Felt like pulling the latest stuff, since I branched back in February.
Conflicts:
build/configure.ac.features
src/cairo.h
util/cairo-script/csi-replay.c
Add a variation of test-fallback-surface that forces the use of a 16-bit
pixman format code instead of the standard 32-bit types. This creates an
image surface akin to the fallbacks used with low bit-depth xservers.
Imports a new polygon scan converter implementation from the
repository at
http://cgit.freedesktop.org/~joonas/glitter-paths/
Glitter paths is a stand alone polygon rasteriser derived from David
Turner's reimplementation of Tor Anderssons's 15x17 supersampling
rasteriser from the Apparition graphics library. The main new feature
in this implementation is cheaply choosing per-scan line between doing
fully analytical coverage computation for an entire row at a time
vs. using a supersampling approach.