When HAS_ATOMIC_OPS is not defined, cairo-image-surface.c does not
compile because _pixman_white_image calls _pixman_image_for_solid
which gets defined only later in the code.
The low-level surface composite interface will disappear in the near
future and results in much more ugly code than calling the high level
interface - so use it when flattening images into the page background.
On my Core2, the library version of lround() is faster than our
hand-rolled non-floating point implementation. So only enable our code
if we are trying to minimise the number of floating point operations --
even then, it would worth investigating the library performance first.
[Just a reminder that optimisation choices will change over time as our
hardware and software evolves.]
Still an experimental backend, it's now a little too late to stabilise
for 1.10, but this should represent a major step forward in its feature
set and an attempt to catch up with all the bug fixes that have been
performed on xlib. Notably not tested yet (and expected to be broken)
are mixed-endian connections and low bitdepth servers (the dithering
support has not been copied over for instance). However, it seems robust
enough for daily use...
Of particular note in this update is that the xcb surface is now capable
of subverting the xlib surface through the ./configure --enable-xlib-xcb
option. This replaces the xlib surface with a proxy that forwards all
operations to an equivalent xcb surface whilst preserving the cairo-xlib
API that is required for compatibility with the existing applications,
for instance GTK+ and Mozilla. Also you can experiment with enabling a
DRM bypass, though you need to be extremely foolhardy to do so.
As proof-of-principle add the nearly working demonstrations of using DRM
to render directly with the GPU bypassing both RENDER and GL for
performance whilst preserving high quality rendering.
The basis behind developing these chip specific backends is that this is
the idealised interface that we desire for this chips, and so a target
for cairo-gl as we continue to develop both it and our GL stack.
Note that this backends do not yet fully pass the test suite, so only
use if you are brave and willing to help develop them further.
Write a dedicated compositor for pixman so that we avoid the
middle-layer syndrome of surface-fallback. The major upshot of this
rewrite is that the image surface is now several times quicker for glyph
compositing, which dramatically improves performance for text rendering
by firefox and friends. It also uses a couple of the new scan
convertors, such as the rectangular scan converter for rectilinear
paths.
Speedups
========
image-rgba firefox-talos-gfx-0 342050.17 (342155.88 0.02%) -> 69412.44 (69702.90 0.21%): 4.93x speedup
███▉
image-rgba vim-0 97518.13 (97696.23 1.21%) -> 30712.63 (31238.65 0.85%): 3.18x speedup
██▏
image-rgba evolution-0 69927.77 (110261.08 19.84%) -> 24430.05 (25368.85 1.89%): 2.86x speedup
█▉
image-rgba poppler-0 41452.61 (41547.03 2.51%) -> 21195.52 (21656.85 1.08%): 1.96x speedup
█
image-rgba firefox-planet-gnome-0 217512.61 (217636.80 0.06%) -> 123341.02 (123641.94 0.12%): 1.76x speedup
▊
image-rgba swfdec-youtube-0 41302.71 (41373.60 0.11%) -> 31343.93 (31488.87 0.23%): 1.32x speedup
▍
image-rgba swfdec-giant-steps-0 20699.54 (20739.52 0.10%) -> 17360.19 (17375.51 0.04%): 1.19x speedup
▎
image-rgba gvim-0 167837.47 (168027.68 0.51%) -> 151105.94 (151635.85 0.18%): 1.11x speedup
▏
image-rgba firefox-talos-svg-0 375273.43 (388250.94 1.60%) -> 356846.09 (370370.08 1.86%): 1.05x speedup
Discard a redundant clear as the image surface is guaranteed to return
a cleared surface that meets pixman/xlib requirements for alignment, and
more importantly add the ComponentAlpha flag on the pixman image
generated as appropriate.
Revamp clipping in preparation for the removal of the low-level interface
and promote backend to use the higher levels. The principle here is that
the higher level interface gives the backend more scope for choosing
better performing primitives.
Frequently we only need the coarse path bounds, so avoid walking over
the list of points once more as we can cheaply track the extents during
construction.
By preallocating in our data segment a couple of solid patterns for the
stock colours, it becomes more convenient when using those in surface
operations, such as when clearing.
An issue that we currently have is that we have a pessimistic
false-positive rate when determining whether glyphs within a string
overlap. By using the tight bounds, the overlap detection is arguably
less accurate presuming pixel-aligned opacity masks but we make the
trade-off for performance.
Having added a specialised scan converter on the premise that it should
be better at handling rounded rectangles, ensure that they are indeed
rendered correctly.
Enable origin tracking by default for make check-valgrind. This is
slower and consumes more memory than regular valgrind, but the
additional information provided about the source of the uninitialised
data is often invaluable.
This is a highly specialised scan converter for the relatively common
case of where the input geometry is known to be a series of rectangles.
Generally not device aligned (or else we would most likely have chosen
an even higher performance path that does not require a coverage mask),
this optimised converter can simply compute the analytical coverage by
utilising a special case Bentley-Ottmann intersection finder.
This variant uses the Bentley-Ottmann algorithm to only maintain the
active edge list upon edge events and so can efficiently skip areas
where no change occurs. This means that it can be much quicker than the
Tor algorithm (which is still used to compute the coverages from the
active edges) for geometries consisting of long straight lines with few
intersections. However due to the computational overhead of the
Bentley-Ottmann event processing, for dense curvy paths, simply updating
the active edge list in sync with computing the coverages is a win. Due
to advantageous adaptive step size, the scan converter can be run at a
much higher subsampling with little extra overhead compared with Tor,
currently it uses a 256x256 subsampling grid to avoid any impedance
mismatch with path precision.
Given the current status of implementations, this scan converter [botor]
is likely to be advantage where detecting large regions of unchanged
span data will result in improved performance, for instance the drm
backends which convert the scan data into rectangles.
Currently we use cairo_traps_t to also pass around arrays of boxes. This
is woefully inefficient in terms of storage, but also means that we
repeatedly have to verify that the traps are a set of boxes. By
explicitly passing around a cairo_boxes_t we avoid the semantic loss.
This will be heavily used in pending commits.
This is a more useful definition that is able to individually track the
rectangles that compose the composite operation. This will be used by
the specialist compositors as a means to perform the common extents
determination for an operation.
It is quite common amongst our geometry to have rows of repeated span
data, for example a rounded rectangle will have repeating data between
the top and bottom rounded corners. By passing the repeat length to the
renderers, they may be able to use that information more efficiently,
and the scan converters can avoid recomputing the same span data.
Add a gl-window boilerplate target to exercise using GL to render to a
visible Drawable -- for instance, a window has a different coordinate
system to a framebuffer...
The device is a generic method for accessing the underlying interface
with the native graphics subsystem, typically the X connection or
perhaps the GL context. By exposing a cairo_device_t on a surface and
its various methods we enable finer control over interoperability with
external interactions of the device by applications. The use case in
mind is, for example, a multi-threaded gstreamer which needs to serialise
its own direct access to the device along with Cairo's across many
threads.
Secondly, the cairo_device_t is a unifying API for the mismash of
backend specific methods for controlling creation of surfaces with
explicit devices and a convenient hook for debugging and introspection.
The principal components of the API are the memory management of:
cairo_device_reference(),
cairo_device_finish() and
cairo_device_destroy();
along with a pair of routines for serialising interaction:
cairo_device_acquire() and
cairo_device_release()
and a method to flush any outstanding accesses:
cairo_device_flush().
The device for a particular surface may be retrieved using:
cairo_surface_get_device().
The device returned is owned by the surface.
Implement a recursive mutex which will be needed for cairo_device_t.
In particular only pthreads by default is a non-recursive mutex (to my
knowledge) - both win32 critical sections and mutexes on Quartz are
recursive.
We were using _GNU_SOURCE throughout the codebase, so simply define it
once during configure. This is the easiest method to enable recursive
mutexes using pthreads, as required in a pending patch.