When compiling we can depend on whatever version of cairo we need, but
we should be wary of checking for runtime compatibility when building
standalone.
We were exposing the actual value of CAIRO_FORMAT_INVALID
through API functions already, so it makes sense to just
go ahead and put it in the cairo_format_t enum.
Should fix:
Bug 26509 - Cairo fails to compile without mmap
http://bugs.freedesktop.org/show_bug.cgi?id=26509
As reported by Hib Eris, Cairo files to compile under a mingw32
cross-compiler as we use a structure only defined if HAVE_MMAP
unconditionally.
By implicitly reference the target of the context instead, i.e.
this reduces the use of:
/target get (example.png) write-to-png pop
as a common idiom where the context is kept on the stack and the surface
forgotten.
Real applications that control their Drawable externally to Cairo are
'disadvantaged' by cairo-perf-trace when it creates a similar surface
for each new instance of the same Drawable. The difficulty in
maintaining one perf surface for every application surface is that the
traces do not track lifetimes for the application surfaces, so we would
just accumulate stale surfaces. The surface cache takes a different
approach and returns the same surface for each active Drawable, and
maintains a hold-over of the MRU 16 surfaces. This achieves 60-80% hit
rate with firefox, which is probably as good as can be expected.
Obviously for double-buffered applications we only every draw to freshly
created surfaces (and Gtk+ bypasses cairo to do the final copy -- the
ideal application would just use a push-group for double buffering, in
which case we would capture and replay the entire expose event).
To enable use of the surface cache whilst replaying use -c:
./cairo-perf-trace -c firefox-talos-gfx
In order to get a baseline for win32 performance testing, always create a
font so that the trace can be replayed. Not ideal, but I feel this the
pragmatic solution for judging the performance differentials before I can
work out a better solution for loading typ42 fonts.
In order to enable replay of traces on machines that do not use FreeType
as the native font system, we need to convert a type42 font into something
similar. Currently the fallback is just to select a font with the same
name - this ignores weight and slant, and many other details.
A very simple surface that produces a hierarchical DAG in a simple XML
format. It is intended to be used whilst debugging, for example with the
automatic regression finding tools of cairo-sphinx, and with test suites
that just want to verify that their code made a particular Cairo call.
cairo_script_context_t is an encapsulation object for interfacing with the
output - multiple surfaces can share the same context, meaning that they
write to the same destination file/stream.
Kerning is quite frequent, that is to apply a horizontal but no vertical
offset to a glyph. For instance by discarding the vertical coordinate
where it remains the same and only encoding the horizontal offset we
reduce the file size by ~12.5% when tracing poppler.
Infrequently, but, for example, handling glyph strings, we require the
string to be nul terminated. (Otherwise an error occurs, which was
previously compounded by a drastic leak.)
If we fail to add the glyph cache (presumably because the font is in
error) do not leak the allocation. As this occurs for every single glyph
string, the leak can grow very quickly and mask the original bug.
Currently the replay creates a fresh surface for every new surface. Whilst
using it to view replays (such as with --xlib) this is often not what is
desired so add a mode (compile-time only for now) to use similar surfaces
and blits to the front buffer instead.
After diverting the pointers to accommodate lazy decompressing of the
source, the bytecode pointer was left pointing to the original location
that had already been freed - thus passing an invalid block to FreeType
and unsurprisingly then, blowing up.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Hook into the scanner to write out binary version of the tokenized
objects -- note we bind executable names (i.e. check to see if is an
operator and substitute the name with an operator -- this breaks
overloading of operators by scripts).
By converting scripts to a binary form, they are more compact and
execute faster:
firefox-world-map.trace 526850146 bytes
bound.trace 275187755 bytes
[ # ] backend test min(s) median(s) stddev. count
[ 0] null bound 34.481 34.741 0.68% 3/3
[ 1] null firefox-world-map 89.635 89.716 0.19% 3/3
[ 0] drm bound 79.304 79.350 0.61% 3/3
[ 1] drm firefox-world-map 135.380 135.475 0.58% 3/3
[ 0] image bound 95.819 96.258 2.85% 3/3
[ 1] image firefox-world-map 156.889 156.935 1.36% 3/3
[ 0] xlib bound 539.130 550.220 1.40% 3/3
[ 1] xlib firefox-world-map 596.244 613.487 1.74% 3/3
This trace has a lot of complex paths and the use of binary floating point
reduces the file size by about 50%, with a commensurate reduction in scan
time and significant reduction in operator lookup overhead. Note that this
test is still IO/CPU bound on my i915 with its pitifully slow flash...