In order to get a baseline for win32 performance testing, always create a
font so that the trace can be replayed. Not ideal, but I feel this the
pragmatic solution for judging the performance differentials before I can
work out a better solution for loading typ42 fonts.
In order to enable replay of traces on machines that do not use FreeType
as the native font system, we need to convert a type42 font into something
similar. Currently the fallback is just to select a font with the same
name - this ignores weight and slant, and many other details.
If the tracer encounters an unknown enum value it
ought not to crash. Theis patch replaces the idiom
of looking up a name for an enumerated value directly
from a table by a switch statement. As a bonus we get
warnings from the compiler when the enums are updated
in cairo.
If the tracer's object stack underflows we want to
know about is as soon as possible. This patch adds
checks against the stack overflowing and aborts the
program with an object stack dump if it does.
Support non-Linux systems which don't have a /proc/self/cmdline
by transferring the application name given to cairo-trace via
an environment variable CAIRO_TRACE_PROG_NAME.
A very simple surface that produces a hierarchical DAG in a simple XML
format. It is intended to be used whilst debugging, for example with the
automatic regression finding tools of cairo-sphinx, and with test suites
that just want to verify that their code made a particular Cairo call.
sphinx is an alternate version of the current cairo-test-trace. It's
purpose is to replay a live application (which may just be a replay of a
trace) against a backend and its reference. The improvement over the
original cairo-test-trace is that the reference backend may be from an
older version of cairo.
This is a simple variation on cairo-trace that wraps records the last 16
contexts by wrapping the target surface inside a tee surface, along with a
meta/recording surface. Then on receipt of a SIGUSR1, those last 16
contexts are played via a script-surface into /tmp/fdr.trace.
Mostly proof-of-concept, it seems to be causing a number of rendering
glitches whilst testing with firefox -- otherwise, it seems to works.
cairo_script_context_t is an encapsulation object for interfacing with the
output - multiple surfaces can share the same context, meaning that they
write to the same destination file/stream.
Larry Ewing hit a bug in cairo-trace whereby it tried to create a similar
surface referencing an undefined object. This fix checks whether the
object has yet to be defined, and if not issues an index in order to copy
the appropriate operand from the stack.
Kerning is quite frequent, that is to apply a horizontal but no vertical
offset to a glyph. For instance by discarding the vertical coordinate
where it remains the same and only encoding the horizontal offset we
reduce the file size by ~12.5% when tracing poppler.
We refactor the surface fallbacks to convert full strokes and fills to the
intermediate polygon representation (as opposed to before where we
returned the trapezoidal representation). This allow greater flexibility
to choose how then to rasterize the polygon. Where possible we use the
local spans rasteriser for its increased performance, but still have the
option to use the tessellator instead (for example, with the current
Render protocol which does not yet have a polygon image).
In order to accommodate this, the spans interface is tweaked to accept
whole polygons instead of a path and the tessellator is tweaked for speed.
Performance Impact
==================
...
Still measuring, expecting some severe regressions.
...
Infrequently, but, for example, handling glyph strings, we require the
string to be nul terminated. (Otherwise an error occurs, which was
previously compounded by a drastic leak.)
If we fail to add the glyph cache (presumably because the font is in
error) do not leak the allocation. As this occurs for every single glyph
string, the leak can grow very quickly and mask the original bug.
Currently the replay creates a fresh surface for every new surface. Whilst
using it to view replays (such as with --xlib) this is often not what is
desired so add a mode (compile-time only for now) to use similar surfaces
and blits to the front buffer instead.
font expects the dictionary to be constructed on the stack for its use, so
close it. (I missed the closing '>>' when switching between dictionary
constructors.)
After diverting the pointers to accommodate lazy decompressing of the
source, the bytecode pointer was left pointing to the original location
that had already been freed - thus passing an invalid block to FreeType
and unsurprisingly then, blowing up.
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Based on the work by Øyvind Kolås and Pierre Tardy -- many thanks to
Pierre for pushing this backend for inclusion as well as testing and
reviewing my initial patch. And many more thanks to pippin for writing the
backend in the first place!
Hacked and chopped by myself into a suitable basis for a backend. Quite a
few issues remain open, but would seem to be ready for testing on suitable
hardware.