If the tracer encounters an unknown enum value it
ought not to crash. Theis patch replaces the idiom
of looking up a name for an enumerated value directly
from a table by a switch statement. As a bonus we get
warnings from the compiler when the enums are updated
in cairo.
If the tracer's object stack underflows we want to
know about is as soon as possible. This patch adds
checks against the stack overflowing and aborts the
program with an object stack dump if it does.
Support non-Linux systems which don't have a /proc/self/cmdline
by transferring the application name given to cairo-trace via
an environment variable CAIRO_TRACE_PROG_NAME.
cairo_script_context_t is an encapsulation object for interfacing with the
output - multiple surfaces can share the same context, meaning that they
write to the same destination file/stream.
Larry Ewing hit a bug in cairo-trace whereby it tried to create a similar
surface referencing an undefined object. This fix checks whether the
object has yet to be defined, and if not issues an index in order to copy
the appropriate operand from the stack.
font expects the dictionary to be constructed on the stack for its use, so
close it. (I missed the closing '>>' when switching between dictionary
constructors.)
Handling clip as part of the surface state, as opposed to being part of
the operation state, is cumbersome and a hindrance to providing true proxy
surface support. For example, the clip must be copied from the surface
onto the fallback image, but this was forgotten causing undue hassle in
each backend. Another example is the contortion the meta surface
endures to ensure the clip is correctly recorded. By contrast passing the
clip along with the operation is quite simple and enables us to write
generic handlers for providing surface wrappers. (And in the future, we
should be able to write more esoteric wrappers, e.g. automatic 2x FSAA,
trivially.)
In brief, instead of the surface automatically applying the clip before
calling the backend, the backend can call into a generic helper to apply
clipping. For raster surfaces, clip regions are handled automatically as
part of the composite interface. For vector surfaces, a clip helper is
introduced to replay and callback into an intersect_clip_path() function
as necessary.
Whilst this is not primarily a performance related change (the change
should just move the computation of the clip from the moment it is applied
by the user to the moment it is required by the backend), it is important
to track any potential regression:
ppc:
Speedups
========
image-rgba evolution-20090607-0 1026085.22 0.18% -> 672972.07 0.77%: 1.52x speedup
▌
image-rgba evolution-20090618-0 680579.98 0.12% -> 573237.66 0.16%: 1.19x speedup
▎
image-rgba swfdec-fill-rate-4xaa-0 460296.92 0.36% -> 407464.63 0.42%: 1.13x speedup
▏
image-rgba swfdec-fill-rate-2xaa-0 128431.95 0.47% -> 115051.86 0.42%: 1.12x speedup
▏
Slowdowns
=========
image-rgba firefox-periodic-table-0 56837.61 0.78% -> 66055.17 3.20%: 1.09x slowdown
▏
Based on the work by Øyvind Kolås and Pierre Tardy -- many thanks to
Pierre for pushing this backend for inclusion as well as testing and
reviewing my initial patch. And many more thanks to pippin for writing the
backend in the first place!
Hacked and chopped by myself into a suitable basis for a backend. Quite a
few issues remain open, but would seem to be ready for testing on suitable
hardware.
It is easier on the eye to use
'1 index set-source exch pop'
rather than
'dup /p0 exch def p0 set-source /p0 undef'
(as patterns are expected to be temporary so we strive to avoid naming
them).
After opening a specific file or fd for ourselves, reset the
CAIRO_TRACE_FD to point to an invalid fd in order to prevent any child
processes (who inherit our environment) from attempting to trace cairo
calls. If we allow them to continue, then the two traces will intermix
and be unreplayable.
Embed the pixels for images less than 32*32 as this catches most icons
which are frequently uploaded, but is still an unlikely size for a
destination image surface.
Using a null surface is a convenient method to measure the overhead of the
performance testing framework, so export it although as a test-surface so
that it will only be available in development builds and not pollute
distributed libraries.
python lazily loads libcairo.so and so it is not available via RTLD_NEXT,
and we need to dlopen cairo ourselves. Similarly the linker is not able to
resolve any naked function references and so we need to ensure that all of
our own calls into the library are wrapped with DLCALL.
Objects like cairo_scaled_font_t may return a reference to a previously
defined scaled-font instead of creating a new token each time. This caused
cairo-trace to overflow its operand stack by pushing a new instance of the
old token every time. Modify the tracer such that a font token can only
appear once on the stack -- for font-faces we remove the old operand and
for scaled-fonts we simply pop, chosen to reflect expected usage.
Applications like firefox have a very conservative approach and mark
surfaces dirty before every render. As we record the image data every
time, firefox traces can grow very large with redundant data - so allow
the user to disable mark dirty tracing.
In order to have locale-independent output of decimal values, we need to
manually transform such numbers into strings. As this is a solved problem
for cairo, we adopt _cairo_output_stream_printf() and in particular the
_cairo_dtostr() routine for our own printf processing.
Reparsing the dwarf info for every lookup is very slow, so cache the
symbol lookups. This initial implementation is unbounded in the simple
belief that the actual number of unique lookups during a program's
lifetime should be fairly small. (Extending to a bounded MRU list is left
as an exercise for the reader.)