Oops we were accumulating paths during each spiral iteration and so the
tests were getting slower and slower and slower...
[And fix a couple of other instances of path accumulation.]
A common mistake is to forget to pass the foreground mode to
cairo-test-suite when launching it under the debugger, resulting in the
debugger not attaching to the children and missing the error you were
trying to capture. Under linux, we can inspect the path to our parent's
executable and if that looks like gdb, we assume it is and disable forking
of traces.
Add a command line option to the test suite to cause it to exit after the
first failure. The purpose of this is for integration into 'git bisect run',
where the failing test is unknown and we are looking for any failure. For
example, for use in a regression script to find commits in the midst of as
series that need a refresh of a reference image (or fixing!).
Cairo should include the contents of subwindows when using a Window as a
source but will clip to subwindows when using a Window as a destination.
This can be set using the GC's subwindow_mode.
XCopyArea and XFillRectangle can however only use one GC for both source
and destination. Cairo's mode is set to (the default) ClipByChildren.
This means that copying from a Window is broken, so we only allow the
optimization when we know that the source is a Pixmap.
The performance impact of this change has not been tested. It should be
small, as the code will use XRender otherwise.
If it turns out to be a bigger impact, the optimizations could be
improved by doing a two-step copy process:
1) Copy to an intermediate Pixmap with IncludeInferiors
2) Copy to the destination with ClipByChildren
(potentially omitting one one of the steps if source or destination are
known to be Pixmaps).
references:
commit 0c5d28a4e5https://bugs.freedesktop.org/show_bug.cgi?id=12996
Enabling 'FAST CLIP' appears to trigger an infinite loop so disable.
Enabling 'FAST FILL' has limited effect on performance, so disable whilst
the basic QT surface is improved.
Behdad pointed out that fprintf() returns a value so that we could simply
use the comma operator to return the correct value instead of the
expression-block gcc-ism.
As the associated is now explicitly the font-face used to create the font
by the user, whereas what we require is the current implementation
(quartz) font.
Infrequently, but, for example, handling glyph strings, we require the
string to be nul terminated. (Otherwise an error occurs, which was
previously compounded by a drastic leak.)
If we fail to add the glyph cache (presumably because the font is in
error) do not leak the allocation. As this occurs for every single glyph
string, the leak can grow very quickly and mask the original bug.
Currently the replay creates a fresh surface for every new surface. Whilst
using it to view replays (such as with --xlib) this is often not what is
desired so add a mode (compile-time only for now) to use similar surfaces
and blits to the front buffer instead.
font expects the dictionary to be constructed on the stack for its use, so
close it. (I missed the closing '>>' when switching between dictionary
constructors.)
Workaround for my arm toolchain which succeeds in linking the configure
program, only to complain when linking a program (such as cairo-perf)
against libcairo.so. Annoying.
If a subsequent PATH_OP is just a continuation of the previous line, i.e.
it has the same gradient, then just replace the end-point of the previous
line with the new point rather than adding a new operation. Surprisingly
this occurs in the wild, but the main motivation is a future optimisation
to reduce the number of intersections during stroke-to-path.
Use hypot() instead of open-coding sqrt(x*x + y*y). In theory, the
compiler could emit highly efficient code. In practice it's slower, but
more likely to be more accurate -- but the difference over a bare sqrt()
is likely not to be perceptible.
By ensuring that tests take longer than a couple of seconds we eliminate
systematic errors in our measurements. However, we also effectively
eliminate the synchronisation overhead. To compensate, we attempt to
estimate the overhead by reporting the difference between a single
instance and the minimum averaged instance.
Bug 23067 -- using clear drawing operator crashes printing
http://bugs.freedesktop.org/show_bug.cgi?id=23067
Here we were hitting an assert within the paginated surface after creating
a zero sized fallback image, due to the paginated surface being created
with an x fallback resolution of 0 dpi (by
_gtk_printer_create_cairo_surface(), gtk/gtkprinter.c:924).
Avoid the bug by guarding against bad input to
cairo_surface_set_fallback_resolution() which also allows us to identity
the invalid caller.
Remove the intermediate C program that was a nuisance whilst
cross-compiling and replace it with a simple shell script that is just a
combination of cat + sed.
A problem that does not present itself whilst using spans to intermediate
masks is that the tor-span-convertor did not emit the empty rows. When
compositing directly using spans with an unbounded operator this was
causing rendering artefacts, for overlapping-glyphs and the gl backend.
Prior to introduction of the buggy members to the surface, we obviously
cannot set them. However, the boilerplate code is meant to compile against
older revisions of the library so we need to check for the existence prior
to use.
We can offload creation of gradients to server that support RENDER 0.10
and later. This greatly reduces the amount of traffic we need to send over
our display connection as the gradient patterns are much smaller than the
full image. Even if the server fallbacks to using pixman, performance
should be improved by the reduced transport overhead. Furthermore this is a
requisite to enable hardware accelerated gradients with the xlib backend.
Running cairo-perf-trace on tiny, Celeron/i915:
before: firefox-20090601 211.585
after: firefox-20090601 270.939
and on tiger, CoreDuo/nvidia:
before: firefox-20090601 70.143
after: firefox-20090601 87.326
where linear gradients are used extensively throughout the GTK+ theme.
Not quite the result I was expecting!
In particular, looking at tiny:
xlib-rgba paint-with-alpha_linear-rgba_over-512 47.11 (47.16 0.05%) -> 123.42 (123.72 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgba_over-512 47.27 (47.32 0.04%) -> 123.78 (124.04 0.13%): 2.62x slowdown
█▋
xlib-rgba paint-with-alpha_linear-rgb_over-512 47.19 (47.21 0.02%) -> 123.37 (123.70 0.13%): 2.61x slowdown
█▋
xlib-rgba paint-with-alpha_linear3-rgb_over-512 47.30 (47.31 0.04%) -> 123.52 (123.62 0.09%): 2.61x slowdown
█▋
xlib-rgba paint_linear3-rgb_over-512 47.29 (47.32 0.05%) -> 118.95 (119.60 0.29%): 2.52x slowdown
█▌
xlib-rgba paint_linear-rgba_over-512 47.14 (47.17 0.06%) -> 116.76 (117.06 0.16%): 2.48x slowdown
█▌
xlib-rgba paint_linear3-rgba_over-512 47.32 (47.34 0.04%) -> 116.85 (116.98 0.05%): 2.47x slowdown
█▌
xlib-rgba paint_linear-rgb_over-512 47.15 (47.19 0.03%) -> 114.08 (114.55 0.20%): 2.42x slowdown
█▍
xlib-rgba paint-with-alpha_radial-rgb_over-512 117.25 (119.43 1.21%) -> 194.36 (194.73 0.09%): 1.66x slowdown
▋
xlib-rgba paint-with-alpha_radial-rgba_over-512 117.22 (117.26 0.02%) -> 193.81 (194.17 0.11%): 1.65x slowdown
▋
xlib-rgba paint_radial-rgba_over-512 117.23 (117.26 0.02%) -> 186.35 (186.41 0.03%): 1.59x slowdown
▋
xlib-rgba paint_radial-rgb_over-512 117.23 (117.27 0.02%) -> 184.14 (184.62 1.51%): 1.57x slowdown
▋
Before 1.10, we may choose to disable server-side gradients for the
current crop of Xorg servers, similar to the extended repeat modes.
[Updated by Chris Wilson. All bugs are his.]
The extended repeat modes were only introduced in RENDER 0.10, so disable
them if the server reports an earlier version. This is in addition to
disabling the repeat modes if we know (guess!) the server to have a buggy
implementation.
As the change and ranking is based on the min_ticks, and as this can
sometimes deviate wildly from median_ticks, include min_ticks in the
output.
In particular it helps to explain cases like:
xlib-rgba rectangles_similar-rgba-mag_source-512 10.13 88.41% -> 5.77 0.19%: 1.50x slowdown
which becomes
xlib-rgba rectangles_similar-rgba-mag_source-512 3.83 (10.13 88.41%) -> 5.75 (5.77 0.19%): 1.50x slowdown
(Considering the poor standard deviation on the initial measurement, this
is more likely a sampling error than a true regression.)
More the large slowdowns to the end. This has two pleasing effects:
1. There is symmetry between large speedups at the top, and large
slowdowns at the bottom, with long bars -> short bars -> long bars.
2. After a cairo-perf-diff run the largest slowdowns are immediately
visible on the console. What better way to flag performance
regressions?