These macros are standard in src's cairoint.h and test's cairo-test.h
internal header files, so for consistency do the same thing with perf's
cairo-perf.h.
Reviewed-by: Uli Schlachter <psychon@znc.in>
If you call ./cairo-perf-print --histogram results.txt, it will then
print a histogram of the results, one per test. Ideally, you should see
a skewed distribution (with a negative skew representing that most results
run in optimal time), but random sampling errors (scheduling,
throttling, general inefficiency etc) will push it more towards a normal
distribution.
For example,
| x |
| x xx |
| x xx |
| x xx |
| xxxx |
| xxxx x |
| x xxxxxx |
| x xxxxxx |
| xxxxxxxxx |
| xxxxxxxxx |
| xxxxxxxxx |
| xxxxxxxxxxxx |
| xxxxxxxxxxxx |
| xxxxxxxxxxxx |
| xxxxxxxxxxxxxx |
|x xxxxxxxxxxxxxx |
|x x xxxxxxxxxxxxxxx |
|x x xxxxxxxxxxxxxxx |
|x x xxxxxxxxxxxxxxxxx |
|xxx x xxxxxxxxxxxxxxxxxxx |
|xxx xxxxxxxxxxxxxxxxxxxxxxxxx |
|xxxxxx xxxx x x x x xxx xx xxxxx xxx x xxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
.------------------------------------------------------------------------------.
xlib firefox-fishtank 8298.44 1.53% (829/946)
Starts off reasonably, but quickly deteriorates as the integrated CPU/GPU
overheats and is forced to throttle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having spent the last dev cycle looking at how we could specialize the
compositors for various backends, we once again look for the
commonalities in order to reduce the duplication. In part this is
motivated by the idea that spans is a good interface for both the
existent GL backend and pixman, and so they deserve a dedicated
compositor. xcb/xlib target an identical rendering system and so they
should be using the same compositor, and it should be possible to run
that same compositor locally against pixman to generate reference tests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
P.S. This brings massive upheaval (read breakage) I've tried delaying in
order to fix as many things as possible but now this one patch does far,
far, far too much. Apologies in advance for breaking your favourite
backend, but trust me in that the end result will be much better. :)
We can use the elapsed time of the indiividual operations to profile the
synchronous throughput of a trace and eliminate all replay overhead. At
the cost of running the trace synchronously of course.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A variant of many-strokes tries to answer the question of how much
overhead is there in stroking, i.e. if we fill an equivalent path to a
set of strokes, do we see an equivalence in performance?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
An intersection variant to exercise the stroker with many, many lines. A
silly benchmark, but a popular one in the wild.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A benchmark to test the speed of hash tables when inserting and
removing a huge number of elements.
Although originally hash tables were assumed not to get many
deletions, in practice they are now being used as caches in multiple
places. This means that they often have a fixed number of live
elements and an element is evicted whenever a new element is inserted
(this happens explicitly for cairo_cache_t objects, but also, for
example, in scaled_font_map + holdovers). This access pattern is very
inefficient with the current implementation.
A benchmark to test how close we get to reducing paint+clip to an ordinary
fill, and to check correctness.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Benjamin just demonstrated this funky trick for generating pixel
outlines, and as no good deed should go unpunished, I've added his code
to the perf suite.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently we print the backend description before every time, which is
overly verbose. As the information doesn't^Wshouldn't change, simply
print it before running the first test of each target.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Provide a hook for the test to be able to compute the number of ops per
second. For instance, the glyphs test uses it to report the number of
kiloglyph per second Cairo is able to render.
A new -f option to cairo-perf reverts to a fast run
mode for quick performance overviews. The number of
milliseconds each iteration of a test is run for can
be overriden using the new CAIRO_PERF_ITERATION_MS
environment variable. The default remains 2000 ms/iter.
In order to handle 'cairo-perf-trace benchmark', we need to perform the
can_run? test on the directory name as opposed to the individual trace
names. Make it so.
By ensuring that tests take longer than a couple of seconds we eliminate
systematic errors in our measurements. However, we also effectively
eliminate the synchronisation overhead. To compensate, we attempt to
estimate the overhead by reporting the difference between a single
instance and the minimum averaged instance.
I have an idea to categorise traces within their own subdirectories and so
for convenience added path walking to cairo-perf-trace. Principally this
should allow for forests of symlinks of all types.
Read names of traces to exclude from a file specified using -x on the
commandline, i.e.
$ ./cairo-perf-trace -x cairo-traces/tiny.exclude
This is a convenient method for me to exclude certain traces for
particular machines. For example tiny cannot run
firefox-36-20090609.trace as that has a greater working set than the
available RAM on tiny.
Use 'cairo-perf -v -r' to have both the summary output along with the raw
values. This gives a progress report whilst benchmarking, very reassuring
with long running tests.
Use the new API Behdad exposed in 1.8 to precompute a glyph string using
Cairo and then benchmark cairo_show_glyphs(). This is then equivalent to
the text benchmark but without the extra step of converting to glyphs on
every call to cairo_show_text() i.e. it shows the underlying glyph
rendering performance.
These tests look at the differences in code paths
hit by filling paths that are rectilinear (or not) and
pixel aligned (or not) with the even-odd and non-zero
fill rules. The paths are not simple, so they don't
hit the special case quad/triangle tessellator.
We don't have one just for this purpose. The only other
path with many intersections that gets actually rendered is zrusin-another,
but that might be sped up in the future (say by identifying
collinearities up front or something like that.)
Behdad wants to include the feature with 1.10, so we enable it as early as
possible in 1.9 dev cycle to generate as much feedback as possible.
The first change is to use "<cairo>" as being a name unlikely to clash
with any real font names.
This reverts commits:
a824d284be,
2922336855,
e0046aaf41,
f534bd549e.
This performance test relied on the recently-removed ability
to select the internal twin-based font family with a name of
"cairo".
Presumably, we'll want to bring this performance case back when
some other means of requesting that font face is added.
Generate a cairo-perf-diff graph for a series of commits in order to be
able to identify significant commits. Still very crude, but minimally
functional.
Add a new test case to Cairo for checking the performance of Cairo's
equivalent to GDK's gdk_pixbuf_composite_color() operation. That is an
operation that happens to be extremely useful when viewing or editing
transparent images so I think it is important that it is as fast as
possible.
Add the performance test case to compare the speed of filling a rounded
rectangle (one with camphered corners) as opposed to an ordinary
rectangle. Since the majority of the pixels are identical, ideally the two
cases would take similar times (modulo the additional overhead in the more
complex path).