Instead of just discarding the worst 15% of the results, we now
do IQR-based detection and elimination of outliers. Also, instead
of reporting mean times we now report minimum and median times.
We still do compute the mean and standard deviation for the
detection of when results seem stable for early bailout. And we
do still report the standard deviation.
A statistician might complain that it looks funny to report the
median and the standard deviation together, (instead of median
and average absolute deviation from the median, say), but I liked
the standard deviation since it is always larger, so it might
ensure better separatation if we use it to determine when two
sets of results are sufficiently different to be interesting.
This test is really just for hammering the double to fixed-point conversion
(in _cairo_fixed_from_double) that happens as doubles from API calls gets
translated into internal cairo fixed-point numbers.
This makes the entire performance test suite about 10 times faster
on my system. And I don't think that the results are significantly
worse, (many tests are stable after only 5 iterations while some
still run to 100 iterations without reaching our stability criteria).
This will finally allow us to very easily add lots of other
tests that will similarly involve iterating over the various
sources and operators of interest.
The motivation here is to have the cairo_t context available
to the perf funcs before they call into cairo_perf_run, (so
that they can do one-time setup of source etc. for several
runs).
This changes the perf test output format to be a little more human friendly,
reporting times in ms instead of seconds. It also adds a test number
that could be used in the future for specifying an explicit test to run
(test number, target surface, test name, and size uniquiely identify
a test).
Also adds a few paint tests.
We do this by adding a new cairo_perf_timer_set_finalize function and
in the case of the xlib backend passing a callback to that function
that does a 1x1 XGetImage.
Add a new cairo_boilerplate_mode_t so that the boilerplate targets can
do slightly different things if being tested for correctness vs. being
run for performance.
- add a yield () function that's called before every test. It reduced the std
dev slightly for me
- fix double comparisons to not just compare the integer part
1. Remove all the alarm/signal code, which just isn't doing what we want for some reason.
Instead, for now we'll simply run for a fixed number of iterations, (perhaps we
can tune that per test later).
2. Before computing mean and stdandard deviation of runs, sort them all and discard the
top and bottom 20% of the values.
Now the standard deviation for the paint test is generally 2% or less.