Based on the work by Øyvind Kolås and Pierre Tardy -- many thanks to
Pierre for pushing this backend for inclusion as well as testing and
reviewing my initial patch. And many more thanks to pippin for writing the
backend in the first place!
Hacked and chopped by myself into a suitable basis for a backend. Quite a
few issues remain open, but would seem to be ready for testing on suitable
hardware.
Instead of tagging the sources, which is insensitive to changes, track the
known failure modes by recording the current fail as an xfail.png
reference. (We also introduce a new.png to track a fresh error, so that
they are not lost in the noise of the old XFAILs and hopefully do not
cause everyone to fret).
As we have removed the XFAIL tagging we find, surprise surprise, that some
tests are now working -- so review all the reference images (as also some
.ref.png now should be .xfail.png).
Note: I've only checked image,pdf,ps,svg. The test surfaces report some
failures that probably need to addressed in source. I've not correct the
changes for win32 and quartz. Nor fixed up the experimental backends.
Enforce that each test must render within 60 seconds or be considered to
have hit an infinite loop and be reported as a CRASH. The timeout value is
adjustable via CAIRO_TEST_TIMEOUT -- a value of 0 will disable.
Test case for:
Bug 22441 -- Unexpected shift with push_group and pop_group
https://bugs.freedesktop.org/show_bug.cgi?id=22441
This is a test that demonstrates the error in the pdf backend when using
groups on surfaces with non-integer sizes. In order to create such a
surface, we need to update the boilerplate to use doubles instead of
integers when specifying the surface size.
Using a null surface is a convenient method to measure the overhead of the
performance testing framework, so export it although as a test-surface so
that it will only be available in development builds and not pollute
distributed libraries.
We frequently use '-' within the test name or format name and so we
encounter confusion as '-' is also used as the field separator. At times
this has caused a new test to break an old test because the new test would
match one of the old test's target specific reference images. So switch
everything over to use '.' between fields (test name, target, format,
subtest, etc.).
Avoid calling libtool to link every single test case, by building just one
binary from all the sources.
This binary is then given the task of choosing tests to run (based on user
selection and individual test requirement), forking each test into its own
process and accumulating the results.
Add the core support to cairo-test for running the test-suite under a
malloc fault injector. This commit contains the adjustments to
cairo_test_run() to repeat the test if it detects a failure due to fault
injection and complains if it detects unreported faults or memory leaks.
From bug 18010 (https://bugs.freedesktop.org/show_bug.cgi?id=18010),
in order to make flockfile() available we need to set _POSIX_C_SOURCE and
according to the man page, the appropriate feature check is
_POSIX_THREAD_SAFE_FUNCTIONS.
For this we extend the boilerplate get_image() routines to extract a
single page out of a paginated document and then proceed to manually
check each page of the fallback-resolution test.
(Well that's the theory, in practice SVG doesn't support multiple pages
and so we just generate a new surface for each resolution. But the
infrastructure is in place so that we can automate other tests,
e.g. test/multi-pages.)
If the vector surface matches the output from last time, then the
rasterisation is skipped - but we need to write the expected OUTPUT
filename to the log so that the image is referenced from index.html.
The reasoning behind the -25 testing is that we want to ensure
that cairo provides translation invariance. However, for
many vector backends we use external rasterizers that don't
necessarily provide that translation invariance.
So this testing makes a bunch of failures appear that we don't
really care about, (and we don't even have a mechanism to turn
them off with custom reference images). For the release, I'm
just turning this off.
After the release, I plan to turn this back on, and then we could
fix this by ensuring that the vector output itself is unaffected
by a device offset, or by moving away from external rasterizers,
(see Chris's micro-gs work to test PostScript with cairo-based
rasterization).
Delete the results of previous runs if the reference images are more
recent.
There's still potential error if the conversion utility or its required
libraries are modified...
Be explicit about handling cached FAIL images, instead of relying on the
sequences of failed matches as the files are an external resource and we
can not guarantee their individual accessibility.
Note this also changes the filename, so you may want to run:
$ find -name '*-last.*' -print | xargs rm
after this checkout.
Compare the current output against a previous run to determine if there
has been any change since last time, and only run through imagediff if
there has been. For the vector surfaces, we can check the vector output
first and potentially skip the rasterisation. On my machine this reduces
the time for a second run from 6 minutes to 2m30s. As most of the time,
most test output will remain unchanged, so this seems to be a big win. On
unix systems, hard linking is used to reduce the amount of storage space
required - others will see about a three-fold increase in the amount of
disk used. The directory continues to be a stress test for file selectors.
In order to reduce the changes between runs, the current time is no longer
written to the PNG files (justified by that it only exists as a debugging
aid) and the boilerplate tweaks the PS surface so that the creation date
is fixed. To fully realise the benefits here, we need to strip the
creation time from all the reference images...
The biggest problem with using the caches is that different runs of the
test suite can go through different code paths, introducing potential
Heisenbergs. If you suspect that caching is interfering with the test
results, use 'make -C test clean-caches check'.
Having included some extra details in the test output PNG filename, we
need to pass the extra information to
cairo_ref_name_for_test_target_format() in order to find the match.
As Behdad suggested, we can dramatically speed up the test suite by
short-circuiting the write to a png file, only to then immediately read it
back in. So for the raster based surfaces, we avoid the round-trip through
libpng by implementing a new boilerplate method to directly extract the image
buffer from the test result. A secondary speedup is achieved by caching the
most recent reference image.
Construct the test name to pass to the boilerplate creation routines such
that it uniquely identifies the test in terms of test, target, content and
pass (similar, offset, thread). This allows the vector targets to create
output different output files for each test, whereas before, later tests
would overwrite existing files making debugging more difficult.
In order to run under memfault, the framework is first extended to handle
running concurrent tests - i.e. multi-threading. (Not that this is a
requirement for memfault, instead it shares a common goal of storing
per-test data). To that end all the global data is moved into a per-test
context and the targets are adjusted to avoid overlap on shared, global
resources (such as output files and frame buffers). In order to preserve
the simplicity of the standard draw routines, the context is not passed
explicitly as a parameter to the routines, but is instead attached to the
cairo_t via the user_data.
For the masochist, to enable the tests to be run across multiple threads
simply set the environment variable CAIRO_TEST_NUM_THREADS to the desired
number.
In the long run, we can hope the need for memfault (runtime testing of
error paths) will be mitigated by static analysis. A promising candidate
for this task would appear to be http://hal.cs.berkeley.edu/cil/.
Some platforms and applications enable floating point exceptions, so it
seems sensible to run the test-suite with the most serious exceptions
enabled - divide by zero, invalid result and overflow.
Convert the boilerplate specific flattened content value to the ordinary
CAIRO_CONTENT_COLOR_ALPHA for use with cairo_push_group_with_content() -
otherwise cairo rightfully flags an error and the test harness decides
that the similar surface is not available.
This reverts commit 07122e64fa.
The extra testing did find a pdf bug, and that should be fixed,
but the extra maintenance burden of running another iteration
of all tests does not seem justfied at all---particularly since
it looks like dozens of new reference images would be needed
for the svg backend.
Also, the new "failures" of the image backend with this new
testing look like a misunderstanding of exactly what the new
testing is actually drawing.