But note we can only do the exchange if they do indeed match and
there are no other references (the objects are only on the stack).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Try using the lighter-weight LZO decompressor in an effort to speed up
replays (at the cost of making the bound traces slightly larger).
Presuming that with the slight increase in file size (from -1% to +10%),
the file data remains in the readahead buffer cache, replays see a
performance improvement of between 5-10%.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The idiom (and expectation) for surface operators is that it leaves the
surface on the stack for the next operation. Also we need to hold onto a
surface reference for objects put onto the stack, yet for the
map-to-image return we did not own one.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we fail to resolve a particular pattern, try removing a few features
from the pattern and see if we can resolve that fallback and continue on
with the trace with a close approximation.
This is then behaves very similar as if the pattern requested a specific
font that was not available on the system and so was substituted.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having spent the last dev cycle looking at how we could specialize the
compositors for various backends, we once again look for the
commonalities in order to reduce the duplication. In part this is
motivated by the idea that spans is a good interface for both the
existent GL backend and pixman, and so they deserve a dedicated
compositor. xcb/xlib target an identical rendering system and so they
should be using the same compositor, and it should be possible to run
that same compositor locally against pixman to generate reference tests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
P.S. This brings massive upheaval (read breakage) I've tried delaying in
order to fix as many things as possible but now this one patch does far,
far, far too much. Apologies in advance for breaking your favourite
backend, but trust me in that the end result will be much better. :)
The existing API only described the method to be used for performing
rasterisation and unlike other API provided no opportunity for the user
to give a hint as to how to trade off performance against speed. So in
order to no be overly prescriptive, we extend the NONE/GRAY/SUBPIXEL
methods with FAST/GOOD/BEST hints and leave the backend to decide how
best to achieve those goals.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We clear past the end of the row so that we don't trigger valgrind
warning leaving harmless uninitialised bits inside the input image.
However, for RGB24 the input rowlen is 3*width, whereas we write 4*width
of data, so we need to take account of that and ensure we clear beyond
the end of the written data, not the read data.
Fixes reading of RGB24 input.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A common requirement is the fast upload of pixel data. In order to
allocate the most appropriate image buffer, we need knowledge of the
destination. The most obvious example is that we could use a
shared-memory region for the image to avoid the transfer cost of
uploading the pixels to the X server. Similarly, gl, win32, quartz...
The other side of the equation is that for manual modification of a
remote surface, it would be more efficient if we can create a similar
image to reduce the transfer costs. This strategy is already followed
for the destination fallbacks and this merely exposes the same
capability for the application fallbacks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
During replay we want to handle recording surfaces specially, and not
redirect the creation of those to the target surface. This is similar to
the need to keep image surfaces as images during replay.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Restructure the Makefiles in .sources, .am and .win32 to enable
building cairo-script-interpreter on Win32.
Some minor changes are needed to compile on MSVC:
- include stdint.h to define INT_MAX-like macros
- redefine "inline"
- avoid deprecated functions (snprintf, replaced by _snprintf)
- define _USE_MATH_DEFINES so that math.h defines M_PI, M_SQRT2 and
M_LN2
This is a common format used by framebuffers to drive 10bpc displays
and is often hardware accelerated by XRender with underlying support
from pixman's x2r10g10b10 format (which provides coercion paths for
fallbacks).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is consistent with the naming of most cairo types/functions
(example: cairo_foo_surface_*).
The substitution in the code has been performed using:
sed -i 's/cairo_pattern_mesh_/cairo_mesh_pattern_/' <files>
The reuse hit rate is very small, and most images are quickly
distinguished in the first few bytes... Though perhaps not for video as
in the swfdec-youtube case...
After a renewed discussion, it was pointed out that the API in Cairo was
not restrictive and by using doubles we would be consisted with the rest
of the API. Thus prompting the name change to
cairo_surface_create_for_rectangle()
similar to cairo_rectangle().
And document the public API.
I updated the Free Software Foundation address using the following script.
for i in $(git grep Temple | cut -d: -f1 )
do
sed -e 's/59 Temple Place[, -]* Suite 330, Boston, MA *02111-1307[, ]* USA/51 Franklin Street, Suite 500, Boston, MA 02110-1335, USA/' -i "$i"
done
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=21356
When compiling we can depend on whatever version of cairo we need, but
we should be wary of checking for runtime compatibility when building
standalone.
We were exposing the actual value of CAIRO_FORMAT_INVALID
through API functions already, so it makes sense to just
go ahead and put it in the cairo_format_t enum.
Should fix:
Bug 26509 - Cairo fails to compile without mmap
http://bugs.freedesktop.org/show_bug.cgi?id=26509
As reported by Hib Eris, Cairo files to compile under a mingw32
cross-compiler as we use a structure only defined if HAVE_MMAP
unconditionally.
By implicitly reference the target of the context instead, i.e.
this reduces the use of:
/target get (example.png) write-to-png pop
as a common idiom where the context is kept on the stack and the surface
forgotten.
Real applications that control their Drawable externally to Cairo are
'disadvantaged' by cairo-perf-trace when it creates a similar surface
for each new instance of the same Drawable. The difficulty in
maintaining one perf surface for every application surface is that the
traces do not track lifetimes for the application surfaces, so we would
just accumulate stale surfaces. The surface cache takes a different
approach and returns the same surface for each active Drawable, and
maintains a hold-over of the MRU 16 surfaces. This achieves 60-80% hit
rate with firefox, which is probably as good as can be expected.
Obviously for double-buffered applications we only every draw to freshly
created surfaces (and Gtk+ bypasses cairo to do the final copy -- the
ideal application would just use a push-group for double buffering, in
which case we would capture and replay the entire expose event).
To enable use of the surface cache whilst replaying use -c:
./cairo-perf-trace -c firefox-talos-gfx
In order to get a baseline for win32 performance testing, always create a
font so that the trace can be replayed. Not ideal, but I feel this the
pragmatic solution for judging the performance differentials before I can
work out a better solution for loading typ42 fonts.
In order to enable replay of traces on machines that do not use FreeType
as the native font system, we need to convert a type42 font into something
similar. Currently the fallback is just to select a font with the same
name - this ignores weight and slant, and many other details.
Kerning is quite frequent, that is to apply a horizontal but no vertical
offset to a glyph. For instance by discarding the vertical coordinate
where it remains the same and only encoding the horizontal offset we
reduce the file size by ~12.5% when tracing poppler.
If we fail to add the glyph cache (presumably because the font is in
error) do not leak the allocation. As this occurs for every single glyph
string, the leak can grow very quickly and mask the original bug.
After diverting the pointers to accommodate lazy decompressing of the
source, the bytecode pointer was left pointing to the original location
that had already been freed - thus passing an invalid block to FreeType
and unsurprisingly then, blowing up.
Hook into the scanner to write out binary version of the tokenized
objects -- note we bind executable names (i.e. check to see if is an
operator and substitute the name with an operator -- this breaks
overloading of operators by scripts).
By converting scripts to a binary form, they are more compact and
execute faster:
firefox-world-map.trace 526850146 bytes
bound.trace 275187755 bytes
[ # ] backend test min(s) median(s) stddev. count
[ 0] null bound 34.481 34.741 0.68% 3/3
[ 1] null firefox-world-map 89.635 89.716 0.19% 3/3
[ 0] drm bound 79.304 79.350 0.61% 3/3
[ 1] drm firefox-world-map 135.380 135.475 0.58% 3/3
[ 0] image bound 95.819 96.258 2.85% 3/3
[ 1] image firefox-world-map 156.889 156.935 1.36% 3/3
[ 0] xlib bound 539.130 550.220 1.40% 3/3
[ 1] xlib firefox-world-map 596.244 613.487 1.74% 3/3
This trace has a lot of complex paths and the use of binary floating point
reduces the file size by about 50%, with a commensurate reduction in scan
time and significant reduction in operator lookup overhead. Note that this
test is still IO/CPU bound on my i915 with its pitifully slow flash...