As the easiest approach to making another snapshot that only depends
upon a stable pixman, make the new dependency a compile time option.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Jose Dapena Paz reported an assertion following the uninitialised status
value being returned. Also the function failed to free its allocations.
Based on a patch by Jose Dapena Paz <jdapena@igalia.com>.
Reported-by: Jose Dapena Paz <jdapena@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51104
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This new pixman API allows glyphs to be cached and composited in one
go, which reduces overhead compared to individual calls to
pixman_image_composite_region32().
Notes:
- There is an explicit call to _cairo_image_scaled_glyph_fini(). This
could instead be done with a private, but I chose not to do that
since we don't need to store any actual data; we only need
notification when the glyph dies.
- The slowdown in poppler-reseau is real and stable across runs. I'm
not too concerned about it because this benchmark is only one run
and so it is dominated by glyph cache setup costs and FreeType
rasterizing.
Performance results, image backend:
Speedups
firefox-talos-gfx 5571.55 -> 4265.57: 1.31x speedup
gnome-terminal-vim 1875.82 -> 1715.14: 1.09x speedup
evolution 1128.24 -> 1047.68: 1.08x speedup
xfce4-terminal-a1 1364.38 -> 1277.48: 1.07x speedup
Slowdowns
poppler-reseau 374.42 -> 394.29: 1.05x slowdown
Performance results, image16 backend:
Speedups
firefox-talos-gfx 5387.25 -> 4065.39: 1.33x speedup
gnome-terminal-vim 2116.66 -> 1962.79: 1.08x speedup
evolution 987.50 -> 924.27: 1.07x speedup
xfce4-terminal-a1 1856.85 -> 1748.25: 1.06x speedup
gvim 1484.07 -> 1398.75: 1.06x speedup
Slowdowns
poppler-reseau 371.37 -> 393.99: 1.06x slowdown
Also bump pixman requirement to 0.27.1.
In addition to fixing a bug 7d8d98b91c releated to
expanding a8 glyphs into a8r8g8b8, this commit also added an
optimization where if the first glyph had format a8r8g8b8, the mask
was created in this format from the beginning instead of later
converting from a8 to a8r8g8b8.
However, the optimization had two bugs in it:
(1) The computed stride was 3 * width, not 4 * times width, and
(2) In the case where the mask was allocated on the stack, it was
allocated as PIXMAN_a8 and not a8r8g8b8.
The commit fixes both bugs.
We need to scale the channels of the glyph into the destination (and
indeed expand a8 into a8r8g8b8) when adding into the mask. Normally we
have matching formats for the glyph surfaces and the temporary mask,
for which we can continue to take the faster path.
Reported-by: Søren Sandmann <sandmann@cs.au.dk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently we construct a mask for the entire line and try to process it
in one call to pixman (two without the LERP operator). An alternative
approach is split the row into separate composite operations for the
clear (which we can skip), fully opaque and partial spans.
As the source operator is typically mostly opaque or clear, this is a
good win as we are able to utilise more fast paths. In the worst case,
it degrades to the old method of constructing a whole mask for a row.
It may reduce performance for having to process lots of spans though
(this is where the pixman spans interface should help). However, such
geometry is rare and typically handled elsewhere.
And the existing code has a bug where it was clearing the destination
for clear regions of the mask outside of the spans.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In theory, this should be more cache efficient and allow us to trim the
operation to the width of row, shaving a few texel fetches. The cost is
that we cause pixman to evaluate the composite operation per-row. This
should only be a temporary solution until we can do something better
through pixman...
On a i5-2520m, ymmv,
firefox-fishtank 64585.38 -> 56823.41: 1.14x speedup
swfdec-fill-rate 1383.24 -> 1665.88: 1.20x slowdown
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reducing the number of passes has the usual change in the antialiasing
side-effects, as well as the boon of being faster (and theorectically more
accurate through reduced loss of dynamic range.)
On an i5-2520m:
swfdec-giant-steps-full 3240.43 -> 2651.36: 1.22x speedup
grads-heat-map 166.84 -> 136.79: 1.22x speedup
swfdec-giant-steps 940.19 -> 796.24: 1.18x speedup
ocitysmap 953.51 -> 831.96: 1.15x speedup
webkit-canvas-alpha 13924.01 -> 13115.70: 1.06x speedup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We suffer from the large overhead in calling pixman_image_composite32
per-span, but even will that overhead it is a net win with the usual
caveat about cache efficiency and function call overhead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The actual span rasterisers may be able to specialise if they know that
the spans will be pixel aligned.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We were calling the antialias close function from the unantialiased
paths - a function that operates on a completely different structure to
the one passed in.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having spent the last dev cycle looking at how we could specialize the
compositors for various backends, we once again look for the
commonalities in order to reduce the duplication. In part this is
motivated by the idea that spans is a good interface for both the
existent GL backend and pixman, and so they deserve a dedicated
compositor. xcb/xlib target an identical rendering system and so they
should be using the same compositor, and it should be possible to run
that same compositor locally against pixman to generate reference tests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
P.S. This brings massive upheaval (read breakage) I've tried delaying in
order to fix as many things as possible but now this one patch does far,
far, far too much. Apologies in advance for breaking your favourite
backend, but trust me in that the end result will be much better. :)