In the fairly common condition that both the ctm and the device
transforms are identity, the function overhead of calling the matrix
multiplication on the point overwhelmingly dominates.
Hmm, red over red makes the test pointless. The test still remains of
highly dubious quality, as it primarily serves as a rendercheck more
than a test of Cairo. The best hope for this test is that it inspires a
better one.
And incorporate the notes made by Joonas.
This patch serves two purposes. First, it factors out the heavy part
of the cairo_scaled_font_text_to_glyphs() routine thus allowing GCC
to better optimize the cache cleanup loop. Keeping the look-up table
indices in a separate array speeds up array initialization even further.
Second, this patch introduces a shortcut for the case when the string
to be rendered consists of a single character. In this case, caching is
not necessary at all.
We have a benchmark that uses Cairo to render a large amount of random
strings of consisting of printable ASCII characters. Below are Oprofile
results collected while running this benchmark. It is easy to see that
the heavy part becomes noticeably lighter.
Before:
Profiling through timer interrupt
samples % app name symbol name
198755 13.5580 libcairo.so.2.10907.0 cairo_scaled_font_text_to_glyphs
88580 6.0424 libcairo.so.2.10907.0 _cairo_scaled_glyph_lookup
81127 5.5340 libcairo.so.2.10907.0 _cairo_hash_table_lookup
68186 4.6513 libcairo.so.2.10907.0 cairo_scaled_font_glyph_extents
47145 3.2160 libcairo.so.2.10907.0 _composite_glyphs_via_mask
46327 3.1602 libcairo.so.2.10907.0 _cairo_scaled_font_glyph_device_extents
44817 3.0572 libcairo.so.2.10907.0 _composite_glyphs
40431 2.7580 libcairo.so.2.10907.0 .plt
After (note that cairo_scaled_font_text_to_glyphs_internal_single() was inlined):
Profiling through timer interrupt
samples % app name symbol name
107264 7.6406 libcairo.so.2.10907.0 cairo_scaled_font_text_to_glyphs_internal_multiple
87888 6.2604 libcairo.so.2.10907.0 _cairo_scaled_glyph_lookup
79011 5.6281 libcairo.so.2.10907.0 _cairo_hash_table_lookup
71723 5.1090 libcairo.so.2.10907.0 cairo_scaled_font_glyph_extents
48084 3.4251 libcairo.so.2.10907.0 _composite_glyphs_via_mask
46636 3.3220 libcairo.so.2.10907.0 _cairo_scaled_font_glyph_device_extents
44740 3.1869 libcairo.so.2.10907.0 _composite_glyphs
42472 3.0254 libc-2.8.so _int_malloc
39194 2.7919 libcairo.so.2.10907.0 _cairo_gstate_transform_glyphs_to_backend
38614 2.7506 libcairo.so.2.10907.0 .plt
37063 2.6401 libcairo.so.2.10907.0 _cairo_ft_ucs4_to_index
36856 2.6253 libc-2.8.so random
36376 2.5911 libcairo.so.2.10907.0 _cairo_scaled_glyphs_equal
34545 2.4607 libcairo.so.2.10907.0 cairo_matrix_transform_point
31690 2.2573 libc-2.8.so malloc
29395 2.0939 libcairo.so.2.10907.0 _cairo_matrix_is_identity
26142 1.8621 libcairo.so.2.10907.0 _cairo_utf8_to_ucs4
24406 1.7385 libc-2.8.so free
24059 1.7138 libcairo.so.2.10907.0 cairo_scaled_font_text_to_glyphs
[ickle: slightly amended for stylistic consistency.]
Add tests for degeneratate linear gradients (with start point equal
to the end point), degenerate radial gradients (start radius and end
radius equal to zero, same start and end circle) and gradients (both
linear and radial) with just a single stop.
It's very simple as clipped polygons or ANTIALIAS_NONE still return
UNSUPPORTED. Also, no optimizations are done, so even pixel-aligned
rectangles use the full span rendering.
Still, there are no performance regressions in the benchmark traces and
firefox-talos-svg and swfdec-giant-steps both got ~15% faster.
The function computed the composite rectangles wrong and was only used
in a gl fallback anyway. So instead of trying to fix it, just remove it
and make sure gl doesn't fallback.
For firefox-planet-gnome, 19135 times a gradient gets rendered using
only 10 different gradients. So we get a 100% hit rate in the cache.
Unfortunately, texture upload is not the biggest problem of this test,
as the performance increase is only moderate - at least on i965:
34.3s => 33.5s
1) call _cairo_gl_composite_flush() or cairo_surface_flush() where
needed
2) Destroy texture operands when necessary
3) get rid of _cairo_gl_composite_end()
With this patch, vertices are not flushed immediately anymore, but only
when needed or when a new set of vertices is emitted that requires an
incompatible setup. This improves performance a lot in particular for
text. (gnome-terminal-vim gets 10x faster)
1) store the current operator. This will be useful later to check if the
operator changed.
2) pass the context instead of the destination as first argument. The
destination is known to be the current target.
...and use it for image uploads. This makes sure that the texture units
used for SOURCE and MASK get not clobbered when images get uploaded to
textures.
With the current code, the surface will never be flushed as the flush
function checks if the surface is finished, and if so, doesn't call the
vfunc. Ooops.