Commit graph

5288 commits

Author SHA1 Message Date
Dmitri Vorobiev
4a0bd91ff7 scaled-font: fine-tune caching
This patch implements the ideas outlined by Behdad Esfahbod in the following
mailing list message:

http://lists.cairographics.org/archives/cairo/2010-June/020065.html

Specifically, two things have been adjusted. First, the size of the look-up
table was reduced to 64. Second, cache codepath is now bypassed for strings
that are shorter than 16, not only for one-character strings. This allowed
us to reduce the LUT initialization overhead while still retaining the
advantage of caching for common-case string sizes.

We have experimented with different LUT sizes, and it came out that the size
of 64 is the best one in view of speed, at least for our language-neutral
benchmark, which generated random strings of printable ASCII characters.

Below is a table presenting benchmark results for different values of LUT
size:

===============================================================================
 Benchmark		| [1]	| [2]	| [3]	| [4]	| [5]	| [6]	| [7]
===============================================================================
8px text, 1 chars	| 0.41	| 0.41	| 0	| 0.41	| 0	| 0.41	| 0
8px text, 10 chars	| 2.13	| 2.21	| 3.76	| 2.19	| 2.82	| 2.09	| -1.88
8px text, 20 chars	| 2.97	| 3.04	| 2.36	| 3.01	| 1.35	| 2.98	| 0.34
12px text, 1 chars	| 0.94	| 0.94	| 0	| 0.95	| 1.06	| 0.94	| 0
12px text, 10 chars	| 4.73	| 4.89	| 3.38	| 4.9	| 3.59	| 4.82	| 1.9
12px text, 20 chars	| 6.32	| 6.42	| 1.58	| 6.46	| 2.22	| 6.32	| 0
16px text, 1 chars	| 1.75	| 1.76	| 0.57	| 1.77	| 1.14	| 1.76	| 0.57
16px text, 10 chars	| 8.13	| 8.45	| 3.94	| 8.43	| 3.69	| 8.44	| 3.81
16px text, 20 chars	| 10.41	| 10.69	| 2.69	| 10.64	| 2.21	| 10.65	| 2.31
24px text, 1 chars	| 3.3	| 3.3	| 0	| 3.32	| 0.61	| 3.3	| 0
24px text, 10 chars	| 14.68	| 14.97	| 1.98	| 14.97	| 1.98	| 14.87	| 1.29
24px text, 20 chars	| 17.93	| 18.01	| 0.45	| 18.06	| 0.73	| 17.81	| -0.67
96px text, 1 chars	| 23.65	| 23.38	| -1.14	| 23.74	| 0.38	| 23.65	| 0
96px text, 5 chars	| 50.52	| 51.34	| 1.62	| 51.48	| 1.9	| 51.41	| 1.76
96px text, 10 chars	| 57.5	| 58.11	| 1.06	| 58.27	| 1.34	| 58.04	| 0.94
===============================================================================

[1]: Git head, Mpix/s
[2]: {GLYPH_LUT_SIZE = 32, CACHING_THRESHOLD = 16}
[3]: Gain of {32, 16} w.r.t. Git head
[4]: {GLYPH_LUT_SIZE = 64, CACHING_THRESHOLD = 16}
[5]: Gain of {64, 16} w.r.t. Git head
[6]: {GLYPH_LUT_SIZE = 128, CACHING_THRESHOLD = 16}
[7]: Gain of {128, 16} w.r.t. Git head

The benchmark itself can be found from this mailing list message:

http://lists.cairographics.org/archives/cairo/2010-June/020064.html
2010-06-14 15:33:51 +01:00
Zoxc
505a0456d2 gl: Added WGL context and surface. 2010-06-14 12:46:26 +01:00
Zoxc
fd6c38b9e0 win32: Fixed compile errors in Windows backend. 2010-06-14 12:44:20 +01:00
Chris Wilson
8737bc8b17 gl: start returning the failure status aftern an invalid GL op. 2010-06-12 16:49:46 +01:00
Chris Wilson
9b7cc7641b cairo: Create error objects for cairo_t
Perform an early check for error status and prevent creation of a full
object. This means that we do not pass down error objects to the
initialisation routines and so can survive without paranoia inside the
library. It also has brings consistency that like the other
constructors, no object is created in error and we can skip the
cairo_destroy() if we choose (and we don't waste one of the precious
zero-alloc context slots.

Fixes crash in test/a8-mask introduced by 1a544361e8.
2010-06-12 10:41:09 +01:00
Chris Wilson
9b6617a3b3 image: Apply component alpha to composite masks.
If we need to pattern requires component alpha, then we must take a
copy of the image and enable component alpha for pixman.

Fixes test/text-antialias-subpixel on xlib-fallback -- i.e. we will
finally render subpixel antialiased text on ancient XServers.
2010-06-11 22:04:14 +01:00
Chris Wilson
a049889c64 pattern: Remove incorrect optimisations from _cairo_pattern_aquire_surface()
Safe reduction of patterns is performed in gstate, so not only are the
extra checks in _cairo_pattern_acquire_surface redundant there are also
unsafe. Simply remove them.

Fixes test/radial-gradient-extend [xlib-fallback]
2010-06-11 21:26:26 +01:00
Chris Wilson
00bc1d1578 pattern: Remove extraordinary _cairo_pattern_fini_snapshot().
Miraculously the circular references from self-copy have disappeared and
the forced finish within _cairo_pattern_fini_snapshot() now quite
explosive. By replacing them with an ordinary _cairo_pattern_fini() the
crash from test/smask-image-mask disappear along and valgrind remains
happy.

Fixes test/smask-image-mask and similar.
2010-06-11 21:08:06 +01:00
Chris Wilson
1a544361e8 gstate: Update cached matrix state after device transform changes on the target
Commit 8d67186cb2 caches whether the device
transform is identity on context creation. However, the api is quite lax
and allows the user to modify the device transform *after* he has
started to use the surface in a context, as apparently WebKit does.
Since this is not the only instance where we may need to invalidate
caches if the user modifies state, introduce a simple mechanism for
hooking into notifications of property changes.

Fixes test/clip-device-offset.
2010-06-11 16:08:17 +01:00
Chris Wilson
4e4724d48c gl: make check insists "cairoint.h" is first. 2010-06-11 12:19:56 +01:00
Chris Wilson
4edbcf1b1d color: Mark _cairo_color_get_content() as private. 2010-06-11 12:17:19 +01:00
Chris Wilson
edb73b6dcf xlib: Adjust trapezoid precision based on antialias.
Render supports two modes of precision when rendering trapezoids.
Precise specifies points sampling on a 15x17 grid, ala pixman. Imprecise
allows the driver more freedom in the methods used, which may be more
amenable to acceleration. Choose to use the imprecise mode by default,
but still allow users to force the more rigidly specified precision by
changing the antialias mode.
2010-06-11 11:16:42 +01:00
Chris Wilson
290749bdb5 polygon: Reorder conditionals based on likelihood.
The vast majority of edges will be unclipped, so process those first.
2010-06-11 10:59:17 +01:00
Karl Tomlinson
55037bfb24 xlib: Find matching Visual for XRenderFormat
Not only is this useful for users to know which Visual matches any
particular Cairo surface, it should also close a few obscure bugs of not
converting images correctly on upload.

Fixes:

  Bug 28492 - cairo_xlib_surface_create_with_xrender_format does not
              create visual for resulting surface
  https://bugs.freedesktop.org/show_bug.cgi?id=28492

  Mozilla Bug 567065 - Try to create offscreen Xlib surface from existing
                       visual if possible
  https://bugzilla.mozilla.org/show_bug.cgi?id=567065

  Mozilla Bug 445250 - cairo_draw_with_xlib should provide a non-NULL visual
                       to callback
  https://bugzilla.mozilla.org/show_bug.cgi?id=445250

Reported-by: Oleg Romashin <romaxa@gmail.com>
2010-06-11 10:42:15 +01:00
Chris Wilson
eeafeebd2e path: Exponentially grow buffer based on populated points and ops.
Instead of simply doubling the buffer size every time we overflow a point
or an op, enlarge the buffer to fit twice the number of used points and
ops.  We expect paths to be fairly consistent in the mix of operations,
and this allows the buffer size to tune itself to actual usage and reduce
wastage.
2010-06-11 09:06:20 +01:00
Andrea Canciani
836f616659 gl: support single stop gradients 2010-06-10 16:07:42 +02:00
Andrea Canciani
d17e2c5e23 ps: support single stop gradients 2010-06-10 16:07:42 +02:00
Andrea Canciani
eb7fc35115 pdf: support single stop gradients 2010-06-10 16:07:42 +02:00
Andrea Canciani
e2660a0eac pattern: improve single stop gradients handling
None-extended single stop gradients are now explicitly made clear.
2010-06-10 16:07:42 +02:00
Andrea Canciani
a0f8cfe646 pattern: improve degenerate gradients handling
Degenerate radial gradients are now considered clear.
2010-06-10 16:07:42 +02:00
Andrea Canciani
bccd89b417 gstate: correct optimizations
Gradient were previously hand-optimized (without properly checking
for extend modes). By properly using _cairo_pattern functions we
avoid code duplication and bugs.

Fixes linear-gradient-extend, radial-gradient-extend.
2010-06-10 16:07:42 +02:00
Andrea Canciani
06c6207ad4 pattern: add gradient_is_solid function
It contains in a single place the logic needed to check if a gradient
pattern is solid (within a specified region).
2010-06-10 16:07:42 +02:00
Andrea Canciani
561625ee3b pattern: improve clear/opaque check functions
_cairo_pattern_is_opaque was missing some checks about the extend type.
Conversely _cairo_pattern_is_clear was being too strict about gradients.
2010-06-10 16:07:42 +02:00
Andrea Canciani
baaf312e04 pattern: remove content field from solid patterns
The content field in solid patterns had ill-defined semantic (or no
semantic at all), thus it can be removed.
2010-06-10 16:07:41 +02:00
Andrea Canciani
7461947eb1 surface: remove content argument from is_similar
The content argument was basically unuses.

Xlib change extracted from ickle's wip/compositor branch.
2010-06-10 16:07:41 +02:00
Chris Wilson
8d67186cb2 gstate: Track whether the combination of ctm * device is identity.
In the fairly common condition that both the ctm and the device
transforms are identity, the function overhead of calling the matrix
multiplication on the point overwhelmingly dominates.
2010-06-10 13:13:12 +01:00
Dmitri Vorobiev
5cb764850f scaled-font: optimize cairo_scaled_font_text_to_glyphs()
This patch serves two purposes. First, it factors out the heavy part
of the cairo_scaled_font_text_to_glyphs() routine thus allowing GCC
to better optimize the cache cleanup loop. Keeping the look-up table
indices in a separate array speeds up array initialization even further.

Second, this patch introduces a shortcut for the case when the string
to be rendered consists of a single character. In this case, caching is
not necessary at all.

We have a benchmark that uses Cairo to render a large amount of random
strings of consisting of printable ASCII characters. Below are Oprofile
results collected while running this benchmark. It is easy to see that
the heavy part becomes noticeably lighter.

Before:

Profiling through timer interrupt
samples  %        app name                 symbol name
198755   13.5580  libcairo.so.2.10907.0    cairo_scaled_font_text_to_glyphs
88580     6.0424  libcairo.so.2.10907.0    _cairo_scaled_glyph_lookup
81127     5.5340  libcairo.so.2.10907.0    _cairo_hash_table_lookup
68186     4.6513  libcairo.so.2.10907.0    cairo_scaled_font_glyph_extents
47145     3.2160  libcairo.so.2.10907.0    _composite_glyphs_via_mask
46327     3.1602  libcairo.so.2.10907.0    _cairo_scaled_font_glyph_device_extents
44817     3.0572  libcairo.so.2.10907.0    _composite_glyphs
40431     2.7580  libcairo.so.2.10907.0    .plt

After (note that cairo_scaled_font_text_to_glyphs_internal_single() was inlined):

Profiling through timer interrupt
samples  %        app name                 symbol name
107264    7.6406  libcairo.so.2.10907.0    cairo_scaled_font_text_to_glyphs_internal_multiple
87888     6.2604  libcairo.so.2.10907.0    _cairo_scaled_glyph_lookup
79011     5.6281  libcairo.so.2.10907.0    _cairo_hash_table_lookup
71723     5.1090  libcairo.so.2.10907.0    cairo_scaled_font_glyph_extents
48084     3.4251  libcairo.so.2.10907.0    _composite_glyphs_via_mask
46636     3.3220  libcairo.so.2.10907.0    _cairo_scaled_font_glyph_device_extents
44740     3.1869  libcairo.so.2.10907.0    _composite_glyphs
42472     3.0254  libc-2.8.so              _int_malloc
39194     2.7919  libcairo.so.2.10907.0    _cairo_gstate_transform_glyphs_to_backend
38614     2.7506  libcairo.so.2.10907.0    .plt
37063     2.6401  libcairo.so.2.10907.0    _cairo_ft_ucs4_to_index
36856     2.6253  libc-2.8.so              random
36376     2.5911  libcairo.so.2.10907.0    _cairo_scaled_glyphs_equal
34545     2.4607  libcairo.so.2.10907.0    cairo_matrix_transform_point
31690     2.2573  libc-2.8.so              malloc
29395     2.0939  libcairo.so.2.10907.0    _cairo_matrix_is_identity
26142     1.8621  libcairo.so.2.10907.0    _cairo_utf8_to_ucs4
24406     1.7385  libc-2.8.so              free
24059     1.7138  libcairo.so.2.10907.0    cairo_scaled_font_text_to_glyphs

[ickle: slightly amended for stylistic consistency.]
2010-06-10 12:05:41 +01:00
Andrea Canciani
c43399fa68 gl: fix compilation on MacOS X
MacOS X uses different defines to avoid multiple inclusion of GL
header files. Adding them to glew.h fixes the compilation when GL is
enabled.
2010-06-09 17:53:09 +02:00
Chris Wilson
6eb5f859f1 bo: And disable DEBUG_TRAPS again.
Meh. I'm going back to bed. Thanks Joonas for catching this.
2010-06-09 10:40:32 +01:00
Chris Wilson
56c081bdc6 bo: Fix debugging for changes in internal traps api. 2010-06-09 10:33:01 +01:00
Benjamin Otte
a946d39555 gl: Add support for clip regions to the span renderer
Clip surface support is still missing, but i suppose that'd need a tiny
bit more code...
2010-06-08 22:23:12 +02:00
Benjamin Otte
f61b3f25af gl: Add an assertion that we always have a texture
When painting, the sources must be textures and not windows, and we did
that wrong previously. This assertion makes sure that never happens
again.
2010-06-08 22:23:12 +02:00
Benjamin Otte
c6c9a24a1d gl: Use CAIRO_COLOR_BLACK
... instead of creating black on our own - and wrong, too.
2010-06-08 22:23:12 +02:00
Benjamin Otte
19bc6793d1 gl: Only clone texture surfaces
Using non-texture surfaces as source or mask will fail, so we need to
fallback.
Caught by the subsurface-modify-child test.
2010-06-08 22:23:12 +02:00
Benjamin Otte
44483d843e gl: Fix argument order
oops...
2010-06-08 22:23:12 +02:00
Benjamin Otte
7d8359721b gl: Fix span renderer doing bad stuff for CLEAR and SOURCE
SOURCE will fallback now, CLEAR is identical to DEST_OUT with white.
2010-06-07 16:46:46 +02:00
Benjamin Otte
ef8fd1355e gl: Fix span renderer for unbounded spans
The span renderer used to not output rects for the top and bottom rows
when they didn't contain any spans.
2010-06-07 15:03:37 +02:00
Benjamin Otte
1d11af083f gl: Add a simple spans renderer for stroke/fill
It's very simple as clipped polygons or ANTIALIAS_NONE still return
UNSUPPORTED. Also, no optimizations are done, so even pixel-aligned
rectangles use the full span rendering.

Still, there are no performance regressions in the benchmark traces and
firefox-talos-svg and swfdec-giant-steps both got ~15% faster.
2010-06-07 13:37:49 +02:00
Benjamin Otte
550335efed Remove _cairo_surface_composite_trapezoids_as_polygon()
The function computed the composite rectangles wrong and was only used
in a gl fallback anyway. So instead of trying to fix it, just remove it
and make sure gl doesn't fallback.
2010-06-07 13:37:49 +02:00
Benjamin Otte
1e003fce8f gl: Fix vertex size changes not causing updates of the operands
Check vertex size stays identical when setting up vertices.
2010-06-07 13:37:49 +02:00
Benjamin Otte
39143400dd gl: Add a gradient texture cache
For firefox-planet-gnome, 19135 times a gradient gets rendered using
only 10 different gradients. So we get a 100% hit rate in the cache.
Unfortunately, texture upload is not the biggest problem of this test,
as the performance increase is only moderate - at least on i965:
34.3s => 33.5s
2010-06-07 13:37:49 +02:00
Benjamin Otte
932ab2641e device: flush before setting finished
Otherwise APIs critical for flushing - in particular acquiring the
device - do not work.
2010-06-07 13:37:49 +02:00
Benjamin Otte
35e219d08f gl: Make gradient textures a separate object
This is necessary so we can do proper refcounting and don't delete the
gradient texture prematurely.
2010-06-07 13:37:49 +02:00
Benjamin Otte
9c17a477d2 gl: Use the generic functions for filter/extend in gradients 2010-06-07 13:37:49 +02:00
Benjamin Otte
df93802765 gl: Create separate functions for setting extend and filter 2010-06-07 13:37:49 +02:00
Benjamin Otte
10e71806d2 gl: Switch to deferred rendering
1) call _cairo_gl_composite_flush() or cairo_surface_flush() where
   needed
2) Destroy texture operands when necessary
3) get rid of _cairo_gl_composite_end()

With this patch, vertices are not flushed immediately anymore, but only
when needed or when a new set of vertices is emitted that requires an
incompatible setup. This improves performance a lot in particular for
text. (gnome-terminal-vim gets 10x faster)
2010-06-07 13:37:48 +02:00
Benjamin Otte
f2f79ca1b3 gl: Make using shaders an explicit argument
This is preparation for a followup patch
2010-06-07 13:37:48 +02:00
Benjamin Otte
19c1d8316e gl: Special case blend mode for CAIRO_CONTENT_COLOR
This ensures that alpha stays at 1 for RGB in all cases.
2010-06-07 13:37:48 +02:00
Benjamin Otte
1f249064cc gl: rework _cairo_gl_set_operator()
1) store the current operator. This will be useful later to check if the
   operator changed.
2) pass the context instead of the destination as first argument. The
   destination is known to be the current target.
2010-06-07 13:37:48 +02:00
Benjamin Otte
f66500d8b0 gl: Only resetup textures if we need to 2010-06-07 13:37:48 +02:00