Speedups
========
firefox-paintball 59462.09 -> 40928.76: 1.45x speedup
firefox-fishtank 43687.33 -> 34627.78: 1.26x speedup
firefox-tron 52526.00 -> 45754.73: 1.15x speedup
However in order to avoid a regression with firefox-talos-svg we need to
prevent splitting up the scanline when using a gradient source.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We were open-coding the functionality of map-to-image inside the source
creation routines. so refactor to actually use map-to-image instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be more strict with when we mark the pixmap as active so that we only
wait for the actual XCopyArea involving the pixmap to complete.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Try using the lighter-weight LZO decompressor in an effort to speed up
replays (at the cost of making the bound traces slightly larger).
Presuming that with the slight increase in file size (from -1% to +10%),
the file data remains in the readahead buffer cache, replays see a
performance improvement of between 5-10%.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When clearing a GL surface, set is_clear to true, and when mapping to an
image, handle is_clear like surfaces without modification. Additionally,
explicitly clear surfaces created via cairo_surface_create_similar.
Writing to the stencil buffer can be expensive, so when using the
stencil buffer for clipping only clear the clip extent. When using the
stencil buffer to prevent overlapping rendering during stroking, only
clear the approximate stroke extents.
s/CAIRO_GOBJECT_TYPE_HNT_METRICS/CAIRO_GOBJECT_TYPE_HINT_METRICS/
However, as we have already released the broken headers, we need to
preserve that mistake in case applications are already using. Since it
is just a #define, there is little associated cost with carrying both
the incorrect spelling and the corrected define.
Whilst it cannot handle self-intersecting strokes (which includes the
antialias region of neighbouring lines and joints), it is about 3x
faster to use than the more robust algorithm. As some backends delegate
the rendering, the quality may still be preserved and so they should be
responsible for choosing the appropriate method for generation of the
stroke geometry.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In theory this should just save a single copy, however PutImage will
break up requests into a series of scanlines requests which is less
efficient than the single-shot transfer provided by ShmPutImage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to overload the emitters in future to provide specialised
routines for the common types of operands, begin by switching the
current users over to a vfunc interface.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As our lazy event mechanism is sufficient for tracking when to reuse shm
memory, and the events are not necessary for ShmPut/ShmGetImage paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Say, you were asking cairo for a font at 200px. For bitmap-only fonts,
cairo was finding the closes strike size and using it. If the strike
was at 20px, well, that's what you were getting. We now scale that 20px
strike by a factor of 10 to get the correct size rendering.
Note that by itself this patch doesn't change much on the Linux desktop.
The reason is that the size you are interested in (eg. 200px) is lost by
fontconfig. When you request a font at 200px, fontconfig returns a font
pattern that says 20px, and so the next layers thing you want a font at
20px. To address that, one also needs a piece of fontconfig config that
puts the 200 back into the pixelsize. Something like this:
<match target="font">
<test name="scalable" mode="eq">
<bool>false</bool>
</test>
<edit name="pixelsize" mode="assign">
<times>
<name>size</name>
<name>dpi</name>
<double>0.0138888888888</double> <!--1/72.-->
</times>
</edit>
</match>
I'm going to try to upstream this config so it will be enabled by
default. The config can be a bit smarter. For example, if
metricshinting is enabled and the size difference is small, we may as
well not scale.
The nice thing about this is that the configuration of whether and when
to scale bitmaps will be done in fontconfig, not cairo / Qt / ... code.
Despite subclassing image surfaces, we never called down to the image
surface destructor, so we leaked a pixman_image_t every time.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=882976
Signed-off-by: Adam Jackson <ajax@redhat.com>
This teaches the xcb backend how to split up a PutImage request for a subimage
into multiple requests. The backend already does the same for "normal" PutImage
where it doesn't have to assemble the image from various rows.
Signed-off-by: Uli Schlachter <psychon@znc.in>
This creates an image surface with a non-natural stride and paints it to a
similar surface.
In the xcb backend, this causes a call to _cairo_xcb_connection_put_subimage()
which tries to send a huge PutImage request. As a result, xcb kills the X11
connection.
Signed-off-by: Uli Schlachter <psychon@znc.in>
The old code uses an uninitialized variable for the extents of the group that is
created. This patch makes it use an unbounded recording surface instead.
This has the implicit assumption that everything that is unbounded smells like a
recording surface. Let's see when this assumption breaks. :-)
http://lists.cairographics.org/archives/cairo/2012-October/023585.html
Signed-off-by: Uli Schlachter <psychon@znc.in>
The boilerplate code makes sure that our tests didn't cause any X11 errors or
X11 events, because those might confuse API users.
However, when the keyboard layout changes, every connection gets a MappingNotify
event. This means that the test and performance test suites failed when the
keyboard layout was changed while they are running.
Fix this by ignoring MappingNotifies.
Reported by Arthur Huillet on IRC.
Signed-off-by: Uli Schlachter <psychon@znc.in>
Some OpenGLES2 drivers support downloading BGRA data. On little-endian
systems BGRA and GL_UNSIGNED_BYTe is equivalent to the typical
cairo_image_t format, so this can prevent CPU bit swizzling for
operations that involve images.
Instead of allocating a depth/stencil buffer for all surfaces, share a
common buffer that's the size of the largest surface. This reduces
video memory usage when there are many GL surfaces.
This is important because there are places in the code where msaa_active
is used to decide whether or not to complete an operation with
multisampling or not.
As the random selection of a gradient can possible destroy the currently
active gradient, we need to flush the context in order to flush any
references to the texture before deletion.
Instead of asserting that the caller passed in a chunk-aligned base
pointer, just perform the fixup whilst initialising the mempool. This
means that the caller (xcb!) cannot assume that the mempool->base is
then the same base pointer as passed in and so needs to store it
separately for use in computing SHM offsets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The idiom (and expectation) for surface operators is that it leaves the
surface on the stack for the next operation. Also we need to hold onto a
surface reference for objects put onto the stack, yet for the
map-to-image return we did not own one.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Adjust the stack manipulation to avoid moving an unknown surface to
the dictionary.
Reported-by: Dongyeon Kim <dy5.kim@samsung.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>