The fixed pool allocation on stack is very fast and works very well
for rendering glyphs smaller than 40 pixels. Larger glyphs have to be
split and rendered piecewise, which is slower. This commit introduces
dynamic pool allocation for larger glyphs. Complex large glyphs are
now rendered about 2x faster.
* src/smooth/ftgrays.c (gray_convert_glyph): Use simpler banding schema
in case of rendering emergency.
(gray_raster_render): Allocate larger pools dynamically.
* include/freetype/config/ftoption.h: Explain the render pool size.
* devel/ftoption.h: Ditto.
Applying an LCD filter to spans rather than the entire image improves
the performance of ClearType-like rendering by about 40% at 32 ppem
and much more at larger sizes. Small rounding differences are expected.
* src/smooth/ftsmooth.c (ft_smooth_raster_lcd, ft_smooth_lcd_spans,
ft_smooth_raster_lcdv, ft_smooth_lcdv_spans, TOrigin): Implement it.
* include/freetype/internal/ftobjs.h (FT_LibraryRec): lcd_filter_func gone.
* src/base/ftlcdfil.c (ft_lcd_filter_fir): Removed.
(ft_lcd_padding): Use padding sufficient for any 5-tap filter.
(FT_Library_SetLcdFilterWeights, FT_Library_SetLcdFilter): Updated.
* docs/CHANGES: Updated.
This removes the internal face property that sets the filtering weights.
The global filtering algorithms and weights are now optimized to work
well under all conditions.
* include/freetype/internal/ftobjs.h (FT_Face_InternalRec): Do it.
* include/freetype/freetype.h (FT_Face_Properties): Revised docs.
* include/freetype/ftparams.h (FT_PARAM_TAG_LCD_FILTER_WEIGHTS): Ditto.
* src/base/ftlcdfil.c (ft_lcd_padding): Updated.
* src/base/ftobjs.c (ft_open_face_internal, FT_Face_Properties): Ditto.
* src/smooth/ftsmooth.c (ft_smooth_render): Ditto.
* docs/CHANGES: Updated.
This is a better fix for #1384, which is rather about signed overflow.
* include/freetype/ftimage.h (FT_Span): Use unsigned position.
* src/smooth/ftgrays.c (gray_sweep_direct): Sync with FT_Span.
* src/smooth/ftsmooth.c (ft_smooth_render): Remove redundant shift.
* src/base/ftobjs.c (ft_glyphslot_preset_bitmap): Readjust limits.
To support WASM targets with slow or unsupported setjmp and longjmp,
we eliminate these calls in favor of an error propagation model.
When gray_set_cell is out of cells, it raises an exception which is
later handled in gray_convert_glyph_inner.
This is a less invasive alternative to !385.
* src/smooth/ftgrays.c (gray_set_cell): Raise the overflow exception
and redirect all work to `cell_null`.
(gray_move,line,conic,cubic_to): Return the exception.
(gray_convert_glyph, gray_convert_glyph_inner): Handle the exception.
This doubles the number or allowed points, see
https://github.com/harfbuzz/harfbuzz/issues/4752
Although it is hardly practical to use more than 32767 points,
other font engines seem to support it.
* docs/CHANGES: Announce it.
* include/freetype/ftimage.h (FT_Outline): Do it and update limits.
* src/*: Update `FT_Outline` users.
With horizontal bisections, the smallest section is a whole single
scanline. Almost horizontal lines or other complex scanlines can
easily overflow the rendering pool. Switching to vertical bisections
splits the scanlines and should rule out the overflows. Fixes#1269.
* src/smooth/ftgrays.c (gray_convert_glyph): Bisect vertically.
As a result of 7b308a29dd, the regular 64-bit execution is now faster
than SSE2. The rendering speed of script fonts at 64 ppem or larger is
improved by about 3% without SSE2. See !314 for the testing results.
* src/smooth/ftgrays.c (gray_render_conic)[FT_INT64]: Remove SSE2 code.
* src/base/ftoutln.c (FT_Outline_Reverse, FT_Outline_EmboldenXY,
FT_Outline_Get_Orientation): Set the first and last indexes together.
(FT_Outline_Decompose): Ditto and check them more stringently.
* src/smooth/ftgrays.c (FT_Outline_Decompose)[STANDALONE_]: Ditto.
Modern compilers get more insistent on that...
* include/freetype/internal/compiler-macros.h (FALL_THROUGH): Define.
* src/*: Use it instead of `/* fall through */` comments.
Fixes#1164 by using a volatile variable around `setjmp`. It is hard to
say how this fixes crashes related to certain link-time optimizations.
This does not decrease the rendering performance.
* src/smooth/ftgrays.c (gray_convert_glyph_inner): Use volatile `error`.
We really have to use double casts to avoid issues with C's and C++'s
signedness propagation rules in implicit casts.
Reported as
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=41178https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=41182
* include/freetype/config/public-macros.h (FT_STATIC_CAST,
FT_REINTERPRET_CAST): Modify macro to take two arguments.
Update all callers.
(FT_STATIC_BYTE_CAST): New macro.
* include/freetype/freetype.h (FT_ENC_TAG): Use `FT_STATIC_BYTE_CAST`.
* include/freetype/ftimage.h (FT_IMAGE_TAG): Ditto.
* include/freetype/fttypes.h (FT_MAKE_TAG): Ditto.
Use `FT_Tag` for casting.
* src/ftraster/ftmisc.h (FT_MAKE_TAG): Removed, no longer needed.
(FT_STATIC_BYTE_CAST): New macro.
* src/smooth/ftgrays.c (FT_STATIC_CAST): Replace with...
(FT_STATIC_BYTE_CAST): ... this.
Many FreeType clients use C++. However `g++ -Wold-style-cast` warns for
macros with C-style casts even for system header files; this also affects
directories included with `-isystem`. While this could be seen as a problem
with g++, the problem is more a philosophical one: Over the time, C and C++
diverged more and more, and some features of C are no longer the 'right'
solution in C++.
* include/freetype/config/public-macros.h (FT_STATIC_CAST,
FT_REINTERPRET_CAST): New macros.
* include/freetype/freetype.h (FT_ENC_TAG, FT_LOAD_TARGET_,
FT_LOAD_TARGET_MODE): Use `FT_STATIC_CAST`.
Correctly handle negative 'signed char' input.
* include/freetype/ftimage.h (FT_IMAGE_TAG): Ditto.
* include/freetype/fttypes.h (FT_MAKE_TAG, FT_BOOL): Ditto.
* include/freetype/ftmodapi.h (FT_FACE_DRIVER_NAME): Use
`FT_REINTERPRET_CAST`.
* src/smooth/ftgrays.c (FT_STATIC_CAST)[STANDALONE_]: New macro.
[!STANDALONE]: Include `FT_CONFIG_CONFIG_H`.
Fixes#1116.
* src/smooth/ftgrays.c (FT_UDIVPREP, FT_UDIV): Reduce shift.
Smaller shifts that keep the division operands of FT_UDIVPREP within
32 bits result in slightly faster divisions, which is noticeable in
the overall performance. The loss of precision is tolerable until the
divisors (the components dx and dy) approach 32 - PIXEL_BITS. With
PIXEL_BITS = 8, this corresponds to 65,000 pixels or the bitmap size
that we refuse to render anyway.
Using `ftbench -p -s60 -t5 -bc timesi.ttf`,
Before: 8.52 us/op
After: 8.32 us/op
MSVC does not set `__SSE2__`. Instead one must check whether `_M_IX86_FP` is
defined and greater than or equal to 2.
* src/smooth/ftgrays.c (FT_SSE2): New macro.
Use it where appropriate.
Put the null cell at the end of the pool and store it explicitly so that
we can use it as both the limit and the dumpster.
* src/smooth/ftgrays.c (gray_TWorker): Store the last `cell_null` and
remove unnecesary fields.
(NULL_CELL_PTR, CELL_IS_NULL): Remove in favor of explicit `cell_null`.
(gray_dump_cells, gray_set_cell, gray_sweep{,_direct}): Update callers.
(gray_convert_glyph_inner): Trace remaining cells (oh well).
(gray_convert_glyph): Set up `cell_null` and slightly improve the pool
management.