Change _cairo_lround to correctly handle edge cases previously missed

A nice side effect of this new approach is that the valid input range
was expanded back to (INT_MIN, INT_MAX]. No performance regressions observed.
Also included is documentation about the internal mysteries of _cairo_lround,
as previously promised.
This commit is contained in:
Dan Amelang 2006-12-09 18:54:47 -08:00
parent cc75159587
commit fea60c7283

View file

@ -3199,7 +3199,7 @@ _cairo_restrict_value (double *value, double min, double max)
/* This function is identical to the C99 function lround(), except that it
* performs arithmetic rounding (instead of away-from-zero rounding) and
* has a valid input range of [INT_MIN / 4, INT_MAX / 4] instead of
* has a valid input range of (INT_MIN, INT_MAX] instead of
* [INT_MIN, INT_MAX]. It is much faster on both x86 and FPU-less systems
* than other commonly used methods for rounding (lround, round, rint, lrint
* or float (d + 0.5)).
@ -3216,41 +3216,194 @@ _cairo_restrict_value (double *value, double min, double max)
* This function doesn't perform any floating-point calculations, and thus
* avoids this penalty.
*/
/* XXX needs inline comments explaining the internal magic
*/
int
_cairo_lround (double d)
{
uint32_t top, shift_amount, output;
union {
uint32_t ui32[2];
double d;
uint64_t ui64;
uint32_t ui32[2];
} u;
uint32_t exponent, most_significant_word, least_significant_word;
int32_t integer_result;
u.d = d;
#ifdef FLOAT_WORDS_BIGENDIAN
most_significant_word = u.ui32[0];
least_significant_word = u.ui32[1];
#else
most_significant_word = u.ui32[1];
least_significant_word = u.ui32[0];
/* If the integer word order doesn't match the float word order, we swap
* the words of the input double. This is needed because we will be
* treating the whole double as a 64-bit unsigned integer. Notice that we
* use WORDS_BIGENDIAN to detect the integer word order, which isn't
* exactly correct because WORDS_BIGENDIAN refers to byte order, not word
* order. Thus, we are making the assumption that the byte order is the
* same as the integer word order which, on the modern machines that we
* care about, is OK.
*/
#if ( defined(FLOAT_WORDS_BIGENDIAN) && !defined(WORDS_BIGENDIAN)) || \
(!defined(FLOAT_WORDS_BIGENDIAN) && defined(WORDS_BIGENDIAN))
{
uint32_t temp = u.ui32[0];
u.ui32[0] = u.ui32[1];
u.ui32[1] = temp;
}
#endif
exponent = 1052 - ((most_significant_word >> 20) & 0x7FF);
integer_result = ((most_significant_word & 0xFFFFF) | 0x100000) << 10;
integer_result |= (least_significant_word >> 22);
#ifdef WORDS_BIGENDIAN
#define MSW (0) /* Most Significant Word */
#define LSW (1) /* Least Significant Word */
#else
#define MSW (1)
#define LSW (0)
#endif
if (most_significant_word & 0x80000000)
integer_result = -integer_result;
/* By shifting the most significant word of the input double to the
* right 20 places, we get the very "top" of the double where the exponent
* and sign bit lie.
*/
top = u.ui32[MSW] >> 20;
integer_result >>= exponent;
/* Here, we calculate how much we have to shift the mantissa to normalize
* it to an integer value. We extract the exponent "top" by masking out the
* sign bit, then we calculate the shift amount by subtracting the exponent
* from the bias. Notice that the correct bias for 64-bit doubles is
* actually 1075, but we use 1053 instead for two reasons:
*
* 1) To perform rounding later on, we will first need the target
* value in a 31.1 fixed-point format. Thus, the bias needs to be one
* less: (1075 - 1: 1074).
*
* 2) To avoid shifting the mantissa as a full 64-bit integer (which is
* costly on certain architectures), we break the shift into two parts.
* First, the upper and lower parts of the mantissa are shifted
* individually by a constant amount that all valid inputs will require
* at the very least. This amount is chosen to be 21, because this will
* allow the two parts of the mantissa to later be combined into a
* single 32-bit representation, on which the remainder of the shift
* will be performed. Thus, we decrease the bias by an additional 21:
* (1074 - 21: 1053).
*/
shift_amount = 1053 - (top & 0x7FF);
if (exponent > 30)
integer_result = 0;
/* We are done with the exponent portion in "top", so here we shift it off
* the end.
*/
top >>= 11;
integer_result = (integer_result + 1) >> 1;
/* Before we perform any operations on the mantissa, we need to OR in
* the implicit 1 at the top (see the IEEE-754 spec). We needn't mask
* off the sign bit nor the exponent bits because these higher bits won't
* make a bit of difference in the rest of our calculations.
*/
u.ui32[MSW] |= 0x100000;
return integer_result;
/* If the input double is negative, we have to decrease the mantissa
* by a hair. This is an important part of performing arithmetic rounding,
* as negative numbers must round towards positive infinity in the
* halfwase case of -x.5. Since "top" contains only the sign bit at this
* point, we can just decrease the mantissa by the value of "top".
*/
u.ui64 -= top;
/* By decrementing "top", we create a bitmask with a value of either
* 0x0 (if the input was negative) or 0xFFFFFFFF (if the input was positive
* and thus the unsigned subtraction underflowed) that we'll use later.
*/
top--;
/* Here, we shift the mantissa by the constant value as described above.
* We can emulate a 64-bit shift right by 21 through shifting the top 32
* bits left 11 places and ORing in the bottom 32 bits shifted 21 places
* to the right. Both parts of the mantissa are now packed into a single
* 32-bit integer. Although we severely truncate the lower part in the
* process, we still have enough significant bits to perform the conversion
* without error (for all valid inputs).
*/
output = (u.ui32[MSW] << 11) | (u.ui32[LSW] >> 21);
/* Next, we perform the shift that converts the X.Y fixed-point number
* currently found in "output" to the desired 31.1 fixed-point format
* needed for the following rounding step. It is important to consider
* all possible values for "shift_amount" at this point:
*
* - {shift_amount < 0} Since shift_amount is an unsigned integer, it
* really can't have a value less than zero. But, if the shift_amount
* calculation above caused underflow (which would happen with
* input > INT_MAX or input <= INT_MIN) then shift_amount will now be
* a very large number, and so this shift will result in complete
* garbage. But that's OK, as the input was out of our range, so our
* output is undefined.
*
* - {shift_amount > 31} If the magnitude of the input was very small
* (i.e. |input| << 1.0), shift_amount will have a value greater than
* 31. Thus, this shift will also result in garbage. After performing
* the shift, we will zero-out "output" if this is the case.
*
* - {0 <= shift_amount < 32} In this case, the shift will properly convert
* the mantissa into a 31.1 fixed-point number.
*/
output >>= shift_amount;
/* This is where we perform rounding with the 31.1 fixed-point number.
* Since what we're after is arithmetic rounding, we simply add the single
* fractional bit into the integer part of "output", and just keep the
* integer part.
*/
output = (output >> 1) + (output & 1);
/* Here, we zero-out the result if the magnitude if the input was very small
* (as explained in the section above). Notice that all input out of the
* valid range is also caught by this condition, which means we produce 0
* for all invalid input, which is a nice side effect.
*
* The most straightforward way to do this would be:
*
* if (shift_amount > 31)
* output = 0;
*
* But we can use a little trick to avoid the potential branch. The
* expression (shift_amount > 31) will be either 1 or 0, which when
* decremented will be either 0x0 or 0xFFFFFFFF (unsigned underflow),
* which can be used to conditionally mask away all the bits in "output"
* (in the 0x0 case), effectively zeroing it out. Certain, compilers would
* have done this for us automatically.
*/
output &= ((shift_amount > 31) - 1);
/* If the input double was a negative number, then we have to negate our
* output. The most straightforward way to do this would be:
*
* if (!top)
* output = -output;
*
* as "top" at this point is either 0x0 (if the input was negative) or
* 0xFFFFFFFF (if the input was positive). But, we can use a trick to
* avoid the branch. Observe that the following snippet of code has the
* same effect as the reference snippet above:
*
* if (!top)
* output = 0 - output;
* else
* output = output - 0;
*
* Armed with the bitmask found in "top", we can condense the two statements
* into the following:
*
* output = (output & top) - (output & ~top);
*
* where, in the case that the input double was negative, "top" will be 0,
* and the statement will be equivalent to:
*
* output = (0) - (output);
*
* and if the input double was positive, "top" will be 0xFFFFFFFF, and the
* statement will be equivalent to:
*
* output = (output) - (0);
*
* Which, as pointed out earlier, is equivalent to the original reference
* snippet.
*/
output = (output & top) - (output & ~top);
return output;
#undef MSW
#undef LSW
}