'Concept related to precision of float and double
Why the precision of the float
is up to 6 digits after the decimal point and the precision of the double
is up to 15 digits after the decimal point?
Can anyone give a mathematical explanation of it?
Solution 1:[1]
Saying the precision of float
or double
is some number of decimal digits is sloppy terminology. float
and double
are commonly implemented using the IEEE-754 binary32 and binary64 formats, which use significands of 24 and 53 bits, respectively. (The significand is the fraction portion of a floating-point representation. It is accompanied by a sign bit and an exponent.) The precisions of these formats are 24 bits and 53 bits.
The precision is sufficient that, for float
, any decimal numeral with six significant digits can be converted to float
accurately enough that conversion back to a six-digit decimal numeral produces the same number. (All conversions discussed herein use round-to-nearest.) Six is the limit; there are some seven-digit decimal numerals that do not survive a round-trip conversion. For double
, the limit is 15 significant digits.
Here is how we find the limits. A 24-bit significand can represent 224 values. That is 16,777,216, so it is certainly capable of representing all seven-digit decimal numerals from 0 to 9,999,999. However, if we want to represent some seven-digit decimal numeral a.bcdefg•10n, the last decimal numeral in this interval, 9.999999•10n, has some float
representation f•2e. Because the powers of 10 are irregular relative to the powers of two, that f might be small, meaning that we are not representing the numbers from 0.000000•10n to 9.999999•10n with significands entirely in a single sequence 0 to 16,777,215. (I am glossing over some details here; numbers which would have significands under 222 are rescaled to another exponent in the normal range. But being in a finer exponent range gives more precision, so that is not an issue.) Rather, there was a rollover at some point, and the exponent increased. So we lose one bit of precision. So the number of decimal digits we can guarantee survive a round-trip conversion to a floating-point format with p digits in base b is floor((p?1)•log10b). (That formula is given in C 2018 5.2.4.2.2 12.)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |