'C++ why max of 64bit double has 308 digits?

In my environment (Win10 64bit, VC++2019 and 32bit project), sizeof(double) is 8 bytes, the max value should be 1.84e19. But std::numeric_limits<double>::max() is around 1.79e308.

Why they are so different?


The WIKI has more detailed informations: Double-precision floating-point format

The bits are laid out as follows: enter image description here

The real value assumed by a given 64-bit double-precision datum with a given biased exponent e and a 52-bit fraction is:

enter image description here

So if you google 1.999*2^1023 in your browser, it will give you 1.796794e+308.

c++


Solution 1:[1]

Floating point types work differently than integer types.

Integer types directly map their min-max range values to all the binary permutations from all 0s to all 1s. So they can directly represent a value, if its in range.

Floating point types are essentially 2 numbers: fraction (mantissa) and exponent.

So number 1,000,000 could be represented by fraction 1 and exponent 6 (10^6).

So a floating point number can represent a massive range of numbers (range is limited by exponent range) with variable accuracy (accuracy is limited by fraction range). As numbers get larger, due to the limited digits of fraction, smaller numbers will get less and less accurate. Eg if you reach 10,000 , adding 0.0001 will result in closest number to 10,000.0001 that can be represented, like 10,000.025786 or something.

As per IEE765, a double has 11 bits of exponent and 52 bits for fraction.

So when you pair a 52 bit long number, with a 11 bit long exponent, you get quite a large number. But that doesnt mean every number in that smallest to largest range can be accurately represented, unlike an integer type.

Solution 2:[2]

C++ why max of 64bit double has 308 digits?

In my environment (Win10 64bit

Because your system uses hardware that conforms to the IEEE 754 specification (as does most hardware). That document specifies the the largest finite representable value of 64 bit binary floating point to be 21024 which is a bit less than 1.8 E308.


sizeof(double) is 8 bytes, the max value should be 1.84e19

There are about 1.84e19 positive integers that are representable by 8 bytes. Are you assuming that double is an unsigned integer type? It is not, so your assumption is misplaced.

Solution 3:[3]

  1. Largest double (DBL_MAX) is 2^971 * (2^53 - 1). (309 digits)
  2. Next after the largest is 2^971 * (2^53 - 2).
  3. Next after is 2^971 * (2^53 - 3).

...

  1. Next after is 2^971 * (2^53 - 2^52). (308 digits)
  2. Next after is 2^970 * (2^53 - 1).

nextafter return the next representable floating-point value following x in the direction of y.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Community
Solution 3 Michel