Page: 9 prev next

Floating Point Representation of Numbers

Numerical inaccuracy in mathematical computations stems from the need to represent real numbers in floating point format.

In floating point format, a fixed number of bits of storage are used to hold the sign, mantissa, and exponent of a real number, usually in base 2.

In IEEE standard floating point on a 32 bit computer, a double precision number consists of
  • 1 bit for the sign,
  • 11 bits for the exponent, and
  • 52 bits for the mantissa.

Some numbers, called machine numbers, can be exactly represented in floating point. Others can't and must be rounded to the nearest machine number, introducing a small error.