- Floating point representation
- Machine epsilon and rounding error
- Error propagation and estimation
- Comparing two floats
- Minimize the error
- In modern computer architectures like RISC-V, arm, a64 and x86, IEEE 754 standard is used for floating point representation.
- IEEE 754 use the scientific notation where every number can be represented as (−1)^sign × mantissa × base^exponent where the mantissa is a number smaller than the exponent.
- For modern architectures the base is 2.
- Based on the number of bits there are multiple floating point number(32, 64, 128, 80 also exists). 32bit floting point number is structured like this (sign 1 bit, mantissa 23 bits, exponent 8 bits) where the fact that the number is binary is used and the mantissa implicitly start with 1 (most of the time) which give the mantisa one extra bit of information (24)
- The IEEE 754 cannot represent every possible number. Like integer numbers, floats create grid on real axis but unlike integers this grid is not equidistant. Numers are denser around the zero and get sparse with rise of the exponent.
- IEEE 754 supports special cases like signed zeroes, infinity, nans and Subnormal number