floating point

The float point contain three parts:

1. signed bit
2. exponent bits (E)
3. fraction bits (M)
(1)
$$V = (-1)^s*M*2^E$$

There are two kinds of floating point:

• normalization
• denormalization

### normalization

when e is not all 1s or all 0s.

$E = e - Bias$
$bias = 2^{k-1} - 1$
$M = 1.f_{n-1}...f_1f_0$

### denormalization

when e is all 0s

$E = 1 - Bias$
$M = f$ with no leading 1

when e is all 1s

when f=0, then it is $\infty$.
otherwise it is NaN (Not a number)

The following example is when 8bits floating point with k=4 n=3 bias = 7. You can see that it's easy to compare each number.

 Description Bit representation e E f M V Zero 0 0000 000 0 -6 0 0 0 Smallest denormalization 0 0000 001 0 -6 1/8 1/8 1/512 0 0000 010 0 -6 2/8 2/8 2/512 0 0000 011 … 0 -6 3/8 3/8 3/512 0 0000 110 0 -6 6/8 6/8 6/512 Largest denormalization 0 0000 111 0 -6 7/8 7/8 7/512 Smallest normalization 0 0001 000 1 -6 0 8/8 8/512 0 0001 001 … 1 -6 1/8 9/8 9/512 0 0110 110 6 -1 6/8 14/8 14/16 0 0110 111 6 -1 7/8 15/8 15/16 One 0 0111 000 7 0 0 8/8 1 0 0111 001 7 0 1/8 9/8 9/8 0 0111 010 … 7 0 2/8 10/8 10/8 0 1110 110 14 7 6/8 14/8 224 Largest norm. 0 1110 111 14 7 7/8 15/8 240 Infinity 0 1111 000 - - - - $\infty$
page revision: 5, last edited: 03 Oct 2008 06:51