Author: | Wojciech Muła |
---|---|

Added on: | 2013-12-27 |

This short article shows how a normalized floating point value could be safely converted to an integer value without assistance of FPU/SSE. Only basic bit and arithmetic operations are used. In the worst case following operations are performed:

- 6 comparisons,
- 2 subtracts,
- 1 and,
- 1 or,
- 2 shifts (one with variable, one with constant amount).

The floating point value is calculated as: − 1^{sign} ⋅ (1 + *fraction*) ⋅ 2^{exponent − bias}. The fraction part is in range [0, 1). For 32-bit
values **sign** has 1 bit, **exponent** has 8 bits, **fraction** has
23 bits, and **bias** has value 127; **exponent + bias** is saved as
an unsigned number.

The layout of binary word:

+-+--------+-----------------------+ |S|exp+bias| fraction | +-+--------+-----------------------+ 31 30 23 22 0

Let clear fields **exponent + bias** and **sign** and restore the implicit integer 1 at 24-th bit:

+-+--------+-----------------------+ |0|00000001|xxxxxxxxxxxxxxxxxxxxxxx| +-+--------+-----------------------+ 31 30 23 22 0

The value of such 32-bit word treated as an unsigned integer is (1 + *fraction*) ⋅ 2^{23}. To calculate the result this word have to be shifted left
or right depending on value and sign of `shift := exponent - 23`;
only few cases have to be considered:

- If
**shift**is negative, then the word must be shifted right. The number of significant bits is 24, so if*shift*< − 24 the result is always zero. - If
**shift**is positive, then the word must be shifted left. Since destination is a 32-bit signed value, thus maximum shift is 31 - 24 = 7 bits --- when shift is greater than 7, then overflow will occur. - If − 24 <
*shift*< 7 then the number could be safely shifted. When*shift*= 7, then result has exactly 31 significant bits, thus a range check is required: for positive numbers (sign = 0) maximum value is 2^{31}− 1 and for negative is 2^{31}.

Sample program is available.