Manostaxx
The text that follows is owned by the site above referred.
Here is only a small part of the article, for more please follow the link
SOURCE: https://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html
In computers, floating-point numbers are represented in scientific notation of fraction (F
) and exponent (E
) with a radix of 2, in the form of F×2^E
. Both E
and F
can be positive as well as negative. Modern computers adopt IEEE 754 standard for representing floating-point numbers. There are two representation schemes: 32-bit single-precision and 64-bit double-precision.
In 32-bit single-precision floating-point representation:
- The most significant bit is the sign bit (
S
), with 0 for positive numbers and 1 for negative numbers. - The following 8 bits represent exponent (
E
). - The remaining 23 bits represents fraction (
F
).
Normalized Form
Let’s illustrate with an example, suppose that the 32-bit pattern is 1 1000 0001 011 0000 0000 0000 0000 0000
, with:
S = 1
E = 1000 0001
F = 011 0000 0000 0000 0000 0000
In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form of 1.F
. In this example, the actual fraction is 1.011 0000 0000 0000 0000 0000 = 1 + 1×2^-2 + 1×2^-3 = 1.375D
.
The sign bit represents the sign of the number, with S=0
for positive and S=1
for negative number. In this example with S=1
, this is a negative number, i.e., -1.375D
.
In normalized form, the actual exponent is E-127
(so-called excess-127 or bias-127). This is because we need to represent both positive and negative exponent. With an 8-bit E, ranging from 0 to 255, the excess-127 scheme could provide actual exponent of -127 to 128. In this example, E-127=129-127=2D
.
Hence, the number represented is -1.375×2^2=-5.5D
.