Number Representation : 32-bit Single-Precision Floating-Point Numbers

Manostaxx

The text that follows is owned by the site above referred.

Here is only a small part of the article, for more please follow the link

SOURCE: https://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html

In computers, floating-point numbers are represented in scientific notation of fraction (F) and exponent (E) with a radix of 2, in the form of F×2^E. Both E and F can be positive as well as negative. Modern computers adopt IEEE 754 standard for representing floating-point numbers. There are two representation schemes: 32-bit single-precision and 64-bit double-precision.

In 32-bit single-precision floating-point representation:

  • The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative numbers.
  • The following 8 bits represent exponent (E).
  • The remaining 23 bits represents fraction (F).

float

Normalized Form

Let’s illustrate with an example, suppose that the 32-bit pattern is 1 1000 0001 011 0000 0000 0000 0000 0000, with:

  • S = 1
  • E = 1000 0001
  • F = 011 0000 0000 0000 0000 0000

In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form of 1.F. In this example, the actual fraction is 1.011 0000 0000 0000 0000 0000 = 1 + 1×2^-2 + 1×2^-3 = 1.375D.

The sign bit represents the sign of the number, with S=0 for positive and S=1 for negative number. In this example with S=1, this is a negative number, i.e., -1.375D.

In normalized form, the actual exponent is E-127 (so-called excess-127 or bias-127). This is because we need to represent both positive and negative exponent. With an 8-bit E, ranging from 0 to 255, the excess-127 scheme could provide actual exponent of -127 to 128. In this example, E-127=129-127=2D.

Hence, the number represented is -1.375×2^2=-5.5D.

Leave a Reply

Your email address will not be published. Required fields are marked *