<< Chapter < Page | Chapter >> Page > |
The IEEE standard specifies how computations are to be performed on floating- point values on the following operations:
These operations are specified in a machine-independent manner, giving flexibility to the CPU designers to implement the operations as efficiently as possible while maintaining compliance with the standard. During operations, the IEEE standard requires the maintenance of two guard digits and a sticky bit for intermediate values. The guard digits above and the sticky bit are used to indicate if any of the bits beyond the second guard digit is nonzero.
In [link] , we have five bits of normal precision, two guard digits, and a sticky bit. Guard bits simply operate as normal bits — as if the significand were 25 bits. Guard bits participate in rounding as the extended operands are added. The sticky bit is set to 1 if any of the bits beyond the guard bits is nonzero in either operand. If you are somewhat hardware-inclined and you think about it for a moment, you will soon come up with a way to properly maintain the sticky bit without ever computing the full “infinite precision sum.” You just have to keep track as things get shifted around. Once the extended sum is computed, it is rounded so that the value stored in memory is the closest possible value to the extended sum including the guard digits. [link] shows all eight possible values of the two guard digits and the sticky bit and the resulting stored value with an explanation as to why.
Extended Sum | Stored Value | Why |
1.0100 000 | 1.0100 | Truncated based on guard digits |
1.0100 001 | 1.0100 | Truncated based on guard digits |
1.0100 010 | 1.0100 | Rounded down based on guard digits |
1.0100 011 | 1.0100 | Rounded down based on guard digits |
1.0100 100 | 1.0100 | Rounded down based on sticky bit |
1.0100 101 | 1.0101 | Rounded up based on sticky bit |
1.0100 110 | 1.0101 | Rounded up based on guard digits |
1.0100 111 | 1.0101 | Rounded up based on guard digits |
The first priority is to check the guard digits. Never forget that the sticky bit is just a hint, not a real digit. So if we can make a decision without looking at the sticky bit, that is good. The only decision we are making is to round the last storable bit up or down. When that stored value is retrieved for the next computation, its guard digits are set to zeros. It is sometimes helpful to think of the stored value as having the guard digits, but set to zero.
Two guard digits and the sticky bit in the IEEE format insures that operations yield the same rounding as if the intermediate result were computed using unlimited precision and then rounded to fit within the limits of precision of the final computed value.
At this point, you might be asking, “Why do I care about this minutiae?” At some level, unless you are a hardware designer, you don’t care. But when you examine details like this, you can be assured of one thing: when they developed the IEEE floating-point standard, they looked at the details very carefully. The goal was to produce the most accurate possible floating-point standard within the constraints of a fixed-length 32- or 64-bit format. Because they did such a good job, it’s one less thing you have to worry about. Besides, this stuff makes great exam questions.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?