+ All Categories
Home > Documents > An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative...

An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative...

Date post: 19-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
33
An Introduction to Floating Point Arithmetic by Example Pat Quillen 21 January 2010 Floating Point Arithmetic by Example – p.1/15
Transcript
Page 1: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

An Introduction to Floating PointArithmetic by Example

Pat Quillen

21 January 2010

Floating Point Arithmetic by Example – p.1/15

Page 2: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example

What is the value of

1 − 3 ∗ (4/3 − 1)

according to MATLAB?

Floating Point Arithmetic by Example – p.2/15

Page 3: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example

What is the value of

1 − 3 ∗ (4/3 − 1)

according to MATLAB?

2.220446049250313e-016

Floating Point Arithmetic by Example – p.2/15

Page 4: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example

What is the value of

1 − 3 ∗ (4/3 − 1)

according to MATLAB?

2.220446049250313e-016

Why??

Floating Point Arithmetic by Example – p.2/15

Page 5: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example

What is the value of

1 − 3 ∗ (4/3 − 1)

according to MATLAB?

2.220446049250313e-016

Why?? Essentially because 4/3 cannot be represented

exactly by a binary number with finitely many terms.

Floating Point Arithmetic by Example – p.2/15

Page 6: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example (continued)

Notice that4

3=

13

4

=1

1 − 1

4

=

∞∑

k=0

1

4

k

That is,4

3= 1 +

1

22+

1

24+

1

26+ · · ·

or, in binary,4

3= 1.010101010101 · · ·

which, again, is not exactly representable by finitely many

terms.

Floating Point Arithmetic by Example – p.3/15

Page 7: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Representation

In binary computers, most floating point numbers arerepresented as

(−1)s 2e (1 + f)

where

s is represented by one bit (called the sign bit).

e is the exponent.

f is the mantissa.

Floating Point Arithmetic by Example – p.4/15

Page 8: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Representation

In binary computers, most floating point numbers arerepresented as

(−1)s 2e (1 + f)

where

s is represented by one bit (called the sign bit).

e is the exponent.

f is the mantissa.

For double precision numbers, e is an eleven bit number

and f is a fifty-two bit number.

Floating Point Arithmetic by Example – p.4/15

Page 9: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Exponent

As e is represented by 11 bits, it can range in value from0 to 211 − 1 = 2047.

Floating Point Arithmetic by Example – p.5/15

Page 10: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Exponent

As e is represented by 11 bits, it can range in value from0 to 211 − 1 = 2047.

Negative exponents are represented by biasing e whenstored.

Floating Point Arithmetic by Example – p.5/15

Page 11: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Exponent

As e is represented by 11 bits, it can range in value from0 to 211 − 1 = 2047.

Negative exponents are represented by biasing e whenstored.

The double precision bias is 210 − 1 = 1023. Thus,−1023 ≤ e ≤ 1024.

Floating Point Arithmetic by Example – p.5/15

Page 12: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Exponent

As e is represented by 11 bits, it can range in value from0 to 211 − 1 = 2047.

Negative exponents are represented by biasing e whenstored.

The double precision bias is 210 − 1 = 1023. Thus,−1023 ≤ e ≤ 1024.

The extreme values e = −1023 (stored as eb = 0) ande = 1024 (stored as eb = 2047) are special, so−1022 ≤ e ≤ 1023 is the valid range of the exponent.

Floating Point Arithmetic by Example – p.5/15

Page 13: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Mantissa

f limits the precision of the floating point number.

Floating Point Arithmetic by Example – p.6/15

Page 14: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Mantissa

f limits the precision of the floating point number.

0 ≤ f < 1

Floating Point Arithmetic by Example – p.6/15

Page 15: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Mantissa

f limits the precision of the floating point number.

0 ≤ f < 1

The format 2e (1 + f) provides an implicitly stored 1, sodoubles actually have 53 bits of precision.

Floating Point Arithmetic by Example – p.6/15

Page 16: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Mantissa

f limits the precision of the floating point number.

0 ≤ f < 1

The format 2e (1 + f) provides an implicitly stored 1, sodoubles actually have 53 bits of precision.

252f is an integer ⇒ gaps between successive doubles.

Floating Point Arithmetic by Example – p.6/15

Page 17: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Floating Point Mantissa

f limits the precision of the floating point number.

0 ≤ f < 1

The format 2e (1 + f) provides an implicitly stored 1, sodoubles actually have 53 bits of precision.

252f is an integer ⇒ gaps between successive doubles.

For example, all integers up to 253 are exactly representable

as floating point numbers, but 253 + 1 is not.

Floating Point Arithmetic by Example – p.6/15

Page 18: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Examples

The number 1 is represented as

(−1)0 20 (1 + 0).

That is, s = 0, e = 0, f = 0. Adding the bias (1023), thebiased value of e is eb = 1023.

Floating Point Arithmetic by Example – p.7/15

Page 19: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Examples

The number 1 is represented as

(−1)0 20 (1 + 0).

That is, s = 0, e = 0, f = 0. Adding the bias (1023), thebiased value of e is eb = 1023.

You can use format hex in MATLAB to see the bit pattern

of the floating point number in hexadecimal. The first three

hex digits (12 bits) represent the sign bit and the biased ex-

ponent, and the remaining 13 hex digits (52 bits) represent

the mantissa.

Floating Point Arithmetic by Example – p.7/15

Page 20: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Examples

In the case of the number 1, s = 0 and eb = 01111111111,so the first three hex digits are 001111111111 = 3ff so, 1is represented by

3ff0000000000000

Floating Point Arithmetic by Example – p.8/15

Page 21: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Examples

In the case of the number 1, s = 0 and eb = 01111111111,so the first three hex digits are 001111111111 = 3ff so, 1is represented by

3ff0000000000000

For 4

3, f = 0.01010101 · · · 0101, or 55 · · · 5 in hex. As with

1, 4

3has e = 1, and so it has representation

3ff5555555555555

which is just slightly smaller than 4

3.

Floating Point Arithmetic by Example – p.8/15

Page 22: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Examples

In the case of the number 1, s = 0 and eb = 01111111111,so the first three hex digits are 001111111111 = 3ff so, 1is represented by

3ff0000000000000

For 4

3, f = 0.01010101 · · · 0101, or 55 · · · 5 in hex. As with

1, 4

3has e = 1, and so it has representation

3ff5555555555555

which is just slightly smaller than 4

3.

The real number 0.1 has e = −4, andf = 0.10011001 · · · 10011010, and thus has representation

3fb999999999999a

which is just slightly larger than 0.1.

Floating Point Arithmetic by Example – p.8/15

Page 23: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Round-off

Since fl(

4

3

)

6= 4

3(where fl(x) stands for “the floating point

representation of x”), we see the behavior

1 − 3 ∗ (4/3 − 1) 6= 0.

All of the operations except the division are performedwithout error, and the special value

ǫ = 2−52

is the result.

ǫ is referred to as machine epsilon, or the unit-roundoff, and it

is the distance between 1 and the next closest floating point

number.Floating Point Arithmetic by Example – p.9/15

Page 24: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example

A very common example of propogation of round-off comesin the form of

0.1 + 0.1 + 0.1

Specifically, is the above expression equal to 0.3?

Floating Point Arithmetic by Example – p.10/15

Page 25: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Example

A very common example of propogation of round-off comesin the form of

0.1 + 0.1 + 0.1

Specifically, is the above expression equal to 0.3?

No! As a matter of fact, MATLAB will tell you that 0.3 isrepresented by

3fd3333333333333while 0.1 + 0.1 + 0.1 is represented by

3fd3333333333334

The difference in the last place is due to accumulation of the

difference between 0.1 and fl(0.1).

Floating Point Arithmetic by Example – p.10/15

Page 26: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Deadly Consequences

Numerical Disasters1991: Patriot Missile misses Scud!

1996: Ariane Rocket explodes!

Floating Point Arithmetic by Example – p.11/15

Page 27: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Swamping

Due to finiteness of precision, floating point addition cansuffer swamping. Suppose we have two floating pointnumbers a = 105 and b = 10−12. The quantity c = a + b isequal to a, since a and b differ by many orders ofmagnitude.

Floating Point Arithmetic by Example – p.12/15

Page 28: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Swamping

Due to finiteness of precision, floating point addition cansuffer swamping. Suppose we have two floating pointnumbers a = 105 and b = 10−12. The quantity c = a + b isequal to a, since a and b differ by many orders ofmagnitude.

To rectify the effects of swamping, one may compute inincreasing order of magnitude. For example, try these inMATLAB:

eps/2 + 1 − eps/2 eps/2 − eps/2 + 1

Floating Point Arithmetic by Example – p.12/15

Page 29: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Swamping

Due to finiteness of precision, floating point addition cansuffer swamping. Suppose we have two floating pointnumbers a = 105 and b = 10−12. The quantity c = a + b isequal to a, since a and b differ by many orders ofmagnitude.

To rectify the effects of swamping, one may compute inincreasing order of magnitude. For example, try these inMATLAB:

eps/2 + 1 − eps/2 eps/2 − eps/2 + 1

Note: It is frequently infeasible to do this!

Floating Point Arithmetic by Example – p.12/15

Page 30: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Cancellation

A phenomenon not dissimilar from swamping iscancellation, which occurs when a number is subtractedfrom another number of rougly the same magnitude.

For example, for values of x very near 0, the expression√

x + 1 − 1

suffers cancellation, as 1 swamps x in the computation of

x + 1, and the subsuquent subtraction results in 0.

Floating Point Arithmetic by Example – p.13/15

Page 31: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Cancellation

To get around the effects of cancellation, one may rewritetheir computation in an equivalent form that avoids thecancellation altogether. For example, computing with

√x + 1 − 1 =

x√x + 1 + 1

avoids the cancellation for values of x near zero. Now, theonly value of x that results in a zero output is 0 itself.

Floating Point Arithmetic by Example – p.14/15

Page 32: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Cancellation

To get around the effects of cancellation, one may rewritetheir computation in an equivalent form that avoids thecancellation altogether. For example, computing with

√x + 1 − 1 =

x√x + 1 + 1

avoids the cancellation for values of x near zero. Now, theonly value of x that results in a zero output is 0 itself.

Note: Not all cancellation can be avoided, and not all can-

cellation is bad!

Floating Point Arithmetic by Example – p.14/15

Page 33: An Introduction to Floating Point Arithmetic by …bellen/Analisi Numerica/laboratorio...Negative exponents are represented by biasing e when stored. Floating Point Arithmetic by Example

Resources

What Every Computer Scientist Should Know AboutFloating-Point Arithmetic by David Goldberg. Availablehere.

Numerical Analysis, 8th ed. by Richard L. Burden andJ. Douglas Faires.

Numerical Computing with MATLAB by Cleve Moler.

Accuracy and Stability of Numerical Algorithms byNicholas J. Higham.

Technical Note regarding Floating Point Arithmetic.Available here

Floating Point Arithmetic by Example – p.15/15


Recommended