Test Driven Development of Scientific Models · 2012-06-07 · Test Driven Development of Scienti c...

Post on 21-Jun-2020

2 views 0 download

transcript

Test Driven Development of Scientific Models

Tom Clune

Software Systems Support OfficeEarth Science Division

NASA Goddard Space Flight Center

June 5, 2012

Tom Clune (SSSO) TDD June 5, 2012 1 / 38

Outline

1 Motivations

2 Testing

3 Testing Frameworks

4 Test-Driven Development

5 What about scientific/technical software?

Tom Clune (SSSO) TDD - Motivations June 5, 2012 2 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

The development cycle and productivity

Extend

Fix

Port

Compiles?

Executes?

Looks ok?

Correct?

Conventional software verification for modeling is slow.

Tom Clune (SSSO) TDD - Motivations June 5, 2012 3 / 38

Some observations

Risk grows with magnitude of implementation step

Magnitude of implementation step grows with cost ofverification/validation

Conclusion:Optimize productivity by reducing cost of verification!

Tom Clune (SSSO) TDD - Motivations June 5, 2012 4 / 38

Some observations

Risk grows with magnitude of implementation step

Magnitude of implementation step grows with cost ofverification/validation

Conclusion:Optimize productivity by reducing cost of verification!

Tom Clune (SSSO) TDD - Motivations June 5, 2012 4 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept pace

I Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillion

I Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept pace

I Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisive

I Scientific integrity is crucial

Software management and testing have not kept pace

I Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept pace

I Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept pace

I Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept paceI Strong validation against data, but ...

I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept paceI Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systems

I Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept paceI Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept paceI Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimes

F Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Trusting the Science

Climate modeling has grown to be of extreme socioeconomicimportance:

I Adaptation/mitigation strategies easily exceed $100 trillionI Implications are politically sensitive/divisiveI Scientific integrity is crucial

Software management and testing have not kept paceI Strong validation against data, but ...I Validation is a blunt tool for isolating issues in coupled systemsI Validation cannot detect certain types of software defects:

F Those that are only exercised in rare/future regimesF Those which change results below detection threshold

Tom Clune (SSSO) TDD - Motivations June 5, 2012 5 / 38

Outline

1 Motivations

2 Testing

3 Testing Frameworks

4 Test-Driven Development

5 What about scientific/technical software?

Tom Clune (SSSO) TDD - Testing June 5, 2012 6 / 38

Testing

Tom Clune (SSSO) TDD - Testing June 5, 2012 7 / 38

Test Harness - work in safety

Collection of tests that constrain system

Detects unintended changes

Localizes defects

Improves developer confidence

Decreases risk from change

Tom Clune (SSSO) TDD - Testing June 5, 2012 8 / 38

Test Harness - work in safety

Collection of tests that constrain system

Detects unintended changes

Localizes defects

Improves developer confidence

Decreases risk from change

Tom Clune (SSSO) TDD - Testing June 5, 2012 8 / 38

Test Harness - work in safety

Collection of tests that constrain system

Detects unintended changes

Localizes defects

Improves developer confidence

Decreases risk from change

Tom Clune (SSSO) TDD - Testing June 5, 2012 8 / 38

Test Harness - work in safety

Collection of tests that constrain system

Detects unintended changes

Localizes defects

Improves developer confidence

Decreases risk from change

Tom Clune (SSSO) TDD - Testing June 5, 2012 8 / 38

Test Harness - work in safety

Collection of tests that constrain system

Detects unintended changes

Localizes defects

Improves developer confidence

Decreases risk from change

Tom Clune (SSSO) TDD - Testing June 5, 2012 8 / 38

Do you write legacy code?

“The main thing that distinguishes legacy code from non-legacy code istests, or rather a lack of tests.”

Michael FeathersWorking Effectively with Legacy Code

Lack of tests leads to fear of introducingsubtle bugs and/or changing thingsinadvertently.

Programming on a tightrope

This is also a barrier to involving pure

software engineers in the development of

our models.

Tom Clune (SSSO) TDD - Testing June 5, 2012 9 / 38

Do you write legacy code?

“The main thing that distinguishes legacy code from non-legacy code istests, or rather a lack of tests.”

Michael FeathersWorking Effectively with Legacy Code

Lack of tests leads to fear of introducingsubtle bugs and/or changing thingsinadvertently.

Programming on a tightrope

This is also a barrier to involving pure

software engineers in the development of

our models.

Tom Clune (SSSO) TDD - Testing June 5, 2012 9 / 38

Do you write legacy code?

“The main thing that distinguishes legacy code from non-legacy code istests, or rather a lack of tests.”

Michael FeathersWorking Effectively with Legacy Code

Lack of tests leads to fear of introducingsubtle bugs and/or changing thingsinadvertently.

Programming on a tightrope

This is also a barrier to involving pure

software engineers in the development of

our models.

Tom Clune (SSSO) TDD - Testing June 5, 2012 9 / 38

Do you write legacy code?

“The main thing that distinguishes legacy code from non-legacy code istests, or rather a lack of tests.”

Michael FeathersWorking Effectively with Legacy Code

Lack of tests leads to fear of introducingsubtle bugs and/or changing thingsinadvertently.

Programming on a tightrope

This is also a barrier to involving pure

software engineers in the development of

our models.

Tom Clune (SSSO) TDD - Testing June 5, 2012 9 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Excuses, excuses ...

Takes too much time to write tests

Too difficult to maintain tests

It takes too long to run the tests

It is not my job

“Correct” behavior is unknown

http://java.dzone.com/articles/unit-test-excuses

- James Sugrue

Numeric/scientific code cannot be tested, because ...

Tom Clune (SSSO) TDD - Testing June 5, 2012 10 / 38

Just what is a test anyway?

Tests can exist in many forms

Conditional termination:IF (PA( I , J)+PTOP.GT. 1 2 0 0 . ) &

c a l l s t o p m o d e l ( ’ADVECM: P r e s s u r e d i a g n o s t i c e r r o r ’ , 1 1 )

Diagnostic print statementp r i n t ∗ , ’ l o s s o f mass = ’ , d e l t a M a s s

Visualization of outputTemp1

5 10 15

10

20

30

40

50

60

Temp2

5 10 15

10

20

30

40

50

60

Difference

5 10 15

10

20

30

40

50

60

Student Version of MATLAB

Tom Clune (SSSO) TDD - Testing June 5, 2012 12 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ RequirementsConstraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a featureDesign experiment −→ Write a testRun experiment −→ Run testsRefine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ Requirements

Constraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a featureDesign experiment −→ Write a testRun experiment −→ Run testsRefine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ RequirementsConstraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a featureDesign experiment −→ Write a testRun experiment −→ Run testsRefine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ RequirementsConstraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a feature

Design experiment −→ Write a testRun experiment −→ Run testsRefine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ RequirementsConstraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a featureDesign experiment −→ Write a test

Run experiment −→ Run testsRefine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ RequirementsConstraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a featureDesign experiment −→ Write a testRun experiment −→ Run tests

Refine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Analogy with Scientific Method?

Scientists ought to like TDD:

Objective reality −→ RequirementsConstraints: theory and data −→ Constraints: existing tests

Formulate hypothesis −→ Select a featureDesign experiment −→ Write a testRun experiment −→ Run testsRefine hypothesis −→ Refine implementation

http://agile2003.agilealliance.org/files/P6Paper.pdf

Tom Clune (SSSO) TDD - Testing June 5, 2012 13 / 38

Properties of good tests

Isolating

I Test failure indicates location in source code

Orthogonal

I Each defect results in failure of small number of tests

Complete

I Each bit of functionality covered by at least one test

Independent

I No side effectsI Test order does not matterI Corollary: cannot terminate execution

Frugal

I Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

Orthogonal

I Each defect results in failure of small number of tests

Complete

I Each bit of functionality covered by at least one test

Independent

I No side effectsI Test order does not matterI Corollary: cannot terminate execution

Frugal

I Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

OrthogonalI Each defect results in failure of small number of tests

Complete

I Each bit of functionality covered by at least one test

Independent

I No side effectsI Test order does not matterI Corollary: cannot terminate execution

Frugal

I Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

OrthogonalI Each defect results in failure of small number of tests

CompleteI Each bit of functionality covered by at least one test

Independent

I No side effectsI Test order does not matterI Corollary: cannot terminate execution

Frugal

I Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

OrthogonalI Each defect results in failure of small number of tests

CompleteI Each bit of functionality covered by at least one test

IndependentI No side effectsI Test order does not matterI Corollary: cannot terminate execution

Frugal

I Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

OrthogonalI Each defect results in failure of small number of tests

CompleteI Each bit of functionality covered by at least one test

IndependentI No side effectsI Test order does not matterI Corollary: cannot terminate execution

FrugalI Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

OrthogonalI Each defect results in failure of small number of tests

CompleteI Each bit of functionality covered by at least one test

IndependentI No side effectsI Test order does not matterI Corollary: cannot terminate execution

FrugalI Run quicklyI Small memory, etc.

Automated and repeatable

Clear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Properties of good tests

IsolatingI Test failure indicates location in source code

OrthogonalI Each defect results in failure of small number of tests

CompleteI Each bit of functionality covered by at least one test

IndependentI No side effectsI Test order does not matterI Corollary: cannot terminate execution

FrugalI Run quicklyI Small memory, etc.

Automated and repeatableClear intent

Tom Clune (SSSO) TDD - Testing June 5, 2012 14 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)

call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)

call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Anatomy of a Software Test Procedure

testTrajectory() ! s = 12 at2

a = 2.; t = 3.

s = trajectory(a, t)

call assertEqual (9., s)

call assertEqual (9., trajectory (2.,3.))

! no op

Tom Clune (SSSO) TDD - Testing June 5, 2012 16 / 38

Outline

1 Motivations

2 Testing

3 Testing Frameworks

4 Test-Driven Development

5 What about scientific/technical software?

Tom Clune (SSSO) TDD - Testing Frameworks June 5, 2012 17 / 38

Testing Frameworks

Provide infrastructure to radically simplify:I Creating test routines (Test cases)I Running collections of tests (Test suites)I Summarizing results

Key feature is collection of assert methodsI Used to express expected results

c a l l a s s e r t E q u a l (1 20 , f a c t o r i a l ( 5 ) )

Generally specific to programming language (xUnit)I Java (JUnit)I Pnython (pyUnit)I C++ (cxxUnit, cppUnit)I Fortran (FRUIT, FUNIT, pFUnit)

Tom Clune (SSSO) TDD - Testing Frameworks June 5, 2012 19 / 38

GUI - JUnit in Eclipse

Tom Clune (SSSO) TDD - Testing Frameworks June 5, 2012 20 / 38

Outline

1 Motivations

2 Testing

3 Testing Frameworks

4 Test-Driven Development

5 What about scientific/technical software?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 21 / 38

(Somewhat) New Paradigm: TDD

Old paradigm:

Tests written by separate team (black box testing)

Tests written after implementation

Consequences:

Testing schedule compressed for release

Defects detected late in development ($$)

New paradigm

Developers write the tests (white box testing)

Tests written before production code

Enabled by emergence of strong unit testing frameworks

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 22 / 38

(Somewhat) New Paradigm: TDD

Old paradigm:

Tests written by separate team (black box testing)

Tests written after implementation

Consequences:

Testing schedule compressed for release

Defects detected late in development ($$)

New paradigm

Developers write the tests (white box testing)

Tests written before production code

Enabled by emergence of strong unit testing frameworks

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 22 / 38

(Somewhat) New Paradigm: TDD

Old paradigm:

Tests written by separate team (black box testing)

Tests written after implementation

Consequences:

Testing schedule compressed for release

Defects detected late in development ($$)

New paradigm

Developers write the tests (white box testing)

Tests written before production code

Enabled by emergence of strong unit testing frameworks

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 22 / 38

The TDD cycle

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 23 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Benefits of TDD

High reliability

Excellent test coverage

Always “ready-to-ship”

Tests act as maintainable documentationI Test shows real use case scenarioI Test is maintained through TDD process

Less time spent debugging

Reduced stress / improved confidence

Productivity

Predictable schedule

Porting

Quality implementation?

Tom Clune (SSSO) TDD - Test-Driven Development June 5, 2012 24 / 38

Outline

1 Motivations

2 Testing

3 Testing Frameworks

4 Test-Driven Development

5 What about scientific/technical software?

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 25 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logic

I Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Unique challenges of numerical software

Difficult to estimate errorI RoundoffI Truncation

Insufficient analytic cases

Irreducible complexityI Test would require the same redundant logicI Appeals to vanity?

Stability/NonlinearityI Problems that occur only after long integrationsI More generally - emergent properties of coupled systems

General mitigation strategy:

Fine-grained implementation (each routine does just one thing)

Test layers in isolation

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 26 / 38

Numerical Tolerance

For testing numerical results, a good estimate for the tolerance isnecessary:

If too low, then test fails for uninteresnting reasons.

If too high, then the test has no teeth.

Unfortunately ...

Error estimates are seldom available for complex algorithms

Best case - usually asymtotic form with unknown leading coefficient!

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 27 / 38

Numerical Tolerance

For testing numerical results, a good estimate for the tolerance isnecessary:

If too low, then test fails for uninteresnting reasons.

If too high, then the test has no teeth.

Unfortunately ...

Error estimates are seldom available for complex algorithms

Best case - usually asymtotic form with unknown leading coefficient!

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 27 / 38

Numerical Tolerance

For testing numerical results, a good estimate for the tolerance isnecessary:

If too low, then test fails for uninteresnting reasons.

If too high, then the test has no teeth.

Unfortunately ...

Error estimates are seldom available for complex algorithms

Best case - usually asymtotic form with unknown leading coefficient!

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 27 / 38

Numerical Tolerance

For testing numerical results, a good estimate for the tolerance isnecessary:

If too low, then test fails for uninteresnting reasons.

If too high, then the test has no teeth.

Unfortunately ...

Error estimates are seldom available for complex algorithms

Best case - usually asymtotic form with unknown leading coefficient!

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 27 / 38

Numerical Tolerance

For testing numerical results, a good estimate for the tolerance isnecessary:

If too low, then test fails for uninteresnting reasons.

If too high, then the test has no teeth.

Unfortunately ...

Error estimates are seldom available for complex algorithms

Best case - usually asymtotic form with unknown leading coefficient!

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 27 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff

1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)

2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators

3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

Mitigation

I Tailored synthetic inputs:eliminate/minimize roundoff from nonlinearity

I Test layers in isolation:circumvent growth from composition

I Put iteration logic in separate layer:circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearity

I Test layers in isolation:circumvent growth from composition

I Put iteration logic in separate layer:circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from composition

I Put iteration logic in separate layer:circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Numerical tolerance (cont’d)

Sources of roundoff1 Ordinary arithmetic - machine epsilon (not a concern)2 Nonlinearity - esp. small denominators3 Composition and iteration

MitigationI Tailored synthetic inputs:

eliminate/minimize roundoff from nonlinearityI Test layers in isolation:

circumvent growth from compositionI Put iteration logic in separate layer:

circumvent growth from iteration

Conclusion: Decomposition and synthetic inputs yield testingtolerances that are of the same order as machine epsilon.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 28 / 38

Test layers in isolation

Example: Procedure that does too much

. . .a = <complex e x p r e s s i o n >b = <complex e x p r e s s i o n >c = <complex e x p r e s s i o n >r e t u r n a + s q r t ( b/ c )

Same capability, but split into two decoupled levels

. . .a = f 1 ( . . . )b = f 2 ( . . . )c = f 3 ( . . . )r e t u r n g ( a , b , c )

Higher level test ensures proper coupling, but not fully expandedarithmetic.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 30 / 38

Test layers in isolation

Example: Procedure that does too much

. . .a = <complex e x p r e s s i o n >b = <complex e x p r e s s i o n >c = <complex e x p r e s s i o n >r e t u r n a + s q r t ( b/ c )

Same capability, but split into two decoupled levels

. . .a = f 1 ( . . . )b = f 2 ( . . . )c = f 3 ( . . . )r e t u r n g ( a , b , c )

Higher level test ensures proper coupling, but not fully expandedarithmetic.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 30 / 38

Test layers in isolation

Example: Procedure that does too much

. . .a = <complex e x p r e s s i o n >b = <complex e x p r e s s i o n >c = <complex e x p r e s s i o n >r e t u r n a + s q r t ( b/ c )

Same capability, but split into two decoupled levels

. . .a = f 1 ( . . . )b = f 2 ( . . . )c = f 3 ( . . . )r e t u r n g ( a , b , c )

Higher level test ensures proper coupling, but not fully expandedarithmetic.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 30 / 38

Test layers in isolation (cont’d)

Consider the main loop of a climate model:

Do test

Proper # of iterations

Pieces called in correct order

Passing of data betweencomponents

Do NOT test

Calculations inside components

Much easier to do in practice with objects than with procedures.

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 31 / 38

TDD and lack of analytic results

Complex algorithms often yield few if any analytic solutions

And yet we attempt software implementations. How can this be?

Difficulty generally arises from composition and iteration

Mitigation:I Test algorithmic steps in isolationI Tailor synthetic inputs to yield “obvious” results for each stepI Use integration tests to verify that steps are composed correctly

But still use high level analytic solutions as tests whenever possible

Consider Newton’s three-body problem - no analytic solution

Test generation of pairwise forces

Test time integration (e.g., RK4)

Use special cases that have solutions as additional tests

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 33 / 38

TDD and lack of analytic results

Complex algorithms often yield few if any analytic solutions

And yet we attempt software implementations. How can this be?

Difficulty generally arises from composition and iteration

Mitigation:I Test algorithmic steps in isolationI Tailor synthetic inputs to yield “obvious” results for each stepI Use integration tests to verify that steps are composed correctly

But still use high level analytic solutions as tests whenever possible

Consider Newton’s three-body problem - no analytic solution

Test generation of pairwise forces

Test time integration (e.g., RK4)

Use special cases that have solutions as additional tests

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 33 / 38

TDD and lack of analytic results

Complex algorithms often yield few if any analytic solutions

And yet we attempt software implementations. How can this be?

Difficulty generally arises from composition and iteration

Mitigation:I Test algorithmic steps in isolationI Tailor synthetic inputs to yield “obvious” results for each stepI Use integration tests to verify that steps are composed correctly

But still use high level analytic solutions as tests whenever possible

Consider Newton’s three-body problem - no analytic solution

Test generation of pairwise forces

Test time integration (e.g., RK4)

Use special cases that have solutions as additional tests

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 33 / 38

TDD and lack of analytic results

Complex algorithms often yield few if any analytic solutions

And yet we attempt software implementations. How can this be?

Difficulty generally arises from composition and iteration

Mitigation:I Test algorithmic steps in isolationI Tailor synthetic inputs to yield “obvious” results for each stepI Use integration tests to verify that steps are composed correctly

But still use high level analytic solutions as tests whenever possible

Consider Newton’s three-body problem - no analytic solution

Test generation of pairwise forces

Test time integration (e.g., RK4)

Use special cases that have solutions as additional tests

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 33 / 38

TDD and lack of analytic results

Complex algorithms often yield few if any analytic solutions

And yet we attempt software implementations. How can this be?

Difficulty generally arises from composition and iteration

Mitigation:I Test algorithmic steps in isolationI Tailor synthetic inputs to yield “obvious” results for each stepI Use integration tests to verify that steps are composed correctly

But still use high level analytic solutions as tests whenever possible

Consider Newton’s three-body problem - no analytic solution

Test generation of pairwise forces

Test time integration (e.g., RK4)

Use special cases that have solutions as additional tests

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 33 / 38

TDD and lack of analytic results

Complex algorithms often yield few if any analytic solutions

And yet we attempt software implementations. How can this be?

Difficulty generally arises from composition and iteration

Mitigation:I Test algorithmic steps in isolationI Tailor synthetic inputs to yield “obvious” results for each stepI Use integration tests to verify that steps are composed correctly

But still use high level analytic solutions as tests whenever possible

Consider Newton’s three-body problem - no analytic solution

Test generation of pairwise forces

Test time integration (e.g., RK4)

Use special cases that have solutions as additional tests

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 33 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...I Unit tests use tailored inputs - implementation handles generic caseI Model layers are tested in isolationI Tests are decoupled - low complexityI Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...I Unit tests use tailored inputs - implementation handles generic caseI Model layers are tested in isolationI Tests are decoupled - low complexityI Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...

I Unit tests use tailored inputs - implementation handles generic caseI Model layers are tested in isolationI Tests are decoupled - low complexityI Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...I Unit tests use tailored inputs - implementation handles generic case

I Model layers are tested in isolationI Tests are decoupled - low complexityI Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...I Unit tests use tailored inputs - implementation handles generic caseI Model layers are tested in isolation

I Tests are decoupled - low complexityI Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...I Unit tests use tailored inputs - implementation handles generic caseI Model layers are tested in isolationI Tests are decoupled - low complexity

I Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Irreducible complexity

“Aren’t my tests as complex as the implementation?”“Aren’t my tests doing redundant calculations (tautological)?”

Short answer: No

Long answer: Well, they shouldn’t be ...I Unit tests use tailored inputs - implementation handles generic caseI Model layers are tested in isolationI Tests are decoupled - low complexityI Actual model couples layers - huge complexity

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 34 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:1 Individual steps have defects - add tests2 Integration has a defect - add tests3 Component steps lack necessary accuracy - need tests and improved

algorithm4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:

1 Individual steps have defects - add tests2 Integration has a defect - add tests3 Component steps lack necessary accuracy - need tests and improved

algorithm4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:1 Individual steps have defects - add tests

2 Integration has a defect - add tests3 Component steps lack necessary accuracy - need tests and improved

algorithm4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:1 Individual steps have defects - add tests2 Integration has a defect - add tests

3 Component steps lack necessary accuracy - need tests and improvedalgorithm

4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:1 Individual steps have defects - add tests2 Integration has a defect - add tests3 Component steps lack necessary accuracy - need tests and improved

algorithm

4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:1 Individual steps have defects - add tests2 Integration has a defect - add tests3 Component steps lack necessary accuracy - need tests and improved

algorithm4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

Long integration and emergent properties

TDD generally does not directly address such issues

If long integration gets incorrect results, one of the following holds:1 Individual steps have defects - add tests2 Integration has a defect - add tests3 Component steps lack necessary accuracy - need tests and improved

algorithm4 Insufficient physical fidelity - genuine science challenge

At the very least, TDD can reduce the frequency at which longintegrations are needed/performed

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 35 / 38

TDD and performance

TDD emphasizes small fine-grained implementations

Such implementations are often sub-optimal in terms of performance

Optimized implementations typically fuse multiple operations

Solution: bootstrappingI Use initial TDD solution as unit test for optimized implementationI Maintain both implementations

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 36 / 38

TDD and performance

TDD emphasizes small fine-grained implementations

Such implementations are often sub-optimal in terms of performance

Optimized implementations typically fuse multiple operations

Solution: bootstrappingI Use initial TDD solution as unit test for optimized implementationI Maintain both implementations

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 36 / 38

TDD and the legacy burden

TDD was created for developing new code, and does not directlyspeak to maintaining legacy code.

Adding new functionalityI Avoid wedging new loging directly into existing large procedureI Use TDD to develop separate facility for new computationI Just call the new procedure from the large legacy procedure

RefactoringI Use unit tests to constrain existing behaviorI Very difficult for large proceduresI Try to find small pieces to pull out into new procedures

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 37 / 38

References

pFUnit: http://sourceforge.net/projects/pfunit/

Tutorial materialsI https://modelingguru.nasa.gov/docs/DOC-1982I https://modelingguru.nasa.gov/docs/DOC-1983I https://modelingguru.nasa.gov/docs/DOC-1984

TDD Bloghttps://modelingguru.nasa.gov/blogs/modelingwithtdd

Test-Driven Development: By Example - Kent Beck

Mller and Padberg,”About the Return on Investment of Test-DrivenDevelopment,” http://www.ipd.uka.de/mitarbeiter/muellerm/

publications/edser03.pdf

Refactoring: Improving the Design of Existing Code - Martin Fowler

JUnit http://junit.sourceforge.net/

Tom Clune (SSSO) TDD - What about scientific/technical software? June 5, 2012 38 / 38