Two Best Practices for Scientific ComputingVersion Control Systems & Automated Code Testing
David Love
Software Interest GroupUniversity of Arizona
February 18, 2013
How This Talk Happened
Applied alumnus, Carlos Chiquete, posted this paper onFacebooka
All were great, but I’d never encountered many
aJustifying every second ever wasted on Facebook
Best Practices for Scientific Computing (arXiv)
David Love VCS & Testing February 18, 2013 1 / 40
Software Carpentry
Lead author Greg Wilson founded a group called SoftwareCarpentry
They have many videos documenting best practices for scientificcomputing
A 2-day boot camp ($20) will be held April 4-5, 2013 teachingmany of these techniques.
David Love VCS & Testing February 18, 2013 2 / 40
1 Version Control System: GitBasicsBranchingRemote Repositories
2 Unit (and other) TestingAssertionsUnit Testing with xUnit
1 Version Control System: GitBasicsBranchingRemote Repositories
2 Unit (and other) Testing
Git
What is a Version Control System?
Version Control Systems are pieces of software designed to:
Maintain a complete history of the state of a project
Works especially well with program code, LATEX files—anything youcan read in a text editorOther file types aren’t stored as efficiently
Allow for different versions (branches) to exist concurrently andindependently
Provides tools to integrate changes from different branchestogether
Allow for much simpler collaboration with others
David Love VCS & Testing February 18, 2013 3 / 40
Git
How It Works
Version Control Systems maintain a database of documentversions, called a repository
Users “check out” files from the repository, change them, then“commit” those changes to the repository
The VCS checks whether two editors (or branches) have edited thesame lines, notes the conflict, and makes you resolve it
Greatly reduces the chance that editors will overwrite each otheraccidentallyChanges will not get lostRepository determines the latest version
David Love VCS & Testing February 18, 2013 4 / 40
Git
Best Use of Version Control
Best Practices for Scientific Computing
“In practice, everything that has been created manually should be putversion control, including programs, original field observations, and thesource files for papers. Automated output and intermediate files can beregenerated at need. Binary files (e.g., images and audio clips) may bestored in version control, but it is often more sensible to use anarchiving system for them, and store the metadata describing theircontents in version control instead.”
David Love VCS & Testing February 18, 2013 5 / 40
Git
Types of VCSs
There are two basic types of VCSs:
Centralized Maintains the repository on a centralized server. Clientsonly check out specific versions of files.
CVS (Concurrent Versions System)SVN (Subversion)
(Software Carpentry teaches SVN)
Decentralized Keeps a copy of the entire repository on every system.Any client could (potentially) act as a server.
Git
(I will demonstrate Git)
Mercurial
David Love VCS & Testing February 18, 2013 6 / 40
Git
Types of VCSs
There are two basic types of VCSs:
Centralized Maintains the repository on a centralized server. Clientsonly check out specific versions of files.
CVS (Concurrent Versions System)SVN (Subversion) (Software Carpentry teaches SVN)
Decentralized Keeps a copy of the entire repository on every system.Any client could (potentially) act as a server.
Git
(I will demonstrate Git)
Mercurial
David Love VCS & Testing February 18, 2013 6 / 40
Git
Types of VCSs
There are two basic types of VCSs:
Centralized Maintains the repository on a centralized server. Clientsonly check out specific versions of files.
CVS (Concurrent Versions System)SVN (Subversion) (Software Carpentry teaches SVN)
Decentralized Keeps a copy of the entire repository on every system.Any client could (potentially) act as a server.
Git (I will demonstrate Git)Mercurial
David Love VCS & Testing February 18, 2013 6 / 40
Git
Why Git?
A very popular distributed VCS
Does not require setting up a separate location to store thedatabase
This makes being a single user easier
Supported on most popular code hosting services
Google Code, SorceForgegithub12, Bitbucket
git svn uses Git locally but works with a Subversion server
Free & Open Source
Why Git is Better than X
1Free student account2github:windows—github:mac—github:mobile
David Love VCS & Testing February 18, 2013 7 / 40
Git
Git Resources
1 Pro Git (used for this talk)
2 Version Control By Example
3 Top 10 Git Tutorials for Beginners
4 O’Reilly Webcast: Git in One Hour
5 Git+LATEX Workflow The highest rated answer to this stackoverflow question is very good.
David Love VCS & Testing February 18, 2013 8 / 40
Git Basics
Basic Configuration
Git stores your name and email and attached them to yourcontributions
1 git config --global user.name "David Love"2 git config --global user.email [email protected]
Name your favorite editor3 git config --global core.editor vim
Select a diff & merge tool4 git config --global merge.tool meld
The --global tag stores the information in your home directory, andapply to all git repositories. The configuration will be stored in thelocal git repository otherwise.
David Love VCS & Testing February 18, 2013 9 / 40
Git Basics
Merge Tools
Open Source
1 Diffuse
2 Emerge (emacs)
3 gvimdiff (gvim)
4 KDiff3
5 Meld
6 tkdiff
7 TortoiseMerge
8 xxdiff
Free CommercialSoftware
1 opendiff (OS Xdeveloper tool)
2 P4Merge
Pay Software
1 Araxis Merge
2 Beyond Compare
3 ECMerge
GitHub for Windows & Mac provide their own merge tool
David Love VCS & Testing February 18, 2013 10 / 40
Git Basics
Creating a Git Repository
To create a new repository:
1 Move to the directory with your files
2 git init
To clone an existing repository:
Use git command clone
Format: git clone <url> [<directory>]
Urls can use protocols git, http(s), ssh:
git clone git://github.com/schacon/grit.git
git clone http://github.com/schacon/grit.git
git clone
ssh://[email protected]:31415/$HOME/test.git
David Love VCS & Testing February 18, 2013 11 / 40
Git Basics
The File Status Lifecycle in Git
Pro Git Image 2-1
Command git status lists
Untracked files
Modified but unstaged files
Staged but uncommitted
Moving within the lifecycle:
Stage files with git add
<file>
Commit with git commit
David Love VCS & Testing February 18, 2013 12 / 40
Git Basics
Committing Changes to the Repository
When you commit changes to the repository, Git asks for acommit message
Git opens your favorite editor, and gives a (commented out)default message
Now, type a short message describing what you changed duringthis commit
Structuring Commits
Best practice: structure your editing so each commit is a logicallyseparate idea
David Love VCS & Testing February 18, 2013 13 / 40
Git Basics
Commit Information
Once committed, Git gives a message like
[master b05ca11] Commit message
1 file changed, 3 insertions(+), 2 deletions(-)
master Branch name
b05ca11 SHA-1 hash key(abbrev)
Commit message Your commitmessage
1 file changed Number of fileschanged
3 insertions(+) Number of linesinserted
2 deletions(-) Number of linesdeleted
Git stores commits by a 40 digit SHA-1 hash key
Git tracks lines of code. Editing a line = 1 insertion & 1 deletion
David Love VCS & Testing February 18, 2013 14 / 40
Git Basics
Committing all changes
git commit without add
git commit -a allows for skipping git add by committing allmodified files.
David Love VCS & Testing February 18, 2013 15 / 40
Git Basics
Viewing Changes
git diff Prints the differences between modified file and the mostrecent committed version
git difftool Uses the merge tool to highlight the differences
--cached Modifies either command to show differencesbetween staged file and most recentcommitted version
David Love VCS & Testing February 18, 2013 16 / 40
Git Basics
Viewing the Commit History
Commit Log
git log shows the commit history in reverse chronological order.Default information
Commit hash
Author
Date & time committed
Commit message
Options:
-<number> Latest <number> entries, e.g., git log -4
--pretty=oneline Abbreviates to one line of output
--since= Look at commits since some time, e.g., yesterday,1.week, "2 months", 2013/02/01, 02/01/2013
--until= Look at commits until some time
David Love VCS & Testing February 18, 2013 17 / 40
Git Basics
Undoing Changes
Changing Your Last Commit
You can modify your previous commit to a new commit with git
commit --amend.
Unstaging a Staged File
A file can be unstaged with git reset HEAD <file>
Unmodifying a Modified File
You can delete modifications to a file with git checkout -- <file>
git status lists the latter two commands when appropriate.
David Love VCS & Testing February 18, 2013 18 / 40
Git Branching
What is a Branch?
In Git and other VCSs, a branch is an independent copy of theworking directory
Changes in one branch will not affect any other branch
Different branches can be “checked out” of the repository
Branches can be merged to combine their contents
Branches are simpler in Git than in most other VCSs
David Love VCS & Testing February 18, 2013 19 / 40
Git Branching
Basic Branch Commands in Git
The basic branch operations:
List branches git branch
Create branch git branch <branch name>
Check out branch git checkout <branch name>
Merge into current branch git merge <branch name>
To see how branches work, we’ll look at how Git stores data.
David Love VCS & Testing February 18, 2013 20 / 40
Git Branching
Data Storage in Git
Pro Git Image 1-5Git stores data as a series of snapshots. Only when files A, B, or C
change does Git store a new snapshot.
David Love VCS & Testing February 18, 2013 21 / 40
Git Branching
Data Storage in Git
Pro Git Image 3-2Data Git stores about a commit, including the hash, the author,
commit message etc. Horizontal arrows are pointers pointing to theprevious commit.
David Love VCS & Testing February 18, 2013 21 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-3An abbreviated commit history marked by SHA-1 hashes
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-4After git branch testing
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-5The HEAD pointer keeps track of the current branch
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-6After git checkout testing
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-7Made some changes, then committed to the current branch (testing)
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-8After git checkout master
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branching in Git, Conceptually
Pro Git Image 3-9Made further changes, then committed them to master
David Love VCS & Testing February 18, 2013 22 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-10
You want to fix issue #53.
Next: Create a branch for thatpurpose
A small commit history
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-11
git branch iss53
Next: Make a change andcommit it
git branch iss53
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-12
You stumble upon a bug thatneeds to be fixed immediately.
Go back to master so yourpartial work on iss53 doesn’tget integrated too early.
Commands to execute:
git checkout master
git checkout -b hotfix
to create and immediatelycheck out branch hotfix
Make a commit to fix thebug.
Committed change on iss53
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-13
After testing your work, youwant to add the bug fix tomaster
Next: merge hotfix intomaster
git branch hotfix to fix a bug
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-14
To merge hotfix into master:1 git checkout master2 git merge hotfix
Git responds with messagethat includes Fast forward
Meaning: Git simply movedthe master label up history ofcommits
Next: delete branchhotfix—it is no longer needed
Next: Go back to working oniss53
Merging changes into master
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-15
Delete branch hotfix with:
git branch -d hotfix
Make another commit toiss53
Delete branch hotfix
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-15
Want to merge iss53 intomaster
But master can’t just move upthe commit history
Will do a three-way merge
Want to merge again
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-16
git merge iss53
Git analyzes the changesapplied to the commonancestor by master and iss53
If master and iss53 madechanges to the same lines, Gitnotes a conflict that must beresolved manually
The three-way merge
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-16
Git surrounds conflicts withstandard conflict resolutionmarkers:
Code between <<<<<<< and======= is the code fromHEAD (master)Code between ======= and>>>>>>> is the code from themerging branch (iss53)
The three-way merge
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-16
Run git mergetool to useyour merge tool to resolve theconflict
Git creates some files to helpyou merge the conflictssuccessfully:
file.local from the currentbranch (master)
file.base from the commonancestor
file.remote from themerging branch(iss53)
The three-way merge
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Branch Merging, Conceptually
Pro Git Image 3-17
Git creates a merge commitonce the conflicts are resolved(or if no conflicts)
Note: after resolving a conflict,you must then git merge togenerate the merge commit.
Merge commit at end of three-way merge
David Love VCS & Testing February 18, 2013 23 / 40
Git Branching
Comparing Branches
git difftool branch shows differences between the currentbranch and branch using the merge tool
Double Dot Notation
For branches A and B, A..B selects all commits in the history of B sincesplitting from A
git log A..B gives all commit messages in B since splitting from A
Triple Dot Notation
A...B selects commits on both branches since splitting
git log A...B gives all commit messages in either A or B sincethe common ancestor
David Love VCS & Testing February 18, 2013 24 / 40
Git Branching
Comparing Branches
git difftool branch shows differences between the currentbranch and branch using the merge tool
Double Dot Notation
For branches A and B, A..B selects all commits in the history of B sincesplitting from A
git log A..B gives all commit messages in B since splitting from A
Triple Dot Notation
A...B selects commits on both branches since splitting
git log A...B gives all commit messages in either A or B sincethe common ancestor
David Love VCS & Testing February 18, 2013 24 / 40
Git Branching
Comparing Branches
git difftool branch shows differences between the currentbranch and branch using the merge tool
Double Dot Notation
For branches A and B, A..B selects all commits in the history of B sincesplitting from A
git log A..B gives all commit messages in B since splitting from A
Triple Dot Notation
A...B selects commits on both branches since splitting
git log A...B gives all commit messages in either A or B sincethe common ancestor
David Love VCS & Testing February 18, 2013 24 / 40
Git Remote Repositories
Remote Repositories
Git can connect to remote repositories over networks to collaboratewith others
origin repository
When you clone from a remote source, the remote repository isautomatically added to your local repository and named origin
git remote List remote repositories
git remote -v List remote repositories with moreinformation
git remote add Add a new remote repository
git remote rename Rename a remote repository
git remote remove Remove a remote repository
David Love VCS & Testing February 18, 2013 25 / 40
Git Remote Repositories
Remote Branches
Remote repositories have their own branches that you can examine andmerge with
Remote Branch Names
Remote branches have names <repository>/<branch>, e.g.,origin/master
git branch -r Show remote branches
git branch -a Show all branches (local and remote)
David Love VCS & Testing February 18, 2013 26 / 40
Git Remote Repositories
Getting Updates from a RemoteRepository
Two options to get data from a remote repository:
git fetch origin Updates remote branch from origin. Does notchange any local branches.
git pull origin Updates remote branch from origin. Tries tomerge these changes into your local branch.
You will have to resolve any conflictsAfter resolving the conflict, git commit to generatethe merge commit
David Love VCS & Testing February 18, 2013 27 / 40
Git Remote Repositories
Adding Updates to a Remote Repository
One command to update a remote branch with your local copy
git push origin master Update master branch on origin withyour local copy of master
If no one has made changes to origin since your last pull, thepush will go through.
If someone else has pushed to origin, Git will prevent you frompushing your changes.
You must first merge the changes in the local repository beforepushing the new code.
1 Use git pull to merge the changes into your copy
1 get mergetool to resolve any conflicts
2 get commit to generate the commit merge
2 git push to update the remote repository
David Love VCS & Testing February 18, 2013 28 / 40
Git Remote Repositories
Workflow with Remote Git
1 Pull changes to start your work time
1 Read the logs of changes made
2 Create local branches to make your changes
3 Once they are correct, merge your local changes back together
4 Push the changes back to the server
1 If rejected, pull to merge changes2 Resolve conflicts and commit, if necessary3 Push changes back to the server
David Love VCS & Testing February 18, 2013 29 / 40
Git Remote Repositories
Sync in GitHub:Widows and Mac
Github Sync
Github’s GUI for Windows and Mac has a “sync” button thatautomatically deals with push and pull commands
David Love VCS & Testing February 18, 2013 30 / 40
1 Version Control System: Git
2 Unit (and other) TestingAssertionsUnit Testing with xUnit
Testing Assertions
Assertions
Assertion
An assertion is a statement that something is true at a particular pointin a program. If the statement is false, the program will haltimmediately.
Assertions can be used to ensure that:
1 Inputs are valid
2 Program or function outputs are consistent
3 Theoretical properties of the algorithm are satisfied
David Love VCS & Testing February 18, 2013 31 / 40
Testing Assertions
Example: Assertions in Matlab
My code has a lower bound zLower that should be uniformlynondecreasing as the algorithm progresses
It is updated with zLower = c*x
I use an assertion to ensure the nondecreasing bound updatingzLower
Code example:
assert( c*x >= zLower, ’Decrease in zLower’ )
zLower = c*x;
David Love VCS & Testing February 18, 2013 32 / 40
Testing Assertions
Runtime Testing
Best Practices in Scientific Computing
“Assertions can make up a sizable fraction of the code in well-writtenapplications, just as tools for calibrating scientific instruments canmake up a sizable fraction of the equipment in a lab.”
If something goes wrong, the code halts immediately, greatlysimplifying debugging
Best Practices in Scientific Computing
“Assertions are executable documentation, i.e., they explain theprogram as well as checking its behavior. This makes them more usefulin many cases than comments since the reader can be sure that theyare accurate and up to date.”
David Love VCS & Testing February 18, 2013 33 / 40
Testing Unit Testing with xUnit
Automated Testing
Best Practices for Scientific Computing
“[R]egression testing is the practice of running pre-existing tests afterchanges to the code in order to make sure that it hasn’t regressed, i.e.,that things which were working haven’t been broken.”
The next line of defense is Automated Testing:
Unit Test Tests a single unit of a program, e.g., a function or method
Integration Test Tests that units work correctly when put together
David Love VCS & Testing February 18, 2013 34 / 40
Testing Unit Testing with xUnit
Kinds of Test Cases
Oracles Anything that tells you how a program should be working
1 Closed form solutions to special cases2 Simple/small cases of the problem3 Older versions of the code
1 Slow, simple algorithm to test complicated, fastalgorithm
2 High level implementation to test lower level code(e.g., MATLAB to C++)
Bugs Write a test to trigger a fixed bug to prevent it fromreappearing
David Love VCS & Testing February 18, 2013 35 / 40
Testing Unit Testing with xUnit
MATLAB xUnit Test Framework
xUnit is a framework for writing unit tests
It has been implemented for almost any language you can think of
MATLAB xUnit Test Framework
Wikipedia’s List of Unit Testing Frameworks
David Love VCS & Testing February 18, 2013 36 / 40
Testing Unit Testing with xUnit
Building tests with xUnit
xUnit tests have the same basic structure:
input = ...
expectedOutput = ...
realOutput = YourCode( input );
assertEqual( expectedOutput, realOutput );
Define the input and expected output (perhaps for multiple cases)
Run your code for each input value
Compare your expectation with what happened
David Love VCS & Testing February 18, 2013 37 / 40
Testing Unit Testing with xUnit
xUnit Assertions
assertEqual(A,B) A and B are equal.
assertElementsAlmostEqual Elements of floating point matrices A
and B are within some (absolute or relative) tolerance
assertVectorsAlmostEqual norm(A-B) is within some (absolute orrelative) tolerance of zero
assertTrue,assertFalse Check Boolean values
assertFilesEqual Checks that files are the same
assertExceptionThrown Checks that a specific exception was thrown
David Love VCS & Testing February 18, 2013 38 / 40
Testing Unit Testing with xUnit
Running tests with xUnit
With MATLAB xUnit Test Framework:
Write your tests in their own directory
Write each test case as an M-file function that returns no outputarguments
The function should start or end with test or Test
Go to the test directory
Run all tests with runtests
Run a specific test with runtests TestName
David Love VCS & Testing February 18, 2013 39 / 40
Testing Unit Testing with xUnit
Test Driven Development
Test Driven Development
Broadly speaking, TDD is the practice of writing the test cases for newsoftware before the software is written.
Benefits:
Helps to clarify the purpose of the program before coding begins
Tends to create more modular and extensible code
Helps ensure tests are actually written!
Possible drawbacks:
May include poorly written tests
May create false confidence
No clear evidence that TDD improves productivity
David Love VCS & Testing February 18, 2013 40 / 40
Thanks for listening!
Questions?