+ All Categories
Home > Documents > Python by contract: a zero-defect approachshipman/soft/pyract/pyract.pdfPython by contract: a...

Python by contract: a zero-defect approachshipman/soft/pyract/pyract.pdfPython by contract: a...

Date post: 26-Mar-2018
Category:
Upload: danghanh
View: 248 times
Download: 1 times
Share this document with a friend
38
Python by contract: a zero-defect approach Applied Cleanroom software development John W. Shipman 2012-02-29 15:20 Abstract Describes how the Cleanroom software development methodology can be applied to programs in the Python language. This publication is available in Web form 1 and also as a PDF document 2 . Please forward any comments to [email protected]. Table of Contents 1. Origins of the Cleanroom methodology ..................................................................................... 2 1.1. Are we serious about attaining zero defects? .................................................................... 2 2. The contract-based approach to program construction ................................................................ 2 2.1. Skills you will need ........................................................................................................ 3 3. Stepwise refinement ................................................................................................................. 4 3.1. The HIPO model: nested black boxes ............................................................................... 4 3.2. Design factoring and separation of concerns .................................................................... 5 4. Cleanroom overview ................................................................................................................ 6 5. Intended function notation ....................................................................................................... 8 5.1. Simple intended functions .............................................................................................. 8 5.2. Preconditions: The other side of the contract .................................................................. 11 5.3. Compound intended functions ..................................................................................... 12 5.4. Useful metaphors: identity and anything................................................................... 13 5.5. Special forms for intended functions ............................................................................. 14 5.6. Intended function examples .......................................................................................... 15 5.7. The letconvention ..................................................................................................... 17 5.8. Specification functions ................................................................................................. 18 6. Proof rules and the stepwise refinement process ....................................................................... 19 6.1. The sequence rule ........................................................................................................ 20 6.2. The alternation rule ..................................................................................................... 21 6.3. The definite iteration rule ............................................................................................. 22 6.4. The while loop rule .................................................................................................... 23 7. Standards for the review of intended functions ......................................................................... 25 8. Trace tables ............................................................................................................................ 26 8.1. Trace table for sequence ................................................................................................ 28 8.2. Trace table for alternation ............................................................................................. 28 8.3. Trace table for definite iteration .................................................................................... 30 8.4. Trace table for while loops .......................................................................................... 31 1 http://www.nmt.edu/~shipman/soft/pyract/ 2 http://www.nmt.edu/~shipman/soft/pyract/pyract.pdf 1 Python by contract: a zero-defect approach Zoological Data Processing
Transcript

Python by contract: azero-defect approach

Applied Cleanroom software development

John W. Shipman2012-02-29 15:20

Abstract

Describes how the Cleanroom software development methodology can be applied to programsin the Python language.

This publication is available in Web form1 and also as a PDF document2. Please forward anycomments to [email protected].

Table of Contents1. Origins of the Cleanroom methodology ..................................................................................... 2

1.1. Are we serious about attaining zero defects? .................................................................... 22. The contract-based approach to program construction ................................................................ 2

2.1. Skills you will need ........................................................................................................ 33. Stepwise refinement ................................................................................................................. 4

3.1. The HIPO model: nested black boxes ............................................................................... 43.2. Design factoring and separation of concerns .................................................................... 5

4. Cleanroom overview ................................................................................................................ 65. Intended function notation ....................................................................................................... 8

5.1. Simple intended functions .............................................................................................. 85.2. Preconditions: The other side of the contract .................................................................. 115.3. Compound intended functions ..................................................................................... 125.4. Useful metaphors: identity and “anything” ................................................................... 135.5. Special forms for intended functions ............................................................................. 145.6. Intended function examples .......................................................................................... 155.7. The “let” convention ..................................................................................................... 175.8. Specification functions ................................................................................................. 18

6. Proof rules and the stepwise refinement process ....................................................................... 196.1. The sequence rule ........................................................................................................ 206.2. The alternation rule ..................................................................................................... 216.3. The definite iteration rule ............................................................................................. 226.4. The while loop rule .................................................................................................... 23

7. Standards for the review of intended functions ......................................................................... 258. Trace tables ............................................................................................................................ 26

8.1. Trace table for sequence ................................................................................................ 288.2. Trace table for alternation ............................................................................................. 288.3. Trace table for definite iteration .................................................................................... 308.4. Trace table for while loops .......................................................................................... 31

1 http://www.nmt.edu/~shipman/soft/pyract/2 http://www.nmt.edu/~shipman/soft/pyract/pyract.pdf

1Python by contract: a zero-defect approachZoological Data Processing

About this document
This document has been generated with RenderX XEP. Visit http://www.renderx.com/ to learn more about RenderX family of software solutions for digital typography.

8.5. Insuring case coverage ................................................................................................. 319. Additional principles for object-oriented programming ............................................................ 3410. The peer review process ........................................................................................................ 34

10.1. Guidelines for reviewers ............................................................................................. 3510.2. Peer review guidelines for the author ........................................................................... 36

11. Testing ................................................................................................................................. 3611.1. Treasure your mistakes, don't bury them! ..................................................................... 37

1. Origins of the Cleanroom methodologyThe idea of zero-defect3 development addresses quality issues by seeking to prevent the initial introduc-tion of defects into a design, rather than trying to find and repair them later.

Cleanroom software engineering4 is a zero-defect methodology developed by IBM Federal SystemsDivision for use in the project that developed onboard software for the Space Shuttle.

The author learned the method from Dr. Allan M. Stavely, whose book Toward Zero-defect Programming5

describes the method in general. This work describes how the author has applied this methodology tothe construction of programs in the Python programming language6.

If you want more effective programmers, you will discover that they should not wastetheir time debugging, they should not introduce the bugs to start with.

— Edsger J. Dijkstra, Comm. ACM 15(10), Oct. 1972: pp. 859–866.

The term “cleanroom” is an analogy to the cleanrooms7 used in integrated circuit fabrication: it is betterto write the code without defects than to try to find and remove them later.

1.1. Are we serious about attaining zero defects?Zero defects is a goal. Where humans are involved, though, we can hope only to come close. Our defectrates will never be zero. We would, however, like them to be asymptotic to zero.

2.The contract-based approach to program constructionIn the humorous lexicon The computer contradictionary, we find this definition:

Interface: An arbitrary line of demarcation set up in order to apportion the blame formalfunctions.— Stan Kelly-Bootle, The Computer Contradictionary, MIT Press, 2nd Ed., 1995, ISBN 978-

0262611121.

Interfaces occur at two levels in a software design.

• The user interface is the external surface of the product. It defines how the product interacts with therest of the world.

• Assuming the kind of modular design that has been standard practice in the software world for manydecades, every connection between two parts of a program is also an interface.

3 http://en.wikipedia.org/wiki/Zero_defects4 http://en.wikipedia.org/wiki/Cleanroom_Software_Engineering5 http://www.pearsonhighered.com/educator/product/Toward-Zero-Defect-Programming/9780201385953.page6 http://www.python.org7 http://en.wikipedia.org/wiki/Cleanroom

Zoological Data ProcessingPython by contract: a zero-defect approach2

Every interface is, in effect, a contract that divides responsibility between the provider of a service andthe user of that service.

In effect, this contract says:

If the user agrees to certain requirements, the service guarantees to function correctly.

To apply this approach to software construction, for each interface, we must describe the interface, andthe semantics of the provided service, in a sufficiently rigorous way that both are unambiguouslydefined.

Our ideal here is scrutable connectivity to first principles.

• By scrutable, we mean that the design must be clearly presented. A design that is not well-documented,or one that contains more complexity than is necessary, becomes inscrutable.

Scrutability is especially important in critical applications, such as railroad signaling and the admin-istration of medical radiation. Where lives are involved, the code must stand up under externalscrutiny. The lone wolf programmer, the one person who knows how the code works, is unacceptable.

...programs must be written for people to read, and only incidentally for machinesto execute.— Harold Abelson and Gerald Sussman, Structure and interpretation of computer pro-

grams, MIT Press, 1996, p. xvii, ISBN 0-262-01153-0.

• The first principles to which we connect a program must be mathematical: we must be able to reasonabout program correctness in a mathematical way.

2.1. Skills you will need• This method isn't for novice programmers. Novices are inclined to treat the formal methods as “un-

necessary extra work.” It is necessary to have a certain minimal amount of experience with real-worldsoftware, tools that must survive ignorant or malicious users, to understand that getting the designright in the first place really is the shortest path to a robust solution.

• The methods require a reasonable familiarity with discrete math. Here's the catalog description ofMath 221 at New Mexico Tech, Formal Logic and Discrete Mathematics, which is a prerequisite forthe Zero-defect Software Design course:

Analytical reasoning and critical thinking skills. Induction and recursion. Mathemat-ical proofs. Propositional calculus and predicate calculus. Discrete and combinatorialmathematics: sets, functions, relations, trees, graphs, permutations, and combinations.

• Clear, concise writing is central to the design process. The author starts each new design by writinga draft specification. However, the coding process is no less a writing task, and requires careful workin naming and describing algorithms and data structures at every step of the way.

Besides a mathematical inclination, an exceptionally good mastery of one's nativetongue is the most vital asset of a competent programmer.

— Edsger J. Dijkstra, Selected writings on computing: a personal perspective, Springer-Verlag, 1982, pp. 129-131, ISBN 0-387-90652-5.

3Python by contract: a zero-defect approachZoological Data Processing

3. Stepwise refinementCleanroom is, among other things, a stepwise refinement approach. For any nontrivial system, you willdivide the logic of the program as a whole into smaller and smaller pieces, writing intended functionsto describe the semantics of each piece, until every piece of the finished system has a well-defined interfaceand clearly implements its intended function.

3.1.The HIPO model: nested black boxesThe venerable HIPO8 model of software design is a way of subdividing large problems into smallerones.

It is useful to model a system as a black box9. In this model, the purpose of a system is to transform aset of one or more inputs into one more outputs.

This particular concept came from circuit design. For example, the half-adder10 is a standard buildingblock for digital logic. It has two inputs A and B, and two outputs called sum and carry. We might drawthe black box like this:

B

A carry

sumhalf−adder

If we are building a larger system out of these, all we need to know is the truth tables for the outputs:sum ≡ A ^ B and carry ≡ A & B, where “^” is Boolean exclusive-or and “&” is Boolean and.

The black-box view is sometimes called the IPO model for Input, Processing, and Output. The outputsof this black box are a function of the inputs, and the P stands for the processing done inside the box toconvert the inputs to the outputs.

The H in HIPO stands for “hierarchical”: we divide large, complex functional blocks into networks ofsmaller functional blocks. The final design is a hierarchy of nested black boxes; the system as a wholeis the outermost box, and the design consists of subdividing the functionality in each box until we reachsimple, single pieces.

Here is the next level of stepwise refinement of our half-adder circuit.

8 http://en.wikipedia.org/wiki/HIPO9 http://en.wikipedia.org/wiki/Black_box10 http://en.wikipedia.org/wiki/Half-adder

Zoological Data ProcessingPython by contract: a zero-defect approach4

and

xor

A

B

carry

sum

Here we see how the overall inputs (A and B) are routed inside the top-level box to smaller functionalpieces, in this case, an XOR gate and an AND gate.

In a circuit design, we could continue this another level, dividing up the AND gate into smaller pieces(transistors).

So it is with software design by stepwise refinement. You start by describing the entire system in termsof the desired outputs, the inputs required to compute those outputs, and the processing that convertsinputs into outputs.

In this abstraction, we take a very broad view of what constitutes an input or output. Inputs might includedata files, keystrokes on a keyboard, values from a database, messages from the Internet, or anythingelse that we can operate on. Outputs might include data files, images on a screen, sounds coming froma speaker, or changes to a database.

NotePrinciples of good cohesion (what goes on inside each piece) and coupling (the connections between thepieces) are outside the scope of this essay. For an overview, see Wikipedia11.

3.2. Design factoring and separation of concernsBefore we dive into the details of the Cleanroom process, here is an example of two important softwaredesign concepts: separation of concerns and factoring.

A designer knows he has achieved perfection not when there is nothing left to add, butwhen there is nothing left to take away.

— Antoine de Saint Exupéry12

I conclude that there are two ways of constructing a software design: One is to makeit so simple that there are obviously no deficiencies and the other way is to make it socomplicated that there are no obvious deficiencies.

11 http://en.wikipedia.org/wiki/Cohesion_(computer_science)12 http://en.wikiquote.org/wiki/Antoine_de_Saint_Exup%C3%A9ry

5Python by contract: a zero-defect approachZoological Data Processing

— C. A. R. Hoare, Comm. ACM 24(2), Feb. 1981: p. 81

The idea behind separation of concerns is that each piece of a program has certain functional requirementsto worry about, and those worries should be localized to that particular piece.

Let's consider a specific application: a ticketing agency. The purpose is to connect patrons with ticketsto some performance.

When a promoter books an event in a hall, the ticketing agency must know exactly how many seats areavailable at each price level. It is the ticket agency's job to keep track of which seats have been sold, andto whom.

We might write a Python class called TicketAgency to implement this functionality. When we startdesigning the class, we can specify three different areas of concern in the implementation.

1. The TicketAgency class's concern is matching up patrons with seats.

2. We might design another class called Venue whose concern it is to keep track of the number andarrangement of seats: how many in each price class, which ones are on the aisle, and so forth.

3. Another class called PatronData is concerned with patrons: how to contact them, and the specificneeds of each patron.

The way that separation of concerns works in this application is that information about seats resides inthe Venue class; information about patrons resides in the PatronData class; and the TicketAgencyclass is concerned only about the connections between seats and patrons.

The concept of separation of concerns is related to the concept of single-sourcing: there must be exactlyone definition of each entity, and others use that single definition. Having two different lists of the seatsin a venue inside your application is a sign of trouble: if the lists disagree, how do we know which oneis right?

In practice, when we apply the idea of separation of concerns, it becomes clear where in our design eachfunctional piece should live. In the example, information about a seat's price and location is local to theVenue class; information about a patron's phone number is local to the PatronData class; and onlythe TicketAgency class cares which patron is associated with which seat.

As we push this design farther down into the details, we may find that the Venue class is a containerfor instances of a Seat class, each of which describes just one seat. This process of dividing separateconcerns into smaller sub-concerns is sometimes called factoring. The Venue class's proper concern isstoring, searching and retrieving Seat instances.

Similarly, when we start writing the PatronData class, we might factor it so that a PatronData instanceis a container for Patron instances. The proper concern of a Patron instance is information about oneticket buyer; the proper concern of the PatronData class is organizing all its contained patrons.

For example, suppose that one specific seat is damaged by a falling meteor. To insure that that seat doesnot get sold, the Venue class must have a way to find and delete that seat from its set of available can-didates; that is in its proper area of concern.

4. Cleanroom overviewDevelopment must start from a firm written specification. The Cleanroom method is intended to insurethat the code conforms to the specification.

You will start by describing the interface to a module (program or piece of a program) using intendedfunction notation. Then you will write the code in such a way that it clearly implements the intendedfunction.

Zoological Data ProcessingPython by contract: a zero-defect approach6

We use the term prime, short for primary program refinement, to describe each of the pieces, the blackboxes of our design.

Here is the flow of the design and verification of one prime.

1. Write the intended function for the prime.

2. Write the code that implements that intended function.

Sometimes the prime will be a single block of code with no control flow. However, more generally,the code will require some control logic. Only four control structures are permitted, and each is asso-ciated with a procedure for proving its correctness.

a. Sequence: Do A, then do B.

b. Alternation: If A is true, then do B, otherwise do C.

c. While loop: While A is true, do B.

d. Definite iteration: For every x in S, do f(x).

For each of these control structures, you will write an intended function for the pieces (such as thebody of a while loop) and write the code for that piece to conform to that intended function.

3. The designer may self-review the code by various methods, such as trace tables that show the stateat various locations in the code.

4. The code is verified by peer review. Unless every member of the review panel is completely convinced thatthe code matches the intended function, the code is sent back for rework.

Notice that the peer review is much more stringent than classical design reviews. Your author has beenthrough quite a few of those while working in industry. In the classical process, whoever wrote the codepresents it to peers, who strive valiantly to stay awake. After droning on for a few hundred lines, thepresenter says, “Any questions?” If you ask too many questions, especially as lunchtime approaches,you may find your coworkers glaring at you.

No, in a Cleanroom peer review, the code must be obvious or it gets sent back for rework. This has theadvantage of preventing unnecessary cleverness. In the author's experience, cleverness is seldom reallyrequired.

Debugging is twice as hard as writing the code in the first place. Therefore, if you writethe code as cleverly as possible, you are, by definition, not smart enough to debug it.

— Brian Kernighan, professor at Princeton University, one of the inventors of Unix,and co-author of The C Programming Language.

More people looking at the code will catch more mistakes, and earlier in the process. Anyone who hasdone much proofreading will tell you that the author will have the same blind spots duringproofreading that they had when they wrote it, so a different viewpoint helps find and clean up defects.

The methodology also motivates you to divide things into more, smaller pieces. Monster proceduresare hard to verify. Peer review of a small, well-defined module doesn't take very long.

• Section 5, “Intended function notation” (p. 8)

• Section 6, “Proof rules and the stepwise refinement process” (p. 19)

• Section 8, “Trace tables” (p. 26)

• Section 10, “The peer review process” (p. 34)

• Section 11, “Testing” (p. 36)

7Python by contract: a zero-defect approachZoological Data Processing

5. Intended function notationEvery prime (piece of logic) is accompanied by a description of what it does using intended functionnotation. Typically the intended function accompanies the source code as a comment, and is placed inside[ square brackets ]. This description is essential to all steps of the method, and completely describesthe interface to that prime.

Formally, an intended function describes a piece of code as a set of simultaneous changes to the stateof one or more state items. State items include:

• The values of variables.

• Attributes of an object, such as the text color of a button on a graphical user interface.

• Files: not just their contents, but the current position if open, the states of buffers, and every otherstateful aspect of file handling.

• External devices and entities such as display screens, servers, TCP/IP sockets, freight trains, medicalradiation machines, and so forth.

We break our discussion of intended function construction into three parts.

• Section 5.1, “Simple intended functions” (p. 8): When the prime always does the same thing.

• Section 5.2, “Preconditions: The other side of the contract” (p. 11).

• Section 5.3, “Compound intended functions” (p. 12): When the prime may do different things indifferent cases.

5.1. Simple intended functionsBy simple intended function we mean a set of state changes that are made in every case. If the prime doesdifferent things in different cases, see Section 5.3, “Compound intended functions” (p. 12).

Here is a small example showing an intended function for two lines of code. The intended function livesin a comment preceding the code, and is placed in [ square brackets ] to distinguish it from other kindsof comments.

# [ x, y := 0, y+1 ]x = 0y += 1

This form of the intended function follows Dr. Stavely's book13. On the left of the “:=” symbol is a listof state items whose state will be changed by the code. On the right is a list of the new values for eachstate item, in the same order.

ImportantIt is critical to write your intended function so that conceptually all state changes happen in parallel. Thisis especially important when one of the state items on the left side of the “:=” also appears on the rightside.

If a state item that appears on the left side of a “:=” also appears on the right side of the same intendedfunction, its value in expressions on right-hand sides always refers to its old value, before the executionof that block.

13 http://www.pearsonhighered.com/educator/product/Toward-Zero-Defect-Programming/9780201385953.page

Zoological Data ProcessingPython by contract: a zero-defect approach8

Consider the simple task of exchanging the values of two items x and y. In Python, this statement worksjust fine:

x, y = y, x

However, let's assume for the moment that the person who had to write the code came from a C back-ground and assumes you need a temporary variable to exchange two values. Here is a first attempt atthe intended function and code.

# [ x, y := y, x ]temp = xx = yy = temp

So when you read the intended function above, the two assignments happen conceptually in parallel,so we could read it as “x gets the old value of y, and y gets the old value of x.”

However, the above example violates one of the cardinal Cleanroom rules:

The intended function must match the code.

Here is the corrected intended function and code:

# [ x, y, temp := y, x, x ]temp = xx = yy = temp

The code affects three state items, not two: after execution, the value of temp is the old value of x. Unlessyou add the line “del temp”, that value stays around, and it must be accounted for during verification.

Life is relatively simple when we're just dealing with the values of variables. The fun begins when weconsider input and output, error recovery, and the other baggage of living in a world where the usermay be clueless or downright evil.

Let's consider the classic hello-world program. Here is a complete script:

#!/usr/bin/env python# [ sys.stdout := sys.stdout + (a cheery greeting) ]print "Welcome to the Batley Townswomen's Guild!"

Although we haven't explicitly imported sys here, the destination for the print statement is the sameas sys.stdout, so the notation is clear to anyone who knows Python. The “+” operator here meansstring concatenation. We might read the intended function as: “Whatever was in standard output before,it's still there, but our cheery greeting has been added to the end of it.”

Here's an example of an input operation.

# [ inFile, rest := inFile advanced to end of file,# all remaining data from inFile ]rest = inFile.read()

There's a lot more to programming that simple, single operators. When your author was learning thismethod, he tried to express everything in an algebraic way, but this can made the code harder to readwhen the application is not primarily an algebra problem.

Writing intended functions is a writing task more than it is a programming task, and your descriptionsof the new value of a state item are prose, not code. Here are some examples of intended functions thatare quite clear as English.

9Python by contract: a zero-defect approachZoological Data Processing

shoppingList := a set of the unique ingredients from allthe recipes in recipeTable

horse_list := horse_list with new_horse addedscrollBar := scrollBar positioned one page further down

but no further than the bottom of the pagerunButton := runButton grayed out and unresponsive to the mouselocomotive := locomotive with emergency braking applied

You may find that the larger pieces of your design are easier to describe in English, and that the intendedfunctions look more like math as you get closer to the smaller pieces of your design, but this is only ageneralization. Use any combination of mathematical notation and English that you think best expresseswhat the program does. This ability improves with practice, especially when you practice peer verific-ation: in that situation what counts is clarity, clarity that is perceived by the people who didn't writethe code.

As with any kind of writing, if you find yourself stuck for a good way to phrase things, try explainingits purpose out loud, to a friend, or to a broom if no one is around. Then write down what you said. Itmay not be perfect, but it gives you something to work with. The author has found that this methodtends to damp down his tendency toward overblown rhetoric (somewhat, at least).

The author deviates from Stavely14's intended function notation in a few unimportant ways.

1. The above intended function could be written in a way that mimics Python's “augmented assignment”convention. The formulation below is the same as the one above, and can be read the same way:“Whatever was in sys.stdout there before, it's still there, but now it has our cheery greeting tackedon.”

# [ sys.stdout +:= (a cheery greeting) ]

2. Often when an intended function names several state items, the author will break up intendedfunction into multiple lines, each one with a “:=” separating the state item from its new value. It isimportant to remember that all the state changes happen conceptually in parallel, and the order ofthese lines does not matter.

For example, in this intended function:

# [ year, month, day, fracDay := year part of date,# month part of date, day part of date, fractional day# from date as a float ]

you have to look carefully to see which state item goes with which new state. The author would breakthis up so the associations are clear:

# [ year := year part of date# month := month part of date# day := day part of date# fracDay := fractional day from date as a float ]

Still, during verification, all of the changes displayed on multiple lines are considered to happen simultaneously.The order in which you happen to arrange multiple state changes in one simple intended functionis not and cannot be important.

3. Programs do a number of things that do not really fit the model of a state item being changed: theyterminate; they raise exceptions; functions return values or just return. For the handling of thesecases, see Section 5.5, “Special forms for intended functions” (p. 14).

14 http://www.pearsonhighered.com/educator/product/Toward-Zero-Defect-Programming/9780201385953.page

Zoological Data ProcessingPython by contract: a zero-defect approach10

All the intended functions above are what we call simple intended functions: the code always makesthe same set of state changes. Before we talk about compound intended functions, for code that maydo different things in different cases, first read Section 5.2, “Preconditions: The other side of the con-tract” (p. 11).

5.2. Preconditions:The other side of the contractSo far we have talked about how intended functions describe the semantics of a prime — what the codedoes. However, your code may not be able to guarantee all the conditions necessary for successful op-eration. This means that you may specify preconditions in order shift the burden of certain conditionsonto the caller.

Let's take for example a function sqrt(x) that computes the square root. Here's our first attempt:

# [ y := square root of x ]y = math.sqrt(x)

However, what if the caller passes a negative argument? Worse yet, what if the caller passes a stringargument? We're expecting x to be a number!

The way to handle this is to specify one or more preconditions. A precondition is a statement that mustbe true for the code to work properly. Here's our revised intended function for sqrt(x).

# [ x is a nonnegative number -># y := square root of x ]

Another name for this convention is partial specification. We define what our function does if x is anonnegative number, but we don't define what it does for any other cases. The precondition precedesthe rest of the intended function, followed by “->”.

Another way to read this intended function is as a contract:

If the caller insures that all preconditions are true, the prime guarantees to make thespecified changes in state items.

Recall Stan Kelly-Bootle's definition of an interface in Section 2, “The contract-based approach to programconstruction” (p. 2)? With our improved intended function, we can apportion the blame for malfunctionsquite clearly. If the caller supplied a nonnegative number and didn't get the right result, it's the fault ofthe sqrt function. If the caller didn't supply a nonnegative number, it's the caller's fault if the result iswrong.

Here's an example of multiple preconditions: an intended function for the atan2(y,x) function in thestandard Python math module.

# [ (y is a number) and (x is a number) -># return the angle that a vector from the origin to# (x,y) makes relative to the x-axis, in radians, in# the range [-pi, pi] ]

What about functions and methods with optional arguments? Here's an example intended function todescribe the s.rjust(n[, fill]) method of the Python str type.

# [ (s is a string) and (n is a positive int) and# (fill is a string) -># return s, left-padded to length n with fill (defaulting# to spaces) ]

11Python by contract: a zero-defect approachZoological Data Processing

There are other ways to write this intended function. For example, you could write the function signatureas “s.rjust(n, fill=' ')”, in which case the intended function wouldn't even have to mentionthe default value process because it is built into the Python semantics for function calls: if you leave outthe second argument to .rjust(), it's the same as if you passed a space for that argument.

5.3. Compound intended functionsWhen the semantics of a block of code are different depending on runtime conditions, we use a compoundintended function. This is basically an if-then-else structure on top of a set of simple intended functions.

Formally:

simple intended functionA list of state changes that happen simultaneously. Each state change is an ordered pair (vi, ei) wherevi is some state item that changes in the prime, and ei is a description of the new value.

compound intended functionA compound intended function is a set of simple intended functions S0, S1, …, and a set of one ormore conditions C0, C1, … that select one of the intended functions.

Conceptually, the execution of a prime with a compound intended function proceeds in two steps:

1. The conditions are examined to see which simple intended function applies in this case.

2. The state changes for the selected simple intended function occur simultaneously.

Here is the general form as it might appear in the code:

# [ if C0 -># S0# else if C1 -># S1# else if ...# ...# else -># Sn ]

Each Ci is a description of some condition, and each Si is a simple intended function.

The relationship between the conditions and the simple intended functions can be expressed as a truthtable. Here is an abstract example:

# [ if C0 -># S0# else if C1 -># S1# else -># S2 ]

Here is a presentation of that case structure as a truth table.

CaseC1C0

S0TT

S0FT

S1TF

Zoological Data ProcessingPython by contract: a zero-defect approach12

CaseC1C0

S2FF

Note that two lines of the above truth table have the same simple intended function. Using the conventionfrom digital logic that “X” means “don't care”, the above truth table can be reduced to three rows:

CaseC1C0

S0XT

S1TF

S2FF

Here is a concrete example: an intended function for a call to Python's str.isdigit() method.

# [ if (s has at least one character) and# (all of s's characters are digits) -># digitCheck := True# else -># digitCheck := False ]digitCheck = s.isdigit()

In more complex cases, if you prefer, you may nest “if” and “else” clauses. However, be sure thatyou have covered all possible cases. Any case that is not covered is undefined, but this is not goodpractice; better practice is to use preconditions to insure that all cases are covered. A truth table is agood way to be sure you have considered all the edge cases.

5.4. Useful metaphors: identity and “anything”Certain metaphors are useful in constructing compound intended functions.

IThe identity transform. Used as the consequence of an “if” or “else” clause when there are no statechanges in that case.

Here's an example:

# To survive in a cubicle farm, keep your head below the top of# the partition. (-hp- folklore ca. 1974)# [ if height > wall.maxHeight -># height := wall.maxHeight# else -># I ]height = min(height, wall.maxHeight)

The intended function says that if height <= wall.maxHeight, nothing changes.

... := (anything)If you specify that the new value of a state item is “(anything)”, you are saying that the value ofthat state item is unreliable or unimportant.

Here's an example. Suppose that your script is supposed to update a file named bio-file, but insome error conditions, it messes up the contents of the file. We might write the intended functionfor the script in this way:

13Python by contract: a zero-defect approachZoological Data Processing

# [ if new-data file is valid -># file bio-file := file bio-file updated using new-data# sys.stderr +:= message indicating success# else -># file bio-file := (anything)# sys.stderr +:= message indicating failure ]

5.5. Special forms for intended functionsSome program semantics don't fit the model of changes to a state item. Here are some exceptions andsuggestions as to how to write their intended functions.

Program terminationIf a program terminates, that's a rather important operation! Here's an example. Quite often, thefirst thing a script will do is to process and check the command line arguments. If they aren't valid,there is no point in continuing execution. Here is an example of an intended function for that process.

# [ if the command line arguments are valid -># args := an argparse.Namespace instance representing# the values of those arguments# else -># sys.stderr +:= (usage message) + (error message)# stop execution ]

The last line above has no “:=” in it, but it's pretty important. Still, the idea of parallel execution ofall the state changes holds: conceptually, our usage and error messages get sent to sys.stderr atexactly the same time as our program terminates.

Function returnUse a line of the form “return E” where E is a description of the return value. For a Pythonfunction that falls off the bottom, or whose return does not state a value, use “return None”.

ExceptionsRaising an exception is similar to stopping execution, in that it interrupts the control flow. Use theverb “raise”. Any other lines of the intended function are conceptually executed simultaneouslywith the raising of the exception. Here's an example.

# [ if x >= 0 -># return the square root of x# else -># sys.stderr +:= (error message)# raise ValueError ]

GeneratorsUse the verb “generate” for functions, or “yield” for yield statements. Here is an example:

def updown(n):''' [ n is a nonnegative integer ->

generate the sequence 0, 1, 2, ..., n, n-1, n-2, ..., 0 ]'''#-- 1# [ generate 0, 1, 2, ..., n-1 ]for k in range(n):

yield k

Zoological Data ProcessingPython by contract: a zero-defect approach14

#-- 2# [ generate n, n-1, n-2, ..., 0 ]for k in range(n, -1, -1):

yield k

5.5.1.Terminal special forms

In the case of all the special forms described in Section 5.5, “Special forms for intended functions” (p. 14),except for the “generate” and “yield” forms, execution does not fall through to the code that followsthat prime.

• If the prime is a function return, execution continues in the caller after the point where the functionwas called.

• If the prime terminates execution, nothing executes after that prime.

• If the prime raises an exception, execution may pass to a try-except construct at some higher level,or, if there is no handler for that exception, execution terminates.

We call such special forms terminal special forms because they may make the following code unreachable.This will be an issue during construction of trace tables.

5.6. Intended function examplesTo illustrate some typical intended functions, here are some examples from Python and its libraries. Ineach example, the wording before the code fragment is taken from the official Python documentation.

any(I)Return True if any element of the iterable I is true. If the iterable is empty, return False.

# [ i is an iterable -># if there exists any element k in i such that bool(k) is True

-># test := True# else -># test := False ]test = any(i)

The precondition stipulates that the argument must be an iterable. This intended function coversthe case that I is empty, because if I is empty, there does not exist any element k such that bool(k)is true.

all(I)Return True if all elements of the iterable are true (or if the iterable is empty).

# [ i is an iterable -># if (i is empty) or (bool(k) is true for all k in i) -># test := True# else -># test := False ]test = all(i)

Here we see a compound condition: there are two True cases, one case where the iterable is empty,and another case where the iterable is nonempty and all its members are true.

15Python by contract: a zero-defect approachZoological Data Processing

s.islower()Return True if all cased characters in s are lowercase and there is at least one cased character in s,False otherwise.

# [ s is a str or unicode value -># if (s contains at least one cased character) and# (all cased characters in s are lowercase) -># test := True# else -># test := False ]test = s.islower()

str.ljust(width[, fillchar])Return the string left justified in a string of length width. Padding is done using the specifiedfillchar (default is a space). The original string is returned if width is less than len(s).

# [ (s is a str or unicode) and (width is an integer) and# (fillchar is a one-character str or unicode, defaulting to# one space in the same type as s) -># if width < len(s) -># a := s# else -># a := s + (len(s)-1 copies of fillchar) ]a = s.ljust(width, fillchar)

There are three parts to the precondition. The first two constrain the types of the s and width argu-ments. The third part constrains both the type and the length of the fillchar argument, and alsospecifies the default value.

list.reverse()Reverses the elements of a list in place.

# [ L is a list -># L := L with its elements reversed# x := None ]x = L.reverse()

The list.reverse() method doesn't return an explicit result, so by the Python convention, ifyou assign the value of this expression to a variable, that variable is set to None.

list.pop([index])Remove and return the item at index (default, the last item). Raises IndexError if the list is emptyor index is out of range.

# [ (L is a list) and (i is an int, defaulting to -1) -># if -len(L) <= i < len(L) -># k := L[i]# L := L with element [i] removed# else -> raise IndexError ]k = L.pop(i)

The two preconditions constrain the types and values of the arguments, and specify the defaultvalue of i. Note that this method modifies two state items: it returns a value and it also modifiesthe original list. The else clause illustrates the special form for cases that raise an exception.

Zoological Data ProcessingPython by contract: a zero-defect approach16

This intended function was written under the assumption that the reader is familiar with the Pythonconvention regarding negative indices: L[-1] is the last element of L, L[-2] is the penultimateelement, and so on; in general, for negative k, L[k] is equivalent to L[len(L)+k].

If we were writing the intended function for readers who might not be familiar with that convention,we could be more explicit:

# [ (L is a list) and (i is an int, defaulting to -1) -># if 0 <= i < len(L) -># k := L[i]# L := L with element [i] removed# else if i < 0 -># k := L[len(L)+i]# L := L with element [len(L)+i] removed# else -> raise IndexError ]k = L.pop(i)

5.7.The “let” conventionSometimes we find that the same phrasing occurs at several positions within an intended function. Toreduce repetition, and also to bring out the fact that we are talking about the same value, you may wishto define a sort of temporary shorthand name for a phrase in a construct with this general form:

# [ let# name1 = phrase1# name2 = phrase2# in -># simple or compound intended function ]

For example, suppose a prime appends the contents of one named file to another, and their names inthe code are inFileName and outFileName. Here is its intended function:

# [ file named by outFileName := (file named by outFileName) +# (contents of file named by inFileName) ]

Now suppose we complicate the design by stating that the input file is the one named by inFileName,but defaulting to standard input if inFileName is None; and the output file is either the one namedby outFileName, or the standard output stream if outFileName is None. The intended function hasnow become quite unwieldy.

Using the let convention, we can write it by defining two shorthand names for the effective input andoutput files.

# [ let# in-file == the file named by inFileName, defaulting to sys.stdin# if inFileName is None# out-file == the file named by outFileName, defaulting to sys.stdout

# if outFileName is None# in -># out-file := out-file + (contents of in-file) ]

17Python by contract: a zero-defect approachZoological Data Processing

5.8. Specification functionsIn Section 5.7, “The “let” convention” (p. 17) we showed how it can make intended functions shorteror clearer by using shorthand names for lengthy phrases.

This sort of shorthanding can also be useful when the same phrase occurs in different intended functionsof the same program. Following Stavely15, the author calls these shorthand notations specification functions.

Because they are purely for notation purposes, and not part of the actual code, the author prefers todefine each specification function in a comment, generally all together and alphabetized for easy lookup.

Here is an example. Python's built-in sorted() function returns a list containing the elements of aniterable in some specified ordering. The prototype looks like this:

sorted(sequence[, cmp=None[, key=None[, reverse=None]]])

sequenceThe sequence to be sorted, any Python iterable.

cmpIf provided, this is a function that defines the ordering. It must accept two arguments and return anegative number if the first argument is to be considered less than the second; zero, if they are tobe considered equal; or a positive number if the first is greater.

keyIf provided, this is a function that defines the key to be used in sorting (the default is to use theelements themselves). It takes one argument, whose type is the type of the elements of the sequence,and returns a value that may be of a different type, which we will call the key domain.

If both cmp and key functions are provided, the cmp() function must operate on values in the keydomain.

If no key function is provided, the cmp() function operates on the type of the elements of sequence.

reverseIf this argument is true, the result is in the reverse order of the ordering specified by the other argu-ments.

An honest intended function for the sorted() function must describe, among other things, two inter-faces: the interface to the function passed to the cmp argument, and the interface to the function passedas the key argument.

To solve this problem, the author would define two specification functions describing these two interfaces.The author also uses a specific typographic convention for their names: each is made of two or morewords connected with hyphens. This clearly distinguishes them from Python names, since they arenever used in actual code anyway.

#================================================================# Specification functions#----------------------------------------------------------------# comparator-function ==# a function whose signature is# f(x, y)# and whose intended function is# [ if x < y -># return a negative number# else if x == y ->

15 http://www.pearsonhighered.com/educator/product/Toward-Zero-Defect-Programming/9780201385953.page

Zoological Data ProcessingPython by contract: a zero-defect approach18

# return 0# else -># return a positive number ]#----------------------------------------------------------------# key-function ==# a function of one argument that returns a single value#----------------------------------------------------------------

Then we can write the intended function for sorted() like this.

[ (sequence is an iterable) and(cmp is a comparator-function, defaulting to the built-inPython cmp() function)) and(key is a key-function, defaulting to the identity function) ->if bool(reverse) ->return a list containing the elements of sequence,ordered using cmp(key(x),key(y)) for any pair x,y,in descending order

else ->return a list containing the elements of sequence,ordered using cmp(key(x),key(y)) for any pair x,y,in ascending order ]

As always, consider your audience when writing intended functions. The above example presupposesthat the reader understands a little about how generic sort functions work, and that the ordering theyproduce is ultimately defined by a mechanism to compare two arbitrary values.

You may also use specification functions to define shorthand names as discussed in Section 5.7, “The“let” convention” (p. 17).

You may also use parameterized specification functions. These use standard functional notation. Forexample:

#--# ref-key(ref) == ref.word + "|" + ref.suffix + "|" + ref.prefix#--# The key value used to order one reference to a keyword, where# ref is an instance of the KwicRef class.#--

This is taken from kwic.py: A Python module to generate a Key Word In Context (KWIC) index16. The ref-key specification function describes how instances of a certain class are ordered according to the con-catenation of three attributes of the instance, separated by vertical bars.

6. Proof rules and the stepwise refinement processThe required techniques of effective reasoning are pretty formal, but as long as program-ming is done by people that don't master them, the software crisis will remain with usand will be considered an incurable disease. And you know what incurable diseasesdo: they invite the quacks and charlatans in, who in this case take the form of SoftwareEngineering gurus.

— Edsger J. Dijkstra, Answers to questions from students of Software Engineering17

16 http://www.nmt.edu/tcc/help/lang/python/examples/kwic/17 http://www.cs.utexas.edu/users/EWD/ewd13xx/EWD1305.PDF

19Python by contract: a zero-defect approachZoological Data Processing

Once you have written the intended function for your entire script or other prime, use stepwise refinementto write the code.

If the implementation of the prime is trivial — a line or three of code — then you simply write the code.At that point you can proceed to apply the rules given in Section 7, “Standards for the review of intendedfunctions” (p. 25), construct trace tables (Section 8, “Trace tables” (p. 26)), and conduct peer reviewsof the prime.

However, for any nontrivial application, you must figure out how to break the logic into smaller pieces.The basic idea of stepwise refinement is to divide each large piece into, say, two to seven smaller pieces,and continue subdividing until each piece is small and well-described. Every single interface betweena larger piece and the smaller pieces that it uses must be completely defined. Our intended functionsare the interface definitions that hold the entire design together.

Each time you subdivide your design into smaller pieces, you will also write the intended functions foreach of the smaller pieces.

In order to be sure that we can reason mathematically about this process, the way you subdivide a primeinto smaller pieces must fit one of these four standard patterns. For each pattern, there is a related proofrule that must be used during peer review to satisfy all present that the implementation is correct. Eachproof rule describes how to compare the intended function for each of the smaller pieces of the primeagainst the overall intended function for the prime as a whole.

• Section 6.1, “The sequence rule” (p. 20): do A, then do B.

• Section 6.2, “The alternation rule” (p. 21): if condition A is true, then do B, otherwise do C.

• Section 6.3, “The definite iteration rule” (p. 22): for every member of some sequence S, do A.

• Section 6.4, “The while loop rule” (p. 23): while condition A is true, do B.

6.1.The sequence ruleWhen a prime A is subdivided into a sequence of two smaller primes B and C, the proof rule reads likethis.

Doing the intended function for B, followed by doing the intended function C, musthave the same net effect as doing the intended function for A.

Here is an example: the computation of the standard deviation of a tuple data containing float values.This is a sequence of three primes.

def stdDev(data):'''Return the standard deviation of a data set.

[ data is an iterable containing at least two floats ->return the standard deviation of data ]

'''#-- 1# [ sumData := sum of data# sumSq := sum of squares of data# n := length of data ]sumData = sum(data)sumSq = sum([x**2

for x in data])n = len(data)

Zoological Data ProcessingPython by contract: a zero-defect approach20

#-- 2# [ variance := ((square of sumData) - sumSq)/(n*(n-1)) ]variance = (sumData**2 - sumSq)/(n*(n-1))

#-- 3# [ result := square root of variance ]result = variance**0.5

#-- 4return result

What value will this function return? To find out, we'll take it one prime at a time. After prime 1, thesestate items have changed:

≡ dataisumData

≡ datai2sumSq

len(data)n

After prime 2, to find the value of variance, we substitute the expressions for sumData and sumSqinto the expression on the right-hand side to get this value:

[(≡ datai) - (≡ datai2)] / [len(data) * (len(data)-1)]variance

After prime 3, we substitute that value of variance into the right-hand side to get the final value:

sqrt{[(≡ datai) - (≡ datai2)] / [len(data) * (len(data)-1)]}result

The final prime returns this value. In order to convince ourselves that the returned value is correct, wecan compare the value of result above with the formula for the standard deviation in a statistics ref-erence.

6.2.The alternation ruleThis rule is for subdividing some prime A into an if construct with this general form:

# [ A ]if B:

# [ C ]...

else:# [ D ]...

The proof rule is:

Doing the intended function C when B is true, or doing the intended function D whenB is false, must have the same net affect as doing the intended function for A.

Here's a brief example.

def sqrt(x):'''Return the square root of x.

21Python by contract: a zero-defect approachZoological Data Processing

[ x is a number ->if x is nonnegative ->return the square root of x

else -> raise ValueError ]'''#-- 1if x >= 0:

#-- 1.1# [ return x to the one-half power ]return x**0.5

else:#-- 1.2# [ raise ValueError ]raise ValueError("You can't take the square root of a "

"negative number like {0:g}.".format(x))

6.3.The definite iteration ruleThis rule is for performing some operation on every element of a Python iterable (sequence value),typically in a for statement with this general form.

# [ A ]for e in S:

# [ B ]...

The proof rule is in two parts.

1. If S is empty, does doing nothing have the same net effect as intended function A?

2. If S is nonempty, does doing intended function B to the first element of S, followedby doing intended function A to the remaining elements of S, have the same net effectas intended function A on the entire sequence S?

Here's an example. This function implements the logic of Python's str.join() method, the equivalentof “connector.join(pieceList)”.

def joiner(connector, pieceList):'''Join the pieces of pieceList with the connector.

[ (connector is a string) and(pieceList is a list of strings) ->

return a string containing the elements of (pieceList)separate by (connector) ]

'''#-- 1prefix = ''result = ''

#-- 2# [ if pieceList is empty -># I# else -># result := result + prefix + (elements of (pieceList),

Zoological Data ProcessingPython by contract: a zero-defect approach22

# separated by (connector)# prefix := connector ]for piece in pieceList:

#-- 2 body# [ result := result + prefix + piece# prefix := connector ]result = result + prefix + pieceprefix = connector

#-- 3return result

Let's apply the two proof rules to this example.

Firstly, suppose pieceList is an empty sequence. In that case, the for loop never executes, so thereturn value is the value of result set in prime #1, the empty string. The overall intended functionspecifies that the connector is inserted only between elements; since there are no elements, the emptystring is the correct result value.

In the case that pieceList is not empty, we apply the second proof rule. After applying the #2 bodyintended function, result is set to the first element of pieceList and prefix is set to the value ofconnector.

We then apply the intended function for the entirety of prime #2. After that, the states of our two variablesare:

• result is the first element of pieceList, followed by the value of prefix (which is now a copyof connector), followed by the remaining elements of pieceList separated by connector.

• prefix is set to a copy of connector.

So in the case that pieceList is nonempty, the net changes are exactly what the intended function forthe joiner function specifies.

Note that primes #1 and #3 have no intended functions. In these cases, the code itself serves as the in-tended function. This is acceptable only when the meaning is quite clear. For example, the intendedfunction for prime #1 would be:

#-- 1# [ prefix := an empty string# result := an empty string ]prefix = ''result = ''

Assuming that the reader knows that '' is the empty string in Python, the comment becomes prettymuch redundant.

6.4.The while loop ruleDesigning with while loops is more involved than the other three proof rules. Such loops are treacherousbecause they can become infinite loops. Hardly ever is that the desired functioning of your program.

The general form:

# [ A ]while C:

23Python by contract: a zero-defect approachZoological Data Processing

# [ X ]...

There are three parts to the proof rule for while loops.

1. You must prove that the loop terminates.

2. When the condition C is false, you must prove that doing nothing accomplishes the intended functionA.

3. When the condition C is true, you must prove that doing intended function X, followed by doing in-tended function A, accomplishes the overall intended function A.

Here is an example that implements the same function as the joiner function in Section 6.3, “Thedefinite iteration rule” (p. 22). Note that we use the Python slice operator “[:]” to make a copy of thepieceList argument. If we used pieceList instead of the copy inside the while loop, the overallintended function for joiner() would have to specify that the value of pieceList becomes empty.

def joiner2(connector, pieceList):'''Join the pieces of pieceList with the connector.

[ (connector is a string) and(pieceList is a list of strings) ->

return a string containing the elements of (pieceList)separate by (connector) ]

'''#-- 1# [ prefix := an empty string# result := an empty string# L := a copy of pieceList ]prefix = ''result = ''L = pieceList[:]

#-- 2# [ if L is empty -># I# else -># result := result + prefix + (elements of L separated# by copies of connector)# prefix := connector# L := empty ]while len(L) > 0:

#-- 2 body# [ result := result + prefix + (first element of L)# prefix := connector# L := L minus its first element ]

#-- 2.1# [ result := result + prefix + (first element of L) ]result += prefix + L[0]

#-- 2.2# [ prefix := connector# L := L minus its first element ]

Zoological Data ProcessingPython by contract: a zero-defect approach24

prefix = connectordel L[0]

#-- 3return result

To prove that the while loop implements the correct semantics, we first prove that the loop terminates.Because each pass through the loop deletes one element of L, and because the loop runs until L is empty,it is clear that the loop cannot run forever.

The next step is to prove that the while loop does the right thing when L is empty. This also is quiteclear: if L is empty, then len(L) is not greater than zero, so the loop never executes. This satisfies thespecial identity intended function “I”.

The final step is to prove the true case, when L is not empty.

1. First we execute the intended function for the body of the loop. This accomplishes the following statechanges:

result+prefix+L[0]result

connectorprefix

L[1:]L

2. Then we execute the intended function for the entire while loop. Assuming there is at least one moreelement remaining in L, the new states of the three variables are:

result+prefix+L[0]+connector+(elements of L[1:] separated by con-nector)

result

connectorprefix

emptyL

If L contained only one element initially, the value of result in the above table becomes “res-ult+prefix+L[0]”, which is the desired final state in that case.

Generally you may find yourself preferring definite iterations to indefinite (while) iterations, becausethey are easier to prove correct.

7. Standards for the review of intended functionsA properly constructed intended function must describe exactly everything the code is to do, no more,and no less.

Additionally, there are a number of tests of correctness that you (as the author of the code) and reviewersmust apply to your intended functions.

• Every side effect of the code must be accounted for. If you change the value of a global variable, orsend something over the Internet, you must declare it in your intended function.

• There must be a balance between inputs and outputs between the intended function for a prime, andits breakout into smaller primes. In intended functions, any state item that appears on a right-handside is an input, and any state item that appears on a left-hand side is an output.

This means that if a some state item is referenced in the right-hand side of the overall prime's intendedfunction, there must be a reference to that state item in one of the pieces.

25Python by contract: a zero-defect approachZoological Data Processing

Conversely, if one of the pieces of a prime refers to some state item, then either that state item mustbe entirely local to the larger prime, or it must also be an input to the larger prime.

Also, a state item that appears on the left-hand side of the overall intended function must appear onthe left-hand side of one of the pieces. That is, a net output of the overall prime must appear as anoutput of at least one its components.

• All routes through the code must be covered; that is, consider all the possible cases.

If you don't want to deal with some case, you can formulate a precondition to make it the caller's re-sponsibility to deal with that case.

For example, if you are writing a square root function, you have your choice of whose responsibilityit is to deal with negative arguments. If you make it a precondition that the argument is nonnegative,it's the caller's fault if they pass you a negative argument.

def square_root(x):'''Compute the square root of x.

[ x is a nonnegative number ->return the square root of x ]

'''...

The other alternative is to state what your code will do in that case.

def square_root(x):'''Compute the square root of x.

[ if x is a nonnegative number ->return the square root of x

else -> raise ValueError ]'''...

We'll have more to say about case coverage in Section 8, “Trace tables” (p. 26).

8.Trace tablesConstructing a trace table can be a useful way to self-verify your code. Trace tables are also useful inpeer review.

The idea behind a trace table is to show all the state changes that occur during each possible path througha prime. This is another reason to keep the pieces of your design small: large, complex pieces have morepossible execution paths.

To build a trace table, examine your intended functions and note the set of state items that appear onthe left sides of the “:=”. Each row of the trace table corresponds to one state item. Each column repres-ents a point in time. Each cell of the table, then, shows the value of a state item at a given point in onepath through the prime.

Here is a very simple example. First, the code, a sequence of two primes. We number the primes forconvenience in referring to them in the trace table.

#-- 1# [ ally := integers from 1 to n inclusive# gumCount := gumCount + 1 ]

Zoological Data ProcessingPython by contract: a zero-defect approach26

ally = range(1, n+1)gumCount += 1

#-- 2# [ somey := even elements of ally ]somey = [ k

for k in allyif k%2 == 0 ]

Three state items appear on the left side of “:=” parts: ally, gumCount, and somey. So our trace tablewill have three rows. The table will have two columns: one column for the state after prime [1] and an-other column for the state after prime [2].

Filling in the trace table for this example. is straightforward. We examine the intended function for eachprime. For each state item that changes, we enter the new state in the column for that prime. Describingthe new state is another writing task: you may describe it algebraically, or in prose, or in any mixtureof the two.

Here is the trace table after prime 1.

After [1]State item

ints in [1,n]ally

gumCount+1gumCount

Our description of the new state of ally is more modern math than Python: we use the notation [1,n]to mean a closed interval that uses the endpoints, and we use the term “int” to mean integers, figuringthat anyone who reviews this code had better know Python at least that well. Know your audience:write your state descriptions so that your peer reviewers can understand them, given that they knowthe language and they are going to be looking at the code at the same time.

The new state for gumCount depends on the previous state, which is outside the scope of this code se-quence. The expression “gumCount+1” means, whatever gumCount had in it when we started, it isone larger now.

Moving on to prime [2], we'll need to add a new row to the table for the new state item (somey) thatchanges. Here is the final table.

After [2]After [1]State item

ints in [1,n]ints in [1,n]ally

gumCount+1gumCount+1gumCount

even ints in [1,n]somey

For each of the four basic branch structures, there is a corresponding rule for constructing the tracetable.

• Section 8.1, “Trace table for sequence” (p. 28).

• Section 8.2, “Trace table for alternation” (p. 28).

• Section 8.3, “Trace table for definite iteration” (p. 30).

• Section 8.4, “Trace table for while loops” (p. 31).

27Python by contract: a zero-defect approachZoological Data Processing

8.1.Trace table for sequenceThe construction of the trace table for a sequence of two primes, A followed by B, is straightforward, asdemonstrated in Section 8, “Trace tables” (p. 26).

There is one additional consideration. The second prime, B, must be reachable. If prime A contains oneof the forms described in Section 5.5.1, “Terminal special forms” (p. 15), then clearly prime B can neverexecute. This is a red flag for peer reviewers.

8.2.Trace table for alternationWhen your code uses the alternation branch structure, you will have to do two trace tables, one for eachpath through the code. Here is the general form:

# [ if C -># A# else -># B ]

If a prime contains such an alternation, here is how you construct the trace tables.

1. Construct the trace table that shows the state changes for intended function A. Within this tracetable, you may assume that condition C is true.

Compare the final value of each state item with the desired final state in the overall intendedfunction for the containing prime.

2. Construct the trace table that shows the state changes for intended function B. Within this tracetable, you may assume that condition C is false. Again, compare the final value of each state itemwith the value it should have according to the overall intended function.

Here is a real-world example: a function to find the real roots of a quadratic equation of the form ax2 +bx + c = 0. We perform this calculation in two steps.

1. First we compute the discriminant, d = b2-4ac.

2. If d < 0, there are no real roots.

If d is zero, there is only one root, -b/2a.

Otherwise there are two real roots, (-b-sqrt(d))/2a and (-b+sqrt(d))/2a.

Here is the code. The overall function is a sequence in which the second part is a three-part alternation,so there are three paths through the code.

def quadRoots(a, b, c):'''Find all real roots of the equation a*x**2 + b*x + c = 0.

[ a, b, and c are numbers ->return a list of the real roots of a*x**2 + b*x + c = 0 ]

'''#-- 1d = b*b - 4.0*a*c

#-- 2# [ if d < 0 -># return a new, empty list# else if d==0 ->

Zoological Data ProcessingPython by contract: a zero-defect approach28

# return [-b/2a]# else -># return [(-b+sqrt(d)/(2*a)), (-b-sqrt(d)/(2*a))] ]if d < 0:

return []elif d == 0:

return [-b/(2.0*a)]else:

rootD = d**0.5twoA = 2.0*areturn [(-b-rootD)/twoA, (-b+rootD)/twoA]

Construction of the trace table starts with prime [1].

After [1]State item

b**2 - 4.0*a*cd

Of the three paths through the code, we'll start with the case where d is negative.

After [2]After [1]State item

sqrt(b)-4*a*csqrt(b)-4*a*cd

[]Result

When there are no real roots, the desired function result is an empty list, so this trace table checks.

Here's the second case, where the discriminant is zero.

After [2]After [1]State item

00d

[-b/(2.0*a)]Result

Again, we check the final states in the above table against the overall intended function. Because in thiscase the discrimant is zero, a list containing -b/(2*a) is the correct value.

Here's the trace table for the case where the discriminant d is positive.

After [2]After [1]State item

sqrt(b)-4*a*csqrt(b)-4*a*cd

[ (-b+(sqrt(b)-4*a*c))/(2*a), (-b-(sqrt(b)-4*a*c))/(2*a) ]

Result

This being the general case, the overall intended function requires that the return value be a list containingthe two real roots: the last entry in the “Result” column matches this.

The fun begins when you have two or more alternations in a row. Suppose your code is structured likethis:

#-- 1# [ if C1 -># A1# else -># B1

29Python by contract: a zero-defect approachZoological Data Processing

...

#-- 2# [ if C2 -># A2# else -># B2 ]...

In the above structure, there are four paths through the code: A1 � A2, B1 � A2, A1 � B2, and B1 �B2.

Similarly, if your prime is broken into a sequence of three alternations, there are eight paths throughthe code, so you must construct eight trace tables. Again, by keeping your primes small and simple,you reduce the amount of work required to build trace tables.

8.3.Trace table for definite iterationHere is the general form of a definite iteration.

# [ A ]for e in S:

# [ B ]...

Recalling the proof rule in Section 6.3, “The definite iteration rule” (p. 22), the proof proceeds in twosteps. The first step is to determine whether doing nothing accomplishes intended function A when se-quence S is empty. This is a straightforward application of reasoning.

For the second step, a two-column trace table can help you reason that the code is correct. We can assumethat sequence S is nonempty.

1. The first column shows the value of all the state items affected after intended function B is executedfor the first element of S.

2. The second column shows the value of all state items affected after intended function A is executedfor the remaining elements of S.

For our example, we will use the joiner function defined in Section 6.3, “The definite iterationrule” (p. 22). We'll need to build two trace tables: one for the case where the pieceList argument isan empty sequence, and one for the general case.

Here's the trace table for the empty case.

[3][2][1]State item

''''''prefix

''''''result

''Result

The function result is the empty string, which is the correct result when pieceList is empty.

Now for the general case. Here we can assume that pieceList has at least one element. The firstcolumn shows the state after prime [1]. The second column shows the state after executing the [2 body]intended function with piece set to the first element of pieceList. The third column shows the stateafter executing the [2] intended function on the rest of pieceList.

Zoological Data ProcessingPython by contract: a zero-defect approach30

[2][2 body][1]State item

connectorconnector''prefix

pieceList[0]+connector+(elements of pieceL-ist[1:] separated by connector)

pieceList[0]''result

Our test for correctness is to compare “elements of (pieceList) separated by (connector)”against “pieceList[0]+connector+(elements of pieceList[1:] separated by connect-or)”: they agree.

8.4.Trace table for while loopsHere is the general form of a while construct:

# [ A ]while C:

# [ X ]...

The trace table work you use to verify while loops is quite similar to the method described in Section 8.3,“Trace table for definite iteration” (p. 30). Reviewing the proof rule discussed in Section 6.4, “The whileloop rule” (p. 23), recall there are three parts. There is no need for trace tables in proving termination,or in proving the trivial case where the condition is initially false.

For the third part, where you prove that doing the loop body's intended function X followed by doingC achieves the state changes in C, constructing a trace table can help you reason about the correctnessof the loop.

For example, we'll use the joiner2 function defined in Section 6.4, “The while loop rule” (p. 23).

In this trace table, there are three columns. The first column shows the state after prime [1]. The secondcolumn shows the state after executing intended function [2 body] once. The third column shows thestate after executing [2]. The correctness test is that the final values of the state items must match theoverall intended function for joiner2.

[2][2 body][1]State item

connectorconnector''prefix

pieceList[0]+connector+(ele-ments of pieceList[1:] separ-ated by connector)

pieceList[0]''result

emptycopy of pieceList[1:]copy ofpieceList

L

The correctness test is to compare the returned value, “pieceList[0]+connector+(elements ofpieceList[1:] separated by connector)”, with the desired function value, “elements of(pieceList) separated by (connector)”: they match.

8.5. Insuring case coverageConsider the case of a prime that is a sequence of three alternations:

31Python by contract: a zero-defect approachZoological Data Processing

#-- 1# [ if C1 -> T1# else -> F1 ]...

#-- 2# [ if C2 -> T2# else -> F2 ]...

#-- 3# [ if C3 -> T3# else -> F3 ]...

As discussed in Section 8.2, “Trace table for alternation” (p. 28), in general there are eight differentpaths through this prime. However, in practice, quite often some of the cases contain terminal specialforms like function returns or exceptions, which reduces the number of cases overall. In the above ex-ample, for instance, if F1 stops execution and F2 also stops execution, there are only four paths throughthe code: F1; T1 � F2; T1 � T2 � F3; and T1 � T2 � T3.

In the general case, how can we be sure that we have a trace table for each possible path through thecode? In this section, we describe a way to insure this.

We will start with a rather generic intended function for an entire script that reads some input file andproduces a PDF-format report on that file under control of some command line options.

We assume that this code is connected to a written specification that details all the critical definitions:the command line options and what they do, the format of the input file, and the appearance of theoutput report. In accordance with the principles discussed in Section 3.2, “Design factoring and separationof concerns” (p. 5), these details are not the concern of the main program logic. All we care about atthis level is the overall sequence of the three major steps: processing the command line options, readingthe input file, and writing the output file.

#!/usr/bin/env python# [ if (the command line options are valid) and# (the input file specified by those options is readable and# valid) -># sys.stdout +:= a PDF report on that file using those options# else -># sys.stdout +:= (anything)# sys.stderr +:= an error message ]

The refinement of this overall intended function follows. We assume the use of two classes definedelsewhere: an instance of class Args represents the command line options, and an instance of class Re-port represents the input file and the report to be produced from it.

#-- 1# [ if the command line options are valid -># args := an Args instance representing those options# else -># sys.stderr +:= an error message# stop execution ]args = Args()

Zoological Data ProcessingPython by contract: a zero-defect approach32

#-- 2# [ if the input file specified by (args) is readable and valid -># report := a Report instance representing that file# processed using (args)# else -># sys.stderr +:= an error message# stop execution ]report = Report(args)

#-- 3# [ sys.stdout +:= (report) formatted as a PDF ]report.pdf(sys.stdout)

Two conditions control the paths through this code. We'll call them C0 and C1:

C0: the command line options are validC1: the input file is readable and valid

Here is a standard truth table for the possible combinations of C0 and C1. In this table, “X” means “don'tcare”:

CaseC1C0(A)XF

(B)FT

(C)TT

In case (A), the command line options aren't valid, so we don't care about the state of the input file. Wewant only one state change in this case: an error message appended to sys.stderr.

[1]State item

an error messagesys.stderr

In case (B), the command line options are valid, but there's some problem with the input file.

[2][1]State item

an Args instance representing the command lineargs

error messagesys.stderr

Case (C) is the successful case where we get all the way to prime [3].

[3][2][1]State item

an Args instance representing the command lineargs

Report instance representing input filereport

PDF representing Report in-stance etc.

sys.stdout

Of the lines in the trace table above, only the last one is external to the prime, and it matches the line inthe overall intended function “sys.stdout +:= a PDF report on that file using those options”.

To be sure that you have covered all routes through the code and have a trace table for each route, builda truth table as above, and make sure the truth table covers all the cases. You could even write a program

33Python by contract: a zero-defect approachZoological Data Processing

to verify that your truth table is valid; this project we will leave to the interested reader. The test is thatevery possible combination of the states of the conditions matches exactly one line of the truth table.

9. Additional principles for object-oriented programmingTo verify the correctness of a Python class, we use invariants.

An invariant is a condition that must be true if an instance is to be considered in a valid state. Typicallyyou will declare invariants on attributes of the instance.

For example, let's suppose you are implementing a binary tree data structure. Suppose further that yourprogram needs to know how many nodes are in the tree, and you want to write the class's .__len__()special method to return that count.

It is quite simple to write a recursive algorithm that counts the nodes, but such an approach might betime-consuming for a tree with millions of nodes.

If the performance of the .__len__() method is impacting the whole system's performance, one wayto optimize it is to maintain a count of the nodes as a private attribute of the class. The constructor setsthis count to zero; a method that adds a node to the tree adds one to the count; and a method that deletesa node from the tree subtracts one from the count.

However, for correct functioning, we must be sure that anything that adds or removes nodes also in-creases or decreases the node count so that the current count is also accurate. So we declare that thecount attribute has an invariant: its value is always equal to the number of nodes in the tree.

Here, then, are the rules for class invariants.

1. The constructor must establish the invariant. In our example case, the .__init__() method setsthe node count to zero.

2. For every method in the class, you must prove that, assuming the invariant is true on entry to themethod, it is also true for every path through the method.

In our example, if there is only one method that adds a node, and only one method that removes anode, you need only prove that the add method increments the node count and the remove methoddecrements the node count. It is not necessary to prove this for any method that does not changeeither the count attribute or the number of nodes in the tree.

10.The peer review processWhenever possible, get others to look at your code. They may perceive blind spots and invalid assump-tions that you had when you wrote the code. They can also help you catch simple problems like unbal-anced parentheses.

A good peer review panel should consist of one or a few people who meet these qualifications.

• They understand the Cleanroom process and how intended functions work.

• They are sufficiently competent in the implementation language to determine if the code matches theintended function.

Here is an outline of the peer review process for one prime.

1. Find a suitable meeting place. You will want enough seating and tables so the review panel cancomfortably review every detail of the code without time pressure (avoid running peer reviewsjust before lunch or quitting time!).

Zoological Data ProcessingPython by contract: a zero-defect approach34

2. Make the code available to all the reviewers. A projector is a good tool, provided that everyone canread it. The legibility test is strict: everyone should be able to tell a period from a speck of dirt –computers care about periods and commas.

3. Review the overall intended function and the intended functions of its component pieces for clarity.If it is not clear to each reviewer that the intended function's meaning is completely clear, the authormust rework it and try again later.

4. The author uses the proof rules for the four standard branch constructs to argue that the code iscorrect. During this phase, ignore the actual source code: stick to comparing the prime's overall in-tended function to the intended functions of its components.

Again, the test is very strict: unless ever peer reviewer is convinced that the decomposition of theprime into smaller pieces is correct, the prime is sent back for rework.

5. Provided that the decomposition of the prime into smaller primes is correct, the review panel thenchecks each piece's intended function against the code that implements it. This is why reviewersmust know the language well.

If the code calls any methods or functions that are also part of the overall project, the intendedfunctions for those primes must be available during the review. The panel will check that both theintended functions and the actual calling sequences match their definitions.

What about calls to Python language primitives and modules? Here we must treat these facilitiesas trusted components. Although they are out of our direct control, it is necessary to assume that theunderlying components are correct. This is admittedly a potentially major limitation with themethod, but short of implementing our own languages and libraries, we may not have any otherreasonable choices.

If the review process catches minor errors such as unbalanced parentheses, there is no reason tosend the prime back for rework; the author can correct them on the spot and proceed.

However, once again the test for correctness is quite strict. Unless it is completely obvious to all reviewersthat the code is correct, it is sent back to rework.

If one of the reviewers has a better idea for the general approach, data structures, or algorithms, try toachieve consensus on whether it is worthwhile to abandon the current approach. Consider the real-world constraints (time, budget, floor space, staffing). If there are two or more promising approachesto a design problem, and there are no strong reasons to choose one of them, whoever writes the codegets to pick the approach.

Once a prime passes review, place a comment in the source code specifying the date and time when itwas reviewed, and the full names of all the reviewers. If defects appear later, we can inform the reviewersabout anything missed, so they can improve their reviewing skills.

If a prime is changed after passing review, it must be reviewed again. Furthermore, any other certifiedcomponent that uses the changed prime must also be reviewed again. For smallish projects, this processcan be managed by informal means.

However, for Cleanroom to be effective in a large project, we would really like some software tools thatkeep track of dependencies, and can insure that all affected modules are re-verified. Such software toolswould timestamp the definition of each prime, track the verification state of each prime, and be able tocompute a list of modules to be re-verified.

10.1. Guidelines for reviewers• Be respectful and courteous. Avoid terms like “brain-dead” or “stupid” or “asinine;” engineering

practice is hard enough without involving people's egos and emotions.

35Python by contract: a zero-defect approachZoological Data Processing

A better approach is to use a phrasing like “In step 2, did you mean…” or “Would it be clearer…” or“Maybe we could reduce the number of steps if….”

• Be positive. Emphasize why a change you suggest is better, not why the current code is worse.

• Pay attention. Try to see both the big picture and the tiny details and everything in between.

If the big picture isn't clear, ask the author for more explanation of the context. Most primes can bereviewed without very much reference to outside entities, except when the code calls or uses outsidedefinitions, functions or methods.

• If the code uses a language feature that you don't recognize, you are within your rights to ask for aninformal explanation of the feature, or a quick visit to the Web page where the feature is defined orexplained.

10.2. Peer review guidelines for the author• Be thick-skinned. Your reviewers may be discourteous, but don't take it personally. Remember that

this process is about getting the code right, not about stroking your ego. Sometimes, especially withstartup companies or death march projects18, there just isn't time for the social niceties.

Conversely, if you're hiring people for a startup or other critical situation, don't hire prima donnas,no matter how brilliant they are. Look for “very good and a steady producer with a solid track record.”

• Be courteous to your review panel whether they deserve it or not.

11.TestingProgram testing can be used to show the presence of bugs, but never to show their ab-sence!

— Edsger J. Dijkstra, Comm. ACM 15(10), Oct. 1972: pp. 859–866.

If you aren't convinced the code is correct before you start testing, what could possiblyconvince you?

—Donald E. Grimes (pers. comm.)

Once the entire design has been peer-reviewed (or self-reviewed with trace tables if peers are unavailable),testing proceeds by these rules.

• No compilation or testing of any kind during design. No unit testing.

• Build a suite of regression tests that test as many overall features of the system as you can think of.The idea is to use your regression suite for initial testing; later, as the design is modified for additionalfeatures, you can use test to make sure that each new change did not break any existing features.

A good regression suite will print, at the very end, some kind of summary such as “Number of testsfailing: 2”. Thus, even if the suite produces a lot of output, you can look at the end of the report tosee if any defects were detected.

• Build a list of all defects discovered since the first compilation. Break this list up into two or threecategories:

1. Pure logic defects: you did the wrong thing. Example: a “<=” where there should have been a “<”.

2. Syntax defects: anything flagged by the language processor.

18 http://en.wikipedia.org/wiki/Death_march_(project_management)

Zoological Data ProcessingPython by contract: a zero-defect approach36

3. For languages such as Python that do their type-checking at execution time, there is a third category:defects that would have been detected in a more strongly typed language.

An example of such a list is the one for the author's nomcompile19 project.

We can never know how many defects remain in a system. However, there is one empirical test thatcan increase (or decrease) our confidence. If the frequency of defect discovery decreases sharply afterthe initial testing phase, it suggests that the defect discovery rate is asymptotic to the x-axis. Conversely,if the rate of discovery of defects decreases in a more linear fashion, it suggests that there are manymore defects to be discovered before the defect count approaches zero.

11.1.Treasure your mistakes, don't bury them!Good judgment comes with experience, but experience comes from bad judgment.

—Attributed to Mulla Nasreddin

When you find a defect, there are two responses. The first is to fix the defect.

If you aspire to mastery, your second response is to look at why the defect occurred. Many arise fromsheer carelessness. However, it is worthwhile to give some thought to how you might modify yourapproach to prevent this sort of defect from happening again.

In particular, if you use a particular trick or pattern and find a defect in it, make sure you look at all theother places you have used that same trick or pattern. You may find that they are broken as well.

19 http://www.nmt.edu/~shipman/xnomo/nomcompile/defects.html

37Python by contract: a zero-defect approachZoological Data Processing

Zoological Data ProcessingPython by contract: a zero-defect approach38


Recommended