+ All Categories
Home > Documents > Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional...

Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional...

Date post: 06-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
25
Iteratees Oleg Kiselyov [email protected] Abstract. Iteratee IO is a style of incremental input processing with precise resource control. The style encourages building input processors from a user-extensible set of primitives by chaining, layering, pairing and other modes of compositions. The programmer is still able, where needed, to precisely control look-ahead, the allocation of buffers, file descriptors and other resources. The style is especially suitable for processing of com- munication streams, large amount of data, and data undergone several levels of encoding such as pickling, compression, chunking, framing. It has been used for programming high-performance (HTTP) servers and web frameworks, in computational linguistics and financial trading. We exposit programming with iteratees, contrasting them with Lazy IO and the Handle-based, stdio-like IO. We relate them to online parser com- binators. We introduce a simple implementation as free monads, which lets us formally reason with iteratees. As an example, we validate several equational laws and use them to optimize iteratee programs. The simple implementation helps understand existing implementations of iteratees and derive new ones. “We should have some ways of coupling programs like garden hose – screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also.” M. D. McIlroy. October 11, 1964. 1 Introduction Iteratee IO is a style of compositional incremental input processing with precise resource control. As such it is conducive to handling large amounts of data and programming of long-running servers. Iteratee IO has been proven in practice: it is employed in several commercially deployed web frameworks (e.g., [2]) has been used in financial trading applications [9] and natural language processing. Good performance of iteratee IO is seen from several benchmarks, web-related (in- cluded SNAP) and others [6, 10]. Performance, compositionality and high level of abstraction attracted attention. As of May 2011, there are three main imple- mentations of Iteratee IO on Hackage: 1 iteratee-0.8.3, enumerator-0.4.10 and the extensive iterIO, as well as several variations. Iteratee IO lends itself to efficient, online parser combinator libraries similar to [1, 13]. First introduced to Haskell [5], Iteratee IO has since been ported to F#, 2 Scala and other languages. 1 http://hackage.haskell.org 2 https://github.com/fsharp/fsharpx
Transcript
Page 1: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

Iteratees

Oleg Kiselyov

[email protected]

Abstract. Iteratee IO is a style of incremental input processing withprecise resource control. The style encourages building input processorsfrom a user-extensible set of primitives by chaining, layering, pairing andother modes of compositions. The programmer is still able, where needed,to precisely control look-ahead, the allocation of buffers, file descriptorsand other resources. The style is especially suitable for processing of com-munication streams, large amount of data, and data undergone severallevels of encoding such as pickling, compression, chunking, framing. Ithas been used for programming high-performance (HTTP) servers andweb frameworks, in computational linguistics and financial trading.We exposit programming with iteratees, contrasting them with Lazy IOand the Handle-based, stdio-like IO. We relate them to online parser com-binators. We introduce a simple implementation as free monads, whichlets us formally reason with iteratees. As an example, we validate severalequational laws and use them to optimize iteratee programs. The simpleimplementation helps understand existing implementations of iterateesand derive new ones.

“We should have some ways of coupling programs like gardenhose – screw in another segment when it becomes necessaryto massage data in another way. This is the way of IO also.”M. D. McIlroy. October 11, 1964.

1 Introduction

Iteratee IO is a style of compositional incremental input processing with preciseresource control. As such it is conducive to handling large amounts of data andprogramming of long-running servers. Iteratee IO has been proven in practice: itis employed in several commercially deployed web frameworks (e.g., [2]) has beenused in financial trading applications [9] and natural language processing. Goodperformance of iteratee IO is seen from several benchmarks, web-related (in-cluded SNAP) and others [6, 10]. Performance, compositionality and high levelof abstraction attracted attention. As of May 2011, there are three main imple-mentations of Iteratee IO on Hackage:1 iteratee-0.8.3, enumerator-0.4.10and the extensive iterIO, as well as several variations. Iteratee IO lends itself toefficient, online parser combinator libraries similar to [1, 13]. First introduced toHaskell [5], Iteratee IO has since been ported to F#,2 Scala and other languages.

1 http://hackage.haskell.org2 https://github.com/fsharp/fsharpx

Page 2: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

2

The goal of Iteratee IO is to overcome drawbacks of Lazy and Handle-basedIO and combine their strong features. Lazy IO, an instance of memory-mappedIO, is an elegant abstraction that effectively eliminates IO, giving program-mers an impression that the entire file is available in memory and may be ac-cessed as an ordinary string. There is no longer need to explicitly read, let aloneworry about buffer allocations and underflows. The abstraction of a file as anin-memory string comes without guilt: behind the scene, the operating or run-time systems access the file efficiently, reading it on demand and sharing the readdata. Lazy IO facilitates compositional input processors like parser combinators.

Lazy IO is so irresistible that it was added to Haskell despite the reservationsof its inventors and the failure to develop good techniques for reasoning aboutits correctness [7, Sec 10.5]. However benign, reading is an observable side-effect,whose occurrence may have to be correlated with other side effects. Such cor-relations are crucial when performing IO over communication pipes, which istypical of web servers.3 As Launchbury and Peyton Jones feared, Lazy IO in-deed “gives rise to a very subtle class of programming errors”. We have seendeadlocks; mishandling of IO errors; running out of file descriptors and similarscarce resources; unpredictable, volatile and sometimes unbearably excessive useof memory. We illustrate the splendors and miseries of Lazy IO in §2.

Handle-based IO is the stdio-style IO familiar from C. It is ‘strict’: IO op-erations must be explicitly requested. Therefore, it affords precise control ofresources and the detection of all IO errors. However, it is very low-level: everyread operation is painfully explicit. Handle-based IO hides the buffering, provid-ing the abstraction for a stream of characters. The abstraction does not extendto a stream of other data types, and does not support stream embeddings. Theprogrammer must be constantly aware of the current file position, which makesit tortuous to process layered streams or combine parsers to process the samestream in parallel. §2 illustrates these problems as well.

One wishes for a set of abstractions that free programmers from thinkingabout IO, and yet provide facilities to control buffering, look ahead, locking, etc.at those moments where it matters. One wishes to derive these abstractions andoptimize them by algebraic transformations based on equational laws.

Iteratee IO is an approach to this ideal, amalgamating ideas going back tothe IO of Haskell 1.3; Kernel-Prolog’s iterator objects [15] uniformly representingfiles, in-memory collections and processes; the resumption monad, surveyed in[3], and generators of Alphard [12] (which now live as Java streams and genera-tors in modern languages.) Like Handle IO, Iteratee IO gives error handling, theprecise control over important operations and resource allocations, incrementalprocessing and high performance. Like Lazy IO, it gives high-level abstractions,encapsulating input processing layers that can be nested and composed sequen-tially or in parallel. Iteratee IO turns out to offer reasoning principles letting usderive implementations and optimize them.

Although all implementations of Iteratee IO follow the same principles, thereare many variations based on historical accidents, handling of buffering, levels

3 POSIX memory-mapped IO, mmap, does not work with communication pipes either.

Page 3: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

3

of generality. There are even more tutorials, varying in generality and points ofview [6, 8, 16–18]. The desire for a standpoint to grasp the idea of iteratees, toassess and derive their variations and to reason with them is the motivation forthe present paper.

The contributions and the structure of the paper We argue that theessence of Iteratee IO is captured by two inter-related views: stream processingnetwork and incremental online parser combinators. Based on the final co-algebramodel of stream processors, for the first time we formulate algebraic laws, whichlet us derive and simplify iteratee parser combinators.§2 introduces Iteratee IO, by using an Iteratee library to write a progression

of examples abstracted from web server programming and computational linguis-tics. We use the examples to contrast Iteratee IO with Lazy and Handle-basedIO and to give them informal semantics.§3 defines the semantics formally, as an interpretation of the data type denot-

ing iteratees. The semantics lets us view iteratees as parsers. The rich algebraicstructure of the iteratee data type – final co-algebra and free monad – givesrise to algebraic laws, which let us build and reason about iteratee programscompositionally. Appendix B of the full paper4 details optimizing iteratee parsercombinators using the equational laws.

Appendix A proves the equational laws in a more general setting of effectfuliteratees – in which input processing is accompanied by effects in some monad.Buffering and look-ahead are two particular examples of such an effect. Thisinsight clarifies the implementation of buffering in iteratee libraries – which sofar has been the most confusing feature.

More material about iteratees, including demonstrations, tutorials and ref-erences to iteratee libraries are available online at http://okmij.org/ftp/

Streams.html. The annotated source code for all the examples in the papercan be found in http://okmij.org/ftp/Haskell/Iteratee/, which is the baseURL for all code files referenced in this paper.

2 Programming with Iteratees

This section introduces programming with iteratees on a series of progressivelymore complex examples. We stress compositionality – assembling input proces-sors from previously written or library components. We appeal to the intuitionsof more familiar Lazy IO and Handle IO when explaining iteratees. Therefore,the examples are also written in Lazy IO, and, when feasible, Handle IO. Thecontrast lets us see the advantages of Lazy and Handle IO that Iteratee IO in-herits, and the drawbacks it is designed to overcome. (In particular, we shall seeLazy IO’s unexpected, huge memory consumption and wasting sparse resourceslike file descriptors.)

The examples revolve around reading potentially very large text and count-ing specific words and whitespace. The final example, abstracted from interactive

4 http://okmij.org/ftp/Haskell/Iteratee/describe.pdf

Page 4: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

4

systems, tests orchestration: reading a communication pipe up to a terminatorbut not a byte further. The complete code for all examples – with tests andthe extra details left out of the paper – is available online (IterDemo1.hs). Thecode uses the IterateeM library available from the same site. Figure 1 liststhe interface of the library fragment used in this section, pointing out relatedfunctions from the Haskell standard library. A different series of illustrative ex-amples, counting lines and words and searching for the first or all occurrencesof a word – implementing wc and grep – are given in IterDemo.hs. The latterset of examples illustrates error handling and the encapsulation of state.

type Iteratee el m a −− a processor of the stream of els−− in a monad m yielding the result of type a

instance Monad m ⇒Monad (Iteratee el m)instance MonadTrans (Iteratee el )

getchar :: Monad m ⇒ Iteratee el m (Maybe el) −− cf. IO.getChar, List . headcount i :: Monad m ⇒ Iteratee el m Int −− cf. List . length

run :: Monad m ⇒ Iteratee el m a → m a −− extract Iteratee ’ s result

−− A producer of the stream of els in a monad mtype Enumerator el m a = Iteratee el m a → m (Iteratee el m a)enum file :: FilePath → Enumerator Char IO a −− Enumerator of a file

−− A transformer of the stream of elo to the stream of eli−− (a producer of the stream eli and a consumer of the stream elo )type Enumeratee elo eli m a =

Iteratee eli m a → Iteratee elo m (Iteratee eli m a)

en filter :: Monad m ⇒ (el → Bool) → Enumeratee el el m atake :: Monad m ⇒ Int → Enumeratee el el m a −− cf. List . takeenum words :: Monad m ⇒ Enumeratee Char String m a −− cf. List.words

−− Kleisli (monadic function) composition: composing enumerators(≫ ) :: Monad m ⇒ (a → m b) → (b → m c) → (a → m c)

−− Connecting producers with transformers (cf . (=� ))infixr 1 . | −− right−associative(. | ) :: Monad m ⇒

(Iteratee el m a → w) → Iteratee el m (Iteratee el ’ m a) → w

−− Parallel composition of iteratees (cf . List . zip )en pair :: Monad m ⇒

Iteratee el m a → Iteratee el m b → Iteratee el m (a,b)

Fig. 1. The interface of the IterateeM library fragment

We start lightly, with counting whitespace characters. The Lazy IO codepattern-matches on the ordinary string (cf. more elegant code later):

Page 5: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

5

countWS lazy :: String → IntcountWS lazy ”” = 0countWS lazy (c:str ) | isSpace c = 1 + countWS lazy strcountWS lazy ( :str ) = countWS lazy str

There are no appearances of any IO operations, which is the main appeal ofLazy IO. The IO is banished to the highest-level code, which opens a file and“reads it all” into a string, to hand out to countWS lazy.

run countWSL fname = readFile fname �= print ◦ countWS lazy

The run-time system reads the file on demand, so that the counting runs inconstant, and small memory.

The Handle IO code is in the style of stdio, familiar to C programmers. It,too, “pattern-matches” on the input stream.

countWS handle :: Handle → IO IntcountWS handle h = loop 0whereloop n = try (hGetChar h) �= check ncheck n (Right c) = loop (if isSpace c then n+ 1 else n)check n (Left e) | Just ioe ← fromException e,

isEOFError ioe = return ncheck (Left e) = throw e

run countWSH fname =bracket (openFile fname ReadMode) hClose $ \h →countWS handle h �= print

It, too, runs in small and constant memory. Error handling stands out. We nowdifferentiate EOF (end-of-file) from other IO errors, which is impossible withLazy IO. Also unlike Lazy IO, the code is explicit about closing the file, ensuringthat the file be closed (IO errors or not) before run countWSH returns.

The Iteratee IO code below should look quite similar to the earlier exam-ples. The intuition of pattern-matching on the stream still applies; the stream isimplicit however. Since there is no explicit ‘handle’, errors like reading from analready closed handle become impossible.

countWS iter :: Monad m ⇒ Iteratee Char m IntcountWS iter = loop 0 −− tail−recursivewhere loop n = getchar �= check n

check n Nothing = return ncheck n (Just c) = loop (if isSpace c then n+ 1 else n)

We have written an iteratee: it reads the stream of characters and produces anInt. Polymorphism over the monad m tells that the iteratee (like countWS lazy)is pure. The library iteratee getchar5 (see Figure 1) is quite like try (hGetChar h)

5 Although some Iteratee libraries indeed provide something like getchar, we implementit ourselves, in IterDemo1.hs. §3 explains the idea of the implementation and givesa simpler version, called oneL in Figure 2.

Page 6: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

6

in the Handle IO code: getchar reads a character from the stream and returns it;the result Nothing signifies EOF. The Iteratee library takes care of detecting andpropagating other IO errors. Iteratee el m is a monad, letting us write iterateecode in the standard monadic style (again compare with the Handle IO codeabove).

This is how we count whitespace in a file:

run countWSI fname = print =� run =� enum file fname countWS iter

Here, enum file is an enumerator : it enumerates a file, passing file data to aniteratee. The precise enumerator/iteratee interaction is described in §3. A userof an Iteratee library should find sufficient the stdin intuition: the iteratee getcharis quite like Prelude.getChar, which reads a character from the buffer containingthe current chunk of the standard input. If the buffer is empty, OS is requestedto fill it in. Our enum file opens the file on the ‘standard stream’ and plays theOS for the iteratee, reading a chunk when the iteratee asks to fill its buffer.When the file is exhausted, or when the iteratee stops asking for more data,the iteratee, encapsulating the resulting state, is returned. The resulting iterateecannot hold any references to the file: in fact, an iteratee cannot know if its ‘stdin’data come from a file or other source. Therefore, enum file’s closing the file uponreturn is safe.6 The function run tells the iteratee that the stream is finishedand extracts its result, the integer counter in our case. (App. C gives another,plumbing intuition for Iteratee IO.) Like Lazy IO, the file is read incrementallyand on demand. Like Handle IO, the file be closed when enum file returns, IOerrors or not. Explicit bracketing is not needed. Like Handle IO (and unlike LazyIO), iteratees support precise error handling and accounting of sparse resourceslike file descriptors. Unlike Handle IO, the boring details are hidden away.

Lazy IO permits a far more elegant solution: a one-liner, using the standardPrelude functions on lists:

countWS’ lazy :: String → IntcountWS’ lazy = length ◦ filter isSpace

We filter out the characters other than whitespace, and count the remainder.As expected for pure Haskell, filtering and counting is done incrementally andlazily. The intermediate list is never fully constructed. Iteratee IO matches thealgorithm and the elegance:

countWS’ iter :: Monad m ⇒ Iteratee Char m IntcountWS’ iter = id . | (en filter isSpace) count i

We use three new library primitives, Figure 1: the iteratee count i, like length, re-turns the length of the stream. The enumeratee en filter is a stream transformer.The type of Enumeratee elo eli m a almost fits the pattern of Enumerator eli m a:indeed, the enumeratee is an enumerator for the inner stream (of eli-type el-ements), taking data from the outer stream. That is, Enumeratee elo eli m aconverts a stream of elos into a stream of elis. The conversion is not nec-essarily in lock-step, as is the case for en filter: although the outer and the

6 In that respect, enum file is similar to Scheme’s with-input-from-file.

Page 7: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

7

inner streams have elements of the same type, the inner streams has gener-ally fewer elements. Although conceptually ‘stream’ is an (infinitely) long se-quence of elements, at any given time only a single, small chunk of the streamis present in memory. Our en filter requests a chunk from the outer streamand creates a filtered chunk, to pass to an iteratee. All stream conversion isstrictly incremental. There is not even a chance of producing a large interme-diate data structure, and no need to trust laziness or GHC fusion rules. Thecombinator (. | ), akin to run, ‘runs’ the enumeratee, that is, terminates theinner stream and extracts the result of the inner iteratee, passing it to the con-sumer, the left argument of (. | ). (There are cases, not described in the paper,where the inner stream should be left unterminated so it can be passed to an-other enumeratee: e.g., processing of HTTP chunk-encoded streams.) To countwhitespace in a file, we write enum file fname countWS’ iter. The equational lawf (g . | h) ≡ (f ◦ g) . | h gives enum file fname . | en filter isSpace count i ,which resembles the Unix pipeline.

We modify the example to count the occurrences of the word “the” (assumingthe input is text with words of bounded size). Lazy IO code is most straightfor-ward, relying on Prelude.words to parse the input string into a list of words. Wefilter out words other than “the” and count the remainder. Thanks to Haskelllaziness, the whole operation runs in constant space.

countTHE lazy :: String → IntcountTHE lazy = length ◦ filter (== ”the”) ◦ words

Handle IO code for this example is complex. Not only should we searchfor “the”, we also have to make sure the character before and after (if exist)is whitespace. On the top of it, we have to deal with errors and EOF. Thesimplest solution is to explicitly write the Finite State Machine recognizer, seeIterDemo1.hs for details. The result is too big to put in the paper, reminding usthat Handle IO is really low level. An abstraction is direly needed, for example,in a form of a lexer generator – or Iteratee IO.

Here is the Iteratee IO code (Recall, (. | ) is right-associative)

countTHE iter :: Monad m ⇒ Iteratee Char m IntcountTHE iter = id . | enum words . | en filter (== ”the”) count i

It is quite like Lazy IO, converting a character stream to a word stream to thefiltered stream, which is then counted.

Let us extend the example so to count the word “the” within a sequence offiles, as if they are concatenated. We shall count “the” even if it is split betweentwo files. The Lazy IO code re-uses the previously written counting functioncountTHE lazy, which now receives a string that is the concatenation of all files’contents.

run manyTHEL fnames =mapM readFile fnames �= print ◦ countTHE lazy ◦ concat

The code is elegant; the processing is incremental, reading only one file at atime. Alas, we have to open all files first! The action readFile, which opens a file

Page 8: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

8

and prepares it for lazy reading, is performed in sequence on all files – prior tocounting. Since counting is pure, we cannot execute IO actions like readFile inthe counting code. Therefore, we need as many file descriptors as there are filesin the fnames list. If the list is long, we may run out of file descriptors. Ideally,however, we only need one file descriptor, opening and closing it as we go. Weget the first intimation of Lazy IO resource mis-management – with no interfaceto correct.

Handle IO code to count in multiple files is even more complex than thesingle-file counter.We do not show the code for the lack of space.

The Iteratee IO code again looks quite like Lazy IO code, re-using the pre-viously written iteratee countTHE iter. Kleisli (monad functional composition)(≫ ) builds an enumerator from two others, effectively sending to an iterateefirst the chunks of the first stream and then the chunks of the second. In short,composing enumerators concatenates their sources. We elaborate on that prop-erty and state it formally, in §3.

run manyTHEI fnames = print =� run =�foldr1 (≫ ) (map enum file fnames) countTHE iter

(one can use the regular foldr keeping in mind that the unit of (≫ ) is return).Unlike Lazy IO, only a single file descriptor is used during the whole counting;only one file is open at any given time.

As a test of compositionality, we combine the two counting operations andcount “the” and whitespace, together. Lazy IO code is elegantly straightforward.

run countPairL fname = dostr ← readFile fnameprint (countWS lazy str, countTHE lazy str)

Here we run into one of Lazy IO pitfalls: the counting is no longer incremental.The whole file is loaded in memory. For applications processing large files or longstreams, Lazy IO is too unreliable for use in production.

The Iteratee code also re-uses previously written counters. It pairs them,relying on the parallel composition of iteratees en pair.

run countPairI fname =print =� run =� enum file fname (countWS iter ‘en pair ‘ countTHE iter)

One may think of en pair as ‘splitting’ (or duplicating) the stream. In realityen pair does no copying or buffering: it receives a chunk and passes it to its twoargument iteratees. If both iteratees want more data, a new chunk is requested.Unlike Lazy IO, the processing remains incremental and in constant memory:As we read a block from file, we send the block to two iteratees.

Our final example demonstrates early, prior to EOF, termination. We modifythe previous “the” and the whitespace counter to count only within the prefixof the stream of the size at most N. This example is abstracted from readingHTTP request content with the explicitly specified Content-Length. We shouldnot attempt to read even a single byte after N since a web client expects thereply first, before it sends the next request. If we attempt to read ahead after

Page 9: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

9

N bytes, the deadlock ensues. The lazy IO code uses the Prelude.take to lazilyobtain the prefix of the file content, which is processed as before.

run ntermL n fname = dostr0 ← readFile fnamelet str = Prelude.take n str0print (countWS lazy str, countTHE lazy str)

As in the previous example, this counting does not run in constant memory.There are bigger problems. First, since we generally stop reading before EOF,the run-time system will not close the file descriptor. It will be closed whenthe corresponding finalizer is run, which may happen very late. Leaking of filedescriptors puts us in danger of running out of them, which indeed happens inpractice when using Lazy IO with programs that process lots of files. Most seriousis the real danger of a deadlock. The run-time system may speculatively read-ahead, at any time and for any reason. The programmer has no way whatsoeverto control this read-ahead or even be informed about it. Deadlock does routinelyhappen in practice, when using lazy IO for interactive services.7

Lazy IO was designed to give the impression that IO is not even happen-ing. When dealing with communication pipes and request-response servers, evenreading is an observable effect. The precise control of reading actions is crucial.Lazy IO becomes a wrong abstraction.

The Iteratee IO code, like the earlier Lazy IO code, differs from the previousrun countPairI in one change, take – from IterateeM rather than Prelude.

run ntermI n fname =print =� run =� enum file fname . |IterateeM. take n (countWS iter ‘en pair ‘ countTHE iter)

Like en filter, the enumeratee take substreams its outer stream, namely, takes theprefix of the size at most n. As soon as take n gets its n elements, it stops askingfor more data, prompting its enumerator, enum file, to close the file. The Iterateecode is just as concise as the Lazy IO code; both are quite alike. Since IO is nowdone strictly, the iteratee code gives full control over file opening, closing, andreading. (IterDemo1.hs has another early termination example, reading the fileup to the first occurrence of a given string.)

We have seen that both Lazy IO and Iteratee IO allow assembling of thewhole program from independent building blocks. Both IO styles permit theincremental processing, reading file data on demand. Because Iteratee IO is notlazy, the Iteratee library can ensure timely deallocation of resources, precise IOerror handling, precise control of reading actions. Iteratee IO, unlike Lazy IO,guarantees the incremental processing.

3 Enumerators and the semantics of iteratees

This section outlines the conceptual design of iteratees, viewing iteratees andenumerators as communicating sequential processes. Iteratee processes are mod-

7 http://www.haskell.org/pipermail/haskell-cafe/2008-August/046532.html

Page 10: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

10

eled as a data type; enumerators become interpreters, thus defining the seman-tics of iteratees as parsers of an enumerator’s source. We show the compositionalconstruction of iteratee-parsers and elucidate the algebraic laws that help designiteratee parser combinators and simplify iteratee programs. The accompanyingcode with derivations and several examples is available at IterDeriv.hs.

Our running example is reading lines from the standard input until the emptyline, returning them in a list. The example is part of the common task of readingHTTP or e-mail headers. A line is a maximal sequence of non-newline characters.First, we write the example in the familiar C-style, with getChar – or its non-exceptional version getchar0, which, like the one in the C standard library, returnsthe current character or EOF.

type LChar = Maybe Char −− lifted charactergetchar0 :: IO LChar

The function to read one line is later used to read all lines up to the empty line:

getline0 :: IO Stringgetline0 = loop ””where loop acc = getchar0 �= check acc

check acc (Just c) | c 6= ’\n’ = loop (c: acc)check acc = return (reverse acc)

getlines0 :: IO [ String ]getlines0 = loop []where loop acc = getline0 �= check acc

check acc ”” = return (reverse acc)check acc l = loop (l : acc)

We may view getline0 and getlines0 as processes receiving lifted characters ona dedicated channel stdin and terminating with a value (a line or a list of lines).The simplest model represents such processes as a data type with a variant foreach process operation – finished or inputting a character (see [4] for a goodexplanation of such modeling). We will call these processes iteratees8.

data I a = Done a| GetC (LChar → I a)

Here is the data type model of the line reader

getline :: I Stringgetline = loop ””where loop acc = GetC (check acc)

check acc (Just c) | c 6= ’\n’ = loop (c: acc)check acc = Done (reverse acc)

which looks almost identical to getline0. However, Done and GetC merely rep-resent process operations. Terms like getlines hence do not “do” anything; they

8 For the origin of the name, see http://okmij.org/ftp/Scheme/

enumerators-callcc.html.

Page 11: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

11

have to be interpreted so to run the corresponding process. Our interpreter usesa given finite string as the source of characters to send to the process.

eval :: String → I a → aeval ”” (GetC k) = eval ”” $ k Nothingeval (c: t) (GetC k) = eval t $ k (Just c)eval str (Done x) = x

When the string is exhausted, the process is sent EOF (that is, Nothing). Hope-fully the process then finishes and we can return the produced result. One mayview eval str i as a Unix pipeline cat str | i.

The data type I a has a rich structure. I a is a final co-algebra of the functorT (X) = A+XLChar – which helps us prove algebraic laws of iteratees below. Thedata type represents finitely branching trees with finite and infinite branches.9

The interpreter eval s traces the path s in the tree. Last but not least, I a is afree monad (for good explanation and references, see [14]):

instance Monad I wherereturn = Done

Done x �= f = f xGetC k �= f = GetC (k ≫ f)

(see Figure 1 for Kleisli composition (≫ )). Therefore, we may build iterateeprocesses by chaining simpler ones with the monadic (�= ) operation. For ex-ample, we chain getlines to build the process model of the reader of lines; theresult looks identically to getlines0:

getlines :: I [ String ]getlines = loop []where loop acc = getline �= check acc

check acc ”” = return (reverse acc)check acc l = loop (l : acc)

The next step is simple but momentous: we factor out eval into two inter-preters, separating out the sending of data from the sending of EOF. The firstfactor feeds the characters, until there are no more data or the iteratee processis finished. The resulting iteratee is returned. The second interpreter tells theiteratee that there are no more data, and extracts its result.

en str :: String → I a → I aen str ”” i = ien str (c: t) (GetC k) = en str t $ k (Just c)en str (Done x) = Done x

run :: I a → arun (GetC k) = run $ k Nothing

9 Recall that Haskell data types are co-inductive, letting us construct infinite terms,such as getline.

Page 12: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

12

run (Done x) = x

Clearly, eval str ≡ run ◦ en str str. We call en str enumerator, and its argumentstr a source. The function run is analogous to eof of the UUparsing library [13].Enumerators and run of the IterateeM library, Figure 1, look more general thanen str and run above since IterateeM lets iteratees and enumerators perform sideeffects in a monad m, when obtaining and processing input data. See App. A forsuch a generalization.

The point of factoring out eval is obtaining interpreters that can be function-ally composed. Furthermore,

Equational Law 1 (Composition)

en str (s1 ++ s2) ≡ en str s2 ◦ en str s1

In words: the composition of enumerators corresponds to the concatenation oftheir sources. The law holds for more general effectful enumerators and iteratees,see App. A for the formulation and proof. If we overlook the last clause of en str’sdefinition, en str is an instance of foldr (which inspired the names ‘iteratee’ and‘enumerator’). The law of composition therefore is hardly surprising.

Another law of en str illustrates the compositionality of the iteratee semanticsand lets us view iteratees as parsers and build parser combinator libraries. Aniteratee is a value of the data type I a, which per se has “no semantics”. Theinterpreter en str gives a semantics to iteratees, as a function from finite stringsto either Done v or GetC k. When the result of en str s i is Done v, we say thatthe iteratee i has recognized the string s, parsing it to the value v. It follows fromthe law of composition that if i has recognized the string s, i recognizes s ++ s2for any s2. We say that i properly recognizes the string s if i recognizes s but notany proper prefix of s.

Equational Law 2 (Chaining) If iteratee i properly recognizes s1, then

en str (s1 ++ s2) (i �= f) ≡ en str s1 i �= en str s2 ◦ f

The proof of the general version of this law is given in App. A.The law of chaining tells us how to build a recognizer for a string from the

recognizers of the string’s prefix and suffix, thus defining the meaning of thesequential iteratee composition (�= ). To represent choice we need a parallelcomposition: the left-biased alternation combinator.

(C ) :: I a → I a → I aDone x C = Done xC Done x = Done x

GetC k1 C GetC k2 = GetC (\c → k1 c C k2 c)

The parser i1 C i2 recognizes whatever the first finishing parser recognizes; inthe event of a tie, the result of i1 is preferred. Whereas the left and right unitof (�= ) is Done, the left and right unit of C is failure, Figure 2, which keepsrequesting input even after receiving EOF. It is a “diverging iteratee”: run failure

Page 13: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

13

diverges. (In IterateeM library, run reports an error if an iteratee asks for dataafter receiving EOF.)

Equational Law 3 (Zero) The failure is the left zero of bind

failure �= f ≡ failure

Equational Law 4 (Right distributivity)

i �= \x → (k1 x C k2 x) ≡ (i �= k1) C (i �= k2)

The law is similar to the law L10 in the parallel parser combinator library [1].Since C commits to whatever a parser recognizes first, the left distributivitydoes not hold:

(i1 C i2 ) �= k 6≡ i1 �= k C i2 �= k

Primitive parsers

failure :: I a −− The parser of nothingfailure = GetC (const failure )

empty :: a → I a −− The parser of the empty stringempty v = Done v

oneL :: I LChar −− The parser of one lifted characteroneL = GetC Done

Parser combinators: chaining and alteration

(�= ) :: I a → (a → I b) → I b(C ) :: I a → I a → I a

Derived parsers

−− The parser of a one−character stringone :: I Charone = oneL �= maybe failure return

−− The parser of a character satisfying the given predicatepSat :: (LChar → Bool) → I LCharpSat pred = oneL �= \c → if pred c then return c else failure

Fig. 2. Parser combinator library for simple iteratees

We thus arrive at the simple parser combinator library, Figure 2, which letsus derive the parsers getline and getlines that we previously built by intuition.First, we use the library to write the line reader in the ‘obviously correct’ way,expressing our definition of a line:

pGetline :: I StringpGetline = nl C liftM2 (:) one pGetlinewhere nl = do

Page 14: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

14

pSat (\c → c == Just ’\n’ | | c == Nothing)return ””

It is quite inefficient: if the current character is not newline, nl turns into failure,which does nothing but keeps receiving characters and discarding them. Suchwasteful operations should be eliminated. Noting that pSat and one start withoneL that can be factored out by the right distributivity, and applying the lawsof failure gives us

pGetline ’ :: I StringpGetline ’ = oneL �= checkwhere check (Just ’\n’) = return ””

check Nothing = return ””check (Just c) = liftM (c:) pGetline ’

(See App. B for the complete equational derivation.) This iteratee is a non-tail-recursive version of getline that we wrote ad hoc earlier with explicit processconstructors GetC and Done. (The tail-recursive conversion through accumulatoris standard.) The correctness of getlines follows from the law of chaining.

The parser combinators in Figure 2 are efficient: the input stream is consumedcharacter-by-character and is never backtracked. These parser combinators aresomewhat similar to camlp4 parsers [11] in structuring a parser as a team ofconcurrent simple recursive-descent ‘stream parsers’. Figure 2 library races thestream parsers in parallel until one succeeds. Camlp4 orders stream parsers byprecedence and lets a higher-precedence parser run for a as long as it could.Camlp4 relies on look-ahead, which iteratee parser combinators in this sectiondo not have (although it could be emulated in the continuation-passing style).

Look-ahead, or a fixed put-back, is an ‘effect’, to be expressed with effectfuliteratees, see App. A. Effectful iteratees also permit a better error reporting.App. A thus lays out the way towards implementing the interface in Figure 1and running our illustrative examples, §2.

4 Conclusions

We have introduced Iteratee IO, a compositional incremental input processingstyle with precise resource control. Like Lazy IO, it provides high abstraction,composability, combinator libraries, and on-demand IO. Because Iteratee IO isnot lazy, the Iteratee library can ensure timely deallocation of resources, preciseIO error handling, and strict control of reading actions. Incremental processingcan now be guaranteed. Iteratee IO is therefore particularly suitable for program-ming long-running servers and processing large amounts of data. Compared toHandle IO, Iteratee IO is much higher level.

We have presented a view of iteratees as processes, represented by final co-algebras and free monads. The view shows how to reason with iteratees andimplement them, motivating the basic design of iteratees and explaining theircompositions. The theory of effectful iteratees clarifies the vexing issues of buffer-ing and look-ahead. The iteratee libraries have many other features such as error

Page 15: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

15

reporting, restartable exceptions, random IO, and merging of several streams.The iteratee-as-process view helps in understanding these advanced parts, too.

The capabilities and applications of Iteratee IO are still being discovered.For example, it was recently shown10 that monadic regions and iteratees easilycombine; therefore it is possible after all to write an exception-safe iterFile (aniteratee that writes the stream data to a file), ensuring the output file alwaysclosed.

The theory of effectful iteratees hints at the possibility of reasoning aboutcomputations with arbitrary IO effects (involving communication pipes, locking,shared memory, etc.), being very specific, at times, about the allocation of re-sources and the precise sequencing of operating system calls. We could deriveobservational equivalences of IO programs by extending equivalences of simplesample programs asserted by the programmer.

Even though Lazy IO compromises equational reasoning, it was introducedbecause Haskell was perceived – by its creators – as not expressive enough forincremental high-level IO: “We fear that there may be no absolutely securesystem – that is, which one guarantees the Church-Rosser property – which isalso expressive enough to describe the programs which systems programmers (atleast) want to write...” [7, Sec 10.5]. The pessimism turns out unwarranted. Wecan write high-level programs with incremental IO and precise resource control –in safe Haskell.

Acknowledgments

I am very thankful to John W. Lato, Paulo Tanimoto, Johan Tibell and Al-istair Bayley for extensive insightful discussions. Many helpful comments andsuggestions from Gregory Collins, Jason Dagit, Nicolas Frisby, David Mazieres,Chung-chieh Shan, Wren Ng Thornton and anonymous reviewers are gratefullyacknowledged.

10 http://www.haskell.org/pipermail/haskell-cafe/2012-January/098704.html

Page 16: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

Bibliography

[1] Claessen, Koen. 2004. Parallel parsing processes. Journal of FunctionalProgramming 14(6):741–757.

[2] Collins, Gregory David. 2011. snap-server: An iteratee-based http serverlibrary. http://snapframework.com/docs/latest/snap-server/index.

html.[3] Harrison, William L. 2006. The essence of multitasking. In 11th int. conf.

on algebraic methodology and software technology (AMAST 2006), 158–172.[4] Hinze, Ralf. 2001. Deriving backtracking monad transformers. In ICFP,

186–197. ACM Press.[5] Kiselyov, Oleg. 2008. Incremental multi-level input processing with left-

fold enumerator: predictable, high-performance, safe, and elegant. ACMSIGPLAN 2008 Developer Tracks on Functional Programming (DEFUN2008).

[6] Lato, John W. 2010. Iteratee: Teaching an old fold new tricks. In Themonad.reader, ed. Brent Yorgey, vol. 16, 19–35.

[7] Launchbury, John, and Simon L. Peyton Jones. 1995. State in Haskell. Lispand Symbolic Computation 8(4):293–341.

[8] Millikin, John. 2010. Understanding iteratees. http://john-millikin.

com/articles/understanding-iteratees/.[9] Parker, Conrad. 2011. Iteratees at Tsuru Capital. http://blog.kfish.

org/2011/09/iteratees-at-tsuru.html.[10] Quick, Kevin. 2011. Fun with the ST monad. http://www.haskell.org/

pipermail/haskell-cafe/2011-February/089689.html.[11] de Rauglaudre, Daniel. 2003. Camlp4 - Reference Manual, version 3.07.

http://caml.inria.fr/pub/docs/manual-camlp4/.[12] Shaw, Mary, William A. Wulf, and Ralph L. London. 1977. Abstraction and

verification in Alphard: defining and specifying iteration and generators.Communications of the ACM 20(8):553–564.

[13] Swierstra, S. Doaitse. 2008. Combinator parsing: A short tutorial. In LerNetALFA Summer School, vol. 5520 of LNCS, 252–300. Springer.

[14] Swierstra, Wouter. 2008. Data types a la carte. Journal of FunctionalProgramming 18(4):423–436.

[15] Tarau, Paul. 2000. Fluents: A refactoring of Prolog for uniform reflectionand interoperation with external objects. In Computational Logic: Firstinternational conference, ed. John Lloyd. LNCS 1861.

[16] Thornton, Wren Ng. 2011. Fun with the ST monad. http://www.haskell.org/pipermail/haskell-cafe/2011-February/089687.html.

[17] Yamamoto, Kazu. 2011. A tutorial on the enumerator library. http://

www.mew.org/~kazu/proj/enumerator/.[18] Yang, Edward Z. 2012. Why iteratees are hard to understand. http://

blog.ezyang.com/2012/01/why-iteratees-are-hard-to-understand/.

Page 17: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

17

A Effective iteratees

The data received by an iteratee may come from a file or a network. To get achunk of that data, an enumerator had to perform IO. An iteratee may alsoneed effects, e.g., to write a log, report exceptions, rewind the input stream.The theory of effectful iteratees developed in this section applies to all thesecases. The theory extends the final-coalgebra representation of iteratee processesintroduced in §3. We generalize the equational laws stated in §3 and prove them.The full details are in the accompanying source code IterDerivM.hs.

As in §3, we view iteratees as processes, modeled by a final co-algebra of theiroperations – terminated or requesting a character. After receiving a character,the iteratee may now incur an effect, in an arbitrary monad m:

data I m a = Done a| GetC (LChar → m (I m a))

The model of the line reader process below looks identically to that of getline in§3. Indeed, this line reader had no effects besides requesting a character.

getline :: Monad m ⇒ I m Stringgetline = loop ””whereloop acc = GetC (check acc)check acc (Just c) | c 6= ’\n’ = return (loop (c: acc))check acc = return (Done (reverse acc))

Let us introduce an effect, of emitting a string:

class Monad m ⇒ PutS m whereputS :: String → m ()

instance PutS IO whereputS = putStrLn

so that we may model a line reader that writes the debugging trace of eachreceived character:

getlineT :: (PutS m, Monad m) ⇒ I m StringgetlineT = loop ””whereloop acc = GetC (trace acc)trace acc c = putS (”got ” ++ show c) � check acc ccheck acc (Just c) | c 6= ’\n’ = return (loop (c: acc))check acc = return (Done (reverse acc))

The interpreters of the iteratee term representation – enumerators and run –are defined as before, §3. The presence of the “call-by-value application” (=� )reveals that the evaluation order now matters.

en str :: Monad m ⇒ String → I m a → m (I m a)

Page 18: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

18

en str ”” i = return ien str (c: t) (GetC k) = en str t =� k (Just c)en str i@Done{} = return i

run :: Monad m ⇒ I m a → m arun (Done x) = return xrun (GetC k) = run =� k Nothing

As in §3, monadic operations let us compose iteratee processes: I m is amonad – a free monad.

instance Monad m ⇒Monad (I m) wherereturn = Done

Done x �= f = f xGetC k �= f = GetC (k ≫ (return ◦ (�= f)))

Somewhat surprisingly (since monads do not generally compose), the composi-tion of m and I m is also a monad, with the following bind operation

type IM m a = m (I m a)bind :: Monad m ⇒ IM m a → (a → IM m b) → IM m bbind m f = m �= checkwherecheck (Done x) = f xcheck (GetC k) = return (GetC (\c → bind (k c) f ))

We hence combine the logging line reader getlineT to read several lines until theempty line:

getlinesT :: (PutS m, Monad m) ⇒ I m [String]getlinesT = loop []whereloop acc = getlineT �= check acccheck acc ”” = return (reverse acc)check acc l = loop (l : acc)

Here is the complete example of reading lines from a given string, printing eachcharacter as it is being processed.

t111 = print =� run =� en str ”abd\nxxx\nf” getlinesT

The equational laws of iteratees and enumerators, §3, generalize to the ef-fectful case.

Equational Law 5 (Effectful Composition)

en str (s1 ++ s2) ≡ en str s1 ≫ en str s2

Here s1 must be a finite string; s2 is arbitrary. The law reads just like the originallaw of composition, this time, in terms of Kleisli composition. Let us prove it.

Page 19: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

19

If we pass the Done x iteratee to the enumerators on both sides of the equality,the results are clearly equal: en str s (Done x) ≡ return (Done x) regardless ofs. The other case is the enumerators applied to a GetC k iteratee, which we proveby induction on s1. In the base case, the empty s1, en str ”” is return, which isthe unit of Kleisli composition. The inductive case:

en str (( c: s1’) ++ s2) (GetC k)≡ −− property of list appenden str (c:( s1’ ++ s2)) (GetC k)≡ −− second clause of en stren str (s1’ ++ s2) =� k (Just c)≡ −− inductive hypothesis(en str s1’ ≫ en str s2) =� k (Just c)≡ −− definition of (≫ )k (Just c) �= (\x → en str s1’ x �= en str s2)≡ −− associativity of bind(k (Just c) �= en str s1’) �= en str s2≡ −− second clause of en stren str (c: s1’) (GetC k) �= en str s2≡ −− definition of (≫ )(en str (c: s1’) ≫ en str s2) (GetC k)

The law of chaining of §3 becomes:

Equational Law 6 (Effectful Chaining) 1. If the iteratee i properly recog-nizes s1, then

en str (s1++ s2) (i �= f) ≡ en str s1 i ‘ bind‘ en str s2 ◦ f

2. If the iteratee i does not recognize s, then

en str s (i �= f) ≡ en str s i �= (return ◦ (�= f))

The proof of part 1 is by induction on s1, which must be finite by the definitionof proper recognition. In the base case, the iteratee i properly recognizing theempty string s1 is Done x and so en str s1 i is return (Done x), which is the leftunit of bind. In the inductive case, the iteratee i properly recognizes the stringc:s1’. Therefore, i must have the form GetC k where k (Just c) is an action thatmust yield an iteratee i’ properly recognizing s1’. We calculate:

en str (( c: s1’) ++ s2) (GetC k �= f)≡ −− property of list appenden str (c:( s1’ ++ s2)) (GetC k �= f)≡ −− definition of bind of (I m)en str (c:( s1’ ++ s2)) (GetC (k ≫ (return ◦ (�= f))))≡ −− second clause of en str definitionen str (s1’ ++ s2) =� (k ≫ (return ◦ (�= f))) (Just c)≡ −− definition of (≫ )en str (s1’ ++ s2) =� (k (Just c) �= (return ◦ (�= f)))

Page 20: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

20

≡ −− rearrangements(k (Just c) �= (return ◦ (�= f))) �= en str (s1’ ++ s2)≡ −− associativity of bindk (Just c) �= (\i ’ → return (i ’ �= f) �= en str (s1’ ++ s2))≡ −− left unit lawk (Just c) �= (\i ’ → en str (s1’ ++ s2) (i’ �= f))≡ −− inductive hypothesis: i ’ does properly recognize s1’k (Just c) �= (\i ’ → en str s1’ i ’ ‘ bind‘ en str s2 ◦ f)≡ −− a property of bind: m ‘bind‘ f ≡ m �= (\x → return x ‘ bind‘ f )k (Just c) �= (\i ’ →

en str s1’ i ’ �= (\x → return x ‘ bind‘ en str s2 ◦ f))≡ −− associativity(k (Just c) �= en str s1’) �=

(\x → return x ‘ bind‘ en str s2 ◦ f)≡ −− second clause of en str definitionen str (c: s1’) (GetC k) �= (\x → return x ‘ bind‘ en str s2 ◦ f)≡ −− the same property of binden str (c: s1’) (GetC k) ‘bind‘ en str s2 ◦ f

The proof of part 2 is analogous, see IterDerivM.hs for the complete derivation.Since the proofs relied only on monad laws, the laws of effectful composition andchaining hold for any effect whatsoever.

The divergent failure iteratee now reads

failure :: Monad m ⇒ I m afailure = GetC (const (return failure ))

The law Zero of §3 remains the same for effectful iteratees: failure �= f ≡ failure.The proof is trivial bisimulation.

The left-biased alternation of effectful iteratees has the form

(C ) :: Monad m ⇒ I m a → I m a → I m ai@Done{} C = iC i@Done{} = i

GetC k1 C GetC k2 = GetC (\c → liftM2 (C ) (k1 c) (k2 c))

To state the right-distributivity law, we need a definition: An iteratee i is idem-potent if

en str s i �= \x → return (x, x) ≡en str s i �= \x → en str s i �= \y → return (x, y)

for any finite string s. The right distributivity law has the same form as givenin §3 – with the side-condition that i must be an idempotent iteratee.

The proof is by bi-similarity. We define the relation R on iteratees as a setof all pairs (iA,iB) where

iA = i �= \x → (k1 x C k2 x)iB = (i �= k1) C (i �= k2)

Page 21: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

21

and we consider all observations of related iteratees. Here we only show the casewhen i is of the form GetC k for some k, in which case iA is GetC kA and iB isGetC kB for some kA and kB. The full proof is in IterDerivM.hs. If we feed iAa character c, we observe

case GetC k �= \x → (k1 x C k2 x) of GetC kA → kA c≡ −− definition on (�= )(k ≫ (return ◦ (�= \x → (k1 x C k2 x)))) c≡k c �= \i’ →return (i ’ �= \x → (k1 x C k2 x))

For iteratee iB, we have

case (GetC k �= k1) C (GetC k �= k2) of GetC kB → kB c≡ −− definitionscase GetC (k ≫ (return ◦ (�= k1))) C

GetC (k ≫ (return ◦ (�= k2)))of GetC kB → kB c≡liftM2 (C )(k c �= \ix → return (ix �= k1))(k c �= \iy → return (iy �= k2))≡ −− definition of liftM2(k c �= \ix → return (ix �= k1)) �= \i1 →(k c �= \iy → return (iy �= k2)) �= \i2 →return (i1 C i2 )≡ −− associativity , unit lawsk c �= \ix → k c �= \iy →return (( ix �= k1) C (iy �= k2))≡ −− monad unit lawk c �= \ix → k c �= \iy → return (ix , iy ) �=\ (ix , iy ) → return (( ix �= k1) C (iy �= k2))≡ −− idempotencek c �= \ix → return (ix , ix ) �=\ (ix , iy ) → return (( ix �= k1) C (iy �= k2))≡ −− monad lawsk c �= \i’ →return (( i ’ �= k1) C (i ’ �= k2))

Thus feeding c to the related iA and iB incurs the same effect (associated withk c) and produces the iteratees that are also related by R.

Treating look-ahead as an effect, the file IterDerivM.hs generalizes theparser combinators of Figure 2 for look-ahead. Buffering, the processing of inputby chunks rather than by individual characters, can be handled similarly.

Page 22: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

22

B Deriving the optimal version of pGetline

This section shows the detailed steps in the conversion of the “obviously correct”but grossly inefficient line reader pGetLine, §3, to the efficient version pGetline’.The conversion relies on the equational laws of iteratees.

Our starting point is pGetline written using the iteratee parsing combinatorsfrom Figure 2:

pGetline :: I StringpGetline = nl C liftM2 (:) one pGetlinewhere nl = do

pSat (\c → c == Just ’\n’ | | c == Nothing)return ””

First, we inline the definitions of one and pSat and desugar the do-notation:

pGetline1 = nl C charwhere

nl = (oneL �= \c → if c == Just ’\n’ | | c == Nothingthen return c else failure ) � return ””

char = (oneL �= maybe failure return ) �= \c → liftM (c:) pGetline1

We re-associate the bind chains to the right:

pGetline2 = nl C charwhere

nl = oneL �= (\c → if c == Just ’\n’ | | c == Nothingthen return c � return ””else failure � return ””)

char = oneL �= (\c → maybe failure return c �= \c →liftM (c:) pGetline2 )

distribute bind into case and apply Monad and Zero laws:

pGetline3 = nl C charwhere

nl = oneL �= (\c → if c == Just ’\n’ | | c == Nothingthen return ””else failure )

char = oneL �= (\c → case c ofJust c → liftM (c:) pGetline3Nothing → failure )

and the right-distributivity law:

pGetline4 = oneL �= \c → nl’ c C char’ cwhere

nl ’ c = if c == Just ’\n’ | | c == Nothingthen return ”” else failure

char’ c = case c of

Page 23: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

23

Just c → liftM (c:) pGetline4Nothing → failure

We pull out the case analysis on the read character, essentially “narrowing”

pGetline5 = oneL �= checkwhere

check (Just ’\n’) = return ”” C liftM (’\ n’:) pGetline5check Nothing = return ”” C failurecheck (Just c) = failure C liftM (c:) pGetline5

The facts that failure is the left and the right unit of C , and return x is its leftzero give us pGetLine’.

C The plumbing intuition for Iteratee IO

The diagrammatic notation for iteratee programs introduced in this section helpsvisualize the flow of input data, giving an idea of iteratee processing at a glance.The notation is inspired by the “Piping and Instrumentation Diagram StandardNotation”11 used in Industrial Engineering for a similar purpose. For illustration,we show the diagrams for the examples in §2.

a

countTHE

countWS

filter filterwords

b

countTHE

countWS

filter filterwords

c

countTHE

countWS

filter filterwords

d

countTHE

countWS

filter filterwords

e

countTHE

countWS

filter filterwords

f

countTHE

countWS

filter filterwords

g

countTHE

countWS

filter filterwords

Fig. 3. Notation for primitive components and combinators

Figure 3 describes the notation for the Iteratee library components, Figure 1.Enumerator (a) is a pump, pumping data so long as it can flow. When the con-sumer is saturated and will not accept more data, the pressure rises and thepump shuts off. A general iteratee (b) is a reservoir with an overflow pipe. Whenthe reservoir is filled up (i.e., the iteratee is Done), the further input data flowthrough the overflow pipe to the next iteratee in chain. The overflow pipe isshut by default. When the reservoir fills up (that is, the iteratee gets all datait needs) and there is no further iteratee, the data stream has nowhere to flow,the pressure rises and the pump (enumerator) shuts off. The iteratee getchar (c)is a small reservoir, which can only hold a single byte. In contrast, count i (d)is an open reservoir, accepting any amount of input data. The pairing combi-nator en pair is the Y-connector (e), splitting the stream in two. Enumeratee is

11 See the example, https://controls.engin.umich.edu/wiki/index.php/

PIDStandardNotation

Page 24: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

24

a reactor (f), transforming the incoming (top) stream to the stream of ‘reactorproducts’. When the bottom flow stops, that is, the iteratee consuming the re-actor product finishes, the incoming flow continues through the right (overflow)end. The combinator (. | ) (g) terminates that overflow pipe. There are cases (notshown in the paper) when the overflow continues: for example, when processingmulti-part MIME messages. Kleisli composition of enumerators connects pumpsin sequence:

countTHE

countWS

filter filterwords

We now show the plumbing diagrams for the examples in §2. We start withthe simplest pipeline (too simple to mention in §2) that measures the output ofa pump: enum file fname count i.

countTHE

countWS

filter filterwords

The white-space gauge:

countWS’ iter = id . | (en filter isSpace) count i

countTHE

countWS

filter filterwords

The gauge for “the”:

countTHE iter = id . | enum words . | en filter (== ”the”) count i

countTHE

countWS

filter filterwords

The counter of both “the” and the whitespace in the prefix of the inputstream

run ntermI n fname =print =� run =� enum file fname . |IterateeM. take n (countWS iter ‘en pair ‘ countTHE iter)

Page 25: Iterateesokmij.org/ftp//Haskell/Iteratee/describe.pdf · data. Lazy IO facilitates compositional input processors like parser combinators. Lazy IO is so irresistible that it was added

25

is drawn as follows.

countTHE

countWS

filter filterwords


Recommended