The Vanishing Pattern: from iterators to generators in Python

Post on 10-May-2015

2,677 views 2 download



The core of the talk is refactoring a simple iterable class from the classic Iterator design pattern (as implemented in the GoF book) to compatible but less verbose implementations using generators. This provides a meaningful context to understand the value of generators. Along the way the behavior of the iter function, the Sequence protocol and the Iterable interface are presented. The motivating examples of this talk are database applications.


The Vanishing Patternfrom iterators to generators in Python Luciano Ramalho


Demo: laziness in the Django Shell


>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

this expression makes a Django QuerySet

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

this expression makes a Django QuerySet

QuerySets are “lazy”: no database access so far

>>> from django.db import connection>>> q = connection.queries>>> q[]>>> from municipios.models import *>>> res = Municipio.objects.all()[:5]>>> q[]>>> for m in res: print m.uf, m.nome... GO Abadia de GoiásMG Abadia dos DouradosGO AbadiâniaMG AbaetéPA Abaetetuba>>> q[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]

this expression makes a Django QuerySet

QuerySets are “lazy”: no database access so far

the query is made only when we iterate over the results


QuerySet is a lazy iterable



QuerySet is a lazy iterable

technical term




• Avoids unnecessary work, by postponing it as long as possible

• The opposite of eager


In Computer Science, being “lazy” is often a good thing!


Now, back to basics...



Iteration: C and Python#include <stdio.h>

int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%s\n", argv[i]); return 0;}

import sys

for arg in sys.argv: print arg


Iteration: Java (classic)

class Arguments { public static void main(String[] args) { for (int i=0; i < args.length; i++) System.out.println(args[i]); }}

$ java Arguments alfa bravo charliealfabravocharlie


Iteration: Java ≥1.5

$ java Arguments2 alfa bravo charliealfabravocharlie

• Enhanced for (a.k.a. foreach)


class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); }}


Iteration: Java ≥1.5• Enhanced for (a.k.a. foreach)

class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); }}


import sys

for arg in sys.argv: print arg



You can iterate over manyPython objects

• strings

• files

• XML: ElementTree nodes

• not limited to built-in types:

• Django QuerySet

• etc.



So, what is an iterable?

• Informal, recursive definition:

• iterable: fit to be iterated

• just as: edible: fit to be eaten



The for loop statement is not the only construct that

handles iterables...


List comprehension

● Compreensão de lista ou abrangência de lista

● Exemplo: usar todos os elementos:

– L2 = [n*10 for n in L]

List comprehension• An expression that builds a list from any iterable

>>> s = 'abracadabra'>>> l = [ord(c) for c in s]>>> l[97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97]

input: any iterable object

output: a list (always)


Set comprehension

• An expression that builds a set from any iterable

>>> s = 'abracadabra'>>> set(s){'b', 'r', 'a', 'd', 'c'}>>> {ord(c) for c in s}{97, 98, 99, 100, 114}



Dict comprehensions

• An expression that builds a dict from any iterable

>>> s = 'abracadabra'>>> {c:ord(c) for c in s}{'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100}



Syntactic support for iterables

• Tuple unpacking, parallel assignment

>>> a, b, c = 'XYZ'>>> a'X'>>> b'Y'>>> c'Z'


>>> l = [(c, ord(c)) for c in 'XYZ']>>> l[('X', 88), ('Y', 89), ('Z', 90)]>>> for char, code in l:... print char, '->', code...X -> 88Y -> 89Z -> 90


Syntactic support for iterables (2)

• Function calls: exploding arguments with *

>>> import math>>> def hypotenuse(a, b):... return math.sqrt(a*a + b*b)...>>> hypotenuse(3, 4)5.0>>> sides = (3, 4)>>> hypotenuse(sides)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: hypotenuse() takes exactly 2 arguments (1 given)>>> hypotenuse(*sides)5.0



Built-in iterable types

• basestring

• str

• unicode

• dict

• file

• frozenset

• list

• set

• tuple

• xrange



Built-in functions that take iterable arguments

• all

• any

• filter

• iter

• len

• map

• max

• min

• reduce

• sorted

• sum

• zip

unrelated to compression


Classic iterables in Python



Iterator is...

• a classic design pattern

Design PatternsGamma, Helm, Johnson & VlissidesAddison-Wesley, ISBN 0-201-63361-2



Head First Design Patterns PosterO'Reilly, ISBN 0-596-10214-3



Head First Design Patterns PosterO'Reilly, ISBN 0-596-10214-3


“The Iterator Pattern provides a way to access the elements of an aggregate object sequentially without exposing the underlying representation.”

An iterable Train class>>> train = Train(4)>>> for car in train:... print(car)car #1car #2car #3car #4>>>


class Train(object):

def __init__(self, cars): = cars

def __len__(self): return

def __iter__(self): return TrainIterator(self)

class TrainIterator(object):

def __init__(self, train): self.train = train self.current = 0

def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration()

An iterable Train with iterator




Iterable ABC

• collections.Iterable abstract base class

• A concrete subclass of Iterable must implement .__iter__

• .__iter__ returns an Iterator

• You don’t usually call .__iter__ directly

• when needed, call iter(x)



Iterator ABC

• Iterator provides.nextor.__next__

• .__next__ returns the next item

• You don’t usually call .__next__ directly

• when needed, call next(x)

Python 3

Python 2

Python ≥ 2.6



for car in train:

• calls iter(train) to obtain a TrainIterator

• makes repeated calls to next(aTrainIterator) until it raises StopIteration

class Train(object):

def __init__(self, cars): = cars

def __len__(self): return

def __iter__(self): return TrainIterator(self)

class TrainIterator(object):

def __init__(self, train): self.train = train self.current = 0

def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration()

Train withiterator




>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3


@ramalhoorg34 Richard Bartz/Wikipedia


Iterable duck-like creatures



Design patterns in dynamic languages

• Dynamic languages: Lisp, Smalltalk, Python, Ruby, PHP, JavaScript...

• Many features not found in C++, where most of the original 23 Design Patterns were identified

• Java is more dynamic than C++, but much more static than Lisp, Python etc.


Gamma, Helm, Johnson, Vlissides a.k .a. the Gang of Four (GoF)

Peter Norvig:“Design Patterns in Dynamic Languages”


Dynamic types

• No need to declare types or interfaces

• It does not matter what an object claims do be, only what it is capable of doing



Duck typing


“In other words, don't check whether it is-a duck: check whether it quacks-like-a duck, walks-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.”

Alex Martellicomp.lang.python (2000)


A Python iterable is...

• An object from which the iter function can produce an iterator

• The iter(x) call:

• invokes x.__iter__() to obtain an iterator

• but, if x has no __iter__:

• iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]...

sequence protocol

Iterable interface



Train: a sequence of carstrain = Train(4)


train[0] train[1] train[2] train[3]

Train: a sequence of cars>>> train = Train(4)>>> len(train)4>>> train[0]'car #1'>>> train[3]'car #4'>>> train[-1]'car #4'>>> train[4]Traceback (most recent call last): ...IndexError: no car at 4

>>> for car in train:... print(car)car #1car #2car #3car #4

Train: a sequence of carsclass Train(object):

def __init__(self, cars): = cars

def __getitem__(self, key): index = key if key >= 0 else + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

if __getitem__ exists, iteration “just works”


The sequence protocol at work>>> t = Train(4)>>> len(t)4>>> t[0]'car #1'>>> t[3]'car #4'>>> t[-1]'car #4'>>> for car in t:... print(car)car #1car #2car #3car #4






• protocol: a synonym for interface used in dynamic languages like Smalltalk, Python, Ruby, Lisp...

• not declared, and not enforced by static checks


class Train(object):

def __init__(self, cars): = cars

def __len__(self): return

def __getitem__(self, key): index = key if key >= 0 else + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence protocol

__len__ and __getitem__ implement the immutable sequence protocol

import collections

class Train(collections.Sequence):

def __init__(self, cars): = cars

def __len__(self): return

def __getitem__(self, key): index = key if key >= 0 else + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence ABC• collections.Sequence abstract base class

abstract methods

Python ≥ 2.6

import collections

class Train(collections.Sequence):

def __init__(self, cars): = cars

def __len__(self): return

def __getitem__(self, key): index = key if key >= 0 else + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence ABC• collections.Sequence abstract base class

implement these 2

import collections

class Train(collections.Sequence):

def __init__(self, cars): = cars

def __len__(self): return

def __getitem__(self, key): index = key if key >= 0 else + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Sequence ABC• collections.Sequence abstract base class

inherit these 5


Sequence ABC• collections.Sequence abstract base class

>>> train = Train(4)>>> 'car #2' in trainTrue>>> 'car #7' in trainFalse>>> for car in reversed(train):... print(car)car #4car #3car #2car #1>>> train.index('car #3')2


@ramalhoorg51 U.S. NRC/Wikipedia





Iteration in C (example 2)

#include <stdio.h>

int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%d : %s\n", i, argv[i]); return 0;}

$ ./args2 alfa bravo charlie0 : ./args21 : alfa2 : bravo3 : charlie


Iteration in Python (ex. 2)

import sys

for i in range(len(sys.argv)): print i, ':', sys.argv[i]

$ python alfa bravo charlie0 : args2.py1 : alfa2 : bravo3 : charlie 54

not Pythonic


Iteration in Python (ex. 2)

import sys

for i, arg in enumerate(sys.argv): print i, ':', arg

$ python alfa bravo charlie0 : args2.py1 : alfa2 : bravo3 : charlie 55



import sys

for i, arg in enumerate(sys.argv): print i, ':', arg

Iteration in Python (ex. 2)

$ python alfa bravo charlie0 : args2.py1 : alfa2 : bravo3 : charlie

this returns a lazy iterable object

that object yields tuples (index, item)

on demand, at each iteration



What enumerate does

>>> e = enumerate('Turing')>>> e<enumerate object at 0x...>>>>

enumerate builds an enumerate object



What enumerate does

isso constroium gerador

and that is iterable

>>> e = enumerate('Turing')>>> e<enumerate object at 0x...>>>> for item in e:... print item...(0, 'T')(1, 'u')(2, 'r')(3, 'i')(4, 'n')(5, 'g')>>>


enumerate builds an enumerate object


What enumerate does

isso constroium gerador

the enumerate object produces an

(index, item) tuplefor each next(e) call

>>> e = enumerate('Turing')>>> e<enumerate object at 0x...>>>> next(e)(0, 'T')>>> next(e)(1, 'u')>>> next(e)(2, 'r')>>> next(e)(3, 'i')>>> next(e)(4, 'n')>>> next(e)(5, 'g')>>> next(e)Traceback (most recent...): ...StopIteration

• The enumerator object is an example of a generator


Iterator x generator• By definition (in GoF) an iterator retrieves successive items

from an existing collection

• A generator implements the iterator interface (next) but produces items not necessarily in a collection

• a generator may iterate over a collection, but return the items decorated in some way, skip some items...

• it may also produce items independently of any existing data source (eg. Fibonacci sequence generator)


Faraday disc(Wikipedia)


Very simplegenerators




• Any function that has the yield keyword in its body is a generator function


>>> def gen_123():... yield 1... yield 2... yield 3...>>> for i in gen_123(): print(i)123>>>

the keyword gen was considered for defining generator functions,

but def prevailed


• When invoked, a generator function returns a generator object



>>> def gen_123():... yield 1... yield 2... yield 3...>>> for i in gen_123(): print(i)123>>> g = gen_123()>>> g <generator object gen_123 at ...>



>>> def gen_123():... yield 1... yield 2... yield 3...>>> g = gen_123()>>> g <generator object gen_123 at ...>>>> next(g)1>>> next(g)2>>> next(g)3>>> next(g)Traceback (most recent call last):...StopIteration

• Generator objects implement the Iterator interface




• Note how the output of the generator function is interleaved with the output of the calling code


>>> def gen_AB():... print('START')... yield 'A'... print('CONTINUE')... yield 'B'... print('END.')...>>> for c in gen_AB():... print('--->', c)...START---> ACONTINUE---> BEND.>>>



• The body is executed only when next is called, and it runs only up to the following yield

>>> def gen_AB():... print('START')... yield 'A'... print('CONTINUE')... yield 'B'... print('END.')...>>> g = gen_AB()>>> next(g)START'A'>>>



• When the body of the function returns, the generator object throws StopIteration

• The for statement catches that for you


>>> def gen_AB():... print('START')... yield 'A'... print('CONTINUE')... yield 'B'... print('END.')...>>> g = gen_AB()>>> next(g)START'A'>>> next(g)CONTINUE'B'>>> next(g)END.Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration

for car in train:

• calls iter(train) to obtain a generator

• makes repeated calls to next(generator) until the function returns, which raises StopIteration

class Train(object):

def __init__(self, cars): = cars

def __iter__(self): for i in range( # index 2 is car #3 yield 'car #%s' % (i+1)

Train with generator function




>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3


Classic iterator x generator

class Train(object):

def __init__(self, cars): = cars

def __len__(self): return

def __iter__(self): return TrainIterator(self)

class TrainIterator(object):

def __init__(self, train): self.train = train self.current = 0

def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration()

class Train(object):

def __init__(self, cars): = cars

def __iter__(self): for i in range( yield 'car #%s' % (i+1)

2 classes, 12 lines of code

1 class,3 lines of code

class Train(object):

def __init__(self, cars): = cars

def __iter__(self): for i in range( yield 'car #%s' % (i+1)

The pattern just vanished

class Train(object):

def __init__(self, cars): = cars

def __iter__(self): for i in range( yield 'car #%s' % (i+1)

“When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough -- often that I'm generating by hand the expansions of some macro that I need to write.”

Paul GrahamRevenge of the nerds (2002)

Generator expression (genexp)

>>> g = (c for c in 'ABC')>>> g<generator object <genexpr> at 0x10045a410> >>> for l in g:... print(l)... ABC>>>


• When evaluated, returns a generator object

>>> g = (n for n in [1, 2, 3])>>> g<generator object <genexpr> at 0x...>>>> next(g)1>>> next(g)2>>> next(g)3>>> next(g)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration

Generator expression (genexp)

for car in train:

• calls iter(train) to obtain a generator

• makes repeated calls to next(generator) until the function returns, which raises StopIteration

class Train(object):

def __init__(self, cars): = cars

def __iter__(self): for i in range( # index 2 is car #3 yield 'car #%s' % (i+1)

Train with generator function




>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3


for car in train:

• calls iter(train) to obtain a generator

• makes repeated calls to next(generator) until the function returns, which raises StopIteration



class Train(object): def __init__(self, cars): = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(

Train with generator expression

>>> train = Train(3)>>> for car in train:... print(car)car #1car #2car #3

class Train(object):

def __init__(self, cars): = cars

def __iter__(self): return ('car #%s' % (i+1) for i in range(

Generator functionx genexpclass Train(object):

def __init__(self, cars): = cars

def __iter__(self): for i in range( yield 'car #%s' % (i+1)


Built-in functions that return iterables, iterators or generators

• dict

• enumerate

• frozenset

• list

• reversed

• set

• tuple



• boundless generators

• count(), cycle(), repeat()

• generators which combine several iterables:

• chain(), tee(), izip(), imap(), product(), compress()...

• generators which select or group items:

• compress(), dropwhile(), groupby(), ifilter(), islice()...

• generators producing combinations of items:

• product(), permutations(), combinations()...

The itertools module Don’t reinvent the wheel, use itertools!

this was not reinvented: ported from Haskell

great for MapReduce


Generators in Python 3

• Several functions and methods of the standard library that used to return lists, now return generators and other lazy iterables in Python 3

• dict.keys(), dict.items(), dict.values()...

• range(...)

• like xrange in Python 2.x (more than a generator)

• If you really need a list, just pass the generator to the list constructor. Eg.: list(range(10))



A practical example using generator functions

• Generator functions to decouple reading and writing logic in a database conversion tool designed to handle large datasets



Main loop writes JSON file


Another loop readsthe input records


One implementation:same loop reads/writes


But what if we need to read another format?


Functions in the script

• iterMstRecords*

• iterIsoRecords*

• writeJsonArray

• main

* generator functions


main:read commandline arguments

main: determineinput format

selected generator function is passed as an argument

input generator function is selected based on the input file extension


writeJsonArray:write JSON records


writeJsonArray:iterates over one of the input generator functions

selected generator function received as an argument...

and called to produce input generator


iterIsoRecords:read recordsfrom ISO-2709format file

generator function!




yields one record, structured as a dict

creates a new dict in each iteration



iterMstRecords:read recordsfrom ISIS.MST file

generator function!


yields one record, structured as a dict

creates a new dict in each iteration

Generators at work

Generators at work

Generators at work


We did not cover

• other generator methods:

• gen.close(): causes a GeneratorExit exception to be raised within the generator body, at the point where it is paused

• gen.throw(e): causes any exception e to be raised within the generator body, at the point it where is paused

Mostly useful for long-running processes.Often not needed in batch processing scripts.



We did not cover

• generator delegation with yield from

• sending data into a generator function with the gen.send(x) method (instead of next(gen)), and using yield as an expression to get thedata sent

• using generator functions as coroutines

not useful in the context of iteration

Python ≥ 3.3

“Coroutines are not related to iteration”

David Beazley



How to learn generators

• Forget about .send() and coroutines: that is a completely different subject. Look into that only after mastering and becoming really confortable using generators for iteration.

• Study and use the itertools module

• Don’t worry about .close() and .throw() initially. You can be productive with generators without using these methods.

• yield from is only available in Python 3.3, and only relevant if you need to use .close() and .throw()