logo
 
Generators
CptS 355 - Programming Language Design
Washington State University
Home
Calendar
Syllabus
Resources
People
Project turn-in

Generators

Generators (or co-routines) are a feature of some languages that allow subroutines to retain state from one invocation to the next. This ability can sometimes result in clearer, more concise code.

We will look at the generators in python using a running example which is to merge two sorted, sentinel-terminated lists. (Using sentinel-terminated lists simplifies the merge code; using null-terminated lists is more common but requires more code. The difference between the two is not important for today's discussion.

The first version of the code is C-ish: it uses subscripting. (As someone pointed out in class, a real C programmer would probably use pointers and the code would be somewhat less ugly.)


inf = 1000000
x = [2, 4, 6, 10, 20, 24, 36, inf]
y = [1, 3, 5, 15, 27, 45, 57, inf]

def mergelists1(l1,l2):
    out = []
    i1 = 0; i2 = 0
    while l1[i1]<>inf or l2[i2]<>inf:
        if l1[i1]<=l2[i2]:
            out.append(l1[i1])
            i1 = i1+1
        else:
            out.append(l2[i2])
            i2 = i2+1
    # we want to process both list fully. This means that
    # at the end we want
    # l1[i1]==inf and l2[i2]==inf
    # Remember that the proof rule for loops tells us therefore
    # that the loop test needs to be 
    # l1[i1]<>inf or l2[i2]<>inf
    return out

print mergelists1(x,y)
In the merge algorithm we only care about one element of each list at any time and we only care about them in sequence; using the full random-access power of subscripting is unnecessary and somewhat error-prone in such situations. Many languages, including Python, Java, and the collection libraries for C++ provide the notion of iterators that return the elements of a collection one by one. In python, iterators are constructed using iter. iter(l) produces an object with a next() method that can be called repeatedly, each time returning the next element of the list.

The code of mergelists2 takes advantage of iterators and is an elegant expression of the merge algorithm.

def mergelists2(l1,l2):
    out = []
    it1 = iter(l1); it2 = iter(l2)
    e1 = it1.next(); e2 = it2.next()
    while e1<>inf or e2<>inf:
        if e1<e2:
            out.append(e1)
            e1 = it1.next()
        else:
            out.append(e2)
            e2 = it2.next()
    return out

You might now ask, "what if I want to build an iterator for my own data structures?" Let's exam the case of lists first and try to reproduce the behavior of iter(). An iterator is an object with a next method. In order to produce the elements of the list one by one we need to maintain some state in the object saying what was the last element produced.


class iter:
    def __init__(self,l):
        # we choose to represent "none produced yet" with index -1
        self.pos = -1
        self.l =l
    def next(self):
        # choosing -1 for "none produced yet" makes next() easy
        self.pos = self.pos+1
        return self.l[self.pos]
The code for our class iter will work in mergelists2. It isn't a full reproduction of the built-in iter because it doesn't handle running off the end of the list the same that the built-in does. We can ignore that fact for this discussion.

Notice how in class iter we had to choose a representation for the state of the iteration and manage it by hand. Compare how it is done in iter with how we would naturally write a loop to examine the elements of a list one by one:

   for e in l:
      ... do something with e
The generator technique lets us write iterators using code very similar to what we would normally write to traverse a data structure if we were writing the code in-line in our program.

A python generator is any function that contains a yield statement. A new object containing a next() method is created, all the parameters are remembered and the object is returned to the function's caller. When the next() method is called the body of the defined function begins to execute and continues up the first yield statement executed. Yield is like in return in that the caller sees the value yielded as the value of its call to next(), but within the function, again the state is remembered and execution will resume at that point when next() is called again. Eventually there may be no more values to produce as the result of next at which point the function can return.

We can now return to our list iterator implementation and see how it would be done with a generator.

def iter(l):
    for e in l:
        yield e
That's all. Because the function contains a yield it is a generator so when first called it returns an object with a next method. Therefore mergelists2 above will work fine with an iterator produced by this generator.

The hand-crafted iterator class for lists wasn't too painful but consider now what you would have to do to build an iterator for a binary tree class such as



# A binary tree class.
class Tree:

    def __init__(self, label, left=None, right=None):
        self.label = label
        self.left = left
        self.right = right
        
    # __repr__ gives a string representation of any object
    # used, e.g.,  to print the object
    def __repr__(self, level=0, indent="    "):
        s = level*indent + `self.label`
        if self.left:
            s = s + "\n" + self.left.__repr__(level+1, indent)
        if self.right:
            s = s + "\n" + self.right.__repr__(level+1, indent)
        return s

A sorted Tree can be built from a sorted list as follows:
Now let's look at how a generator could be used to build an iterator
for binary trees (this code is intended to go into the Tree class above
    
    def iter(t):
        if t:
            if t.left:
                for x in t.left.iter():
                    yield x
            yield t.label
            # t.right() also produces an iterator because
            # of the __iter__ method that is automagically called
            # when an iterator is needed for a class
            if t.right:
                for x in t.right.iter():
                    yield x
... and now merging two trees is just like merging two lists:
def mergetrees(t1,t2):
    out = []
    it1 = t1.iter()
    it2 = t2.iter()
    e1 = it1.next(); e2 = it2.next()
    while e1<>inf or e2<>inf:
        if e1<e2:
            out.append(e1)
            e1 = it1.next()
        else:
            out.append(e2)
            e2 = it2.next()
    return out

xt = tree(x)
yt = tree(y)
print mergetrees(xt, yt)

                                                                                                                                                                                                                                                                                                                                             
  (c) 2003 Curtis Dyreson, (c) 2004 Carl H. Hauser           E-mail questions or comments to Prof. Carl Hauser