Iterators and Generators in Python3
Recently I needed a way to infinitely loop over a list in Python. Traditionally, this is extremely easy to do with simple indexing if the size of the list is known in advance. For example, an approach could look something like this:
1l = [1, 2, 3]
2i = 0
3
4while True:
5 print(l[i])
6 i += 1
7 if i == len(l):
8 i = 0
Eventually I settled on a an inbuilt approach using the itertools
module from the standard library. Consequently the code became a lot cleaner:
1import itertools
2l = [1, 2, 3]
3
4for n in itertools.cycle(l):
5 print(n)
But for the fun of it, I decided to try to re-implement itertools.cycle
myself, and in order to do that, I first have to understand how generators work in python3
. I already do, but I will explain it here, after which I will demonstrate the itertools.cycle
re-implementation.
Both iterators and generators have their own short sections in the offical python3 tutorial. This post is my own explanation of them.
Iterators
Iterators are objects that define a __next__
method that is called every time the next value of the iterable is desired. They can be iterated over, yielding their members one by one.
An object is said to be iterable if it defines an __iter__
method that returns an above mentioned iterator.
It can be a little confusing. Think of it like this: strings are iterable (able to be iterated over) because their base class (
str
) defines an__iter__
method that returns an iterator object which has a__next__
method.
For example, in the following code:
1for n in "abc":
2 print(n)
, behind the scenes, for
calls iter()
on "abc"
, which in turn calls "abc".__iter__()
. Since "123"
is of type str
, a class that defines the __iter__
method, that one is called, and an iterator object is returned.
Then, for
simply calls the __next__
method on that interator object, passing its return value to you in the form of a variable, until __next__
raises a StopIteration
exception, at which point the loop stops.
It actually calls
next()
on the iterator object, which in turn calls its__next__
method. Thenext()
builtin is handy because it provides another parameter that can be used to specify a value to be returned if the iterator object is already exahusted.
Example
We can see this in action:
1> string = "abc"
2> iterator = string.__iter__()
3
4> print(iterator.__next__())
5'a'
6> print(iterator.__next__())
7'b'
8> print(iterator.__next__())
9'c'
10
11> print(iterator.__next__())
12Traceback (most recent call last):
13 File "<stdin>", line 1, in <module>
14StopIteration
As we can see, when we exhaust the iterator, a StopIteration
exception is raised.
Using the built-in next()
function we have a greater degree of control: we can designate a value to be returned in the case where __next__
would exhaust the object (and raise an exception):
1> string = "abc"
2> iterator = string.__iter__()
3
4> print(next(iterator, 3))
5'a'
6> print(next(iterator, 3))
7'b'
8> print(next(iterator, 3))
9'c'
10> print(next(iterator, 3))
113
12> print(next(iterator, 3))
133
Instead of directly calling
object.__iter__()
, you should probably use the built-initer()
method, as it has a little added functionality depending on the use case, but ultimately does the same thing (iterate over an object, that is).
Re-implementing itertools.cycle
using iterators
Knowing all this, we can write a custom class that can infinitely iterate over another iterable:
1class InfiniteIterable:
2 def __init__(self, original):
3 self.original_iterable = original
4 self.len = len(original)
5 self.i = 0
6 def __iter__(self):
7 return self
8 def __next__(self):
9 if self.i > (self.len - 1):
10 self.i = 0
11
12 ret = self.original_iterable[self.i]
13 self.i += 1
14
15 return ret
16
17for n in InfiniteIterable("abc"):
18 print(n)
Since we never raise a StopIteration
exception from our __next__
method, the iterator never halts!
Since our custom class is an iterable object in and of itself (i.e. it defines an
__next__
method),__iter__
just returnsself
! But if we didn’t want to implement iteration logic from within this class, we could make__iter__
return an instance of some other class.
Generators
Generator functions are regular Python functions that ease the process of creating iterators. They are written exactly as normal functions, with the added requirement of including at least one yield
statement.
The return value of a generator function is a generator, which is a kind of iterator.
1>>> def g(): 2... while True: 3... yield 3 4 5>>> '__iter__' in dir(g()) and '__next__' in dir(g()) 6True 7>>> g() 8<generator object g at 0x7fe49a2b35a0> 9>>> type(g()) 10<class 'generator'>
When a yield
statement is encountered inside a generator function, it yields control back to the outside code, ‘returning’ the value that was the argument to the yield statement.
The next time the generator is re-entered (via next()
, the .send()
method, or via a for-loop iteration), executuion resumes after the yield
statement.
The generator is exhausted when the associated generator function returns, and a StopIteration
exception is raised, just like with iterators. If the generator function returned with an associated value,
it is wrapped by the StopIteration
exception instance and can be subsequently accessed. This means that a return value
statement from within a generator function is
semantically equivalent to raise StopIteration(value)
- except that the exception cannot be caught from within the containing generator function.
Implementing InfiniteIterable
using generators
1def InfiniteIterable(original_iterable):
2 i = 0
3 while True:
4 yield original_iterable[i]
5 i += 1
6 if i > (len(original_iterable) - 1):
7 i = 0
8
9for n in InfiniteIterable("abc"):
10 print(n)
This accomplishes the exact same task as manually writing the iterable does, but in a more readable manner. It is essentially just syntatic sugar.
generator.send()
- bi-directional communication
Generators also implement a .send()
method, which allows for bi-directional communication with a ‘running’ generator and outside code.
Calling next(generator)
as well as generator.__next__()
is equivalent to calling generator.send(None)
. That is to say, both next
and .send
request the next value from the generator, while .send
also sends some data in.
The result of the yield expression in the generator function becomes the value of .send
’s first argument (or None
if next()
was used instead).
yield from
The yield from
syntax, introduced in PEP 380 is used for delegating control to a subgenerator.
That is to say - the yield from
syntax iterates over the requested generator, and then yields the values back out as they come in.
The yield from
expression also has an accessible result that is present if the associated subgenerator returned with a value.
On the surface, it may seem that yield from gen()
is just a shorthand for the following for loop:
1for v in gen():
2 yield v
While that is true for a number of simple use-cases - when the semantics of the iterator-generator methods send
, throw
and close
is introduced,
it becomes apparent that the yield from
syntax performs a much more complicated job under the hood. This becomes evident after taking a look at the Formal Semantics of PEP 380.
Most notably, if the outer controlling generator is sent in a value from external code, it propagates that value and sends it to the generator
in the yield from
expression. Alongside that, it also handles all the possible edge cases as related to exception handling inside generators and the .throw
/.close
methods.