Table Of Contents

Previous topic

The Reactor and the Protocols

Next topic

The yield keyword simplifies Twisted code

This Page

The Deferred

Advantages and usage

Event driven frameworks are usually provided with a set of classes with predefined events. For example, to model an HTTP client, we expect to have to derive a class and implement a method with a specific name. Something like:

class MyClient(HTTPClient):
    gotHtml(html):
        "here my specific client code parsing the html"

Twisted indeed provides similar pattern, but Twisted also introduces a powerful abstraction to represent an event and its pending action: the Deferred is an object which can holds a function. The code creating a request is expected to return a result, which is unavailable at this point, so instead, it returns a deferred, for which the requesting code expect the user to be filled it with a function to process the results. The requester object which is usually an instance of child class of Protocol also keeps a reference to this deferred and should call the callback, as soon as it is notified by the reactor that the data is received. The Twisted documentation calls it a “promise of a result”, here and there.

Here are a hundred Twisted concurrent pending increments on a global variable, using deferreds:

>>> from time import time as now
>>> from twisted.internet.defer import Deferred
>>>
>>> def request():
...     return Deferred()
>>>
>>> counter, start = 0, now()
>>> deferreds = [request().addCallback(incr) for i in xrange(100)]
>>>
>>> # There is a hundred concurrent pending actions at this point ...
>>>
>>> # ... fire NOW !
>>> for d in deferreds:
...     d.callback(None)
...
>>> elapsed = now() - start
>>> 2 * elapsed < no_lock
True
>>> counter
100000

This code runs even twice faster as the code running 100 threads without locks, and is has the noticeable advantages of being correct. Here are three great things about the Deferred:

  • avoid the requirement to subclass anything to write a callback. No need for the object oriented programming to kick in, good old functions will do just fine.

  • the code making a request does not have to specify, know or care about the name of the callback function, which simplifies the writing of new requesting API. The requester calls the method callback() on a deferred, when the data is received. It is up to the user to store the callable it seems adapted, in the Deferred return by asynchronous function.

    It is up to the job of the protocol implementer to create a deferred, keep it as a attribute of the protocol instance and execute the callback which has been set by the protocol user, on this deferred on the desired event.

  • the event represented by the deferred, and the pending action it fires can be manipulated: stored, listed, passed around, chained or cancelled. Take a list of events, it is not difficult to set a callback when the first event, or all events have happened.

Synchronisation

Synchronizing calls means specifying the order and the event at which actions will take place. In a sequential script, the execution schema is implicit and so obvious that it is not even worth mentioning it:

  • the network calls are executed along with the successive urlopen() function calls
  • and the program stops when the interpreter reaches the end of the script.

So far so good, but now, in a Twisted program, things go differently, there is no more gravity, and there is a fifth dimension... ok, I am being a bit dramatic, the differences are more subtle. There are two phases:

  1. the first phase is the specification of the execution steps through the stacking of connections request to the reactor, and the definition of callbacks path. getPage() function call does not actually trigger a network HTTP request but creates a deferred which stacks a step in a callback chain,
  2. the second phase is inside reactor.run() , which triggers the execution of the callback chains and synchronizes the callbacks depending on when the response are available.

Just comment out the call to run the reactor in the concurrent script, and use wireshark to check that getPage() does not carry out the network call by itself.

In our last problem, the concurrent script did not stop when the 30 calls completed successfully and require an explicit signal to terminate. Let’s synchronize the end of the script to the completion of the 30 page download. In Twisted terms, this translates as gather the deferred returned from the requests in a list, define a callback which will stop the reactor when all the deferreds in the list have completed.

The code should be modified to create a DeferredList from the list of calls to the title function. DeferredList is a Twisted primitive which returns a deferred which fires when all the deferred have completed. An anonymous function which stop the reactor is attached as a callback to the DeferredList:

l = [ getPage(url).addCallback(getpage_callback) for i in range(30) ]
d.DeferredList(l)
d.addCallback(lambda n:reactor.stop())

Here, the expression lambda n:reactor.stop() returns a function whose only action is to call the reactor.stop(). This new function is required because reactor.stop() does not comply with the callback specification: a callback must have at least one argument. The anonymous function created with lambda is created to present the correct signature.

Now that the script terminates gracefully, let’s clarify a common misunderstanding: what does the reactor know about the deferreds that the user manipulate? The answer is: nothing. The interfaces that the reactor knows are the few hardcoded functions from the UDP, TCP and SSL transport protocols such as connectionMade(), dataReceived(), and other methods. The reactor maintains a list of transport instances stored as attributes of protocoles instances which hold a Deferred created by the request methods and that the dataReceived() methods expects to fire the callback.

Now this concurrent version terminates, its performance can be compared to a sequential script. It is much more efficient (on my machine, it is 8 times more efficient). Note that for a threaded version of the script

~$ time python trivial_sequential.py
real 1m22.945s
~$ time python trivial_concurrent.py
real 0m10.315s

The central mechanisms of Twisted were presented in the previous sections, you are almost there ! The last section before the conclusion shows a nicer way to present Twisted code. The two first subsections are recaps on the standard yield keyword and Python decorators.