A naive, CPU hungry solution
============================


Let's say there are three nodes a, b, c:

::

  a, b, c = nodes = 'abc'
  
They are linked by the following dependencies, a needs b, a needs c,
c needs b.

::

  edges=[(a,b), (a,c), (c,b)]
  
  
Let's build a simple class to hold the nodes and edges as attributes

::

  class Graph(object):
      def __init__(self,nodes=[],edges=[]):
          self.nodes = nodes
          self.edges = edges
  
  G = Graph(nodes, edges)
  
Three primitives: *deps*, *are_before* and *is_winner*
------------------------------------------------------

To solve our problem, three simple primitives are written: *deps()*,
*are_before()* and *is_winner()*. 

**deps()** returns the dependencies for the given graph and node

:: 

  def deps(graph,node):
      return (dep for (n,dep) in graph.edges if node==n)
  
Given *a*, *deps* returns *b* and *c*, given *c*, *deps* returns *b*, etc.

>>> list(deps(G,a))
[b, c]
>>> list(deps(G,c))
[b]
>>> list(deps(G,b))
[]


**are_before()** returns whether the sample is indeed composed of
elements located in the list, before the *pivot*.

::

  def are_before(List, sample, pivot):
      return all(List.index(n) < List.index(pivot) for n in sample)
  

>>> are_before([a,b,c], [a,b], c)
True
>>> are_before([a,b,c], [a,c], b)
False

**is_winner()** checks whether the list satisfies the dependency graph.

::

  def is_winner(graph, List):
      return all(are_before(List,
                            deps(graph, pivot),
                            pivot) 
                 for pivot in L)
  
>>> incorrect_solution = [a, b, c]
>>> is_winner(G, incorrect_solution)
False
   
This solution is not satisfying because a needs b, and on the other
hand, b is located after a (index(a)<index(b)).
   
>>> is_winner( G, [b, c, a] )
True
  
The search() function
---------------------

*search()* combines the previous primitives to process a graph given
as an input and returns the list of the solutions i.e. the list of
permutations of the nodes where the requirements are always listed
before the nodes which requires them

::

  from itertools import permutations
  def search(graph):
      return filter(lambda l:is_winner(graph,l),
                    permutations(graph.nodes))
  
>>> sorted(schedule(G))
[(b, c, a)]
 
Really this algorithm is quite simple and can be expressed in four
lines of Python: let's recap briefly

::

  from itertools import chain, permutations as perm
  deps       = lambda G,node:(req for (n,req) in G.edges if node==n)
  are_before = lambda L,l,p: all(L.index(n) < L.index(p) for n in l)
  is_winner  = lambda G,l: all(are_before(l,deps(G,n),n) for n in l)
  search     = lambda G: filter(lambda l:is_winner(G, l), perm(G.nodes))
  
>>> print search(G)
[b, c, a]

Complexity bites
----------------

Let's try our functions on a real life example, it is not even
really big. There are nine tasks with dependencies::

  >>> from data import deps as data
  >>> from pprint import pprint
  >>> pprint(data)
  {1: [2, 3],
   5: [4],
   6: [1],
   7: [6, 5, 4],
   8: [6, 4],
   9: [6, 5, 8, 7],
   10: [4, 6, 7],
   11: [6, 5, 4, 7],
   12: [6, 7, 11, 10, 8, 1, 9]}

  # Transforming the dictionary into two lists: edges and nodes
  >>> edges = list(
      chain(*[[ (n,k) for n in v ] for k,v in data.iteritems()]))
  >>> nodes = list(set(chain(*data.values())))
  >>> nodes.extend(data.keys())

  >>> print nodes
  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

  >>> print edges
  [(2, 1), (3, 1), (4, 5), (1, 6), (6, 7), (5, 7), (4, 7), (6, 8),
  (4, 8), (6, 9), (5, 9), (8, 9), (7, 9), (4, 10), (6, 10), (7, 10),
  (6, 11), (5, 11), (4, 11), (7, 11), (6, 12), (7, 12), (11, 12),
  (10, 12), (8, 12), (1, 12), (9, 12)]

  >>> G = Graph(nodes, edges)
  >>> print search(G)
  [[4, 5, 3, 2, 1, 6, 8, 7, 11, 10, 9, 12], ...

The search seems to never return and actually took a whole night to
be able to finish.  We hit a *complexity wall*. Let's have a look at
the respective algorithmic complexity of the functions::

  O(deps)       = O(n)
  
This notation means that the computations defined in *deps()* is
directly proportional to the size *n*, of the input data. For
*are_before()*, this is worse: the computations are proportional to
the *square* on the input data::

  O(are_before) = O(l * 2 O(index))
                = O(l * O(n))
                = O(n2)
  
The complexity for *is_winner()* and *search()* are also high::

  O(is_winner)  = O(l * O(are_before) + O(l * O(deps))
                = O(l * O(n2)  + O(l* O(n)))
                = O( O(n3)  + O(n2) )
                = O(n3)
  
  O(search)     = O(is_winner) * O(perm)
                = O(n3) * O(n!)
                = O(n!)

On the first hand, *deps()* and *are_before()* can be enhanced to
reduce their complexity:

#. using a list that needs to be unrolled to get the dependencies of
   a node is suboptimal, instead a dictionary data structure is more
   adapted to access the dependencies of a node directly. Complexity
   would come down from O(n) to O(1).

#. *index()*'s execution time depends on the size of the list. Using
   a dictionary where the position of the list value would be
   accessed in constant time is more efficient. *are_before()* would
   operate in O(n) instead of O(n2)

But these improvements are cosmetic with regards to the size fo the
data to test: if 12 nodes needs to be sorted, as with the data part
of the *deps* module below, then 12! = 479 001 600 (half a billion)
permutations needs to be tested. While actually, entire branches do
not need to be tested: see the next article: :doc:`off_the_shelf`,
for better ways to sort dependencies.