Table Of Contents

Previous topic

Sorting dependencies

Next topic

Topsort in available packages

This Page

A naive, CPU hungry solution

Let’s say there are three nodes a, b, c:

a, b, c = nodes = 'abc'

They are linked by the following dependencies, a needs b, a needs c, c needs b.

edges=[(a,b), (a,c), (c,b)]

Let’s build a simple class to hold the nodes and edges as attributes

class Graph(object):
    def __init__(self,nodes=[],edges=[]):
        self.nodes = nodes
        self.edges = edges

G = Graph(nodes, edges)

Three primitives: deps, are_before and is_winner

To solve our problem, three simple primitives are written: deps(), are_before() and is_winner().

deps() returns the dependencies for the given graph and node

def deps(graph,node):
    return (dep for (n,dep) in graph.edges if node==n)

Given a, deps returns b and c, given c, deps returns b, etc.

>>> list(deps(G,a))
[b, c]
>>> list(deps(G,c))
[b]
>>> list(deps(G,b))
[]

are_before() returns whether the sample is indeed composed of elements located in the list, before the pivot.

def are_before(List, sample, pivot):
    return all(List.index(n) < List.index(pivot) for n in sample)
>>> are_before([a,b,c], [a,b], c)
True
>>> are_before([a,b,c], [a,c], b)
False

is_winner() checks whether the list satisfies the dependency graph.

def is_winner(graph, List):
    return all(are_before(List,
                          deps(graph, pivot),
                          pivot)
               for pivot in L)
>>> incorrect_solution = [a, b, c]
>>> is_winner(G, incorrect_solution)
False

This solution is not satisfying because a needs b, and on the other hand, b is located after a (index(a)<index(b)).

>>> is_winner( G, [b, c, a] )
True

The search() function

search() combines the previous primitives to process a graph given as an input and returns the list of the solutions i.e. the list of permutations of the nodes where the requirements are always listed before the nodes which requires them

from itertools import permutations
def search(graph):
    return filter(lambda l:is_winner(graph,l),
                  permutations(graph.nodes))
>>> sorted(schedule(G))
[(b, c, a)]

Really this algorithm is quite simple and can be expressed in four lines of Python: let’s recap briefly

from itertools import chain, permutations as perm
deps       = lambda G,node:(req for (n,req) in G.edges if node==n)
are_before = lambda L,l,p: all(L.index(n) < L.index(p) for n in l)
is_winner  = lambda G,l: all(are_before(l,deps(G,n),n) for n in l)
search     = lambda G: filter(lambda l:is_winner(G, l), perm(G.nodes))
>>> print search(G)
[b, c, a]

Complexity bites

Let’s try our functions on a real life example, it is not even really big. There are nine tasks with dependencies:

>>> from data import deps as data
>>> from pprint import pprint
>>> pprint(data)
{1: [2, 3],
 5: [4],
 6: [1],
 7: [6, 5, 4],
 8: [6, 4],
 9: [6, 5, 8, 7],
 10: [4, 6, 7],
 11: [6, 5, 4, 7],
 12: [6, 7, 11, 10, 8, 1, 9]}

# Transforming the dictionary into two lists: edges and nodes
>>> edges = list(
    chain(*[[ (n,k) for n in v ] for k,v in data.iteritems()]))
>>> nodes = list(set(chain(*data.values())))
>>> nodes.extend(data.keys())

>>> print nodes
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

>>> print edges
[(2, 1), (3, 1), (4, 5), (1, 6), (6, 7), (5, 7), (4, 7), (6, 8),
(4, 8), (6, 9), (5, 9), (8, 9), (7, 9), (4, 10), (6, 10), (7, 10),
(6, 11), (5, 11), (4, 11), (7, 11), (6, 12), (7, 12), (11, 12),
(10, 12), (8, 12), (1, 12), (9, 12)]

>>> G = Graph(nodes, edges)
>>> print search(G)
[[4, 5, 3, 2, 1, 6, 8, 7, 11, 10, 9, 12], ...

The search seems to never return and actually took a whole night to be able to finish. We hit a complexity wall. Let’s have a look at the respective algorithmic complexity of the functions:

O(deps)       = O(n)

This notation means that the computations defined in deps() is directly proportional to the size n, of the input data. For are_before(), this is worse: the computations are proportional to the square on the input data:

O(are_before) = O(l * 2 O(index))
              = O(l * O(n))
              = O(n2)

The complexity for is_winner() and search() are also high:

O(is_winner)  = O(l * O(are_before) + O(l * O(deps))
              = O(l * O(n2)  + O(l* O(n)))
              = O( O(n3)  + O(n2) )
              = O(n3)

O(search)     = O(is_winner) * O(perm)
              = O(n3) * O(n!)
              = O(n!)

On the first hand, deps() and are_before() can be enhanced to reduce their complexity:

  1. using a list that needs to be unrolled to get the dependencies of a node is suboptimal, instead a dictionary data structure is more adapted to access the dependencies of a node directly. Complexity would come down from O(n) to O(1).
  2. index()‘s execution time depends on the size of the list. Using a dictionary where the position of the list value would be accessed in constant time is more efficient. are_before() would operate in O(n) instead of O(n2)

But these improvements are cosmetic with regards to the size fo the data to test: if 12 nodes needs to be sorted, as with the data part of the deps module below, then 12! = 479 001 600 (half a billion) permutations needs to be tested. While actually, entire branches do not need to be tested: see the next article: Topsort in available packages, for better ways to sort dependencies.