Table Of Contents

Previous topic

A naive, CPU hungry solution

Next topic

Graph traversal

This Page

Topsort in available packages

Topsort is a fast algorithm operating on a set of dependencies, it only returns one of the solutions. This is usually what is actually needed: as for a labyrinth for instance, the problem usually is to find the exit, not exhaustively list every possible way out.

The topsort package

This package, available on Pypi, does not successfully installs at this time (june 2010), but a working simplified version of the algorithm is reproduced here (the graph cycle detection was suppressed : make sure there are no circular dependencies in the input graph).

The function has been written by Tim Peters, author of the super efficient timsort algorithm. The function is in three steps :

  1. the preparation of a small data structure: every nodes is associated to its number of child in a dictionary called num_childs,

  2. the initial condition: the nodes without children (their num_childs‘ value is zero) are stored in the answer list,

  3. the loop iterates on the nodes in the answer list. At each iteration, each current node’s parents see its child count decremented by one. Whenever a child has zero parents, it is appended to the answer list.

    The iteration stops when all nodes in the answer list has been processed.

At each iteration, the num_parents shrinks and the algorithm crunches a smaller graph. The memory use of this algorithm does not increase along the execution. Here is the code of this topsort:

def tims_topsort(deps):

    # 1. data structure preparation
    edges = list(chain(*[[(parent,child) for parent in deps[child]] 
                         for child in deps ]))

    num_childs = dict([ (k,0) for k in chain(*edges) ])
    for parent,_ in edges: 
        num_childs[parent] += 1 

    # 2. initial condition
    answer = filter(lambda x: num_childs[x] == 0, num_childs)

    # 3. traversing the graph
    for child in answer:
        if deps.has_key(child):
            for parent in deps[child]:
                num_childs[parent] -= 1
                if num_childs[parent] == 0:
                    answer.append(parent)

    return list(reversed(answer))

Let’s try it on the small real world data used previously:

>>> from data import deps
>>> print tims_topsort(deps)
[3, 2, 4, 1, 5, 6, 7, 8, 9, 10, 11, 12]

From the Pygraph package

The topsort algorithm from the Pygraph project is imported from the sorting algorithms package:

>>> from pygraph.algorithms.sorting import topological_sorting

This alogrithm requires a digraph as the input. prepare() takes a dictionary of dependencies, detects the list of edges and the set of nodes and returns an adapted digraph:

>>> from pygraph.classes.digraph import digraph
def prepare(deps):
   
    # As the topsort algorithm expects edges in the form (dependency,
    # project) and not (project, dependency), the dependency
    # dictionary entry 1:[2,3] becomes (3, 1), (2, 1), etc..

    edges = list(chain(*[[(parent,child) for parent in deps[child]]
                         for child in deps]))
    nodes = set(chain(*edges))

    G = digraph()
    for n in nodes: 
        G.add_node(n)
    for e in edges: 
        G.add_edge(e)

    return G

Then, it is just a matter of calling topsort on the digraph output:

>>> from data import deps
>>> print topological_sorting(prepare(deps))
[4, 5, 3, 2, 1, 6, 8, 7, 11, 10, 9, 12]

The simplified topsort runs four times faster than the implementation in the pygraph packages, but most importantly it is a million time faster than the naive implementation

>>> from timeit import Timer
>>> print Timer(lambda : tims_topsort(deps)).timeit(number=1000)
0.0732760429382
>>> print Timer(lambda : topological_sorting(prepare(deps))
...             ).timeit(number=1000)
0.333679914474

The unit is the second, a thousand executions of tim(deps) took 7 hundredths of a second that is tim(deps) took 7 microseconds.The topsort from python-grapgh took 33 microseconds.

A big difference between the two algorithm it that the naive solution will check every permutation possible while topsort starts with a small list and only append correct nodes.

Topsort in Erlang

Erlang actually provides a module in the standard library offering several algorithm dealing with graphs. The digraph package offers the primitives to manages graphs, the digraph_utils package offers the algorithm. The code presented below is actually very similar to our script based on the Pygraph project: first the construction of the digraph, then the execution of the algorithm.

-module(topsort_stdlib).
-export([start/0,test/0]).
-import(lists,[map/2,foreach/2,flatmap/2,usort/1]).

prepare() ->

    {ok,[L]}=file:consult("projects.term"),

    Edges=flatmap(fun({C,G}) -> [ {P,C} || P<-G] end,D),
    Vertices=usort(flatmap(fun tuple_to_list/1, Edges)),

    G=digraph:new(),

    foreach(
      fun(Vertex) -> digraph:add_vertex(G,Vertex) end,
      Vertices),

    foreach(
      fun({Parent,Child}) -> digraph:add_edge(G,Parent,Child) end,
      Edges),

    G.


start() ->
    G = ,
    io:format("Here are the projects, sorted by their dependences~n"
	      "~p~n",[digraph_utils:topsort(prepare())]).


test()-> eunit:test(
	   [
	    fun() -> 
		    [3,2,1,4,5,6,8,7,9,11,10,12]
			= topsort_from_deps(load_deps()) end
	    ]).

    

The next part of this article, Graph traversal presents several “by hand” implementations of topsort as graph traversal. The generic method of graph traversal can be adapted to build high performance or distributed implementations.