Table Of Contents

Previous topic

Topsort in available packages

Next topic

Extending Twisted Mail with the IMAP Push

This Page

Graph traversal

Four algorithm are presented here which exhaustively find all the solutions, two depth search first (DFS) and two breadth search first (BFS):

  1. After the classic DFS algorithm, an recursive generator is presented. It does not return the whole list after a while, instead (being a generator), it returns one solution and then yields back execution until called again,
  2. After the classic BFS algorithm, a recursive SQL query is presented. It makes it possible to solve problems directly on the SQL server without going to the trouble of extracting the data to an SQL client, then finding the solution. This solution is very fast.

The algorithms will make use of the following primitives:

prepare(deps) adapts the input dictionary of dependencies into the almost same dictionary: except that 1. the values are turned from list to sets, and 2. the projects with no dependencies are added to the dictionary with an empty set as the value. This function is called once, prior to the algorithm.

def prepare(deps):
    """Returns the set of projects, the input dictionnary is also
    updated with project without dependencies"""
    
    for p in set(chain(*deps.values())) - set(deps.keys()):
        deps[p]=[]

    for p in deps: 
        deps[p]=set(deps[p])

    return set(deps), deps

candidates(projects, deps, path) returns the list of project nodes satisfying the constraint to be added to the path: not being already in the path and having all its dependencies in the path. The constraints are checked using the set operators: a <= b means a is included in b, and a - b means the element of a without the element from b. This function is called in the algorithm, at each recursion.

def candidates(projects, deps, path):
    "Returns project not in the path, but whose dependencies are"
    return filter( lambda p: deps[p] <= path, projects - path)

Recursive generator

Thanks to the yield keyword, there is no need to accumulate the solutions, as with the previous algorithm, which leads to a simpler algorithm:

def idfs(projects, deps, path=[]):
    candids = candidates(projects, deps, set(path))
    if candids:
        for c in candids:
            path.append(c)
            for winner in idfs(projects, deps, path):
                yield winner
            path.pop()
    else:
        yield path[:]

Recursive SQL queries

The recent addition of the with recurse statement in Postgresql makes it possible to delegate complex computing to the database, without requiring the overhead of extracting the data and processing it on the database client. The syntax is a bit idiomatic, and is well explained in the official documentation: here and especially there.

Simply put:

  1. the with recurse statement defines a temporary table called topsort with three columns: project, dependencies, path,,
  2. then, there are two clauses separated by UNION ALL,
    1. the first clause is the initial condition: the empty path,
    2. the second clause is the recursive one: it selects from topsort. This clause augments each record in the topsort table with the project, only the rows statisfying the candidates conditions above are added to the topsort table.
  3. the termination condition is implicit, it is reached when the recursive clause produces no more rows,
  4. as the topsort table keeps the incomplete path from each iteration, the final restriction keeps only the complete path (those were every projects was cited).
-- create temporary table  deps (p int, d int[]);

-- insert into deps values ( 1 , array[2,3]);
-- insert into deps values ( 5 , array[4]);
-- insert into deps values ( 6 , array[1]);
-- insert into deps values ( 7 , array[6, 5, 4]);
-- insert into deps values ( 8 , array[6, 4]);
-- insert into deps values ( 9 , array[6, 5, 8, 7]);
-- insert into deps values ( 11, array[6, 5, 4, 7]);
-- insert into deps values ( 10, array[4, 6, 7]);
-- insert into deps values ( 12, array[6, 7, 11, 10, 8, 1, 9]);
-- insert into deps values ( 2 , array[]::integer[]);
-- insert into deps values ( 3 , array[]::integer[]);
-- insert into deps values ( 4 , array[]::integer[]);

with recursive topsort(p,d,path) as (
     select null::integer, null::integer[], array[]::integer[]
UNION ALL
     select deps.p, deps.d, path || deps.p
     from topsort, deps
     where (deps.d <@ path and not deps.p = any(path)))
select path from topsort where array_length(path, 1) = (select count(*) from deps);

-- # if not redirected, 'more' is spawn and interactively blocks
-- psql -f topsort.sql > toto && cat toto     # is easier to use