Four algorithm are presented here which exhaustively find all the solutions, two depth search first (DFS) and two breadth search first (BFS):
The algorithms will make use of the following primitives:
prepare(deps) adapts the input dictionary of dependencies into the almost same dictionary: except that 1. the values are turned from list to sets, and 2. the projects with no dependencies are added to the dictionary with an empty set as the value. This function is called once, prior to the algorithm.
def prepare(deps):
"""Returns the set of projects, the input dictionnary is also
updated with project without dependencies"""
for p in set(chain(*deps.values())) - set(deps.keys()):
deps[p]=[]
for p in deps:
deps[p]=set(deps[p])
return set(deps), deps
candidates(projects, deps, path) returns the list of project nodes satisfying the constraint to be added to the path: not being already in the path and having all its dependencies in the path. The constraints are checked using the set operators: a <= b means a is included in b, and a - b means the element of a without the element from b. This function is called in the algorithm, at each recursion.
def candidates(projects, deps, path):
"Returns project not in the path, but whose dependencies are"
return filter( lambda p: deps[p] <= path, projects - path)
The candidates are computed for the current path, and for each candidate, the function is called with the path augmented with the candidate.
The initial condition is the empty path for which the candidates will be the nodes without dependencies.
The termination condition is the absence of any more candidates nodes: there is no dead-end path when only the correct candidates nodes are explored. Whenever the termination condition is met, the current path is added to the accumulator, and the function returns.
def dfs(projects, deps):
"Returns a sorted list of the dependencies - depth first traversal"
def _dfs(projects, deps, path, acc):
candids = candidates(projects, deps, set(path))
if candids:
for c in candids:
_dfs(projects, deps, path + [c], acc)
else:
acc.append(path)
acc = []
_dfs(projects, deps, [], acc)
return acc
Thanks to the yield keyword, there is no need to accumulate the solutions, as with the previous algorithm, which leads to a simpler algorithm:
def idfs(projects, deps, path=[]):
candids = candidates(projects, deps, set(path))
if candids:
for c in candids:
path.append(c)
for winner in idfs(projects, deps, path):
yield winner
path.pop()
else:
yield path[:]
In breadth first search, at each iteration, it is not one path, but the list of all possible path which is computed, and there is no backtracking as with the DFS algorithm. A completely new list of augmented path is generated from the input list of incomplete paths: for each incomplete input path, a list of the path augmented with one candidate is built.
The termination condition is when no path from the input list can be augmented.
The initial condition is a list of one empty path.
def bfs(projects, deps, paths=[[]]):
"Returns a sorted list of the dependencies - breadth first traversal"
cands_lists= [ candidates(projects, deps, set(p)) for p in paths]
if any(cands_lists):
newpaths=[]
for p,cands in zip(paths,cands_lists):
newpaths.extend([p+[c] for c in cands])
return bfs(projects, deps, newpaths)
else:
return paths
The recent addition of the with recurse statement in Postgresql makes it possible to delegate complex computing to the database, without requiring the overhead of extracting the data and processing it on the database client. The syntax is a bit idiomatic, and is well explained in the official documentation: here and especially there.
Simply put:
-- create temporary table deps (p int, d int[]);
-- insert into deps values ( 1 , array[2,3]);
-- insert into deps values ( 5 , array[4]);
-- insert into deps values ( 6 , array[1]);
-- insert into deps values ( 7 , array[6, 5, 4]);
-- insert into deps values ( 8 , array[6, 4]);
-- insert into deps values ( 9 , array[6, 5, 8, 7]);
-- insert into deps values ( 11, array[6, 5, 4, 7]);
-- insert into deps values ( 10, array[4, 6, 7]);
-- insert into deps values ( 12, array[6, 7, 11, 10, 8, 1, 9]);
-- insert into deps values ( 2 , array[]::integer[]);
-- insert into deps values ( 3 , array[]::integer[]);
-- insert into deps values ( 4 , array[]::integer[]);
with recursive topsort(p,d,path) as (
select null::integer, null::integer[], array[]::integer[]
UNION ALL
select deps.p, deps.d, path || deps.p
from topsort, deps
where (deps.d <@ path and not deps.p = any(path)))
select path from topsort where array_length(path, 1) = (select count(*) from deps);
-- # if not redirected, 'more' is spawn and interactively blocks
-- psql -f topsort.sql > toto && cat toto # is easier to use