Building Mixed Hierarchies in SQL

A project that has been dear to my heart for some years now involves representing the pages of a web site as a hierarchy. This formalised structure has three advantages for me. First, navigation is built in. On any single page you can see where this page belongs in the overall scheme of things and a basic up/down, left/right navigator is, in effect, part of the package. Second, I can manage user access by assigning roles to sub-trees. And, third, I can imbue the structure with some semantics – for project management, materials handling or whatever.

Of course, with an sql backend I hit the familiar mismatch between the relational and hierarchical views. The standard solution is to use an adjacency list and this is one of many descriptions of how to do that in sql. However, I soon realised that I really want the ability to link any node in my tree to more than one parent. That way I can give different views of the data to different people or for different circumstances. What I end up with is a directed graph (with the added restriction of being non-cyclic, otherwise I get very confused).

The normal way of representing a directed graph is to note the edges. That is, for every node I record the nodes immediately above it. Using this structure I can traverse the graph from any point in it. That is fine for many things, but I want to be able to understand the full path above any given node. If nothing else, I need to understand whether the current user has access to this node, and in general that means looking an arbitrary number of layers back up the graph. Since I would prefer the database server to do most of the work I need some extra information to help it along. As a result I have come up with these three tables:

node (n):
a record defining the data used by the application. A node has a unique identifier.

  • n.node is the node identifier
edge (e):
a record defining the immediate parent/child relationship between two nodes.

  • e.node is the node in question
  • e.parent is the immediate parent
parent manifest (p):
a set of records defining the complete list of nodes that are ancestors of a specific node. This list is not ordered and will include nodes from all the hierarchies that this node belongs to.

  • p.node is the node in question
  • p.parent is an ancestor of p.node

This structure is easy to use in a number of ways:

Search in sub-trees

Find nodes matching some criterion starting from some one or more arbitrary points in the graph.

  SELELCT * FROM n JOIN p ON node 
            WHERE p.parent IN [given list of starting nodes]
            AND [match criterion]

Depth first search

Search for nodes matching some criterion: return those that correspond to a depth first search.

This is achieved in two steps: one finds all the nodes matching the criterion and the other eliminates any nodes in the result that have an ancestor node also in the result.

  SELECT * FROM n JOIN p ON node
           WHERE [match criterion]
           AND NOT EXISTS ( SELECT node FROM n AS nx JOIN p AS px ON node 
                                        WHERE [match criterion] 
                                        AND p.parent = nx.node )

As I mentioned above, user access can be controlled by giving a user relevant roles on one or more sub-trees. I need a role allocation table to link users to nodes:

user role (r):
a record defining the role that a User has for the given node

  • r.node is the node in question
  • r.user is a User reference
  • r.role is a Role reference

From this I need to find either a set of nodes that a given user can access, or the set of roles that are relevant for a given user on a given node.

Identify a User’s sub-tree roots

Each sub-tree starts at the first mention of this User in any path in a depth-first search

  SELECT * FROM n JOIN r ON node
           WHERE r.user = this_user
           AND NOT EXISTS (SELECT node FROM n AS nx JOIN p AS px ON node 
                                       WHERE [match criterion] 
                                       AND n.node = nx.node)
Finding all possible Roles for a User on a given node

The problem is to find all the Roles that a user inherits from the source node and the parents of the source node.

  SELECT r.* FROM n JOIN r ON node
                  JOIN (SELECT px.parent as node FROM p AS px
                                        WHERE px.node = source) AS py ON node
                   WHERE r.node = source

I’m sure you’ve noticed that these patterns do not use the edge table. I still need it, though, on the one hand to identify parents, siblings and children of the node being dislayed, and on the other to build the parent manifest when unlinking nodes from the graph. The parent manifest looses information because a node could derive an ancestor through more than one path through the graph. So I need the edge table as a definition of the graph for those cases where the actual path is important.

These small functional elements may need to be combined to do interesting things. A generic search, for instance, would need to be limited to sub-trees that the user is allowed to see. Of course, in Python, what is really nice is using SQLAlchemy to do the work. Because a query object is generative I can build a few basic query elements to get, for example, the whole tree that a user is permitted to see, and then restrict that by adding further filters according to the need of the moment. SQLAlchemy manfully creates whatever complex structure I need and the database backend does the rest. It all turns out to be surprisingly easy.

Category: Development | Tags: , , One comment »

One Response to “Building Mixed Hierarchies in SQL”

  1. Tags have their uses — Mike de Plume

    […] hierarchy is a good example. In fact, the software I have written uses interwoven hierarchies and, as it turns out, the parent manifest that supports the processing is identical to a set of tags. On the other hand, […]

Back to top