Archive for November 2010

Tags have their uses

November 21st, 2010 — 8:59pm

Some time ago I was looking for software to support the small consultancy I worked at. It needed to include elements from groupware, customer relationship management and project management, and it needed to work over the internet. At the time we did have some complex software development going on and I was hoping there would be something that would help with that. In particular, I was looking for task hierarchies and ways of grouping and structuring associated documentation. This was at least recognised on commercial desk-top applications, but seemed thin on the ground with open source web-based software.

As it happens, I started to write my own system, but it did occur to me that the ‘tag’ feature that most of the software supported might be useful and, maybe, save me some work. This post captures some of my thoughts about tags and shows why they don’t cut it for what I want.

Tags are, primarily, a search tool. I can categorise things by anything I like, and that might include: type of document; indication of some subject covered in the document; the year or month of publication; client; and so on. So, assuming I placed the tags correctly in the first place, I can look for pages that give me specification documents that relate to a specific interface and used for a particular client. This is great. I have to maintain the tags, of course, but, given that discipline, I have a powerful search technique. What I don’t have, necessarily, is a way of looking at the context of the resulting document. I have to use some other facility in the software, or create a new search, to see pages that might relate to what I found in the first search.

What I could do, of course, is add tags that help me relate documents together, and that leads to the idea of adding, in effect, a parent to a set of documents. In other words, I can create a hierarchy by naming groups of things and placing the tags appropriately. These groups are quite flat, of course. The structure is implied and not explicit, but once I have added client, project, documentation, phase and other such things to the set of possible tags I can begin to structure the data in a useful way. This rapidly becomes difficult to maintain. A miss-spelled or forgotten tag can lead to orphaned pages that may be difficult to find (and no-one is going to bother to look if the page does not pop up in a search). Equally, the structure can be difficult to search for someone who has forgotten the exact tag.

The other thing I might be interested in doing is looking for pages that have some dynamic characteristic, such as activities that are at some particular stage in a work flow. As an example I might record system issue reports, and, after triage, they might be classified as ‘won’t fix’, ‘next software release’, and ‘showstopper’. That’s not too difficult, and is certainly flexible, because I can easily add a new status in response to the latest quality initiative. Life gets a bit more difficult if I add in another process, such as software development. Here I might have ‘next in line’, ‘in development’, ‘in testing’, and ‘released’. This also works, but I can’t prevent something going straight from ‘next in line’ to ‘released’ (I know, it happens all the time, but some people might prefer to have some control over the process.) So, it turns out that tags can be used for dynamic attributes, but, because there is no support for semantic content and it is up to the users to maintain the logical consistency of the tag set.

The unsurprising conclusion is that tags store data and can therefore be used for pretty much anything, but there is no natural semantics associated with them that allows the system to do things instead of the user. Creation of a hierarchy is a good example. In fact, the software I have written uses interwoven hierarchies and, as it turns out, the parent manifest that supports the processing is identical to a set of tags. On the other hand, when it comes to dynamic attributes, semantics is everything and the underlying storage mechanism is largely irrelevant. The ultimate conclusion, of course, is that one should design software to suit the requirements rather than trying to force a particular mechanism to do duty where it is not appropriate.

Comments Off on Tags have their uses | Development

Building Mixed Hierarchies in SQL

November 21st, 2010 — 7:14pm

A project that has been dear to my heart for some years now involves representing the pages of a web site as a hierarchy. This formalised structure has three advantages for me. First, navigation is built in. On any single page you can see where this page belongs in the overall scheme of things and a basic up/down, left/right navigator is, in effect, part of the package. Second, I can manage user access by assigning roles to sub-trees. And, third, I can imbue the structure with some semantics – for project management, materials handling or whatever.

Of course, with an sql backend I hit the familiar mismatch between the relational and hierarchical views. The standard solution is to use an adjacency list and this is one of many descriptions of how to do that in sql. However, I soon realised that I really want the ability to link any node in my tree to more than one parent. That way I can give different views of the data to different people or for different circumstances. What I end up with is a directed graph (with the added restriction of being non-cyclic, otherwise I get very confused).

The normal way of representing a directed graph is to note the edges. That is, for every node I record the nodes immediately above it. Using this structure I can traverse the graph from any point in it. That is fine for many things, but I want to be able to understand the full path above any given node. If nothing else, I need to understand whether the current user has access to this node, and in general that means looking an arbitrary number of layers back up the graph. Since I would prefer the database server to do most of the work I need some extra information to help it along. As a result I have come up with these three tables:

node (n):
a record defining the data used by the application. A node has a unique identifier.

  • n.node is the node identifier
edge (e):
a record defining the immediate parent/child relationship between two nodes.

  • e.node is the node in question
  • e.parent is the immediate parent
parent manifest (p):
a set of records defining the complete list of nodes that are ancestors of a specific node. This list is not ordered and will include nodes from all the hierarchies that this node belongs to.

  • p.node is the node in question
  • p.parent is an ancestor of p.node

This structure is easy to use in a number of ways:

Search in sub-trees

Find nodes matching some criterion starting from some one or more arbitrary points in the graph.

  SELELCT * FROM n JOIN p ON node 
            WHERE p.parent IN [given list of starting nodes]
            AND [match criterion]

Depth first search

Search for nodes matching some criterion: return those that correspond to a depth first search.

This is achieved in two steps: one finds all the nodes matching the criterion and the other eliminates any nodes in the result that have an ancestor node also in the result.

  SELECT * FROM n JOIN p ON node
           WHERE [match criterion]
           AND NOT EXISTS ( SELECT node FROM n AS nx JOIN p AS px ON node 
                                        WHERE [match criterion] 
                                        AND p.parent = nx.node )

As I mentioned above, user access can be controlled by giving a user relevant roles on one or more sub-trees. I need a role allocation table to link users to nodes:

user role (r):
a record defining the role that a User has for the given node

  • r.node is the node in question
  • r.user is a User reference
  • r.role is a Role reference

From this I need to find either a set of nodes that a given user can access, or the set of roles that are relevant for a given user on a given node.

Identify a User’s sub-tree roots

Each sub-tree starts at the first mention of this User in any path in a depth-first search

  SELECT * FROM n JOIN r ON node
           WHERE r.user = this_user
           AND NOT EXISTS (SELECT node FROM n AS nx JOIN p AS px ON node 
                                       WHERE [match criterion] 
                                       AND n.node = nx.node)
Finding all possible Roles for a User on a given node

The problem is to find all the Roles that a user inherits from the source node and the parents of the source node.

  SELECT r.* FROM n JOIN r ON node
                  JOIN (SELECT px.parent as node FROM p AS px
                                        WHERE px.node = source) AS py ON node
                   WHERE r.node = source

I’m sure you’ve noticed that these patterns do not use the edge table. I still need it, though, on the one hand to identify parents, siblings and children of the node being dislayed, and on the other to build the parent manifest when unlinking nodes from the graph. The parent manifest looses information because a node could derive an ancestor through more than one path through the graph. So I need the edge table as a definition of the graph for those cases where the actual path is important.

These small functional elements may need to be combined to do interesting things. A generic search, for instance, would need to be limited to sub-trees that the user is allowed to see. Of course, in Python, what is really nice is using SQLAlchemy to do the work. Because a query object is generative I can build a few basic query elements to get, for example, the whole tree that a user is permitted to see, and then restrict that by adding further filters according to the need of the moment. SQLAlchemy manfully creates whatever complex structure I need and the database backend does the rest. It all turns out to be surprisingly easy.

1 comment » | Development

Back to top