Deploying packages using Fabric

May 15th, 2011 — 1:58pm

I’ve started using fabric to help manage my servers. This is a great tool that allows me to build a series of commands in Python and then run them on one or more server, all from my own desk. It all boils down to some handy layering on top of an ssh library. This is early days, for me, as yet, and I don’t actually have a lot of servers (well, two at the last count), but I do have some things I do quite often that require a number of steps, so a tool like fabric is a potential time saver.

My main use is for deployment of updated software. This happens in bursts, as and when the focus falls on one package or another, and it is helpful to have a pre-packaged set of commands that I don’t have to remember. Roughly, the things I need to do are:

  1. Identify the latest version of the package and find the distribution file in the development environment.
  2. Copy the distribution file over to the target system.
  3. Install the distribution
  4. Stop and restart the server.

Step 1 is where fabric starts to be useful. I can identify the version, find the target file and do any checking I need to do all using Python. I’m not afraid of bash, but I am happier in Python.

Step 2 is dead easy. fabric has a copy command, so no further though is required.

Step 3 is where life gets more interesting, as this happens on the remote server. I use the fabric run command to put together a series of commands that can be executed from an ssh command. The trick here is to make sure that the command has all the environment that it needs. I need two things: one is to be in the correct directory, and the other is to activate the correct virtualenv environment.

My servers all run bash, so I can string commands together using ‘logical and’, &&, thus:

command_1 && command_2 && command_3

The ssh command is executed in a minimal environment, so your .bashrc is not going to be run. That means that useful stuff like virtualenvwrapper is not necessarily going to work. In my case I have to explicitly activate the required environment:

cd my_package_distribution_directory && \
source ~/.virtualenvs/my_target_environment/activate && \
python setup.py install

fabric can automatically add the cd some_directory if you want. I haven’t found a way of adding any general prefix. No matter, because I can build the prefix easily and just have it lying around in the code, so I don’t need more in my bash-centric world.

Step 4 introduces another learning point. When restarting the server I need the server to detach from the terminal and run in the background. Normally, I’d simply enter the command terminated with &:

lk_fcgi_server_up.py &

This doesn’t work too well. Doing this remotely using ssh seems to leave ssh just hanging. Using fabric is no better, as that tool appears to terminate forcibly, thus killing the background process. However, bash (version 4.0) comes to the rescue with the coproc command. This runs the target command as a coprocess, complete with connected pipes for I/O. I don’t need to look at standard output from the server, so the simplest invocation seems to do the job:

previous_commands && coproc lk_fcgi_server_up.py

And that’s it for my deploy tool. I can use standard Python for command line options, environment validation and whatever else, while fabric manages the remote connections. One other thing I have learnt, however. If the remote command sequence is at all complicated, it might be better to have a command script sitting on the remote server that handles the whole thing from a single parameterised remote invocation. Of course, the remote script has to be maintained, and deployed on all servers …

Update: Just for clarity:

  • Using c1 && c2 && c3 & is not that useful anyway because it puts the entire line into the background and not just the c3 part.
  • When using coproc you need to add pty=False to the fabric run call.

Comments Off on Deploying packages using Fabric | Development

Package distribution using Distutils2 concepts.

May 12th, 2011 — 11:04pm

I have been developing a product I want to publish as open source (It’s called Lokai; more of this later), and, being keen, I popped over to distutils2 (aka packaging) to see what’s going on, with the intention of using it if appropriate. I’m not claiming to be following the leading edge here, it just seemed that the original distutils is seen as needing replacement, and setuptools good but flawed in some way (package management, for instance, see PEP 376). Distutils2 is also going to be the official packaging tool for Python 3, so it has to be worth the look.

We are warned, pretty much at every turn, that distutils2 is not yet ready for production. Actually, there’s a lot there that is good, but it fell over (version 1.0a3) on package data. The developers already know about this; it happens to be the example that I tripped up over.

Let me explain. Lokai is an application, rather than a library, and it is web based. It is quite sizable (for me, anyway) and I have some things I need from a distribution tool:

  • I don’t want to have to spend time maintaining a package list. There are 32 packages at the last count, and, while this is not overwhelming, I prefer not to have to fiddle with this every time I add something or restructure.
  • I use Mercurial and I want to be able to link the distribution to a tag in the repository. If this can be done with some automation, so much the better.
  • Lokai is a web application, so there are CSS files, templates and other bits and pieces that are not Python but are essential. I need those to be carried through to the distribution, and the ultimate installation, without undue hassle.

None of these things is particularly difficult, but they emphasize a fact that distutils2 seems to recognize well: there is a real distinction between actions taken by the developer to create a distribution and actions taken by a user when installing from a distribution. This comes about simply because the developer has access to information about the package (from a repository, for example) that is not available to the user; the user relies solely on whatever information the developer has provided. For distutils2 this distinction seems to center on the setup.cfg file. This file drives the whole process and there is no need for a setup.py file. The developer can use all the information available to generate a setup.cfg file, and that file is then all that the user needs to build and install the package. If you want to get more complex than this, of course you can, but this seems to me to cover a wide range of cases.

So, thinking ahead, I decided to write a program to generate a setup2.cfg file. I’m quite happy to help things along by putting files in reasonable places, so the bulk of this program comes down to this:

  • I use distutils2.core.find_packages to find all my packages from the top level. It’s easy enough to write such a thing, or copy it from elsewhere, but this does the job.
  • I found hgtools, which provides a way of getting a version number from tags in the Mercurial repository. I had to copy and modify it, because the implied version number structure is based on StrictVersion from distutils. My version uses distutils2.version.NormalizedVersion, which gets me a step forward in future compatibility.
  • The program contains fixed, or slow moving, values, such as package metadata, embedded in the source.
  • The output is usable by distutils2. (Or, at least, the version I was playing with. Expect changes.)

Of course, I’m not using distutils2, so the next step is to take the output from my prepare.py program (setup2.cfg) and feed it into a setup.py. The setup.py is relatively simple – it reads the prepared configuration file and converts data and data names as appropriate for a call to distutils.core.setup. This setup.py, plus setup2.cfg are included in the distribution.

I have, I know, inserted an extra step into the process, but I can automate my way around that if I want to. What I do have, however, is a set of tools to build the distribution that stay with me. I don’t have to include all the package searching and version calculation stuff in the distribution. I get to keep these in my repository, of course, but the user of my distribution does not need them, and, with this process, never sees them. I think that is ultimately simpler for all concerned. I’m also a step towards using distutils2 when the time comes.

Comments Off on Package distribution using Distutils2 concepts. | Development

Tags have their uses

November 21st, 2010 — 8:59pm

Some time ago I was looking for software to support the small consultancy I worked at. It needed to include elements from groupware, customer relationship management and project management, and it needed to work over the internet. At the time we did have some complex software development going on and I was hoping there would be something that would help with that. In particular, I was looking for task hierarchies and ways of grouping and structuring associated documentation. This was at least recognised on commercial desk-top applications, but seemed thin on the ground with open source web-based software.

As it happens, I started to write my own system, but it did occur to me that the ‘tag’ feature that most of the software supported might be useful and, maybe, save me some work. This post captures some of my thoughts about tags and shows why they don’t cut it for what I want.

Tags are, primarily, a search tool. I can categorise things by anything I like, and that might include: type of document; indication of some subject covered in the document; the year or month of publication; client; and so on. So, assuming I placed the tags correctly in the first place, I can look for pages that give me specification documents that relate to a specific interface and used for a particular client. This is great. I have to maintain the tags, of course, but, given that discipline, I have a powerful search technique. What I don’t have, necessarily, is a way of looking at the context of the resulting document. I have to use some other facility in the software, or create a new search, to see pages that might relate to what I found in the first search.

What I could do, of course, is add tags that help me relate documents together, and that leads to the idea of adding, in effect, a parent to a set of documents. In other words, I can create a hierarchy by naming groups of things and placing the tags appropriately. These groups are quite flat, of course. The structure is implied and not explicit, but once I have added client, project, documentation, phase and other such things to the set of possible tags I can begin to structure the data in a useful way. This rapidly becomes difficult to maintain. A miss-spelled or forgotten tag can lead to orphaned pages that may be difficult to find (and no-one is going to bother to look if the page does not pop up in a search). Equally, the structure can be difficult to search for someone who has forgotten the exact tag.

The other thing I might be interested in doing is looking for pages that have some dynamic characteristic, such as activities that are at some particular stage in a work flow. As an example I might record system issue reports, and, after triage, they might be classified as ‘won’t fix’, ‘next software release’, and ‘showstopper’. That’s not too difficult, and is certainly flexible, because I can easily add a new status in response to the latest quality initiative. Life gets a bit more difficult if I add in another process, such as software development. Here I might have ‘next in line’, ‘in development’, ‘in testing’, and ‘released’. This also works, but I can’t prevent something going straight from ‘next in line’ to ‘released’ (I know, it happens all the time, but some people might prefer to have some control over the process.) So, it turns out that tags can be used for dynamic attributes, but, because there is no support for semantic content and it is up to the users to maintain the logical consistency of the tag set.

The unsurprising conclusion is that tags store data and can therefore be used for pretty much anything, but there is no natural semantics associated with them that allows the system to do things instead of the user. Creation of a hierarchy is a good example. In fact, the software I have written uses interwoven hierarchies and, as it turns out, the parent manifest that supports the processing is identical to a set of tags. On the other hand, when it comes to dynamic attributes, semantics is everything and the underlying storage mechanism is largely irrelevant. The ultimate conclusion, of course, is that one should design software to suit the requirements rather than trying to force a particular mechanism to do duty where it is not appropriate.

Comments Off on Tags have their uses | Development

Building Mixed Hierarchies in SQL

November 21st, 2010 — 7:14pm

A project that has been dear to my heart for some years now involves representing the pages of a web site as a hierarchy. This formalised structure has three advantages for me. First, navigation is built in. On any single page you can see where this page belongs in the overall scheme of things and a basic up/down, left/right navigator is, in effect, part of the package. Second, I can manage user access by assigning roles to sub-trees. And, third, I can imbue the structure with some semantics – for project management, materials handling or whatever.

Of course, with an sql backend I hit the familiar mismatch between the relational and hierarchical views. The standard solution is to use an adjacency list and this is one of many descriptions of how to do that in sql. However, I soon realised that I really want the ability to link any node in my tree to more than one parent. That way I can give different views of the data to different people or for different circumstances. What I end up with is a directed graph (with the added restriction of being non-cyclic, otherwise I get very confused).

The normal way of representing a directed graph is to note the edges. That is, for every node I record the nodes immediately above it. Using this structure I can traverse the graph from any point in it. That is fine for many things, but I want to be able to understand the full path above any given node. If nothing else, I need to understand whether the current user has access to this node, and in general that means looking an arbitrary number of layers back up the graph. Since I would prefer the database server to do most of the work I need some extra information to help it along. As a result I have come up with these three tables:

node (n):
a record defining the data used by the application. A node has a unique identifier.

  • n.node is the node identifier
edge (e):
a record defining the immediate parent/child relationship between two nodes.

  • e.node is the node in question
  • e.parent is the immediate parent
parent manifest (p):
a set of records defining the complete list of nodes that are ancestors of a specific node. This list is not ordered and will include nodes from all the hierarchies that this node belongs to.

  • p.node is the node in question
  • p.parent is an ancestor of p.node

This structure is easy to use in a number of ways:

Search in sub-trees

Find nodes matching some criterion starting from some one or more arbitrary points in the graph.

  SELELCT * FROM n JOIN p ON node 
            WHERE p.parent IN [given list of starting nodes]
            AND [match criterion]

Depth first search

Search for nodes matching some criterion: return those that correspond to a depth first search.

This is achieved in two steps: one finds all the nodes matching the criterion and the other eliminates any nodes in the result that have an ancestor node also in the result.

  SELECT * FROM n JOIN p ON node
           WHERE [match criterion]
           AND NOT EXISTS ( SELECT node FROM n AS nx JOIN p AS px ON node 
                                        WHERE [match criterion] 
                                        AND p.parent = nx.node )

As I mentioned above, user access can be controlled by giving a user relevant roles on one or more sub-trees. I need a role allocation table to link users to nodes:

user role (r):
a record defining the role that a User has for the given node

  • r.node is the node in question
  • r.user is a User reference
  • r.role is a Role reference

From this I need to find either a set of nodes that a given user can access, or the set of roles that are relevant for a given user on a given node.

Identify a User’s sub-tree roots

Each sub-tree starts at the first mention of this User in any path in a depth-first search

  SELECT * FROM n JOIN r ON node
           WHERE r.user = this_user
           AND NOT EXISTS (SELECT node FROM n AS nx JOIN p AS px ON node 
                                       WHERE [match criterion] 
                                       AND n.node = nx.node)
Finding all possible Roles for a User on a given node

The problem is to find all the Roles that a user inherits from the source node and the parents of the source node.

  SELECT r.* FROM n JOIN r ON node
                  JOIN (SELECT px.parent as node FROM p AS px
                                        WHERE px.node = source) AS py ON node
  UNION SELECT r.* FROM n JOIN r ON node
                   WHERE r.node = source

I’m sure you’ve noticed that these patterns do not use the edge table. I still need it, though, on the one hand to identify parents, siblings and children of the node being dislayed, and on the other to build the parent manifest when unlinking nodes from the graph. The parent manifest looses information because a node could derive an ancestor through more than one path through the graph. So I need the edge table as a definition of the graph for those cases where the actual path is important.

These small functional elements may need to be combined to do interesting things. A generic search, for instance, would need to be limited to sub-trees that the user is allowed to see. Of course, in Python, what is really nice is using SQLAlchemy to do the work. Because a query object is generative I can build a few basic query elements to get, for example, the whole tree that a user is permitted to see, and then restrict that by adding further filters according to the need of the moment. SQLAlchemy manfully creates whatever complex structure I need and the database backend does the rest. It all turns out to be surprisingly easy.

1 comment » | Development

Deconstructing URLs

September 19th, 2010 — 8:41pm

As a follow-on to my investigation (if that’s the right word) into URL dispatch in Python frameworks I thought I would look at how an application discovers, calculates or otherwise works out what URL to use to refer to its own objects. The application wants to provide a link to an object edit page, say, so it must somehow know how to formulate such a link and where to find the contextual information that places the application in the particular environment it is running in. Let’s start by deconstructing an example.

I have an application that currently uses a URL something like http://example.com/a/projects/{identifier}/edit. This breaks down into the scheme (http:), the network location (example.com) and a path (/a/projects/{identifier}/edit). The path is an absolute path according to rfc 1808, because it starts with “/”, and it is this path that we want to recreate somehow.

As it happens, this path has three different elements

  • /a/ – I used this to manage the path that cookies belong to. In principle, any path that did not begin with /a/ could be used for static files, whereas paths that did start with /a/ would be processed by an application using the cookie to manage the session. In practice, it defines a name-space that allows us to put more than one instance of an application environment onto a single net location.
  • projects/ – This points to a particular resource handler used for project management. There could be other resource handlers running in the same environment. In effect, this part of the path could be thought of as narrowing down the selection of objects referenced by the remainder of the path. This creates another name-space that distinguishes available resources. We could, potentially, have a URL that looks like http://example.com/a/other_world/{identifier}/edit where the {identifier} in this case comes from a different set of identifiers than the projects set, and the edit element implies quite different functionality.
  • {identifier}/edit – The final element of the URL that is actually interpreted by some resource handler code to support an identified object.

The important point here is that the first two parts of this URL (/a/projects/) are irrelevant to the resource handler. This is, in effect, the SCRIPT_PATH of the CGI definition, and {identifier}/edit is the PATH_INFO. Clearly the SCRIPT_PATH can be changed to reflect the context, and it can be as long or as short as required. So long as it links to the correct code to interpret PATH_INFO the URL works.

I am, of course, making an assumption here. The URL I have deconstructed is rather old fashioned in the sense that the structure seems to represent the application in some way. I could, in principle, write /a/{identifier}/projects/edit. The /a/ still has to come first, because in my example it is being interpreted by the http client for returning cookies, but projects/ can be anywhere I like. This doesn’t make much difference, except to emphasise two things: there is going to be some part (/a/) that is dependent on the server environment, and some part (projects/) that is going to be dependent on some sort of framework environment. The underlying problem remains the same – how to feed these two parts into the URL generation process without making the resource handler aware of the details.

I need to do two different things. I want to serve more than one resource type from the same environment, and I want to run more than one environment from the same net location. The second I could solve by virtual servers. I (simply?) configure the http server to direct www.example.com to one place and software.example.com to another. That’s fine if I have full control over the server and is probably the ‘best’ solution. The first could also be solved in the http server if the URL is strictly hierarchical, but it can’t be avoided by limiting the site to only one resource type. Generally I am at least likely to want to refer to user objects (for access control, capturing addresses, credit cards, whatever) and, say, product objects (for the users to buy). At the very least that means choosing the object names very carefully for any particular site. On different sites, ‘users’, ‘customers’, ‘clients’, ‘patients’, or the same in the singular, may be valid options, but I have to choose just one and stick with it. (A quick read of this style guide is worth it for the reminder.)

In general, there is a three step sequence of places that might do URL dispatch – http server, application framework, and resource handler. The http handler communicates with the applications it serves using CGI. For our purposes here, the SCRIPT_PATH tells us the fist part of the path we eventually want to create, so that is what we must use in the next step.

The application framework could be null if all the routing is done in the http server, but we might need to provide something if we don’t have access to the server, or if we want to be reasonably dynamic. This framework will have a URL dispatcher, and this dispatcher may support named routes. The resource handler could delegate all the routing to the framework. This works fine, because the framework extracts the useful parts of the path and presents them to the resource handler as parameters. The resource handler, however, has to ask the framework to do URL generation and this locks the resource handler firmly to the framework.

I rather like the idea of a resource handler that consists of a dispatcher and code combination that is dedicated to handling a single resource type. I can plug this in to a framework, or serve directly from an http server. Of course, if the handler provides a user interface to a browser, then there may be some conventions to follow, or code to share, but that would be a necessary consideration whatever was done. The framework becomes little more than a dispatcher that looks like an http server. It creates an appropriate SCRIPT_PATH to hand down to the resource handler, and the resource handler can handle the remaining parts of the path.

I think I’ll work more on this idea.

1 comment » | Uncategorized

URL Dispatch in Quixote

July 5th, 2010 — 8:14am

I’ve spent some time recently thinking about implementation approaches for URL dispatch. In a recent post I spoke about some requirements for URL dispatch and promised myself that I would look at available packages. I began to wonder why it seemed that Quixote didn’t fit these requirements, and I began to follow a line of thought that had me writing my own URL dispatcher based on principles derived from Quixote. Of course, in the end it may be best to use an existing package, but along the way I’ve learnt some interesting things about Quixote, and the separation of URL from code.

The route I’m going to take starts with how Quixote separates the URL structure from the code structure. This is a key aspect of RESTfull web access, and is, perhaps, the main reason why we look for URL dispatch as a separate tool.

Quixote as URL interpreter

Quixote uses a Directory object to interpret a URL. The URL is passed to a starting Directory that strips off the first (leftmost) element of the URL and finds the matching attribute in the Directory. This attribute may reference a further Directory that looks at the next element in the URL, and so on. At some point in this sequence we reach a piece of code that generates HTML and this is the response to the URL.

For example, we might have a set of Directory type classes that looks roughly like this:

from Quixote.directory import Directory
import my_code_somewhere

class WorkStuff(Directory):
    ...

    def joblist(self):
        return my_code_somewhere.joblist_page()
    ...

class PlayStuff(self):
    ...

class MainEntry(Directory):
    ...

    work = WorkStuff()
    play = PlayStuff()

There are one or two things missing, but this is pretty much what we need. Given the URL /work/joblist, plus an indication in the publisher that MainEntry is the place to start, the code that underlies Directory identifies the work attribute of MainEntry and then finds joblist in WorkStuff. At the end of this process the function joblist_page provides the response.

There are a number of points to bring out here:

  • The code you see here represents the structure of the URLs that it will interpret. If you change the URL scheme then all you do is edit the set of Directory objects. You do not have to make any changes to joblist_page
  • The names WorkStuff and PlayStuff are arbitrary and completely insignificant. In many ways it would be fine if we could do anonymous nesting, but we have to provide names to make the link. The point is that the fact that ‘work’ and ‘WorkStuff’ appear to be related is down to convenience for humans. It does not constrain how we build a URL scheme.

The next thing to consider is the use of variables in a URL. What happens if we want to write /work/23/edit?. Well, Quixote provides a catch-all method, _q_lookup, that allows us to handle this. Take a look at:

from quixote.directory import Directory
import my_code_somewhere

class ProcessJobStuff(Directory):

    def edit(self):
        return my_code_somewhere.job_edit()

    def display(self):
        return my_code_somewhere.job_display()
        ...

class WorkStuff(Directory):
    ...

    def joblist(self):
        return my_code_somewhere.joblist_page()

    def _q_lookup(self, component):
        """ component contains the URL component that got us here."""
        set_environment('application_job_name', component)
        return ProcessJobStuff()

class PlayStuff(self):
    ...

class MainEntry(Directory):
    ...

    work = WorkStuff()
    play = PlayStuff()

In this case, the URL component 23 is not recognised as a method within WorkStuff, so the component is passed to _q_lookup. Here we simply store the component into the request environment and then proceed to ProcessJobStuff to interpret the edit component.

Once again, there is no connection between the URL interpretation structure and the responding code other than the need to provide a code entry point. It would be trivial to rearrange the structure above to interpret /work/edit/23, for example, and still link to the same job_edit responder, without changing the api for job_edit. For MVC fans, we might consider the module my_code_somewhere to be the controller, and joblist, job_edit, job_display (etc.) to be views within that.

Using indirection

In the examples above, the link between the URL endpoint and the code to be executed to build the response is done using import. This is all very Pythonic, but it might cause difficulties if you want to do something dynamic without reloading the application. Not a problem. We simply capture the appropriate information from the URL and pass it all to a ‘get this code’ function. This might be a bit more difficult to read, but it does the job:

from quixote.directory import Directory
from helpers import find_controller_and_view

class ProcessJobStuff(Directory):

    def _q_lookup(self, component):
        """ component contains the URL component that got us here."""
        set_environment('application_view', component)
        return find_controller_and_view()
        ...

class WorkStuff(Directory):
    ...

    def joblist(self):
        set_environment('application_controller', 'job_list')
        return find_controller_and_view()

    def _q_lookup(self, component):
        """ component contains the URL component that got us here."""
        set_environment('application_controller', 'job_list')
        set_environment('application_job_name', component)
        return ProcessJobStuff()

class PlayStuff(self):
    ...

class MainEntry(Directory):
    ...

    work = WorkStuff()
    play = PlayStuff()

The code in find_controller_and_view extracts the names of the controller and view from the environment and does whatever it needs to do to find the right stuff to execute, importing it if necessary.

Of course, Python allows us to modify objects on the fly, so we can also do dynamic extensions to the URL scheme by posting new attributes into the appropriate Directory objects. We may not always need many levels of indirection.

Backtracking

We have to be able to distinguish between URLs that contain variable components. We might have, for example, /work/{product}/{part} that needs to be distinguished from /work/{job}. For this we need to be able to backtrack.

Quixote’s Directory object already has a mechanism we can use: the Directory returns None if it cannot find a match. This None value is propagated back through the tree in just the same way as a valid HTML response, so we can test it at key points.

The general case looks something like this:

class SomeDictionary(Dictionary):
    ...

    def _q_lookup(component):
        """ This component can be one of a number of possibilities 
            represented by a list of possible sub-Directories
        """
        for variable_name, responder in list_of_options:
            response = responder()
            if response is not None:
                return response
        return None

We need something to manage the presence of the variable in the request environment, but I’m sure you get the picture.

Roundup

There are plenty of other things that can be done with a Directory, but I have tried your patience enough. I hope I have shown that the Directory object can be used to separate URL interpretation from the application code. Of course, it is still possible to write a URL schema that represents application structure. In my examples I use /work and /play. This appears to, and may actually, represent some split in the underlying set of applications, but that is a matter of design choice and not a constraint of the mechanism.

I think there are a number of challenges in using Directory objects.

  • The URL dispatch structure looks like (indeed, is) code. It is easy to mistake this for application code. This is something I did myself initially.
  • The process of writing the code to handle any particular dispatcher is clearly more challenging than writing map_url('/work/{job}', my_code_somewhere.job_display)
  • The result is not self documenting and can be difficult to interpret.

In fact, the Directory is really an implementation tool. The programmer is being asked to interpret, or compile, some imagined scheme into code. The set of objects we see is the result of this compilation process. The code, or something like it, would still exist even if we provided a mini language and a compiler to handle it. (How about /work/{job} = job_display if method == GET?) Does this matter? Quite possibly not. After all, URL schemes do not change much, so maybe all we need is a set of patterns to follow and some useful helpers and we are all set to go.

Of course, it still doesn’t satisfy my requirement for URL generation. That will have to be dealt with another time.

Comments Off on URL Dispatch in Quixote | Development

In celebration of filepath

June 26th, 2010 — 4:44pm

I was reminded the other day of some of the problems that we (the software industry, and the human race in general) have in thinking of software as some sort of engineering. The term Software Engineering has a nice ring to it, but the reality is disappointing. As Jeff Atwood so nicely pointed out, the advantage of, er, real world engineering is that it has immutable laws of physics. In software the programmer is free to invent his or her own laws as the project progresses. Of course, it is possible to relate physical laws to the software world, but, not, I suspect in the way that I, or Jeff Attwood, have in mind.

The particular issue I’m looking at here is software re-use. This is where project A and project B can both use some component, even though the two projects are very different. The mutable laws of physics thing comes in when the component in question has an API that is tuned to project A and therefore doesn’t cut it for project B. This is a known problem, with a set of causes, discussed by Douglas C. Schmidt in 1999, and one key aspect of software engineering is the disciplined approach that tries to address these causes. Interestingly, Schmidt picks up on two problems that are relevant here. One is that re-usable software has to be deliberately written that way. This means that the author has to understand a wide range of use cases, and have the funding or flexibility to be allowed to do it that way. The other is that re-usable components have to be attractive. Once the component has been written others must be attracted to it. For this, Schmidt uses the concept of a “re-use magnet” and he thought that the open source development process is a good and effective way of creating re-use magnets.

There’s a whole can of worms that opens up when software engineering is discussed, but that is not why I’m writing this. What reminded me of all of this is the recent release of filepath. This is a package that makes file handling that bit easier by hiding the details of the Python os.path module. In that way alone it is a potential re-use magnet. Why I like the fact of this release, however, is that filepath used to be part of Twisted. Now, Twisted is a good and useful web frame work, but I don’t happen to use it, and I would not want to install it just for the sake of using filepath. After all, I already have lots of code that uses os.path, so I’m not going to go out of my way to dig a new component out of a bunch of software that I’m not going to use. The advantages are not enough. On the other hand, with filepath as a down-loadable component on its own the re-use magnet is no longer shielded and I can feel the attraction. What is more, now that it is out there it can grow (or shrink) to fit the wider world (as a result of more use cases and more discussion).

So, thank you to the guys at Twisted. The software re-use paradigm lives on and Chaos and Old Night have been kept a bay in one small corner of the virtual world.

Comments Off on In celebration of filepath | Development

Parsing Configuration Files Bad for your Health

June 3rd, 2010 — 1:15pm

As part of my angst-driven search for stuff that other people might have done, and published, that would save me maintaining my own code, I took a look at things to handle INI files. Oh dear. They are everywhere, and they all do different things.

Python does provide a mechanism for reading INI files. ConfigParser is the basic in-built module, and it handles a reasonably simple form of config file, with some variable substitution. It is a starting point, but there are issues:

  • Command line programs would like to be able to override entries in the config file with optional command line settings.
  • The main problem here is that optparse has a different API and does not naturally recognise the kind of block structure that INI files allow.

  • Setting the configuration file name from the command line does not play well with the idea of overriding settings from an, as yet, unread config file.
  • When building an application made up of multiple packages, each with its own configuration, the config file should be able to integrate the configs for these packages, while keeping them appropriately separate.
  • It would be nice to be able to use different text formats. Conversely, there is no reason why the API should constrain the file format.

There are probably other issues, but these keep coming up. I am relieved, slightly, to find that many people have thought about this, and there has even been discussion in the Python wiki (ConfigParserShootout). I am also depressed. The discussion came to no great conclusion, and the issues are still open.

So I keep the old code, then.

Comments Off on Parsing Configuration Files Bad for your Health | Development

URL Dispatch in Python Frameworks

May 24th, 2010 — 9:48am

I have been using Quixote for some time now as my web framework. So far, I have had very little incentive to change to another framework. The code is lightweight and reliable, to the extent that I generally just forget about it. Recently, though, I have hit some issues. It’s all to do with URL dispatch and, specifically, dynamic URLs that contain data. I want to be able to write URLs like /MyPage/edit, where ‘MyPage’ is the name of a documentation page, for example, or /reports/2009/04 which might bring up a list of reports for April 2009. This looks nicer than /reports?year=2009&month=4, which is what I used to do, and makes it easier for users to bookmark pages.

Quixote does URL dispatch using python objects based on a Directory class. The URL is processed left to right, and each element of the URL identifies the next Directory object. It is all quite flexible, and, of course, the structure is built in software and bears no particular relationship to folders on disc or how a project is developed. It is remarkably easy to link in work from different projects, and this is a key advantage. Normally a Directory object recognises the text in a URL by direct match with an attribute of the object, but there is a catch-all that gives the option of processing elements that it does not otherwise recognise, and I have used this to support dynamic URLs. This works well enough, but there is no way to support the generation of the URLs in the first place. When I am generating a page with links on it I want to be able to write target_url = make_url(edit_template, target_page=MyPage), or something similar, and end up with a URL that the dispatch mechanism will recognise. Obviously, I can do this by hand, but the relationship between the template and the dispatch tree exists only in my head. So I end up with buggy links, and problems if I want to make changes.

All this prompted me to look into URL dispatch mechanisms, to see what I think I need, and to find out if there is anything out there that already does the job. So, in no particular order, this is my shopping list:

  • URL can contain data and process related fields
  • A URL identifies both an object (the data or subject matter) and the action to perform (display, edit). In model/view/controller terms, the URL provides all of the information to identify data and identify the required code components.

  • Flexibility of URL design
  • There should not be any inherent restriction on the order of items in the URL, or on how code or data related items might be placed in the pattern.

  • Ability to distinguish similar URL schemes
  • The URLs

    • /ham/{some date}/spam
    • /ham/{some name}/spam
    • /ham/{some year}/{some month}/spam

    are all similar but may be significantly different in processing. I tried some thought experiments with Quixote Directory interpretation of these forms and came to the conclusion that it might be possible to handle things like this, but there are probably easier ways.

  • Linking to code points does not impose restrictions on code structure
  • I want to build a tool that is flexible and does not enforce any particular approach. One of the claims for Quixote is that the developer simply uses their own knowledge of Python. In this vein, I don’t want to force code to be stored in a ‘controller’ directory, or to insist that a particular model-view-controller structure is used.

    Exactly how code points are identified is almost a subject in itself. For now, I just want to know that the URL dispatch process is not going to be a restriction.

  • Configuration of applications from different projects
  • This is probably another view on the previous requirement. Obviously, all applications that are going to be used in whatever environment I end up with will have to be aware of some features of the environment (such as the URL generator, for example), but I have applications that have been developed over time, and I would like the ability to stitch in new applications in the future, so a reasonable configuration process is needed; one that does not require a whole bunch of rewriting to get things working.

  • Partial path handling
  • By this I mean the ability to identify an action based on the first few fields of a URL and then pass the remaining fields to another dispatcher. Actually, this is effectively what the Quixote Directory object does, working one field at a time. I guess I’m looking for something less fine grained.

  • Fields can be referenced by name
  • This is for convenience and for reducing the possibility of error. It is likely to be easy for a dispatcher to report fields as a list. It is much more useful for me if I can use names.

  • URL generation
  • And now, the main reason for looking at this in the first place, generation of a URL, given a template, or a reference to a template. The implied requirements here are:

    • the template should be directly related to the dispatch process without need for thought or invention on the part of the developer.
    • the names used for the fields to be substituted should be the same as the names used to extract the data when interpreting the URL

Most of that is fairly obvious, and there are people out there who picked up on this years ago. The next part of this saga is to review what is out there and pick an approach.

2 comments » | Development

Using web technology for applications

May 16th, 2010 — 4:07pm

I read Ben Ward the other day. In the wake of all the fuss about flash from Steve Jobs and whether or not H264 is good (for example) Ben is worried that we might forget what the web is for. Roughly speaking, he looks to the web as an interconnected set of documents and data. This ability to move freely from one space to another is its main reason for being, and complaints about the inability to provide high grade user interfaces are out of scope. He makes a good point, but the comments on his post show that there are other views, and the issues, perhaps not surprisingly, come down to what is appropriate to the circumstances.

I write applications for businesses, to support business processes, and which use forms so that users can interact with the business process. Generally, a user base for an application is counted in the 10s. Some are in-house, and some are public, in the sense that users come from different organisations. Technically, I have, roughly speaking, a choice between a client side application with remote data access, a server side application using remote terminal technology, and a server side application using a browser. Out of all this, I choose to use a browser. Why do I do that, and what am I expecting to achieve?

These applications change quite frequently as users feed back their requirements and business processes change, and even with a small user base I can be stuck with a range of working environments. That gives me a problem of control. As it happens, and by design, the browser interface gives me:

  • Control over software updates.
  • To be fair, we are well used to automatic updates of client side software nowadays, but, if the application users are all from different organisations, we can run into trouble with security policies. Even for an in-house application, individual users may still spoil the update process somehow.

  • Operating system independence.
  • I don’t have to worry about whether clients use Windows, MacOS or Linux. I’m not dependent on a particular widget set or o/s file handling capability. I certainly don’t have to worry about what happens when a client organisation upgrades all its computers to Vista or Windows 7.

    I don’t even have to worry about the operating system that the server runs. Much. I can write o/s independent code if I want to, and upgrade effects can be minimised. That would not be true if I was using a terminal server approach.

  • Hardware independence
  • At a pinch, a user can still use the application from a mobile phone.

    If I get the css right, it might even be easy to use from a mobile phone

  • Location independence
  • The client computer does not need any special software, so the application is accessible from the next door office, home, someone else’s home, an internet cafe, the international space station – anywhere.

  • New user facilitation
  • New user? Point them to a browser. No software to install. No work for the IT support team. Sorted.

So far, so obvious, but what about the quality of the UI? Do I need Flash? Javascript? Well, history comes into play here. My early excursions into this field involved users in many different organisations, using who knew what o/s and hardware. We looked at Flash, but there appeared to be version differences that meant we still had to work hard on compatibility, and the complexity of the interface did not warrant it. I was also seriously put off, as a user, by the download times of flash scripts (at the time) and I wanted to give the best impression possible. So we started out with HTML, with frames and a bit of javascript. The javascript was kept to a minimum, because we had no way to guarantee that the user would have it switched on. I think we managed well enough, and most of what I have needed to do has been perfectly well supported by HTML.

There have been times when a UI requirement is difficult in HTML alone. Too much back and forth from the server makes some things slow and clunky. Javascript can do some wonderful things in the right hands and it can be a boon for those occasions. However, the general rule I follow is that the UI must be usable with javascript turned off.

Anything else? Yes – links. Links and browser tabs (or new windows). With HTML you get linking for free, as it were. I like to design applications where data is cross referenced, so there should be links everywhere (I confess, I don’t always manage this, but that doesn’t stop me wanting to.) And if you, the user, open a link in a new window you can keep the information there to help you. Of the various applications I use, one of the most difficult limits me to one screen at a time. If I happen to have forgotten some detail I need I have to navigate to the information, write it on a piece of paper, and navigate back. Links and browser tabs are the answer here.

As it turns out, I seem to be pretty much within Ben Ward’s concept of webbishness. My applications by design provide all the hardware and o/s independence implied by HTTP/HTML, and I can support all the interconnections anyone might need or want, within the bounds of privacy and access controls. I should award myself a pat on the back, but I must remember that this only happens because I want it to happen (for the reasons listed above) and because I believe (based on experience) that the UI my applications provide is perfectly adequate. If I believed something else then I would have to do something different. Would I then be demanding universal Flash? Probably not. With my small communities I can discuss the compromises, face up to them, and use the best tool for the job. As I said above, it comes down to what is appropriate to the circumstances.

Comments Off on Using web technology for applications | Development

Back to top