Tag: url dispatch


Deconstructing URLs

September 19th, 2010 — 8:41pm

As a follow-on to my investigation (if that’s the right word) into URL dispatch in Python frameworks I thought I would look at how an application discovers, calculates or otherwise works out what URL to use to refer to its own objects. The application wants to provide a link to an object edit page, say, so it must somehow know how to formulate such a link and where to find the contextual information that places the application in the particular environment it is running in. Let’s start by deconstructing an example.

I have an application that currently uses a URL something like http://example.com/a/projects/{identifier}/edit. This breaks down into the scheme (http:), the network location (example.com) and a path (/a/projects/{identifier}/edit). The path is an absolute path according to rfc 1808, because it starts with “/”, and it is this path that we want to recreate somehow.

As it happens, this path has three different elements

  • /a/ – I used this to manage the path that cookies belong to. In principle, any path that did not begin with /a/ could be used for static files, whereas paths that did start with /a/ would be processed by an application using the cookie to manage the session. In practice, it defines a name-space that allows us to put more than one instance of an application environment onto a single net location.
  • projects/ – This points to a particular resource handler used for project management. There could be other resource handlers running in the same environment. In effect, this part of the path could be thought of as narrowing down the selection of objects referenced by the remainder of the path. This creates another name-space that distinguishes available resources. We could, potentially, have a URL that looks like http://example.com/a/other_world/{identifier}/edit where the {identifier} in this case comes from a different set of identifiers than the projects set, and the edit element implies quite different functionality.
  • {identifier}/edit – The final element of the URL that is actually interpreted by some resource handler code to support an identified object.

The important point here is that the first two parts of this URL (/a/projects/) are irrelevant to the resource handler. This is, in effect, the SCRIPT_PATH of the CGI definition, and {identifier}/edit is the PATH_INFO. Clearly the SCRIPT_PATH can be changed to reflect the context, and it can be as long or as short as required. So long as it links to the correct code to interpret PATH_INFO the URL works.

I am, of course, making an assumption here. The URL I have deconstructed is rather old fashioned in the sense that the structure seems to represent the application in some way. I could, in principle, write /a/{identifier}/projects/edit. The /a/ still has to come first, because in my example it is being interpreted by the http client for returning cookies, but projects/ can be anywhere I like. This doesn’t make much difference, except to emphasise two things: there is going to be some part (/a/) that is dependent on the server environment, and some part (projects/) that is going to be dependent on some sort of framework environment. The underlying problem remains the same – how to feed these two parts into the URL generation process without making the resource handler aware of the details.

I need to do two different things. I want to serve more than one resource type from the same environment, and I want to run more than one environment from the same net location. The second I could solve by virtual servers. I (simply?) configure the http server to direct www.example.com to one place and software.example.com to another. That’s fine if I have full control over the server and is probably the ‘best’ solution. The first could also be solved in the http server if the URL is strictly hierarchical, but it can’t be avoided by limiting the site to only one resource type. Generally I am at least likely to want to refer to user objects (for access control, capturing addresses, credit cards, whatever) and, say, product objects (for the users to buy). At the very least that means choosing the object names very carefully for any particular site. On different sites, ‘users’, ‘customers’, ‘clients’, ‘patients’, or the same in the singular, may be valid options, but I have to choose just one and stick with it. (A quick read of this style guide is worth it for the reminder.)

In general, there is a three step sequence of places that might do URL dispatch – http server, application framework, and resource handler. The http handler communicates with the applications it serves using CGI. For our purposes here, the SCRIPT_PATH tells us the fist part of the path we eventually want to create, so that is what we must use in the next step.

The application framework could be null if all the routing is done in the http server, but we might need to provide something if we don’t have access to the server, or if we want to be reasonably dynamic. This framework will have a URL dispatcher, and this dispatcher may support named routes. The resource handler could delegate all the routing to the framework. This works fine, because the framework extracts the useful parts of the path and presents them to the resource handler as parameters. The resource handler, however, has to ask the framework to do URL generation and this locks the resource handler firmly to the framework.

I rather like the idea of a resource handler that consists of a dispatcher and code combination that is dedicated to handling a single resource type. I can plug this in to a framework, or serve directly from an http server. Of course, if the handler provides a user interface to a browser, then there may be some conventions to follow, or code to share, but that would be a necessary consideration whatever was done. The framework becomes little more than a dispatcher that looks like an http server. It creates an appropriate SCRIPT_PATH to hand down to the resource handler, and the resource handler can handle the remaining parts of the path.

I think I’ll work more on this idea.

1 comment » | Uncategorized

URL Dispatch in Quixote

July 5th, 2010 — 8:14am

I’ve spent some time recently thinking about implementation approaches for URL dispatch. In a recent post I spoke about some requirements for URL dispatch and promised myself that I would look at available packages. I began to wonder why it seemed that Quixote didn’t fit these requirements, and I began to follow a line of thought that had me writing my own URL dispatcher based on principles derived from Quixote. Of course, in the end it may be best to use an existing package, but along the way I’ve learnt some interesting things about Quixote, and the separation of URL from code.

The route I’m going to take starts with how Quixote separates the URL structure from the code structure. This is a key aspect of RESTfull web access, and is, perhaps, the main reason why we look for URL dispatch as a separate tool.

Quixote as URL interpreter

Quixote uses a Directory object to interpret a URL. The URL is passed to a starting Directory that strips off the first (leftmost) element of the URL and finds the matching attribute in the Directory. This attribute may reference a further Directory that looks at the next element in the URL, and so on. At some point in this sequence we reach a piece of code that generates HTML and this is the response to the URL.

For example, we might have a set of Directory type classes that looks roughly like this:

from Quixote.directory import Directory
import my_code_somewhere

class WorkStuff(Directory):
    ...

    def joblist(self):
        return my_code_somewhere.joblist_page()
    ...

class PlayStuff(self):
    ...

class MainEntry(Directory):
    ...

    work = WorkStuff()
    play = PlayStuff()

There are one or two things missing, but this is pretty much what we need. Given the URL /work/joblist, plus an indication in the publisher that MainEntry is the place to start, the code that underlies Directory identifies the work attribute of MainEntry and then finds joblist in WorkStuff. At the end of this process the function joblist_page provides the response.

There are a number of points to bring out here:

  • The code you see here represents the structure of the URLs that it will interpret. If you change the URL scheme then all you do is edit the set of Directory objects. You do not have to make any changes to joblist_page
  • The names WorkStuff and PlayStuff are arbitrary and completely insignificant. In many ways it would be fine if we could do anonymous nesting, but we have to provide names to make the link. The point is that the fact that ‘work’ and ‘WorkStuff’ appear to be related is down to convenience for humans. It does not constrain how we build a URL scheme.

The next thing to consider is the use of variables in a URL. What happens if we want to write /work/23/edit?. Well, Quixote provides a catch-all method, _q_lookup, that allows us to handle this. Take a look at:

from quixote.directory import Directory
import my_code_somewhere

class ProcessJobStuff(Directory):

    def edit(self):
        return my_code_somewhere.job_edit()

    def display(self):
        return my_code_somewhere.job_display()
        ...

class WorkStuff(Directory):
    ...

    def joblist(self):
        return my_code_somewhere.joblist_page()

    def _q_lookup(self, component):
        """ component contains the URL component that got us here."""
        set_environment('application_job_name', component)
        return ProcessJobStuff()

class PlayStuff(self):
    ...

class MainEntry(Directory):
    ...

    work = WorkStuff()
    play = PlayStuff()

In this case, the URL component 23 is not recognised as a method within WorkStuff, so the component is passed to _q_lookup. Here we simply store the component into the request environment and then proceed to ProcessJobStuff to interpret the edit component.

Once again, there is no connection between the URL interpretation structure and the responding code other than the need to provide a code entry point. It would be trivial to rearrange the structure above to interpret /work/edit/23, for example, and still link to the same job_edit responder, without changing the api for job_edit. For MVC fans, we might consider the module my_code_somewhere to be the controller, and joblist, job_edit, job_display (etc.) to be views within that.

Using indirection

In the examples above, the link between the URL endpoint and the code to be executed to build the response is done using import. This is all very Pythonic, but it might cause difficulties if you want to do something dynamic without reloading the application. Not a problem. We simply capture the appropriate information from the URL and pass it all to a ‘get this code’ function. This might be a bit more difficult to read, but it does the job:

from quixote.directory import Directory
from helpers import find_controller_and_view

class ProcessJobStuff(Directory):

    def _q_lookup(self, component):
        """ component contains the URL component that got us here."""
        set_environment('application_view', component)
        return find_controller_and_view()
        ...

class WorkStuff(Directory):
    ...

    def joblist(self):
        set_environment('application_controller', 'job_list')
        return find_controller_and_view()

    def _q_lookup(self, component):
        """ component contains the URL component that got us here."""
        set_environment('application_controller', 'job_list')
        set_environment('application_job_name', component)
        return ProcessJobStuff()

class PlayStuff(self):
    ...

class MainEntry(Directory):
    ...

    work = WorkStuff()
    play = PlayStuff()

The code in find_controller_and_view extracts the names of the controller and view from the environment and does whatever it needs to do to find the right stuff to execute, importing it if necessary.

Of course, Python allows us to modify objects on the fly, so we can also do dynamic extensions to the URL scheme by posting new attributes into the appropriate Directory objects. We may not always need many levels of indirection.

Backtracking

We have to be able to distinguish between URLs that contain variable components. We might have, for example, /work/{product}/{part} that needs to be distinguished from /work/{job}. For this we need to be able to backtrack.

Quixote’s Directory object already has a mechanism we can use: the Directory returns None if it cannot find a match. This None value is propagated back through the tree in just the same way as a valid HTML response, so we can test it at key points.

The general case looks something like this:

class SomeDictionary(Dictionary):
    ...

    def _q_lookup(component):
        """ This component can be one of a number of possibilities 
            represented by a list of possible sub-Directories
        """
        for variable_name, responder in list_of_options:
            response = responder()
            if response is not None:
                return response
        return None

We need something to manage the presence of the variable in the request environment, but I’m sure you get the picture.

Roundup

There are plenty of other things that can be done with a Directory, but I have tried your patience enough. I hope I have shown that the Directory object can be used to separate URL interpretation from the application code. Of course, it is still possible to write a URL schema that represents application structure. In my examples I use /work and /play. This appears to, and may actually, represent some split in the underlying set of applications, but that is a matter of design choice and not a constraint of the mechanism.

I think there are a number of challenges in using Directory objects.

  • The URL dispatch structure looks like (indeed, is) code. It is easy to mistake this for application code. This is something I did myself initially.
  • The process of writing the code to handle any particular dispatcher is clearly more challenging than writing map_url('/work/{job}', my_code_somewhere.job_display)
  • The result is not self documenting and can be difficult to interpret.

In fact, the Directory is really an implementation tool. The programmer is being asked to interpret, or compile, some imagined scheme into code. The set of objects we see is the result of this compilation process. The code, or something like it, would still exist even if we provided a mini language and a compiler to handle it. (How about /work/{job} = job_display if method == GET?) Does this matter? Quite possibly not. After all, URL schemes do not change much, so maybe all we need is a set of patterns to follow and some useful helpers and we are all set to go.

Of course, it still doesn’t satisfy my requirement for URL generation. That will have to be dealt with another time.

Comments Off on URL Dispatch in Quixote | Development

URL Dispatch in Python Frameworks

May 24th, 2010 — 9:48am

I have been using Quixote for some time now as my web framework. So far, I have had very little incentive to change to another framework. The code is lightweight and reliable, to the extent that I generally just forget about it. Recently, though, I have hit some issues. It’s all to do with URL dispatch and, specifically, dynamic URLs that contain data. I want to be able to write URLs like /MyPage/edit, where ‘MyPage’ is the name of a documentation page, for example, or /reports/2009/04 which might bring up a list of reports for April 2009. This looks nicer than /reports?year=2009&month=4, which is what I used to do, and makes it easier for users to bookmark pages.

Quixote does URL dispatch using python objects based on a Directory class. The URL is processed left to right, and each element of the URL identifies the next Directory object. It is all quite flexible, and, of course, the structure is built in software and bears no particular relationship to folders on disc or how a project is developed. It is remarkably easy to link in work from different projects, and this is a key advantage. Normally a Directory object recognises the text in a URL by direct match with an attribute of the object, but there is a catch-all that gives the option of processing elements that it does not otherwise recognise, and I have used this to support dynamic URLs. This works well enough, but there is no way to support the generation of the URLs in the first place. When I am generating a page with links on it I want to be able to write target_url = make_url(edit_template, target_page=MyPage), or something similar, and end up with a URL that the dispatch mechanism will recognise. Obviously, I can do this by hand, but the relationship between the template and the dispatch tree exists only in my head. So I end up with buggy links, and problems if I want to make changes.

All this prompted me to look into URL dispatch mechanisms, to see what I think I need, and to find out if there is anything out there that already does the job. So, in no particular order, this is my shopping list:

  • URL can contain data and process related fields
  • A URL identifies both an object (the data or subject matter) and the action to perform (display, edit). In model/view/controller terms, the URL provides all of the information to identify data and identify the required code components.

  • Flexibility of URL design
  • There should not be any inherent restriction on the order of items in the URL, or on how code or data related items might be placed in the pattern.

  • Ability to distinguish similar URL schemes
  • The URLs

    • /ham/{some date}/spam
    • /ham/{some name}/spam
    • /ham/{some year}/{some month}/spam

    are all similar but may be significantly different in processing. I tried some thought experiments with Quixote Directory interpretation of these forms and came to the conclusion that it might be possible to handle things like this, but there are probably easier ways.

  • Linking to code points does not impose restrictions on code structure
  • I want to build a tool that is flexible and does not enforce any particular approach. One of the claims for Quixote is that the developer simply uses their own knowledge of Python. In this vein, I don’t want to force code to be stored in a ‘controller’ directory, or to insist that a particular model-view-controller structure is used.

    Exactly how code points are identified is almost a subject in itself. For now, I just want to know that the URL dispatch process is not going to be a restriction.

  • Configuration of applications from different projects
  • This is probably another view on the previous requirement. Obviously, all applications that are going to be used in whatever environment I end up with will have to be aware of some features of the environment (such as the URL generator, for example), but I have applications that have been developed over time, and I would like the ability to stitch in new applications in the future, so a reasonable configuration process is needed; one that does not require a whole bunch of rewriting to get things working.

  • Partial path handling
  • By this I mean the ability to identify an action based on the first few fields of a URL and then pass the remaining fields to another dispatcher. Actually, this is effectively what the Quixote Directory object does, working one field at a time. I guess I’m looking for something less fine grained.

  • Fields can be referenced by name
  • This is for convenience and for reducing the possibility of error. It is likely to be easy for a dispatcher to report fields as a list. It is much more useful for me if I can use names.

  • URL generation
  • And now, the main reason for looking at this in the first place, generation of a URL, given a template, or a reference to a template. The implied requirements here are:

    • the template should be directly related to the dispatch process without need for thought or invention on the part of the developer.
    • the names used for the fields to be substituted should be the same as the names used to extract the data when interpreting the URL

Most of that is fairly obvious, and there are people out there who picked up on this years ago. The next part of this saga is to review what is out there and pick an approach.

2 comments » | Development

Back to top