Abstract URL System

a layer underneath the ArsDigita Community System by Philip Greenspun and Jon Salz

The Problem

The main engineering ideas behind the ArsDigita Community System are (1) data models, (2) sequences of URLs that lead up to transactions, and (3) the specifications for those transactions.

We need to increase the amount of abstraction in specifying the URLs.

Right now (February 2000), we happen to use AOLserver and one of the following kinds of pages:

Think about it: when the SAP guys started up in 1972 they probably did a of of things for which they are now sorry. In 30 years we will probably still have some vestiges of our data model and workflow. But the specific languages and systems being used today will likely change. In fact, we've already talked about building version fo the ACS that (a) run inside Oracle using their embedded Java Web server, (b) run with Microsoft Active Server Pages, (c) run inside Apache mod_perl. If a publisher swaps out AOLserver for one of these other systems or if we, in an ACS version upgrade, swap in .spec templating for .tcl, why should the user have to update his bookmarks?

The Solution

We register a procedure that will, given a URL with no extension, dig around in the file system to find the right actual files to deliver/execute. This is analogous to what AOLserver already does when it gets a directory name. There is also an Apache module that does some of this (see http://www.apache.org/docs/content-negotiation.html). Here's an example of the algorithm:
  1. is there a .spec file, indicating usage of the super-whizzy templating system? If so, evaluate it. If not, proceed to next step.
  2. is there a .tcl file, indicating old-style code or code that will look for a .adp template? If so, evaluate it. If not, proceed to next step.
  3. does the user's session indicate that he or she wants WML for a wireless device? If so, try to find a .wml file and serve it. If no session info or no .wml file, proceed to next step.
  4. look for a .html file
  5. look for a .txt file
  6. look for a .jpeg
  7. look for a .gif
Right now we implement a subset of this. The current algorithm (sure to be enhanced in the near future as we add support for scoping and rethink templates) is as follows:
  1. If the URL specifies a directory but doesn't have a trailing slash, append a slash to the URL and redirect (just like AOLserver would).
  2. If the URL specifies a directory and does have a trailing slash, append "index" to the URL (so we'll search for an index.* file in the filesystem).
  3. If the file corresponding to the requested URL exists (probably because the user provided the extension), just deliver the file.
  4. Find a file in the file system with the provided URL as the root (i.e., some file exists which is the URL plus some extension). Give precedence to extensions specified in the ExtensionPrecedence parameter in the abstract-url configuration section (in the order provided there). If such a file exists, deliver it.
  5. The requested resource doesn't exist - return a 404 Not Found.
We are likely to add some steps at the very beginning of this to perform scoping, e.g., check if the URL begins with a group name (and optional group type), and if so set scope variables in the environment and munge the URL accordingly.

Note that we perform a lookup even if a URL with an extension is provided. This is so we can eventually perform content negotation even within the content-type domain, e.g., serve up a document in French (foobar.html.fr) or the King's English (foobar.html.en.uk) as opposed to the default Yankeespeak (foobar.html or foobar.html.en.us) depending on the browser's Accept-Language setting.

Open questions:

Minor Benefits:
philg@mit.edu
jsalz@mit.edu