Content Repository Requirements

Karl Goldstein (karlg@arsdigita.com)
Revision History

VI.A Requirements: Data Model

5.0 MIME Types

The content repository must be able to store objects in any format, both text and binary. MIME types provide a standard set of codes for identifying the file format of each content item. For the purpose of data integrity, the repository must have a canonical list of MIME types that may be assigned to content items.

10.0 Content Types

A content type is characterized by a set of attributes that may be associated with a text or binary content object. Attributes are stored separately from their associated content object, and as such may be indexed, searched, sorted and retrieved independently. For example, attributes of a press release may include a title, byline, and publication date.

The data model must support storage of descriptive information for each content type:

10.10 Content types must be associated with unique keyword identifiers, such as press_release, so they can be referenced in data tables, queries and procedural code.

10.20 Content types must be associated with singular and plural descriptive labels, such as Press Release and Press Releases, to simplify user recognition.

10.20 Content types may specify any number of attributes. Attribute values are simple strings or numbers.

10.30 Content types may inherit attributes from any other other content type. For example, a regional press release may be a subtype of the press release type. Regional press releases have a region attribute in addition to the characteristics of a regular press release.

10.40 Part of the definition of a content type may include a description of the parent-child relationships allowed for items of this type. For example, a Press Release may contain one or more items of type Image, but it should not contain any items of type Internal Financial Status Report.

10.60 A content type definition may include a list of allowed file MIME types for items of this type.

10.70 A content type definition may include a list of tokens to identify or flag relationships with other items. For example, the content type definition for a chapter of a reference manual may include the tokens next, previous and see_also. Each type of relationship may include a minimum and/or maximum number of relationships of this type that are required for an item to be published.

20.0 Content Items

Items are the fundamental building blocks of the content repository. Each item represents a distinct text or binary content object that is publishable to the web, such as an article, report, message or photograph. An item my also include any number of attributes with more structured data, such as title, source, byline and publication date.

Content items have the following persistent characteristics which the data model must support:

20.10 Content items must have a simple unique identifier so they can be related to other objects in the system.

20.20 Each content item consists of a set of attributes and a single text or binary object.

20.25 All content items are associated with a few basic attributes to facilitate searching and development of browser interfaces to the content repository:

20.30 Each content item must be an instance of a particular content type. The content type defines the attributes associated with the content item, in addition to the basic ones described above.

20.40 A content item must have a unique, persistent URL (Uniform Resource Locator) by which it is publicly accessible, such as /press-releases/products/widget. To facilitate moving of items within the repository, the item itself should only be associated with the "tail" of the url, such as widget. The absolute URL of the item is determined by its location within the repository (See Content Organization).

20.50 It must be possible to specify the language of each item.

20.60 It must be possible to maintain a revision history for both the attributes and the text or binary object associated with a content item.

20.70. There must be a flexible mechanism for implementing access control on individual items, based on granting of permissions to groups or individual users.

20.80. A content item may be associated with any number of workflows.

20.90. Content items may themselves be "containers" or "parents" for other content items. For example, an Article may contain multiple Sections.

20.95 Each item may be associated with any number of related objects. The type and number of relationships must be constrained by the content type of the item (See 10.70 above).

30.0 Content Revision

As mentioned above, each content item may be associated with any number of revisions. The data model for revisions must support the following:

30.10. A revision consists of the complete state of the item as it existed at a certain point in time. This includes the main text or binary object associated with the item, as well as all attributes.

30.20. The data model must be extensible so that revisions for all content types (with any number of attributes) may be stored and retrieved efficiently.

40.0 Organization of the Repository

40.10. The data model must support the hierarchical organization of content items in a manner similar to a file system.

40.20. The URL of a content item should reflect its location in the hierarchy. For example, a press release with the URL /press-releases/products/new-widget is located at the third level down from the root of the hierarchy.

40.20.5 Content Folder.

A folder is analogous to a folder or directory in a file system. It represents a level in the content item hierarchy. In the previous example, press-releases is a folder under the repository root, and products is folder within that. The description of a folder may include the following information:

40.20.5.10. A URL-encoded name for building the path to folders and individual items within the folder.

40.20.5.20. A pointer to a content item that may serve as the "index" for the folder (i.e. the item that is served when the URL of the folder itself is accessed).

40.20.5.30. A set of constraints on the number and type of content items that may be stored in the folder.

40.30. It must be possible to create symbolic links or shortcuts to content items, so they may be presented at more than one URL or branch of the hierarchy.

40.30.5 Content Symbolic Link.

A symbolic link is analogous to a symlink, alias or shortcut in a file system. The description of a symbolic link must include the following information:

40.30.5.10. A URL-encoded name for the symbolic link. As for folders and items, this only represents the "tail" of the URL, with the full URL being determined by the folder in which the link is located.

40.30.5.20. A pointer to a target item which the symbolic link references..

40.30.5.30. A title or label, which may be different from the title or label of the target item.

50.0 Content Template.

The content repository should provide a means of storing and managing the templates that are merged with content items to render output in HTML or other formats. Templates are assumed to be text files containing static markup with embedded tags or code to incorporate dynamic content in appropriate places. The data model requirements for templates are a subset of those for content items.

Because they typically need to reference a specific attributes, a template is typically specific to a particular content types and its subtypes.

VI.B Requirements: Stored Procedure API

100.10 MIME Types

Since a MIME type is a required attribute of each content item, the repository must be capable of managing a list of recognized MIME types for ensuring appropriate delivery and storage of content.

100.10.10. Register a MIME type

100.10.20. Set the description of a MIME type

100.10.30. Get the description of a MIME type

100.10.40. Determine whether a MIME type is text or binary

100.10.50. Get a list of registered MIME types

100.10.60. Unregister a MIME type

It is important to note that the role of MIME types in the content repository is simply to describe the general file format of each content item. Neither the data model nor the API support the full range of allowed parameters for the general MIME types such as text/plain.

100.20 Locales

The repository must have access to a list of recognized locales for the purpose of publishing content items in multiple languages and character sets.

All content in the repository is stored in UTF-8 to facilitate searching and uniform handling of content. Locales may be specified as user preferences to configure the user interface in the following ways:

Functional requirements for locales include:

100.20.10. Register a locale, including language, territory and character set.

100.20.20. Get the language of a specified locale.

100.20.10. Get the character set code of a specified locale using either Oracle or WETF/ISO/ANSI codes.

100.20.30. Get the number, date and currency format of a specified locale.

100.20.40. Convert a text content item to a specified locale (character set).

100.20.50. Get a list of registered locales.

100.20.60. Unregister a locale.

100.30 Content Types

100.30.10. Create a content type, optionally specifying that it inherits the attributes of another content type. Multiple inheritance is not supported.

100.30.20. Get and set the singular and plural proper names for a content type.

100.30.30. Create an attribute for a content type.

100.30.40. Register a content type as a container for another content type, optionally specifying a minimum and maximum count of live items.

100.30.50. Register a content type as a container for another content type, optionally specifying a minimum and maximum count of live items.

100.30.60. Register a set of tags or tokens for labeling child items of an item of a particular content type.

100.30.70. Register a template for use with a content type, optionally specifying a use context ("intranet", "extranet") which the template is appropriate to use.

100.30.80. Register a particular type of workflow to associate with items of this content type by default.

100.30.90. Register a MIME type as valid for a content type. For example, the Image content type may only allow GIF and JPEG file formats.

100.30.95 Register a relationship with another type of object, specifying a token or name for the relationship type as well as a minimum and/or maximum number of relationships of this type that are required for the item to be published.

100.40 Content Items

100.40.10. Create a new item, specifying a parent context or the root of the repository by default.

100.40.15. Rename an item.

100.40.17. Copy an item to another location in the repository.

100.40.20. Move an item to another location in the repository.

100.40.30. Get the full path (ancestry of an item) up to the root.

100.40.35. Get the parent of an item.

100.40.40. Determine whether an item may have a child of a particular content type, based on the existing children of the item and the constraints on the content type.

100.40.45. Label a child item with a tag or token, based on the set of tags registered for the content type of the container item.

100.40.50. Get the children of an item.

100.40.55. Get the children of an item by type or tag.

100.40.60. Establish a generic relationship between any object and a content item, optionally specifying a relationship type.

100.40.70. Create a revision.

100.40.80. Mark a particular revision of an item as "live".

100.40.83. Specify a start and end time when an item should be available.

100.40.85. Clear the live revision attribute of an item, effectively removing it from public view.

100.40.90. Get a list of revisions for an item, including modifying user, date modified and comments.

100.40.95. Revert to an older revision (create a new revision based on an older revision).

100.50 Content Folders

The repository should allow for hierarchical arrangement of content items in a manner similar to a file system. The API to meet this general requirement focuses primarily on content folders:

100.50.10. Create a folder for logical groups of content items and other folders. The folder name becomes part of the distinguished URL of any items it contains. Folders may be created at the "root" or may be nested within other folders.

100.50.20. Set a label and description for a folder.

100.50.30. Get the label and description for a folder.

100.50.40. Get a list of folders contained within a folder.

100.50.50. Move a folder to another folder.

100.50.60. Copy a folder to another folder.

100.50.70. Create a symbolic link to a folder from within another folder. The contents of the folder should be accessible via the symbolic link as well as the regular path.

100.50.80. Tag all live item revisions within a folder with a common version descriptor (i.e. 'Version 1.0' or 'August 1 release'), for the purpose of versioning an entire branch of the site. Folder objects themselves are not eligible for versioning, since they are solely containers and do not have any content other than the items they contain.

100.50.90. Delete a folder if it is empty.

Note that folders are simply a special type of content item, and as such may receive the same object services as items, (namely access control and workflow). In addition to the file-system analogy afforded by folders, any type of content item may serve as a contain for other content items (see below).

Workflow

The repository must offer integration with a workflow package for managing the content production process.

100.60 Categorization

The repository must support a common hierarchical taxonomy of subject classifications that may be applied to content items.

100.60.10. Create a new subject category.

100.60.20. Create a new subject category as the child of another subject category.

100.60.30. Assign a subject category to a content item.

100.60.40. Remove a subject category from an item.

100.60.50. Get the subject categories assigned to a content item.

Search

The repository must have a standard means of indexing and searching all content.

Access Control

The repository must have a means of restricting access on an item-by-item basis.

VI.C Requirements: Presentation Layer API

The presentation layer must have access to a subset of the stored procedure API in order to search and retrieve content directly from the repository if desired.

Revision History

AuthorDateDescription
Karl Goldstein9 August 2000 Initial draft.
Karl Goldstein 22 August 2000 Added to API section.
Karl Goldstein 19 September 2000 Added data model requirements, revised API requirements, numbered all items.
Karl Goldstein 21 September 2000 Add requirements for relationships among content items and other objects.

karlg@arsdigita.com
Last Modified: $Id: requirements.html,v 1.1 2002/07/09 17:34:56 rmello Exp $