The content repository provides a consistent sitewide interface for searching content. It uses Intermedia to index the content column of cr_revisions) as well as all the attribute columns for each content type.
The content column in cr_revisions may contain data in any text or binary format. To accommodate searches across multiple file types, the content repository uses an Intermedia index with the INSO filtering option. The INSO filter automatically detects the file type of a binary object, and extracts text from it for indexing. Most common file types are supported, including PDF and Microsoft Word, and Excel and PowerPoint.
Searching for content requires the same syntax as any text index:
select score(1), revision_id, item_id from cr_revisions r where contains(content, 'company', 1) > 0
The above query may be useful for an administrative interface where you wish to search across all revisions, but in most cases you only want to search live revisions:
select score(1), revision_id, item_id, content_item.get_path(item_id) url, title from cr_revisions where contains(content, 'company', 1) > 0 and revision_id = content_item.get_live_revision(item_id)
The URL and title may be used to construct a hyperlink directly to the item.
You may implement any number of variants on this basic query to place additional constraints on the results, such as publication date, content type, subject heading or a particular attribute (see below).
Some limitations of the current implementation include:
This task is primarily handled to two Intermedia indices:
Providing a generic mechanism for searching attributes is complicated by the fact that the attributes for each content type are different. The content repository takes advantage of the XML features in Oracle 8.1.6 to address this:
After creating a new revision and inserting attributes into the storage table for the content type and all its ancestors, you must execute the content_revision.index_attributes procedure. (Note that this cannot be called automatically by content_revision.new, since the attributes in all extended storage tables must be inserted first).
This procedure creates a row in the cr_revision_attributes table, and writes an XML document including all attributes into this row. A Java stored procedure using the Oracle XML Parser for Java v2 is used to actually generate the XML document.
A special Intermedia index configured to parse XML documents is built on the column containing the XML documents for all revisions.
The Intermedia index allows you to use the WITHIN operator to search on individual attributes if desired.
select revision_id,score(1) from cr_revisions where contains(attributes, 'company WITHIN title', 1) > 0
Some limitations of the current implementation include: