How to make an object type searchable?
by Neophytos Demetriou (k2pts\@cytanet.com.cy)
Making an object type searchable involves three steps:
- Choose the object type
- Implement FtsContentProvider
- Add triggers
Choose the object type
In most of the cases, choosing the object type is straightforward.
However, if your object type uses the content repository then you
should make sure that your object type is a subclass of the
"content_revision" class. You should also make sure all
content is created using that subclass, rather than simply create
content with the "content_revision" type.
- Object types that don't use the CR, can be specified using
acs_object_type__create_type
, but those that use the
CR need to use content_type__create_type
.
content_type__create_type
overloads
acs_object_type__create_type
and provides two views
for inserting and viewing content data, and the CR depends on these
views. - Whenever you call content_item__new, call it with
'content_revision' as the item_subtype and
'your_content_type' as the content_type.
Implement FtsContentProvider
FtsContentProvider is comprised of two abstract operations, namely
datasource
and url
. The specification for
these operations can be found in
packages/search/sql/postgresql/search-sc-create.sql
.
You have to implement these operations for your object type by
writing concrete functions that follow the specification. For
example, the implementation of datasource
for the
object type note
, looks like this:
ad_proc notes__datasource {
object_id
} {
\@author Neophytos Demetriou
} {
db_0or1row notes_datasource {
select n.note_id as object_id,
n.title as title,
n.body as content,
'text/plain' as mime,
'' as keywords,
'text' as storage_type
from notes n
where note_id = :object_id
} -column_array datasource
return [array get datasource]
}
When you are done with the implementation of
FtsContentProvider
operations, you should let the
system know of your implementation. This is accomplished by an SQL
file which associates the implementation with a contract name. The
implementation of FtsContentProvider
for the object
type note
looks like:
select acs_sc_impl__new(
'FtsContentProvider', -- impl_contract_name
'note', -- impl_name
'notes' -- impl_owner_name
);
You should adapt this association to reflect your implementation.
That is, change impl_name
with your object type and
the impl_owner_name
to the package key. Next, you have
to create associations between the operations of
FtsContentProvider
and your concrete functions.
Here's how an association between an operation and a concrete
function looks like:
select acs_sc_impl_alias__new(
'FtsContentProvider', -- impl_contract_name
'note', -- impl_name
'datasource', -- impl_operation_name
'notes__datasource', -- impl_alias
'TCL' -- impl_pl
);
Again, you have to make some changes. Change the
impl_name
from note
to your object type
and the impl_alias
from notes__datasource
to the name that you gave to the function that implements the
operation datasource
.
Add triggers
If your object type uses the content repository to store its items,
then you are done. If not, an extra step is required to inform the
search_observer_queue of new content items, updates or deletions.
We do this by adding triggers on the table that stores the content
items of your object type. Here's how that part looks like for
note
.
create function notes__itrg ()
returns opaque as $$
begin
perform search_observer__enqueue(new.note_id,'INSERT');
return new;
end;
$$ language plpgsql;
create function notes__dtrg ()
returns opaque as $$
begin
perform search_observer__enqueue(old.note_id,'DELETE');
return old;
end;
$$ language plpgsql;
create function notes__utrg ()
returns opaque as $$
begin
perform search_observer__enqueue(old.note_id,'UPDATE');
return old;
end;
$$ language plpgsql;
create trigger notes__itrg after insert on notes
for each row execute procedure notes__itrg ();
create trigger notes__dtrg after delete on notes
for each row execute procedure notes__dtrg ();
create trigger notes__utrg after update on notes
for each row execute procedure notes__utrg ();
Questions & Answers
- Q: If content is some binary file (like a pdf file stored in
file storage, for example), will the content still be
indexable/searchable?
A: For each mime type we require some type of handler. Once the
handler is available, i.e. pdf2txt, it is very easy to incorporate
support for that mime type into the search package. Content items
with unsupported mime types will be ignored by the indexer.
- Q: Can the search package handle lobs and files?
A: Yes, the search package will convert everything into text based
on the content and storage_type attributes. Here is the convention
to use while writing the implementation of datasource:
- Content is a filename when storage_type='file'.
- Content is a lob id when storage_type='lob'.
- Content is text when storage_type='text'.