Webmail Design Document
by Erik Bielefeldt (adapted from the original by Jin Choi)
Essentials
User-accessible directory: /webmail/
Site adminstrator directory: /webmail/admin/
data model: /doc/sql/webmail.sql
procedures: /tcl/webmail-procs.tcl
ASJ Article: /asj/webmail/
Introduction
Email handlers are among the first user-level programs written
for any new operating system, and are one of the few core tools
that almost anyone using a computer will use on a regular basis.
Most recently, we have seen a blossoming of Web-based email systems
such as Hotmail and Yahoo Mail. Why build yet another mail
system?
Some of the desirable traits of a mail system are:
Centralized storage. Users should see the same email history
every time they check email, no matter which computer or email
reader they happen to be using.
Reliability. Email is important. A disk failure or a negligent
sysadmin should not be a cause for losing it. The mail server
should always be running, and well-connected to the internet.
Availability. Email should be readable wherever you are.
Completeness and correctness. An email reader should be able to
receive, display, and send attachments. Any message it sends should
be standards-conforming. Because many other systems are not, it
should be able to handle common deviations from the standard.
The webmail application addresses the first three traits (the last
is a work in progress). These requirements argue for the primary
message store to remain on a well-administered server. These are
the same needs addressed by the designers of IMAP. IMAP solves all
these issues except for one: availability; an IMAP client isn't
always installed on every computer with a net connection, whereas a
Web browser almost always is. But a Web browser is a
less-than-ideal interface for reading email when compared to an
all-singing, all-dancing mail client. Thus, the ideal mail solution
is an IMAP server with a web interface that accesses the same
message store as the IMAP client.
Historical Considerations
Mail systems with this architecture already exist. Oracle
provides software that does what the webmail application does, and
probably more. CriticalPath is a company that provides outsourced
email with IMAP and web front ends. These may be better than
webmail. CriticalPath certainly has the advantage that it requires
no effort on the part of the user other than sending them a check
every once in a while. However, Jin Choi reports that when he used
CriticalPath, it was unreachable or unuseably slow about half the
time (usually due to network problems to their server). He also ran
out of patience attempting to install Oracle Email Server.
Competitive Analysis
The downside to these other systems is lack of control. It is
difficult to modify the look or extend the features of an email
server without access to the source code. In the case of
CriticalPath, you are stuck with what they provide, and cannot
integrate it to provide web-based email as a seamless service of
your web site. If you are using the ArsDigita Community System,
webmail provides the core of a web-based email system that relies
on proven, reliable systems to do all of the hard work and that is
simple to extend. If you are not using the ACS, then perhaps
studying an implementation of a working system will aid you in
building something suitable for your own needs.
Design Tradeoffs
In reworking Jin Choi's original implementation, we sought to
improve it in a couple ways. First, Webmail was lacking some basic
functionality that many web-based email services provide, like a
decent folder system, signatures, a paged index view, forwarding
messages, and a customizable interface. The second consideration
was to improve performance, mainly through reworking the data model
and resource consuming queries. In particular, the delivery of
messages and the main index view are areas which need to be as
efficient as possible. The former restricts the volume of incoming
mail that Webmail may handle, and the latter affects both server
load and usability. The index view is not only the most used, but
one of the most expensive pages in terms of working the database.
Where I made changes to the original data-model, it will be noted
why the change was made below it.
Data Model Discussion
The following section will step through the data model, discussing
important or interesting aspects.
-- Domains for which we receive email.
create table wm_domains (
-- short text key
short_name varchar(100) not null primary key,
-- fully qualified domain name
full_domain_name varchar(100) not null
);
The wm_domains table contains the domains for which we
expect to receive mail. The short_name field stores
the text key used to differentiate mail to different domains as
discussed above. Qmail must be configured to handle email for these
domains manually.
-- Maps email accounts to ACS users.
create table wm_email_user_map (
user_id references users,
email_user_name varchar(100) not null,
delivery_address varchar(200) not null,
domain references wm_domains,
primary key (user_id)
);
wm_email_user_map assigns email addresses to ACS
users. Why not just use the email column in the
users table? This approach permits flexibility on
which address is published to other registered users and provides
an external contact if needed. As a row is inserted into this
table, the appropriate .qmail alias files are created for that
user.
delivery_address contains the full qmail delivery
address: (wm_domains.short_name || '-' || email_user_name ||
'@' || wm_domains.full_domain_name) for ease and speed of
lookup in the delivery process (otherwise we will have to re-create
it for each user on each delivery). I have removed the possibility
of one user being mapped to two Webmail accounts because of its
confusing nature. There was also some extra code involved and
performance drawbacks too.
-- Maps mailboxes (folders, in more common terminology) to ACS users.
create sequence wm_mailbox_id_sequence;
create table wm_mailboxes (
mailbox_id integer primary key,
name varchar(100) not null,
creation_user references users(user_id),
creation_date date,
uid_validity integer, -- Needed for IMAP
unique(creation_user, name)
);
A "mailbox" is what other systems would term "folders."
create table wm_messages (
msg_id integer primary key,
mailbox_id integer references wm_mailboxes,
body clob,
-- plain text portions of MIME message; empty if
-- entire message is of type text/*.
mime_text clob,
message_id varchar(500), -- RFC822 Message-ID field
msg_size integer,
date_value date,
subject_value varchar(150),
to_value varchar(150),
from_value varchar(150),
seen_p char(1) default 'f' check(seen_p in ('t','f')),
answered_p char(1) default 'f' check(answered_p in ('t','f')),
flagged_p char(1) default 'f' check(flagged_p in ('t','f')),
deleted_p char(1) default 'f' check(deleted_p in ('t','f')),
draft_p char(1) default 'f' check(draft_p in ('t','f')),
recent_p char(1) default 't' check(recent_p in ('t','f'))
);
create index wm_messages_by_message_id on wm_messages(message_id);
This is the primary message table. It stores the body of the
message, a parsed plain-text version with markers for attachments
if this is a multipart MIME message, the mailbox that this message
is currently filed in, a denormalized Message-ID field for easy
reference by Message ID, and yet another ID field for IMAP
bookkeeping. The message_id field is not unique, since the same
message may have been received multiple times.
We also store the 4 header columns which are needed for the mailbox
index view in the wm_messages table. These were seperated from the
other headers which are stored in a seperate table (see below)
because previous Webmail installations experienced problems with
slow index views. The
net effect of this is immediately visible: while index views of
mailboxes with 500 messages used to be perceptively slow, they are
now almost instantaneous with over 2000 messages. This is because
we no longer have to join the wm_headers table four
times with the wm_messages table to get all the needed
headers for the index view. This change should vastly improve the
scalability of Webmail.
The mapping of messages to mailboxes was also changed from the
previous Webmail implementation; instead of using a mapping table,
we use the mailbox_id column in the wm_messages table. This
eliminates an extra join in a good number of queries, although it
gives up the possibility of mapping messages to multiple mailboxes
(which was a feature that was not utilized in the previous Webmail
anyhow). One possibility to regain this functionality would be to
have a column in wm_messages which references a seperate table
which would contain "common" messages which multiple users could
view as normal messages. Such a feature could save resources for
intra-Webmail spam, and may be implemented in the future if deemed
necessary.
-- Stores attachments for MIME messages.
create table wm_attachments (
msg_id not null references wm_messages,
-- File name associated with attachment.
filename varchar(600) not null,
-- MIME type of attachment.
content_type varchar(100),
data blob,
primary key (msg_id, filename)
);
This table stores MIME attachments and associated information.
-- Headers for a message.
create table wm_headers (
msg_id integer not null references wm_messages,
-- field name as specified in the email
name varchar(100) not null,
value varchar(4000),
-- original order of headers
sort_order integer not null
);
create index wm_headers_by_msg_id_name on wm_headers (msg_id, lower_name);
Headers are stored separately from the message to aid in searching.
The original ordering of the headers is maintained, both so
that we can recreate the header block and because order is
significant for certain fields.
-- Table for recording messages that we failed to parse for whatever reason.
create table wm_parse_errors (
filename varchar(255) primary key not null, -- message queue file
error_message varchar(4000),
first_parse_attempt date default sysdate not null
);
If an error occurs while attempting to parse a message, we store a
record of the error in this log for the administrator to review.
Only the first occurrence of an error is logged for any file, to
prevent hundreds of identical error messages from clogging the log.
-- Used for storing attachments for outgoing messages.
-- Should be cleaned out periodically.
create sequence wm_outgoing_msg_id_sequence;
create table wm_outgoing_messages (
outgoing_msg_id integer not null primary key,
body clob,
composed_message clob,
creation_date date default sysdate not null,
creation_user not null references users
);
create table wm_outgoing_headers (
outgoing_msg_id integer not null references wm_outgoing_messages on delete cascade,
name varchar(100) not null,
value varchar(4000),
sort_order integer not null
);
create unique index wm_outgoing_headers_idx on wm_outgoing_headers (outgoing_msg_id, name);
create sequence wm_outgoing_parts_sequence;
create table wm_outgoing_message_parts (
outgoing_msg_id integer not null references wm_outgoing_messages on delete cascade,
data blob,
filename varchar(600) not null,
content_type varchar(100), -- mime type of data
sort_order integer not null,
primary key (outgoing_msg_id, sort_order)
);
-- Create a job to clean up orphaned outgoing messages every day.
create or replace procedure wm_cleanup_outgoing_msgs as
begin
delete from wm_outgoing_messages
where creation_date < sysdate - 1;
end;
/
declare
job number;
begin
dbms_job.submit(job, 'wm_cleanup_outgoing_msgs;',
interval => 'sysdate + 1');
end;
/
When composing messages for sending, the unsent message and any
attachments are stored in the database. When the message is sent, a
MIME message is composed consisting of the text of the message
followed by any attachments (there is currently no facility to
intersperse attachments with text). Instead of deleting this as
soon as it is sent, we delete old messages daily, allowing users
the chance to hit back on their browsers if they wish to resend the
previously composed messages.
Unsent outgoing attachments could as well be stored in the
filesystem, but it is easier to manage them when they are all
contained within the database.
-- PL/SQL bindings for Java procedures
create or replace procedure wm_process_queue (queuedir IN VARCHAR)
as language java
name 'com.arsdigita.mail.MessageParser.processQueue(java.lang.String)';
/
create or replace procedure wm_compose_message (outgoing_msg_id IN NUMBER)
as language java
name 'com.arsdigita.mail.MessageComposer.composeMimeMessage(int)';
/
These PL/SQL bindings for Java procedures are the heart of the
system. wm_process_queue attempts to parse every file
in the given directory as an email message, deliver it to a webmail
user, and delete the file. It is scheduled with AolServer to run
every minute. Various bugs in Oracle's dbms_job
package have proven that this is a more reliable scheduling system.
-- Trigger to delete subsidiary rows when a message is deleted.
create or replace trigger wm_messages_delete_trigger
before delete on wm_messages
for each row
begin
delete from wm_headers where msg_id = :old.msg_id;
delete from wm_attachments where msg_id = :old.msg_id;
end;
/
This trigger makes deleting messages easy; deleting from
wm_messages will also delete the appropriate rows from
any subsidiary tables.
-- interMedia index on body of message
create index wm_ctx_index on wm_messages (body)
indextype is ctxsys.context parameters ('memory 250M');
-- INSO filtered interMedia index for attachments.
create index wm_att_ctx_index on wm_attachments (data)
indextype is ctxsys.context parameters ('memory 250M filter ctxsys.inso_filter format column format');
-- Trigger to update format column for INSO index.
create or replace trigger wm_att_format_tr before insert on wm_attachments
for each row
declare
content_type varchar(100);
begin
content_type := lower(:new.content_type);
if content_type like 'text/%' or content_type like 'application/msword%' then
:new.format := 'text';
else
:new.format := 'binary';
end if;
end;
/
-- Resync the interMedia index every hour.
declare
job number;
begin
dbms_job.submit(job, 'ctx_ddl.sync_index(''wm_ctx_index'');',
interval => 'sysdate + 1/24');
dbms_job.submit(job, 'ctx_ddl.sync_index(''wm_att_ctx_index'');',
interval => 'sysdate + 1/24');
end;
/
These indices and triggers enable full-text searches over messages.
An INSO filtered index is also created to allow full-text searches
over any attachments which contain text, including formatted
documents.
Legal Transactions
/webmail/admin/
The following legal transactions can occur from the events
administration pages located under /admin/webmail/:
domains
Domains may be created and deleted.
The account size limit may be set for the domain.
accounts
Email accounts may be created or deleted.
/webmail/
The following legal transactions can occur from the events
administration pages located under /webmail/:
messages
Messages may be viewed, re-filed, or deleted.
composing messages
New messages may be composed.
Attachments may be added.
folders
Folders may be created.
Folders may be emptied of messages (their contents deleted).
User created folders may be renamed and deleted(including contents).
filters
Views may be created, edited, and deleted.
Action filters may be created, edited, and deleted.
preferences
User preferences may be edited.
API
PL/SQL Procedures
wm_process_queue (queuedir IN VARCHAR)
Processes the
mail queue directory and inserts messages into the database
(scheduled to run every minute by default)
wm_compose_message (outgoing_msg_id IN NUMBER)
Given
an outgoing_msg_id, updates the wm_outgoing_messages table and sets
the composed_message column to a complete message (including mail
headers) which is ready to send.
Tcl Procedures:
ad_proc ad_acs_webmail_id_mem {}
This is for getting
the package id so we can use ad_parameter in this file.
ad_proc wm_add_user { user_id username short_name }
Creates a new webmail account for the given user.
ad_proc wm_header_display { msg_id
header_display_style user_id }
Creates a string of "header: value" pairs
ad_proc wm_quote_message { author msg_text }
quotes message with ">" on each line
ad_proc wm_msg_permission { msg_id user_id }
Does user_id have permission to access msg_id? Returns 0
or 1
ad_proc wm_mailbox_permission { mailbox_id user_id }
Does user_id have permission to access mailbox_id? Returns 0 or
1
ad_proc wm_get_mime_part { }
Processes requests for message attachments
ad_proc wm_return_error { errmsg }
Just redirects to the webmail-error page
ad_proc wm_move_to_next { msg_id }
Redirects to the
next message in the "current_messages" client property
ad_proc wm_get_preference { user_id preference }
Gets specified preference for specified user from the
wm_preferences table
ad_proc select_default_mailbox { user_id }
sets the
default mailbox (INBOX) using ad_set_client_property
ad_proc wm_format_for_seen_or_deleted { seen_p deleted_p str }
Format an element differently for read or deleted
messages.
ad_proc accumulate_msg_id { msg_id seen_p deleted_p }
collects message data for navigation in message.tcl
ad_proc wm_likefy { string }
escapes % and \ with \
for an Oracle "like" clause
ad_proc wm_build_view_sql { user_id view_id }
Builds
the inner part of the complex index-view.tcl query for the
specified view. If you pass view_id as -1, it will attempt to get
the view from the client's browser properties (used for the "Custom
View")
ad_proc create_read_string { comp_string }
Creates
checkboxes for whether a message is read Helper to
wm_create_filter_form
ad_proc create_constraint_string { comp_object comp_type
comp_string i }
Creates constraint inputs Helper to
wm_create_filter_form
ad_proc create_age_string { comp_type comp_string }
Creates an age input constraint Helper for
wm_create_filter_form
ad_proc wm_create_filter_form { user_id edit_filter_id
{format long} }
Creates strings for displaying the form for
editing a filter view. Specify 0 for edit_filter_id to have an
empty form pass format "flat" to have the mailbox option printed 5
to a row (default is 2)
ad_proc wm_build_index_view { mailbox_id view_id mailbox_name
msg_per_page page_num orderby view_sql }
Creates a table of
the messages in the current mailbox or view See index.tcl and
index-view.tcl for use examples
ad_proc qmail {to from subject body {extraheaders {}}}
Creates a message and injects it into qmail. This proc was
originally in qmail.tcl, but since qmail.tcl is no longer
distributed with ACS 4.0, I added it here.<
ad_proc qmail_send_complete_message {from msg}
Injects
a full formed message into qmail
User Interface
The user interface for webmail includes:
An interface for the user:
Browsing available messages
Reading specific messages
Composing outgoing messages
A side nav-bar for navigating between the different webmail
functions
Preferences for customizing the user interface and
functionality of the application including:
Setting the number of messages displayed at once
Setting the refresh rate of the mailbox
Creating a signature
Setting the from name and reply-to header fields
Choosing to have messages forwarded to another email
address
Automatically saving messages
Creating, editing, deleting, and displaying different
views
Creating, editing, deleting, and displaying different
folders
Creating, editing, deleting, and displaying different filter
actions
An interface for the administrator:
Choosing domains handled by Webmail
Adding and deleting users in those domains
Viewing a list of recent Webmail errors
Setting the account size allowed for users
Configuration/Parameters
Please refer to the doc for
installation and configuration. It
covers configuring qmail, loading the data-model, the java files,
and testing and configuring the system.
Future Improvements/Areas of Likely Change
Future improvements will possibly include POP3 and/or IMAP
access, voice-xml access to messages, and LDAP interface.
Authors
System creator Jin
Choi
System owner Erik
Bielefeldt
Documentation author Jin
Choi, Erik
Bielefeldt
Erik Bielefeldt