Index: openacs-4/packages/acs-core-docs/www/i18n-requirements.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/i18n-requirements.html,v diff -u -r1.20 -r1.20.2.1 --- openacs-4/packages/acs-core-docs/www/i18n-requirements.html 7 Jun 2008 20:28:49 -0000 1.20 +++ openacs-4/packages/acs-core-docs/www/i18n-requirements.html 10 Jun 2009 22:24:07 -0000 1.20.2.1 @@ -1,19 +1,19 @@ - -
Character Sets
50.0 A locale will have a primary associated character set which is used to encode text in the language. When given a locale, we can query the system for the associated character set to use.
The assumption is that we are going to use Unicode in our @@ -177,12 +177,12 @@ browsers and authoring tools, the system must be able to read and write other character sets. In particular, conversions to and from Unicode will need to be explicitly performed at the following -times:
Loading source files (.tcl or .adp) or content files from the -filesystem
Accepting form input data from users
Delivering text output to a browser
Composing an email message
Writing data to the filesystem
Acs-templating does the following.
When the acs-templating package opens an an ADP or TCL file, it assumes the file is iso-8859-1. If the output charset (OutputCharset) in the AOLserver config file is set, then acs-templating assumes it's that charset. -Writing Files
When the acs-templating package writes an an ADP or +times:
Loading source files (.tcl or .adp) or content files from the +filesystem
Accepting form input data from users
Delivering text output to a browser
Composing an email message
Writing data to the filesystem
Acs-templating does the following.
When the acs-templating package opens an an ADP or TCL file, it assumes the file is iso-8859-1. If the output charset (OutputCharset) in the AOLserver config file is set, then acs-templating assumes it's that charset. +Writing Files
When the acs-templating package writes an an ADP or TCL file, it assumes the file is iso-8859-1. If the output charset (OutputCharset) in the AOLserver config file is set, - then acs-templating assumes it's that charset.
There are two classes of Tcl files loaded by the system; + then acs-templating assumes it's that charset.
There are two classes of Tcl files loaded by the system; library files loaded at server startup, and page script files, which are run on each page request.
Should we require all Tcl files be stored as UTF8? That seems too much of a burden on developers.
50.10 Tcl library files can be authored @@ -191,31 +191,31 @@ filename.
50.20 Tcl page script files can be authored in any character set. The system must have a way to determine the character set before loading the files, probably from - the filename.
50.30 Data which is submitted with a HTTP request using a GET or POST method may be in any character set. The system must be able to determine the encoding of the form data and convert it to Unicode on demand.
50.35 The developer must be able to override the default system choice of character set when parsing - and validating user form data. INCOMPLETE - form + and validating user form data. INCOMPLETE - form widgets in acs-templating/tcl/date-procs.tcl are not internationalized. Also, acs-templating's UI needs to be internationalized by replacing all user-visible strings with - message keys.
50.30.10In Japan and some + message keys.
50.30.10In Japan and some other Asian languages where there are multiple character set encodings in common use, the server may need to attempt to do an auto-detection of the character set, because buggy browsers may - submit form data in an unexpected alternate encoding.
50.40 The output character set for a + submit form data in an unexpected alternate encoding.
50.40 The output character set for a page request will be determined by default by the locale associated with the request (see requirement 20.0).
50.50 It must be possible for a developer to manually override the output character set encoding for a request using an API function. -
60.10 All OpenACS error messages must use the message catalog and the request locale to generate error -message for the appropriate locale.NOT IMPLEMENTED for 5.0.0.
60.20 Web server error messages such as +message for the appropriate locale.NOT IMPLEMENTED for 5.0.0.
60.20 Web server error messages such as 404, 500, etc must also be delivered in the appropriate locale.
60.30 Where files are written or read from disk, their filenames must use a character set and character -values which are safe for the underlying operating system.
70.0 For a given abstract URL, the +values which are safe for the underlying operating system.
70.0 For a given abstract URL, the designer may create multiple locale-specific template files may be created (one per locale or language)
70.10 For a given page request, the system must be able to select an approprate locale-specific @@ -226,28 +226,28 @@ current request locale.
70.30 A template file may be created in any character set. The system must have a way to know which character set a template file contains, so it can properly process -it.
70.50 The properties of a datasource column may include a datatype so that the templating system can format the output for the current locale. The datatype is defined by a standard OpenACS datatype plus a format token or format string, for example: a date column might be specified as 'current_date:date LONG,' or 'current_date:date -"YYYY-Mon-DD"'
70.60 The forms API must support +"YYYY-Mon-DD"'
70.60 The forms API must support construction of locale-specific HTML form widgets, such as date entry widgets, and form validation of user input data for locale-specific data, such as dates or numbers. NOT IMPLEMENTED in 5.0.0.
70.70 For forms which allow users to upload files, a standard method for a user to indicate the charset of a text file being uploaded must be provided.
Design note: this presumably applies to uploading -data to the content repository as well
80.10 Support API for correct collation (sorting order) on lists of strings in locale-dependent way.
80.20 For the Tcl API, we will say that locale-dependent sorting will use Oracle SQL operations (i.e., we won't provide a Tcl API for this). We require a Tcl API function to return the correct incantation of NLS_SORT to use for a -given locale with ORDER BY clauses in +given locale with
ORDER BY
clauses in queries.80.40 The system must handle full-text -search in any supported language.
90.10 Provide API support for specifying a time zone
90.20 Provide an API for computing time and date operations which are aware of timezones. So for example a calendar module can properly synchronize items inserted into a @@ -258,39 +258,29 @@ zone preference should be attached via a session or else UTC should be used to display every date and time.
90.60 The default if we can't determine a time zone is to display all dates and times in some -universal time zone such as GMT.
100.10 Since UTF8 strings can use up to three (UCS2) or six (UCS4) bytes per character, make sure that column size declarations in the schema are large enough to accomodate required data (such as email addresses in -Japanese). Since 5.0.0, this is covered in the database -install instructions for both PostgreSQL and Oracle.
When sending an email message, just as when delivering the -content in web page over an HTTP connection, it is necessary to be -able to specify what character set encoding to use. -
110.10 The email message sending API -will allow for a character set encoding to be specified.
110.20 The email accepting API will -allow for character set to be parsed correctly (hopefully a well -formatted message will have a MIME character set content type header)
Mail is not internationalized. The following issues -must be addressed.
- Six different functions currently call ns_sendmail. This - means that there are six different end points for sending - mail. This should be brought down to no more than two (one - for acs_mail and one for acs_mail_lite), and ideally just - one. Functions that currently call ns_sendmail directly - should instead call acs_mail_lite. -
- Outgoing email functions (acs_mail and acs_mail_lite) must do - the following: 1) Determine the appropriate language or - languages to use for the message subject and message body. 2) - Encode the subject and body appropriately and set message - headers, in accordance with RFC 3282 - (http://www.ietf.org/rfc/rfc3282.txt) and other RFCs. -
Extreme Use case: Web site has a default language of Danish. A forum is set up for Swedes, so the forum has a package_id and a language setting of Swedish. A poster posts to the forum in Russian (is this possible?). A user is subscribed to the forum and has a language preference of Chinese. What should be in the message body and message subject? -INCOMPLETE - The mail functions in acs_mail and acs_mail_lite -are not internationalized.
Incoming mail should be localized.
When sending an email message, just as when delivering the + content in web page over an HTTP connection, it is necessary to be + able to specify what character set encoding to use, defaulting to UTF-8.
110.10 The email message sending API + will allow for a character set encoding to be specified.
110.20 The email accepting API + allows for character set to be parsed correctly (the message has a MIME + character set content type header)
Mail is not internationalized. The following issues must be addressed.
+ Many functions still call ns_sendmail. This + means that there are different end points for sending + mail. This should be changed to use the acs-mail-lite API instead. +
+ Consumers of email services must do + the following: Determine the appropriate language or + languages to use for the message subject and message body + and localize them (as in notifications). +
Extreme Use case: Web site has a default language of Danish. A forum is set up for Swedes, so the forum has a package_id and a language setting of Swedish. A poster posts to the forum in Russian (is this possible?). A user is subscribed to the forum and has a language preference of Chinese. What should be in the message body and message subject?
Incoming mail should be localized.
Because globalization touches many different parts of the system, we want to reduce the implementation risk by breaking the implementation into phases. -
Document Revision # | Action Taken, Notes | When? | By Whom? | ||||||||||||||||||||||
1 | Updated with results of MIT-sponsored i18n work at Collaboraid. | 14 Aug 2003 | Joel Aufrecht | ||||||||||||||||||||||
0.4 | converting from HTML to DocBook and importing the document to the OpenACS
+
|