Index: openacs-4/packages/acs-core-docs/www/i18n-requirements.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/i18n-requirements.html,v diff -u -r1.13.2.1 -r1.13.2.2 --- openacs-4/packages/acs-core-docs/www/i18n-requirements.html 5 Jul 2004 19:47:30 -0000 1.13.2.1 +++ openacs-4/packages/acs-core-docs/www/i18n-requirements.html 1 Nov 2004 23:39:45 -0000 1.13.2.2 @@ -1,18 +1,18 @@ -OpenACS Internationalization Requirements

OpenACS Internationalization Requirements

by Henry Minsky, +OpenACS Internationalization Requirements

OpenACS Internationalization Requirements

by Henry Minsky, Yon Feldman, Lars Pind, Peter Marklund, Christian Hvid, and others.

OpenACS docs are written by the named authors, and may be edited by OpenACS documentation staff. -

Introduction

+

Introduction

This document describes the requirements for functionality in the OpenACS platform to support globalization of the core and optional modules. The goal is to make it possible to support delivery of applications which work properly in multiple locales with the lowest development and maintenance cost. -

Definitions

internationalization (i18n)

+

Definitions

internationalization (i18n)

The provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs and coded character sets. @@ -27,7 +27,7 @@ A product development approach which ensures that software products are usable in the worldwide markets through a combination of internationalization and localization. -

Vision Statement

The Mozilla project suggests keeping two catchy phrases in +

Vision Statement

The Mozilla project suggests keeping two catchy phrases in mind when thinking about globalization:

  • One code base for the world

  • English is just another language

Building an application often involves making a number of assumptions on the part of the developers which depend on their own culture. These include constant strings in the user interface and @@ -43,7 +43,7 @@ kind of globalization support would be large and ongoing, since without a mechanism to incorporate the locale-specific changes cleanly back into the code base, it would require making a new fork -of the source code for each locale.

System/Application Overview

A globalized application will perform some or all of the +of the source code for each locale.

System/Application Overview

A globalized application will perform some or all of the following steps to handle a page request for a specific locale:

  1. Decide what the target locale is for an incoming page request

  2. Decide which character set encoding the output should be @@ -68,7 +68,7 @@ Java which we will want to move to. So the design to meet the requirements will tend to rely on these capabilities, or close approximations to them where possible, in order to make it easier -to maintain Tcl and Java OpenACS versions.

Use-cases and User-scenarios

Here are the cases that we need to be able to handle +to maintain Tcl and Java OpenACS versions.

Use-cases and User-scenarios

Here are the cases that we need to be able to handle efficiently:

  1. A developer needs to author a web site/application in a language besides English, and possibly a character set besides ISO-8859-1. This includes the operation of the OpenACS itself, i.e., @@ -90,9 +90,9 @@ resources such as message catalogs, non-text assets such as graphics, and use of templates which help to separate application logic from presentation.

Competitive -Analysis

Other application servers: ATG Dyanmo, Broadvision, Vignette, +Analysis

Other application servers: ATG Dyanmo, Broadvision, Vignette, ... ? Anyone know how they deal with i18n ?

Related -Links

  • System/Package "coversheet" - where all +Links

Requirements

Because the requirements for globalization affect many areas +Registry of Character Sets

  • Test plan

  • Competitive system(s)

  • Requirements

    Because the requirements for globalization affect many areas of the system, we will break up the requirements into phases, with a base required set of features, and then stages of increasing -functionality.

    Locales

    10.0

    A standard representation of locale will be used throughout +functionality.

    Locales

    10.0

    A standard representation of locale will be used throughout the system. A locale refers to a language and territory, and is uniquely identified by a combination of ISO language and ISO country abbreviations.

    See @@ -121,15 +121,15 @@ NOT IMPLEMENTED for 5.0.0.

    10.40Administrators can upgrade their servers to use new locales via the APM. NOT IMPLEMENTED in 5.0.0; current workaround is to get an xml file and load it -manually.

    Associating a Locale with a Request

    20.0

    The request processor must have a mechanism for associating a +manually.

    Associating a Locale with a Request

    20.0

    The request processor must have a mechanism for associating a locale with each request. This locale is then used to select the appropriate template for a request, and will also be passed as the locale argument to the message catalog or locale-specific formatting functions.

    20.10 The locale for a request should be computed by the following method, in descending order of priority:

    • get locale associated with subsite or package id

    • get locale from user preference

    • get locale from site wide default

      20.20 An API will be provided for getting the current request locale from the -ad_conn structure.

    Resource Bundles / Content Repository

    30.0

    A mechanism must be provided for a developer to group a set +ad_conn structure.

    Resource Bundles / Content Repository

    30.0

    A mechanism must be provided for a developer to group a set of arbitrary content resources together, keyed by a unique identifier and a locale.

    For example, what approaches could be used to implement a localizable nav-bar mechanism for a site? A navigation bar might be @@ -141,7 +141,7 @@ functionality might include using templates, Java ResourceBundles, content-item containers in the Content Repository, or some convention assigning a common prefix to key strings in the message -catalog.

    Message Catalog for String Translation

    40.0

    A message catalog facility will provide a database of +catalog.

    Message Catalog for String Translation

    40.0

    A message catalog facility will provide a database of translations for constant strings for multilingual applications. It must support the following:

    40.10 Each message will referenced via unique a key.

    40.20 The key for a message will have @@ -166,7 +166,7 @@ is modified, the other translations of that string can be flagged as needing update.

    40.90 The message lookup must be as efficient as possible so as not to slow down the delivery of -pages.

    Character Set Encoding

    Character Sets

    50.0 A locale will have a primary +pages.

    Character Set Encoding

    Character Sets

    50.0 A locale will have a primary associated character set which is used to encode text in the language. When given a locale, we can query the system for the associated character set to use.

    The assumption is that we are going to use Unicode in our @@ -181,7 +181,7 @@ Writing Files

  • When the acs-templating package writes an an ADP or TCL file, it assumes the file is iso-8859-1. If the output charset (OutputCharset) in the AOLserver config file is set, - then acs-templating assumes it's that charset.

  • Tcl Source File Character Set

    There are two classes of Tcl files loaded by the system; + then acs-templating assumes it's that charset.

    Tcl Source File Character Set

    There are two classes of Tcl files loaded by the system; library files loaded at server startup, and page script files, which are run on each page request.

    Should we require all Tcl files be stored as UTF8? That seems too much of a burden on developers.

    50.10 Tcl library files can be authored @@ -190,7 +190,7 @@ filename.

    50.20 Tcl page script files can be authored in any character set. The system must have a way to determine the character set before loading the files, probably from - the filename.

    Submitted Form Data Character Set

    50.30 Data which is submitted with a + the filename.

    Submitted Form Data Character Set

    50.30 Data which is submitted with a HTTP request using a GET or POST method may be in any character set. The system must be able to determine the encoding of the form data and convert it to Unicode on demand.

    50.35 The developer must be able to @@ -203,18 +203,18 @@ other Asian languages where there are multiple character set encodings in common use, the server may need to attempt to do an auto-detection of the character set, because buggy browsers may - submit form data in an unexpected alternate encoding.

    Output Character Set

    50.40 The output character set for a + submit form data in an unexpected alternate encoding.

    Output Character Set

    50.40 The output character set for a page request will be determined by default by the locale associated with the request (see requirement 20.0).

    50.50 It must be possible for a developer to manually override the output character set encoding for a request using an API function. -

    ACS Kernel Issues

    60.10 All OpenACS error messages must use +

    ACS Kernel Issues

    60.10 All OpenACS error messages must use the message catalog and the request locale to generate error message for the appropriate locale.NOT IMPLEMENTED for 5.0.0.

    60.20 Web server error messages such as 404, 500, etc must also be delivered in the appropriate locale.

    60.30 Where files are written or read from disk, their filenames must use a character set and character -values which are safe for the underlying operating system.

    Templates

    70.0 For a given abstract URL, the +values which are safe for the underlying operating system.

    Templates

    70.0 For a given abstract URL, the designer may create multiple locale-specific template files may be created (one per locale or language)

    70.10 For a given page request, the system must be able to select an approprate locale-specific @@ -226,27 +226,27 @@ any character set. The system must have a way to know which character set a template file contains, so it can properly process it.

    Formatting -Datasource Output in Templates

    70.50 The properties of a datasource +Datasource Output in Templates

    70.50 The properties of a datasource column may include a datatype so that the templating system can format the output for the current locale. The datatype is defined by a standard OpenACS datatype plus a format token or format string, for example: a date column might be specified as 'current_date:date LONG,' or 'current_date:date -"YYYY-Mon-DD"'

    Forms

    70.60 The forms API must support +"YYYY-Mon-DD"'

    Forms

    70.60 The forms API must support construction of locale-specific HTML form widgets, such as date entry widgets, and form validation of user input data for locale-specific data, such as dates or numbers. NOT IMPLEMENTED in 5.0.0.

    70.70 For forms which allow users to upload files, a standard method for a user to indicate the charset of a text file being uploaded must be provided.

    Design note: this presumably applies to uploading -data to the content repository as well

    Sorting and Searching

    80.10 Support API for correct collation +data to the content repository as well

    Sorting and Searching

    80.10 Support API for correct collation (sorting order) on lists of strings in locale-dependent way.

    80.20 For the Tcl API, we will say that locale-dependent sorting will use Oracle SQL operations (i.e., we won't provide a Tcl API for this). We require a Tcl API function to return the correct incantation of NLS_SORT to use for a given locale with ORDER BY clauses in queries.

    80.40 The system must handle full-text -search in any supported language.

    Time Zones

    90.10 Provide API support for specifying +search in any supported language.

    Time Zones

    90.10 Provide API support for specifying a time zone

    90.20 Provide an API for computing time and date operations which are aware of timezones. So for example a calendar module can properly synchronize items inserted into a @@ -257,13 +257,13 @@ zone preference should be attached via a session or else UTC should be used to display every date and time.

    90.60 The default if we can't determine a time zone is to display all dates and times in some -universal time zone such as GMT.

    Database

    100.10 Since UTF8 strings can use up to +universal time zone such as GMT.

    Database

    100.10 Since UTF8 strings can use up to three (UCS2) or six (UCS4) bytes per character, make sure that column size declarations in the schema are large enough to accomodate required data (such as email addresses in Japanese). Since 5.0.0, this is covered in the database install instructions for both PostgreSQL and Oracle.

    Email and -Messaging

    When sending an email message, just as when delivering the +Messaging

    When sending an email message, just as when delivering the content in web page over an HTTP connection, it is necessary to be able to specify what character set encoding to use.

    110.10 The email message sending API @@ -286,10 +286,10 @@ (http://www.ietf.org/rfc/rfc3282.txt) and other RFCs.

  • Extreme Use case: Web site has a default language of Danish. A forum is set up for Swedes, so the forum has a package_id and a language setting of Swedish. A poster posts to the forum in Russian (is this possible?). A user is subscribed to the forum and has a language preference of Chinese. What should be in the message body and message subject? INCOMPLETE - The mail functions in acs_mail and acs_mail_lite -are not internationalized.

  • Incoming mail should be localized.

  • Implementation Notes

    +are not internationalized.

  • Incoming mail should be localized.

  • Implementation Notes

    Because globalization touches many different parts of the system, we want to reduce the implementation risk by breaking the implementation into phases. -

    Revision History

    Document Revision #Action Taken, NotesWhen?By Whom?
    1Updated with results of MIT-sponsored i18n work at Collaboraid.14 Aug 2003Joel Aufrecht
    0.4converting from HTML to DocBook and importing the document to the OpenACS +

    Revision History

    Document Revision #Action Taken, NotesWhen?By Whom?
    1Updated with results of MIT-sponsored i18n work at Collaboraid.14 Aug 2003Joel Aufrecht
    0.4converting from HTML to DocBook and importing the document to the OpenACS kernel documents. This was done as a part of the internationalization of OpenACS and .LRN for the Heidelberg University in Germany12 September 2002Peter Marklund
    0.3comments from Christian1/14/2000Henry Minsky
    0.2Minor typos fixed, clarifications to wording11/14/2000Henry Minsky
    0.1Creation11/08/2000Henry Minsky
    View comments on this page at openacs.org