Index: openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml,v diff -u -r1.6 -r1.7 --- openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml 12 Jul 2009 01:08:30 -0000 1.6 +++ openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml 27 Oct 2014 16:39:32 -0000 1.7 @@ -1,803 +1,803 @@ - - -%myvars; -]> - - OpenACS Internationalization Requirements - - - by Henry Minsky, - Yon Feldman, - Lars Pind, - Peter Marklund, - Christian Hvid, - and others. - - - - Introduction - - - This document describes the requirements for functionality in - the OpenACS platform to support globalization of the core and optional - modules. The goal is to make it possible to support delivery of - applications which work properly in multiple locales with the - lowest development and maintenance cost. - - - - - Definitions - - - - internationalization (i18n) - - - The provision within a computer program of the - capability of making itself adaptable to the requirements of different - native languages, local customs and coded character sets. - - - - - - locale - - - The definition of the subset of a user's environment that depends on - language and cultural conventions. - - - - - - localization (L10n) - - - The process of establishing information within a computer system - specific to the operation of particular native languages, local - customs and coded character sets. - - - - - - globalization - - - A product development approach which ensures that software products - are usable in the worldwide markets through a combination of - internationalization and localization. - - - - - - - - - - Vision Statement - -The Mozilla project suggests keeping two catchy phrases in -mind when thinking about globalization: - - - -One code base for the world - - - -English is just another language - - - -Building an application often involves making a number of -assumptions on the part of the developers which depend on their own -culture. These include constant strings in the user interface and -system error messages, names of countries, cities, order of given -and family names for people, syntax of numeric and date strings and -collation order of strings. - -The OpenACS should be able to operate in languages and regions -beyond US English. The goal of OpenACS Globalization is to provide a -clean and efficient way to factor out the locale dependent -functionality from our applications, in order to be able to easily -swap in alternate localizations. - -This in turn will reduce redundant, costly, and error prone -rework when targeting the toolkit or applications built with the -toolkit to another locale. - -The cost of porting the OpenACS to another locale without some -kind of globalization support would be large and ongoing, since -without a mechanism to incorporate the locale-specific changes -cleanly back into the code base, it would require making a new fork -of the source code for each locale. - - - - - -System/Application Overview - -A globalized application will perform some or all of the -following steps to handle a page request for a specific -locale: - - - -Decide what the target locale is for an incoming page -request - - - -Decide which character set encoding the output should be -delivered in - - - -If a script file to handle the request needs to be loaded -from disk, determine if a character set conversion needs to be -performed when loading the script - - - -If needed, locale-specific resources are fetched. These can -include text, graphics, or other resources that would vary with the -target locale. - - - -If content data is fetched from the database, check for -locale-specific versions of the data (e.g. country names). - - - -Source code should use a message catalog API to translate -constant strings in the code to the target locale - - - -Perform locale-specific linguistic sorting on data if -needed - - - -If the user submitted form input data, decide what character -set encoding conversion if any is needed. Parse locale-specific -quantities if needed (number formats, date formats). - - - -If templating is being used, select correct locale-specific -template to merge with content - - - -Format output data quantities in locale-specific manner -(date, time, numeric, currency). If templating is being used, this -may be done either before and/or after merging the data with a -template. - - - -Since the internationalization APIs may potentially be used -on every page in an application, the overhead for adding -internationalization to a module or application must not cause a -significant time delay in handling page requests. - -In many cases there are facilities in Oracle to perform -various localization functions, and also there are facilities in -Java which we will want to move to. So the design to meet the -requirements will tend to rely on these capabilities, or close -approximations to them where possible, in order to make it easier -to maintain Tcl and Java OpenACS versions. - -Use-cases and User-scenarios - -Here are the cases that we need to be able to handle -efficiently: - - - -A developer needs to author a web site/application in a -language besides English, and possibly a character set besides -ISO-8859-1. This includes the operation of the OpenACS itself, i.e., -navigation, admin pages for modules, error messages, as well as -additional modules or content supplied by the web site -developer. - -What do they need to modify to make this work? Can their -localization work be easily folded in to future releases of -OpenACS? - - - -A developer needs to author a web site which operates in -multiple languages simultaneously. For example, www.un.org with -content and navigation in multiple languages. - -The site would have an end-user visible UI to support these -languages, and the content management system must allow articles to -be posted in these languages. In some cases it may be necessary to -make the modules' admin UI's operate in more than one -supported language, while in other cases the backend admin -interface can operate in a single language. - - - -A developer is writing a new module, and wants to make it -easy for someone to localize it. There should be a clear path to -author the module so that future developers can easily add support -for other locales. This would include support for creating -resources such as message catalogs, non-text assets such as -graphics, and use of templates which help to separate application -logic from presentation. - - - -Competitive -Analysis - -Other application servers: ATG Dyanmo, Broadvision, Vignette, -... ? Anyone know how they deal with i18n ? - -Related -Links - - - -System/Package "coversheet" - where all -documentation for this software is linked off of - - - - Design document - - - - Developer's guide - - - - User's guide - - - - Other-cool-system-related-to-this-one -document -LI18NUX -2000 Globalization Specification: -http://www.li18nux.net/ - -Mozilla -i18N Guidelines: -http://www.mozilla.org/docs/refList/i18n/l12yGuidelines.html - -ISO -639:1988 Code for the representation of names of languages -http://sunsite.berkeley.edu/amher/iso_639.html - -ISO 3166-1:1997 -Codes for the representation of names of countries and their -subdivisions Part 1: Country codes -http://www.niso.org/3166.html - -IANA -Registry of Character Sets - - - -Test plan - - - -Competitive system(s) - - - -Requirements - -Because the requirements for globalization affect many areas -of the system, we will break up the requirements into phases, with -a base required set of features, and then stages of increasing -functionality. - -Locales - -10.0 -A standard representation of locale will be used throughout -the system. A locale refers to a language and territory, and is -uniquely identified by a combination of ISO language and ISO -country abbreviations. - -
-See -Content -Repository Requirement 100.20 - -10.10 Provide a consistent -representation and API for creating and referencing a locale - -10.20 There will be a Tcl library of -locale-aware formatting and parsing functions for numbers, dates -and times. Note that Java has builtin support for these -already. - -10.30 For each locale there will be -default date, number and currency formats. Currency i18n is -NOT IMPLEMENTED for 5.0.0. - - 10.40Administrators can upgrade their -servers to use new locales via the APM. NOT IMPLEMENTED in -5.0.0; current workaround is to get an xml file and load it -manually. - - -
- -
Associating a Locale with a Request - -20.0 -The request processor must have a mechanism for associating a -locale with each request. This locale is then used to select the -appropriate template for a request, and will also be passed as the -locale argument to the message catalog or locale-specific -formatting functions. - -
-20.10 The locale for a request should be -computed by the following method, in descending order of -priority: - - - -get locale associated with subsite or package id - - - -get locale from user preference - - - -get locale from site wide default - -20.20 An API will be provided for -getting the current request locale from the -ad_conn structure. - - -
- -
- -Resource Bundles / Content Repository - -30.0 -A mechanism must be provided for a developer to group a set -of arbitrary content resources together, keyed by a unique -identifier and a locale. - -For example, what approaches could be used to implement a -localizable nav-bar mechanism for a site? A navigation bar might be -made up of a set of text strings and graphics, where the graphics -themselves are locale-specific, such as images of English or -Japanese text (as on www.un.org). It should be easy to -specify alternate configurations of text and graphics to lay out -the page for different locales. - -Design note: Alternative mechanisms to implement this -functionality might include using templates, Java ResourceBundles, -content-item containers in the Content Repository, or some -convention assigning a common prefix to key strings in the message -catalog. - -Message Catalog for String Translation - -40.0 -A message catalog facility will provide a database of -translations for constant strings for multilingual applications. It -must support the following: - -
-40.10 Each message will referenced via -unique a key. - -40.20 The key for a message will have -some hierarchical structure to it, so that sets of messages can be -grouped with respect to a module name or package path. - -40.30 The API for lookup of a message -will take a locale and message key as arguments, and return the -appropriate translation of that message for the specifed -locale. - -40.40 The API for lookup of a message -will accept an optional default string which can be used if the -message key is not found in the catalog. This lets the developer -get code working and tested in a single language before having to -initialize or update a message catalog. - -40.50 For use within templates, custom -tags which invoke the message lookup API will be provided. - -40.60 Provide a method for importing and -exporting a flat file of translation strings, in order to make it -as easy as possible to create and modify message translations in -bulk without having to use a web interface. - -40.70 Since translations may be in -different character sets, there must be provision for writing and -reading catalog files in different character sets. A mechanism must -exist for identifying the character set of a catalog file before -reading it. - -40.80 There should be a mechanism for -tracking dependencies in the message catalog, so that if a string -is modified, the other translations of that string can be flagged -as needing update. - -40.90 The message lookup must be as -efficient as possible so as not to slow down the delivery of -pages. - -
- -
Character Set Encoding - -Character Sets -50.0 A locale will have a primary -associated character set which is used to encode text in the -language. When given a locale, we can query the system for the -associated character set to use. - -The assumption is that we are going to use Unicode in our -database to hold all text data. Our current programming -environments (Tcl/Oracle or Java/Oracle) operate on Unicode data -internally. However, since Unicode is not yet commonly used in -browsers and authoring tools, the system must be able to read and -write other character sets. In particular, conversions to and from -Unicode will need to be explicitly performed at the following -times: - - - -Loading source files (.tcl or .adp) or content files from the -filesystem - - - -Accepting form input data from users - - - -Delivering text output to a browser - - - -Composing an email message - - - -Writing data to the filesystem - - -Acs-templating does the following. - - - When the acs-templating package opens an an ADP or TCL file, it assumes the file is iso-8859-1. If the output charset (OutputCharset) in the AOLserver config file is set, then acs-templating assumes it's that charset. -Writing Files - - - When the acs-templating package writes an an ADP or - TCL file, it assumes the file is iso-8859-1. If the output - charset (OutputCharset) in the AOLserver config file is set, - then acs-templating assumes it's that charset. - - - - - Tcl Source File Character Set -
- - There are two classes of Tcl files loaded by the system; - library files loaded at server startup, and page script files, - which are run on each page request. - - Should we require all Tcl files be stored as UTF8? - That seems too much of a burden on developers. - - 50.10 Tcl library files can be authored - in any character set. The system must have a way to determine the - character set before loading the files, probably from the - filename. - - 50.20 Tcl page script files can be - authored in any character set. The system must have a way to - determine the character set before loading the files, probably from - the filename. -
-
- - - Submitted Form Data Character Set - - 50.30 Data which is submitted with a - HTTP request using a GET or POST method may be in any character - set. The system must be able to determine the encoding of the form - data and convert it to Unicode on demand. - - 50.35 The developer must be able to - override the default system choice of character set when parsing - and validating user form data. INCOMPLETE - form - widgets in acs-templating/tcl/date-procs.tcl are not - internationalized. Also, acs-templating's UI needs to be - internationalized by replacing all user-visible strings with - message keys. - - - 50.30.10In Japan and some - other Asian languages where there are multiple character set - encodings in common use, the server may need to attempt to do an - auto-detection of the character set, because buggy browsers may - submit form data in an unexpected alternate encoding. - - - - Output Character Set - -
- 50.40 The output character set for a - page request will be determined by default by the locale associated - with the request (see requirement 20.0). - - 50.50 It must be possible for a - developer to manually override the output character set encoding - for a request using an API function. - - -
- -
-
- -ACS Kernel Issues - -
-60.10 All OpenACS error messages must use -the message catalog and the request locale to generate error -message for the appropriate locale.NOT IMPLEMENTED for 5.0.0. - -60.20 Web server error messages such as -404, 500, etc must also be delivered in the appropriate -locale. - -60.30 Where files are written or read -from disk, their filenames must use a character set and character -values which are safe for the underlying operating system. -
- -
Templates - -
-70.0 For a given abstract URL, the -designer may create multiple locale-specific template files may be -created (one per locale or language) - -70.10 For a given page request, the -system must be able to select an approprate locale-specific -template file to use. The request locale is computed as per (see -requirement 20.0). - -70.20A template file may be created for -a partial locale (language only, without a territory), and the -request processor should be able to find the closest match for the -current request locale. - -70.30 A template file may be created in -any character set. The system must have a way to know which -character set a template file contains, so it can properly process -it. -
-Formatting -Datasource Output in Templates - -70.50 The properties of a datasource -column may include a datatype so that the templating system can -format the output for the current locale. The datatype is defined -by a standard OpenACS datatype plus a format token or format string, -for example: a date column might be specified as -'current_date:date LONG,' or 'current_date:date -"YYYY-Mon-DD"' - -Forms - -
-70.60 The forms API must support -construction of locale-specific HTML form widgets, such as date -entry widgets, and form validation of user input data for -locale-specific data, such as dates or numbers. NOT -IMPLEMENTED in 5.0.0. - -70.70 For forms which allow users to -upload files, a standard method for a user to indicate the charset -of a text file being uploaded must be provided. - -Design note: this presumably applies to uploading -data to the content repository as well -
- -
Sorting and Searching - -
-80.10 Support API for correct collation -(sorting order) on lists of strings in locale-dependent way. - -80.20 For the Tcl API, we will say that -locale-dependent sorting will use Oracle SQL operations (i.e., we -won't provide a Tcl API for this). We require a Tcl API -function to return the correct incantation of NLS_SORT to use for a -given locale with ORDER BY clauses in -queries. - -80.40 The system must handle full-text -search in any supported language. -
- -
Time Zones - -
-90.10 Provide API support for specifying -a time zone - -90.20 Provide an API for computing time -and date operations which are aware of timezones. So for example a -calendar module can properly synchronize items inserted into a -calendar from users in different time zones using their own local -times. - -90.30 Store all dates and times in -universal time zone, UTC. - -90.40 For a registered users, a time -zone preference should be stored. - -90.50 For a non-registered user a time -zone preference should be attached via a session or else UTC should -be used to display every date and time. - -90.60 The default if we can't -determine a time zone is to display all dates and times in some -universal time zone such as GMT. -
- -
Database - -
-100.10 Since UTF8 strings can use up to -three (UCS2) or six (UCS4) bytes per character, make sure that -column size declarations in the schema are large enough to -accomodate required data (such as email addresses in -Japanese). Since 5.0.0, this is covered in the database -install instructions for both PostgreSQL and Oracle. -
- -
- - - - Email and Messaging - - When sending an email message, just as when delivering the - content in web page over an HTTP connection, it is necessary to be - able to specify what character set encoding to use, defaulting to UTF-8. - -
- 110.10 The email message sending API - will allow for a character set encoding to be specified. - - 110.20 The email accepting API - allows for character set to be parsed correctly (the message has a MIME - character set content type header) -
- - Mail is not internationalized. The following issues must be addressed. - - - - Many functions still call ns_sendmail. This - means that there are different end points for sending - mail. This should be changed to use the acs-mail-lite API instead. - - - - - Consumers of email services must do - the following: Determine the appropriate language or - languages to use for the message subject and message body - and localize them (as in notifications). - - - - Extreme Use case: Web site has a default language of Danish. A forum is set up for Swedes, so the forum has a package_id and a language setting of Swedish. A poster posts to the forum in Russian (is this possible?). A user is subscribed to the forum and has a language preference of Chinese. What should be in the message body and message subject? - - Incoming mail should be localized. - - - - - - -
- - Implementation Notes - - Because globalization touches many different parts of the system, - we want to reduce the implementation risk by breaking the - implementation into phases. - - - - - Revision History - - - - - - Document Revision # - Action Taken, Notes - When? - By Whom? - - - - 1 - Updated with results of MIT-sponsored i18n work at Collaboraid. - 14 Aug 2003 - Joel Aufrecht - - - - 0.4 - converting from HTML to DocBook and importing the document to the OpenACS - kernel documents. This was done as a part of the internationalization of - OpenACS and .LRN for the Heidelberg University in Germany - 12 September 2002 - Peter Marklund - - - - 0.3 - comments from Christian - 1/14/2000 - Henry Minsky - - - - 0.2 - Minor typos fixed, clarifications to wording - 11/14/2000 - Henry Minsky - - - - 0.1 - Creation - 11/08/2000 - Henry Minsky - - - - - - - - -
- - + + +%myvars; +]> + + OpenACS Internationalization Requirements + + + by Henry Minsky, + Yon Feldman, + Lars Pind, + Peter Marklund, + Christian Hvid, + and others. + + + + Introduction + + + This document describes the requirements for functionality in + the OpenACS platform to support globalization of the core and optional + modules. The goal is to make it possible to support delivery of + applications which work properly in multiple locales with the + lowest development and maintenance cost. + + + + + Definitions + + + + internationalization (i18n) + + + The provision within a computer program of the + capability of making itself adaptable to the requirements of different + native languages, local customs and coded character sets. + + + + + + locale + + + The definition of the subset of a user's environment that depends on + language and cultural conventions. + + + + + + localization (L10n) + + + The process of establishing information within a computer system + specific to the operation of particular native languages, local + customs and coded character sets. + + + + + + globalization + + + A product development approach which ensures that software products + are usable in the worldwide markets through a combination of + internationalization and localization. + + + + + + + + + + Vision Statement + +The Mozilla project suggests keeping two catchy phrases in +mind when thinking about globalization: + + + +One code base for the world + + + +English is just another language + + + +Building an application often involves making a number of +assumptions on the part of the developers which depend on their own +culture. These include constant strings in the user interface and +system error messages, names of countries, cities, order of given +and family names for people, syntax of numeric and date strings and +collation order of strings. + +The OpenACS should be able to operate in languages and regions +beyond US English. The goal of OpenACS Globalization is to provide a +clean and efficient way to factor out the locale dependent +functionality from our applications, in order to be able to easily +swap in alternate localizations. + +This in turn will reduce redundant, costly, and error prone +rework when targeting the toolkit or applications built with the +toolkit to another locale. + +The cost of porting the OpenACS to another locale without some +kind of globalization support would be large and ongoing, since +without a mechanism to incorporate the locale-specific changes +cleanly back into the code base, it would require making a new fork +of the source code for each locale. + + + + + +System/Application Overview + +A globalized application will perform some or all of the +following steps to handle a page request for a specific +locale: + + + +Decide what the target locale is for an incoming page +request + + + +Decide which character set encoding the output should be +delivered in + + + +If a script file to handle the request needs to be loaded +from disk, determine if a character set conversion needs to be +performed when loading the script + + + +If needed, locale-specific resources are fetched. These can +include text, graphics, or other resources that would vary with the +target locale. + + + +If content data is fetched from the database, check for +locale-specific versions of the data (e.g. country names). + + + +Source code should use a message catalog API to translate +constant strings in the code to the target locale + + + +Perform locale-specific linguistic sorting on data if +needed + + + +If the user submitted form input data, decide what character +set encoding conversion if any is needed. Parse locale-specific +quantities if needed (number formats, date formats). + + + +If templating is being used, select correct locale-specific +template to merge with content + + + +Format output data quantities in locale-specific manner +(date, time, numeric, currency). If templating is being used, this +may be done either before and/or after merging the data with a +template. + + + +Since the internationalization APIs may potentially be used +on every page in an application, the overhead for adding +internationalization to a module or application must not cause a +significant time delay in handling page requests. + +In many cases there are facilities in Oracle to perform +various localization functions, and also there are facilities in +Java which we will want to move to. So the design to meet the +requirements will tend to rely on these capabilities, or close +approximations to them where possible, in order to make it easier +to maintain Tcl and Java OpenACS versions. + +Use-cases and User-scenarios + +Here are the cases that we need to be able to handle +efficiently: + + + +A developer needs to author a web site/application in a +language besides English, and possibly a character set besides +ISO-8859-1. This includes the operation of the OpenACS itself, i.e., +navigation, admin pages for modules, error messages, as well as +additional modules or content supplied by the web site +developer. + +What do they need to modify to make this work? Can their +localization work be easily folded in to future releases of +OpenACS? + + + +A developer needs to author a web site which operates in +multiple languages simultaneously. For example, www.un.org with +content and navigation in multiple languages. + +The site would have an end-user visible UI to support these +languages, and the content management system must allow articles to +be posted in these languages. In some cases it may be necessary to +make the modules' admin UI's operate in more than one +supported language, while in other cases the backend admin +interface can operate in a single language. + + + +A developer is writing a new module, and wants to make it +easy for someone to localize it. There should be a clear path to +author the module so that future developers can easily add support +for other locales. This would include support for creating +resources such as message catalogs, non-text assets such as +graphics, and use of templates which help to separate application +logic from presentation. + + + +Competitive +Analysis + +Other application servers: ATG Dyanmo, Broadvision, Vignette, +... ? Anyone know how they deal with i18n ? + +Related +Links + + + +System/Package "coversheet" - where all +documentation for this software is linked off of + + + + Design document + + + + Developer's guide + + + + User's guide + + + + Other-cool-system-related-to-this-one +document +LI18NUX +2000 Globalization Specification: +http://www.li18nux.net/ + +Mozilla +i18N Guidelines: +http://www.mozilla.org/docs/refList/i18n/l12yGuidelines.html + +ISO +639:1988 Code for the representation of names of languages +http://sunsite.berkeley.edu/amher/iso_639.html + +ISO 3166-1:1997 +Codes for the representation of names of countries and their +subdivisions Part 1: Country codes +http://www.niso.org/3166.html + +IANA +Registry of Character Sets + + + +Test plan + + + +Competitive system(s) + + + +Requirements + +Because the requirements for globalization affect many areas +of the system, we will break up the requirements into phases, with +a base required set of features, and then stages of increasing +functionality. + +Locales + +10.0 +A standard representation of locale will be used throughout +the system. A locale refers to a language and territory, and is +uniquely identified by a combination of ISO language and ISO +country abbreviations. + +
+See +Content +Repository Requirement 100.20 + +10.10 Provide a consistent +representation and API for creating and referencing a locale + +10.20 There will be a Tcl library of +locale-aware formatting and parsing functions for numbers, dates +and times. Note that Java has builtin support for these +already. + +10.30 For each locale there will be +default date, number and currency formats. Currency i18n is +NOT IMPLEMENTED for 5.0.0. + + 10.40Administrators can upgrade their +servers to use new locales via the APM. NOT IMPLEMENTED in +5.0.0; current workaround is to get an xml file and load it +manually. + + +
+ +
Associating a Locale with a Request + +20.0 +The request processor must have a mechanism for associating a +locale with each request. This locale is then used to select the +appropriate template for a request, and will also be passed as the +locale argument to the message catalog or locale-specific +formatting functions. + +
+20.10 The locale for a request should be +computed by the following method, in descending order of +priority: + + + +get locale associated with subsite or package id + + + +get locale from user preference + + + +get locale from site wide default + +20.20 An API will be provided for +getting the current request locale from the +ad_conn structure. + + +
+ +
+ +Resource Bundles / Content Repository + +30.0 +A mechanism must be provided for a developer to group a set +of arbitrary content resources together, keyed by a unique +identifier and a locale. + +For example, what approaches could be used to implement a +localizable nav-bar mechanism for a site? A navigation bar might be +made up of a set of text strings and graphics, where the graphics +themselves are locale-specific, such as images of English or +Japanese text (as on www.un.org). It should be easy to +specify alternate configurations of text and graphics to lay out +the page for different locales. + +Design note: Alternative mechanisms to implement this +functionality might include using templates, Java ResourceBundles, +content-item containers in the Content Repository, or some +convention assigning a common prefix to key strings in the message +catalog. + +Message Catalog for String Translation + +40.0 +A message catalog facility will provide a database of +translations for constant strings for multilingual applications. It +must support the following: + +
+40.10 Each message will referenced via +unique a key. + +40.20 The key for a message will have +some hierarchical structure to it, so that sets of messages can be +grouped with respect to a module name or package path. + +40.30 The API for lookup of a message +will take a locale and message key as arguments, and return the +appropriate translation of that message for the specifed +locale. + +40.40 The API for lookup of a message +will accept an optional default string which can be used if the +message key is not found in the catalog. This lets the developer +get code working and tested in a single language before having to +initialize or update a message catalog. + +40.50 For use within templates, custom +tags which invoke the message lookup API will be provided. + +40.60 Provide a method for importing and +exporting a flat file of translation strings, in order to make it +as easy as possible to create and modify message translations in +bulk without having to use a web interface. + +40.70 Since translations may be in +different character sets, there must be provision for writing and +reading catalog files in different character sets. A mechanism must +exist for identifying the character set of a catalog file before +reading it. + +40.80 There should be a mechanism for +tracking dependencies in the message catalog, so that if a string +is modified, the other translations of that string can be flagged +as needing update. + +40.90 The message lookup must be as +efficient as possible so as not to slow down the delivery of +pages. + +
+ +
Character Set Encoding + +Character Sets +50.0 A locale will have a primary +associated character set which is used to encode text in the +language. When given a locale, we can query the system for the +associated character set to use. + +The assumption is that we are going to use Unicode in our +database to hold all text data. Our current programming +environments (Tcl/Oracle or Java/Oracle) operate on Unicode data +internally. However, since Unicode is not yet commonly used in +browsers and authoring tools, the system must be able to read and +write other character sets. In particular, conversions to and from +Unicode will need to be explicitly performed at the following +times: + + + +Loading source files (.tcl or .adp) or content files from the +filesystem + + + +Accepting form input data from users + + + +Delivering text output to a browser + + + +Composing an email message + + + +Writing data to the filesystem + + +Acs-templating does the following. + + + When the acs-templating package opens an an ADP or Tcl file, it assumes the file is iso-8859-1. If the output charset (OutputCharset) in the AOLserver config file is set, then acs-templating assumes it's that charset. +Writing Files + + + When the acs-templating package writes an an ADP or + Tcl file, it assumes the file is iso-8859-1. If the output + charset (OutputCharset) in the AOLserver config file is set, + then acs-templating assumes it's that charset. + + + + + Tcl Source File Character Set +
+ + There are two classes of Tcl files loaded by the system; + library files loaded at server startup, and page script files, + which are run on each page request. + + Should we require all Tcl files be stored as UTF8? + That seems too much of a burden on developers. + + 50.10 Tcl library files can be authored + in any character set. The system must have a way to determine the + character set before loading the files, probably from the + filename. + + 50.20 Tcl page script files can be + authored in any character set. The system must have a way to + determine the character set before loading the files, probably from + the filename. +
+
+ + + Submitted Form Data Character Set + + 50.30 Data which is submitted with a + HTTP request using a GET or POST method may be in any character + set. The system must be able to determine the encoding of the form + data and convert it to Unicode on demand. + + 50.35 The developer must be able to + override the default system choice of character set when parsing + and validating user form data. INCOMPLETE - form + widgets in acs-templating/tcl/date-procs.tcl are not + internationalized. Also, acs-templating's UI needs to be + internationalized by replacing all user-visible strings with + message keys. + + + 50.30.10In Japan and some + other Asian languages where there are multiple character set + encodings in common use, the server may need to attempt to do an + auto-detection of the character set, because buggy browsers may + submit form data in an unexpected alternate encoding. + + + + Output Character Set + +
+ 50.40 The output character set for a + page request will be determined by default by the locale associated + with the request (see requirement 20.0). + + 50.50 It must be possible for a + developer to manually override the output character set encoding + for a request using an API function. + + +
+ +
+
+ +ACS Kernel Issues + +
+60.10 All OpenACS error messages must use +the message catalog and the request locale to generate error +message for the appropriate locale.NOT IMPLEMENTED for 5.0.0. + +60.20 Web server error messages such as +404, 500, etc must also be delivered in the appropriate +locale. + +60.30 Where files are written or read +from disk, their filenames must use a character set and character +values which are safe for the underlying operating system. +
+ +
Templates + +
+70.0 For a given abstract URL, the +designer may create multiple locale-specific template files may be +created (one per locale or language) + +70.10 For a given page request, the +system must be able to select an approprate locale-specific +template file to use. The request locale is computed as per (see +requirement 20.0). + +70.20A template file may be created for +a partial locale (language only, without a territory), and the +request processor should be able to find the closest match for the +current request locale. + +70.30 A template file may be created in +any character set. The system must have a way to know which +character set a template file contains, so it can properly process +it. +
+Formatting +Datasource Output in Templates + +70.50 The properties of a datasource +column may include a datatype so that the templating system can +format the output for the current locale. The datatype is defined +by a standard OpenACS datatype plus a format token or format string, +for example: a date column might be specified as +'current_date:date LONG,' or 'current_date:date +"YYYY-Mon-DD"' + +Forms + +
+70.60 The forms API must support +construction of locale-specific HTML form widgets, such as date +entry widgets, and form validation of user input data for +locale-specific data, such as dates or numbers. NOT +IMPLEMENTED in 5.0.0. + +70.70 For forms which allow users to +upload files, a standard method for a user to indicate the charset +of a text file being uploaded must be provided. + +Design note: this presumably applies to uploading +data to the content repository as well +
+ +
Sorting and Searching + +
+80.10 Support API for correct collation +(sorting order) on lists of strings in locale-dependent way. + +80.20 For the Tcl API, we will say that +locale-dependent sorting will use Oracle SQL operations (i.e., we +won't provide a Tcl API for this). We require a Tcl API +function to return the correct incantation of NLS_SORT to use for a +given locale with ORDER BY clauses in +queries. + +80.40 The system must handle full-text +search in any supported language. +
+ +
Time Zones + +
+90.10 Provide API support for specifying +a time zone + +90.20 Provide an API for computing time +and date operations which are aware of timezones. So for example a +calendar module can properly synchronize items inserted into a +calendar from users in different time zones using their own local +times. + +90.30 Store all dates and times in +universal time zone, UTC. + +90.40 For a registered users, a time +zone preference should be stored. + +90.50 For a non-registered user a time +zone preference should be attached via a session or else UTC should +be used to display every date and time. + +90.60 The default if we can't +determine a time zone is to display all dates and times in some +universal time zone such as GMT. +
+ +
Database + +
+100.10 Since UTF8 strings can use up to +three (UCS2) or six (UCS4) bytes per character, make sure that +column size declarations in the schema are large enough to +accomodate required data (such as email addresses in +Japanese). Since 5.0.0, this is covered in the database +install instructions for both PostgreSQL and Oracle. +
+ +
+ + + + Email and Messaging + + When sending an email message, just as when delivering the + content in web page over an HTTP connection, it is necessary to be + able to specify what character set encoding to use, defaulting to UTF-8. + +
+ 110.10 The email message sending API + will allow for a character set encoding to be specified. + + 110.20 The email accepting API + allows for character set to be parsed correctly (the message has a MIME + character set content type header) +
+ + Mail is not internationalized. The following issues must be addressed. + + + + Many functions still call ns_sendmail. This + means that there are different end points for sending + mail. This should be changed to use the acs-mail-lite API instead. + + + + + Consumers of email services must do + the following: Determine the appropriate language or + languages to use for the message subject and message body + and localize them (as in notifications). + + + + Extreme Use case: Web site has a default language of Danish. A forum is set up for Swedes, so the forum has a package_id and a language setting of Swedish. A poster posts to the forum in Russian (is this possible?). A user is subscribed to the forum and has a language preference of Chinese. What should be in the message body and message subject? + + Incoming mail should be localized. + + + + + + +
+ + Implementation Notes + + Because globalization touches many different parts of the system, + we want to reduce the implementation risk by breaking the + implementation into phases. + + + + + Revision History + + + + + + Document Revision # + Action Taken, Notes + When? + By Whom? + + + + 1 + Updated with results of MIT-sponsored i18n work at Collaboraid. + 14 Aug 2003 + Joel Aufrecht + + + + 0.4 + converting from HTML to DocBook and importing the document to the OpenACS + kernel documents. This was done as a part of the internationalization of + OpenACS and .LRN for the Heidelberg University in Germany + 12 September 2002 + Peter Marklund + + + + 0.3 + comments from Christian + 1/14/2000 + Henry Minsky + + + + 0.2 + Minor typos fixed, clarifications to wording + 11/14/2000 + Henry Minsky + + + + 0.1 + Creation + 11/08/2000 + Henry Minsky + + + + + + + + +
+ +