Index: openacs-4/packages/acs-lang/www/doc/i18n-requirements.adp =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-lang/www/doc/i18n-requirements.adp,v diff -u -r1.1.2.14 -r1.1.2.15 --- openacs-4/packages/acs-lang/www/doc/i18n-requirements.adp 22 Jun 2016 07:45:44 -0000 1.1.2.14 +++ openacs-4/packages/acs-lang/www/doc/i18n-requirements.adp 5 Jul 2016 12:16:50 -0000 1.1.2.15 @@ -15,7 +15,7 @@
internationalization (i18n)The provision within a computer program of the capability of making itself adaptable to the requirements of different native -languages, local customs and coded character sets.
locale
The definition of the subset of a user's environment that +languages, local customs and coded character sets.
locale
The definition of the subset of a user's environment that depends on language and cultural conventions.
localization (L10n)
The process of establishing information within a computer system specific to the operation of particular native languages, local customs and coded character sets.
globalization
A product development approach which ensures that software @@ -99,9 +99,9 @@
The site would have an end-user visible UI to support these languages, and the content management system must allow articles to be posted in these languages. In some cases it may be necessary to -make the modules' admin UI's operate in more than one supported -language, while in other cases the backend admin interface can -operate in a single language.
+make the modules' admin UI's operate in more than one +supported language, while in other cases the backend admin +interface can operate in a single language.A developer is writing a new module, and wants to make it easy for someone to localize it. There should be a clear path to author the module so that future developers can easily add support for @@ -115,17 +115,15 @@ ... ? Anyone know how they deal with i18n ? V. Related Links
-
- System/Package "coversheet" - where all documentation for -this software is linked off of
- Design document
- Developer's guide
- User's guide
- +
- System/Package "coversheet" - where all +documentation for this software is linked off of
- Design document
- Developer's guide
- User's guide
- Other-cool-system-related-to-this-one document
LI18NUX 2000 Globalization -Specification: http://www.li18nux.net/Mozilla -i18N Guidelines: +Specification: http://www.li18nux.net/
Mozilla i18N Guidelines: http://www.mozilla.org/docs/refList/i18n/l12yGuidelines.html
+Part 1: Country codes http://www.niso.org/3166.html- Test plan
- Competitive system(s)
VI Requirements
@@ -136,43 +134,45 @@ functionality.VI.A Locales
10.0 - A standard representation of locale will be used -throughout the system. A locale refers to a language and territory, -and is uniquely identified by a combination of ISO language and ISO -country abbreviations. -See -Content Repository Requirement 100.20-10.10 Provide a consistent representation and API for -creating and referencing a locale
-10.20 There will be a Tcl library of locale-aware -formatting and parsing functions for numbers, dates and times. -Note that Java has builtin support for these already.
-10.30 For each locale there will be default date, number -and currency formats.
+ A standard representation of locale will be +used throughout the system. A locale refers to a language and +territory, and is uniquely identified by a combination of ISO +language and ISO country abbreviations. +See Content Repository Requirement +100.20+10.10 Provide a consistent representation and +API for creating and referencing a locale
+10.20 There will be a Tcl library of +locale-aware formatting and parsing functions for numbers, dates +and times. Note that Java has builtin support for these +already.
+10.30 For each locale there will be default +date, number and currency formats.
VI.B Associating a Locale with a Request
20.0 - The request processor must have a mechanism for -associating a locale with each request. This locale is then used to -select the appropriate template for a request, and will also be -passed as the locale argument to the message catalog or + The request processor must have a mechanism +for associating a locale with each request. This locale is then +used to select the appropriate template for a request, and will +also be passed as the locale argument to the message catalog or locale-specific formatting functions.-20.10 The locale for a request should be computed by the -following method, in descending order of priority:
+20.10 The locale for a request should be +computed by the following method, in descending order of +priority:
- get locale associated with subsite or package id
- get locale from user preference
- get locale from site wide default
-20.20 An API will be provided for getting the current -request locale from the
+20.20 An API will be provided for getting the +current request locale from thead_conn
structure.ad_conn
structure.VI.C Resource Bundles / Content Repository
30.0 - A mechanism must be provided for a developer to group a -set of arbitrary content resources together, keyed by a unique -identifier and a locale. + A mechanism must be provided for a developer +to group a set of arbitrary content resources together, keyed by a +unique identifier and a locale.For example, what approaches could be used to implement a localizable nav-bar mechanism for a site? A navigation bar might be made up of a set of text strings and graphics, where the graphics @@ -187,39 +187,41 @@ catalog.
VI.D Message Catalog for String Translation
40.0 - A message catalog facility will provide a database of -translations for constant strings for multilingual applications. It -must support the following: + A message catalog facility will provide a +database of translations for constant strings for multilingual +applications. It must support the following:-40.10 Each message will referenced via unique a key.
-40.20 The key for a message will have some hierarchical -structure to it, so that sets of messages can be grouped with -respect to a module name or package path.
-40.30 The API for lookup of a message will take a locale -and message key as arguments, and return the appropriate +40.10 Each message will referenced via unique a +key.
+40.20 The key for a message will have some +hierarchical structure to it, so that sets of messages can be +grouped with respect to a module name or package path.
+40.30 The API for lookup of a message will take +a locale and message key as arguments, and return the appropriate translation of that message for the specifed locale.
-40.40 The API for lookup of a message will accept an -optional default string which can be used if the message key is not -found in the catalog. This lets the developer get code working and -tested in a single language before having to initialize or update a -message catalog.
-40.50 For use within templates, custom tags which invoke -the message lookup API will be provided.
-40.60 Provide a method for importing and exporting a flat -file of translation strings, in order to make it as easy as -possible to create and modify message translations in bulk without -having to use a web interface.
-40.70 Since translations may be in different character -sets, there must be provision for writing and reading catalog files -in different character sets. A mechanism must exist for identifying -the character set of a catalog file before reading it.
+40.40 The API for lookup of a message will +accept an optional default string which can be used if the message +key is not found in the catalog. This lets the developer get code +working and tested in a single language before having to initialize +or update a message catalog.
+40.50 For use within templates, custom tags +which invoke the message lookup API will be provided.
+40.60 Provide a method for importing and +exporting a flat file of translation strings, in order to make it +as easy as possible to create and modify message translations in +bulk without having to use a web interface.
+40.70 Since translations may be in different +character sets, there must be provision for writing and reading +catalog files in different character sets. A mechanism must exist +for identifying the character set of a catalog file before reading +it.
40.80 There should be a mechanism for tracking dependencies in the message catalog, so that if a string is modified, the other translations of that string can be flagged as needing update.
-40.90 The message lookup must be as efficient as possible -so as not to slow down the delivery of pages.
+40.90 The message lookup must be as efficient +as possible so as not to slow down the delivery of pages.
Design question: Is there any reason to implement the message catalog on top of the content repository as the underlying storage and retrieval service, with a layer of @@ -230,10 +232,10 @@VI.E Character Set Encoding
Character Sets-50.0 A locale will have a primary associated character -set which is used to encode text in the language. When given a -locale, we can query the system for the associated character set to -use.
+50.0 A locale will have a primary associated +character set which is used to encode text in the language. When +given a locale, we can query the system for the associated +character set to use.The assumption is that we are going to use Unicode in our database to hold all text data. Our current programming environments (Tcl/Oracle or Java/Oracle) operate on Unicode data @@ -248,19 +250,19 @@
Design question: Do we want to mandate -that all template files be stored in UTF8? I don't think so, -because most people don't have Unicode editors, or don't want to be -bothered with an extra step to convert files to UTF8 and back when -editing them in their favorite editor. +that all template files be stored in UTF8? I don't think so, +because most people don't have Unicode editors, or don't +want to be bothered with an extra step to convert files to UTF8 and +back when editing them in their favorite editor.Same question for script and template files, how do we know what language and character set they are authored in? Should we overload the filename suffix (e.g., '.shiftjis.adp', '.ja_JP.euc.adp')?
-The simplest design is probably just to -assign a default mapping from each locale to character a set: e.g. -ja_JP -> ShiftJIS, fr_FR -> ISO-8859-1. +++ (see new ACS/Java -notes) +++
+The simplest design is probably just +to assign a default mapping from each locale to character a set: +e.g. ja_JP -> ShiftJIS, fr_FR -> ISO-8859-1. +++ (see new +ACS/Java notes) +++
Tcl Source File Character Set
There are two classes of Tcl files loaded by the system; library @@ -271,129 +273,133 @@ as UTF8? That seems too much of a burden on developers.-50.10 Tcl library files can be authored in any character -set. The system must have a way to determine the character set -before loading the files, probably from the filename.
-50.20 Tcl page script files can be authored in any +50.10 Tcl library files can be authored in any character set. The system must have a way to determine the character set before loading the files, probably from the -filename.
Submitted Form Data Character Set
50.30 Data which is submitted with a HTTP request using a -GET or POST method may be in any character set. The system must be -able to determine the encoding of the form data and convert it to -Unicode on demand. +filename.+50.20 Tcl page script files can be authored in +any character set. The system must have a way to determine the +character set before loading the files, probably from the +filename.
Submitted Form Data Character Set
50.30 Data which is submitted with a HTTP request +using a GET or POST method may be in any character set. The system +must be able to determine the encoding of the form data and convert +it to Unicode on demand.-50.35 The developer must be able to override the default -system choice of character set when parsing and validating user -form data.
-50.30.10 Extra hair: In Japan and some other Asian -languages where there are multiple character set encodings in +50.35 The developer must be able to override +the default system choice of character set when parsing and +validating user form data.
+50.30.10 Extra hair: In Japan and some other +Asian languages where there are multiple character set encodings in common use, the server may need to attempt to do an auto-detection of the character set, because buggy browsers may submit form data -in an unexpected alternate encoding.
Output Character Set
50.40 The output character set for a page request will be -determined by default by the locale associated with the request -(see requirement 20.0). +in an unexpected alternate encoding.Output Character Set
50.40 The output character set for a page request +will be determined by default by the locale associated with the +request (see requirement 20.0).-50.50 It must be possible for a developer to manually -override the output character set encoding for a request using an -API function.
+50.50 It must be possible for a developer to +manually override the output character set encoding for a request +using an API function.VI.F ACS Kernel Issues
-60.10 All ACS error messages must use the -message catalog and the request locale to generate error message -for the appropriate locale. +60.10 All ACS error messages must use +the message catalog and the request locale to generate error +message for the appropriate locale.-60.20 Web server error messages such as 404, 500, etc -must also be delivered in the appropriate locale.
-60.30 Where files are written or read from disk, their -filenames must use a character set and character values which are -safe for the underlying operating system.
+60.20 Web server error messages such as 404, +500, etc must also be delivered in the appropriate locale.+60.30 Where files are written or read from +disk, their filenames must use a character set and character values +which are safe for the underlying operating system.
VI.G Templates
-70.0 For a given abstract URL, the designer may -create multiple locale-specific template files may be created (one -per locale or language) +70.0 For a given abstract URL, the +designer may create multiple locale-specific template files may be +created (one per locale or language)-70.10 For a given page request, the system must be able -to select an approprate locale-specific template file to use. The -request locale is computed as per (see requirement 20.0).
Design note: this would probably be +70.10 For a given page request, the system must +be able to select an approprate locale-specific template file to +use. The request locale is computed as per (see requirement +20.0).
Design note: this would probably be implemented by suffixing the locale or a locale abbreviation to the template filename, such as foo.ja.adp or foo.en_GB.adp.
-70.20A template file may be created for a partial locale -(language only, without a territory), and the request processor -should be able to find the closest match for the current request -locale.
-70.30 A template file may be created in any character -set. The system must have a way to know which character set a -template file contains, so it can properly process it.
Formatting Datasource Output in Templates
70.50 The properties of a datasource column may include a -datatype so that the templating system can format the output for -the current locale. The datatype is defined by a standard ACS -datatype plus a format token or format string, for example: a date -column might be specified as 'current_date:date LONG,' or -'current_date:date "YYYY-Mon-DD"' +70.20A template file may be created for a +partial locale (language only, without a territory), and the +request processor should be able to find the closest match for the +current request locale.+70.30 A template file may be created in any +character set. The system must have a way to know which character +set a template file contains, so it can properly process it.
Formatting Datasource Output in Templates
70.50 The properties of a datasource column may +include a datatype so that the templating system can format the +output for the current locale. The datatype is defined by a +standard ACS datatype plus a format token or format string, for +example: a date column might be specified as 'current_date:date +LONG,' or 'current_date:date "YYYY-Mon-DD"'Forms
70.60 The forms API must support construction of locale-specific HTML form widgets, such as date entry widgets, and form validation of user input data for locale-specific data, such as dates or numbers.-70.70 For forms which allow users to upload files, a -standard method for a user to indicate the charset of a text file -being uploaded must be provided.
Design note: this presumably applies to -uploading data to the content repository as well
+70.70 For forms which allow users to upload +files, a standard method for a user to indicate the charset of a +text file being uploaded must be provided.Design note: this presumably applies +to uploading data to the content repository as well
VI.H Sorting and Searching
-80.10 Support API for correct collation (sorting -order) on lists of strings in locale-dependent way. +80.10 Support API for correct +collation (sorting order) on lists of strings in locale-dependent +way.-80.20 For the Tcl API, we will say that locale-dependent -sorting will use Oracle SQL operations (i.e., we won't provide a -Tcl API for this). We require a Tcl API function to return the -correct incantation of NLS_SORT to use for a given locale with -
ORDER BY
clauses in queries.-80.40 The system must handle full-text search in any -supported language.
+80.20 For the Tcl API, we will say that +locale-dependent sorting will use Oracle SQL operations (i.e., we +won't provide a Tcl API for this). We require a Tcl API +function to return the correct incantation of NLS_SORT to use for a +given locale withORDER BY
clauses in queries.+80.40 The system must handle full-text search +in any supported language.
VI.G Time Zones
-90.10 Provide API support for specifying a time -zone +90.10 Provide API support for +specifying a time zone-90.20 Provide an API for computing time and date -operations which are aware of timezones. So for example a calendar -module can properly synchronize items inserted into a calendar from -users in different time zones using their own local times.
-90.30 Store all dates and times in universal time zone, -UTC.
-90.40 For a registered users, a time zone preference -should be stored.
-90.50 For a non-registered user a time zone preference -should be attached via a session or else UTC should be used to -display every date and time.
-90.60 The default if we can't determine a time zone is to -display all dates and times in some universal time zone such as -GMT.
+90.20 Provide an API for computing time and +date operations which are aware of timezones. So for example a +calendar module can properly synchronize items inserted into a +calendar from users in different time zones using their own local +times.+90.30 Store all dates and times in universal +time zone, UTC.
+90.40 For a registered users, a time zone +preference should be stored.
+90.50 For a non-registered user a time zone +preference should be attached via a session or else UTC should be +used to display every date and time.
+90.60 The default if we can't determine a +time zone is to display all dates and times in some universal time +zone such as GMT.
VI.H Database
+100.10 Since UTF8 strings can use up to three +(UCS2) or six (UCS4) bytes per character, make sure that column +size declarations in the schema are large enough to accomodate +required data (such as email addresses in Japanese).-100.10 Since UTF8 strings can use up to three (UCS2) or -six (UCS4) bytes per character, make sure that column size -declarations in the schema are large enough to accomodate required -data (such as email addresses in Japanese).
VI.I Email and Messaging
When sending an email message, just as when delivering the content in web page over an HTTP connection, it is necessary to be able to specify what character set encoding to use.-110.10 The email message sending API will allow for a -character set encoding to be specified.
-110.20 The email accepting API will allow for character -set to be parsed correctly (hopefully a well formatted message will -have a MIME character set content type header)
+110.10 The email message sending API will allow +for a character set encoding to be specified.+110.20 The email accepting API will allow for +character set to be parsed correctly (hopefully a well formatted +message will have a MIME character set content type header)
Implementation Notes
@@ -414,5 +420,5 @@
-hqm\@arsdigita.com +hqm\@arsdigita.comLast modified: $Date$