For Internationalization to be effective, it needs to be integrated into every module in the system. Thus making the overhead as low as possible is a priority, otherwise developers will be reluctant to use it in their code.
Wherever possible, caching in AOLserver shared memory is used to remove the need to touch the database. Precompiling of template files should reduce the overhead to zero in most cases for translation message lookups. The amount of overhead added to the request processor can be reduced by caching filesystem information on matching of template files for locales.
The ACS Tcl I18N APIs should be as close as possible to the ultimate Java APIs. This means that using the same templates if possible, as well as the same message catalogs and format strings should be a strong goal.
A set of unit tests are included in the acs-lang package, to allow automatic testing after installation.
We will refer to a Locale by a combination of a language and country. In the Java Locale API there is an optional variant which can be added to a locale, which we will omit in the Tcl API.
The language is a valid ISO Language Code. These codes are the
lower-case two-letter codes as defined by ISO-639. You can find a full
list of these codes at a number of sites, such as:
http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
The country is a valid ISO Country Code. These codes are the upper-case two-letter codes as defined by ISO-3166. You can find a full list of these codes at a number of sites, such as:
http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html
Examples are
en_US English US
ja_JP Japanese
fr_FR France French.
The i18n module figures out the locale for a current request makes it accessible via the ad_locale function:
[ad_locale user locale] => fr_FR [ad_locale subsite locale] => en_USIt has not yet been decided how the user's preferred locale will be initialized. For now, there is a site wide default package parameter [ad_parameter DefaultLocale acs-lang "en_US"], and an API for setting the locale with the preference stored in a session variable: The ad_locale_set function is used to set the user's preferred locale to a desired value. It saves the value in the current session.
ad_locale_set locale "en_US" will also automatically set [ad_locale user language] ( to "en" in this case) ad_locale_set timezone "PST"The request processor should use the ad_locale API to figure out the preferred locale for a request (perhaps combining user preference with subsite defaults in some way). It will make this information accesible via the ad_conn function:
ad_conn locale
Content-Type: text/html; charset=iso-8859-1
You can obtain the preferred character set for a locale via the ad_locale API shown below:
set locale "en_US" [ad_locale charset $locale] => "iso-8859-1" or "shift_jis"Returns a case-insensitive name of a MIME character set.
We already have an AOLserver function to convert a MIME charset name to a Tcl encoding name:
[ns_encodingforcharset "iso-8859-1"] => iso8859-1
For presenting data in multiple languages, there are two basic ways to use templates for a given abstract URL. Say we have the URL "foo", for example. We can provide templates for it in the following ways:
Have a copy of each template file in each language you support, e.g., foo.en.adp, foo.fr.adp, foo.de.adp, etc.
We will refer to this style of template pages as language-specific templates.
You write your template to contain references to translation strings either from data sources or using <TRN> tags.
For example, a site might support multiple languages, but use a single file foo.adp which contains no language-specific content, and would only make use of data sources or <TRN> tags which in turn use the message catalog to look up language-specific content.
We will refer to this style of page as a multilingual template.
But for a page which has a very fixed format, such as a data entry form, it would mean a lot less redundant work to use a single template source page to handle all the languages, and to have all language-dependent strings be looked in a message catalog. We can do this either by creating data sources which call lang_message_lookup, or else use the <TRN> tag to do the same thing from within an ADP file.
Let's say you have a template file "foo.adp" and it contains calls to look up message strings using the TRN tag:
If the user requests the page foo, and their ad_locale is "en_US" then effective locale is "en_US". Message lookups are done using the effective locale. If the user's locale is "fr_FR", then the effective locale will be "fr_FR".<master> <trn key=username_prompt>Please enter your username</tr> <input type=text name=username> <p> <trn key=password_prompt>Enter Password:</trn> <input type=password name=passwd>
If we evaluate the TRN tags at compile time then we need to associate the effective locale in which the page was evaluated with the cached compiled page code.
The effective locale of a template page that has an explicit locale, such as a file named "foo.en.adp" or "foo.en_US.adp", will be that explicit locale. So for example, even if a user has a preferred locale of "fr_FR", if there is only a page named "foo.en.adp", then that page will be evaluated (and cached) with an effective locale of en_US.
We will use the following convention for naming template files: filename.locale_or_language.adp.
Examples:
foo.en_US.adp foo.en.adp foo.fr_FR.adp foo.fr.adp foo.ja_JP.adp foo.ja.adp
The user request has a locale which is of the form language_country. If someone wants English, they will implicitly be choosing a default, such as en_US or en_GB. The default locale for a language can be configured in the system locale tables. So for example the default locale for "en" could be "en_US".
The algorithm for finding the best matching template for a request in a given locale is given below:
For example, if the filename in the URL request is simply foo and [ad_conn locale] returns en_US then look for a file named foo.en_US.adp.
For example, if the URL request name is foo and [ad_conn locale] returns en_US and a file named foo.en_US.adp is not found, then look for all templates matching "en_*" as well as any template which just has the "en" suffix.
So for example if the user's locale en_GB and the following files exist:
foo.en_US.adp
then use foo.en_US.adp
If however both foo.en_US.adp and foo.en.adp exist, then use foo.en.adp preferentially, i.e., don't switch locales if you can avoid it. The reasoning here is that people can be very touchy about switching locales, so if there is a generic matching language template available for a language, use it rather than using an incorrect locale-specific template.
I.e., if the request is for en_US, and there exists a file foo.en.adp, use that.
Once a template file is found we must decide what character set it is authored in, so that we can correctly load it into Tcl (which converts it to UTF8 internally).
It would be simplest to mandate that all templates are authored in UTF8, but that is just not a practical thing to enforce at this point, I believe. Many designers and other people who actually author the HTML template files will still find it easier to use legacy tools that author in their "native" character sets, such as ShiftJIS in Japan, or BIG5 in China.
So we make the convention that the template file is authored in it's effective locale's character set. For multilingual templates, we will load the template in the site default character set as specified by the AOLserver OutputCharset initializatoin parameter. For now, we will say that authoring generic multilingual adp files can and should be done in ASCII. Eventually we can switch to using UTF8.
A character set corresponding to a locale can be found using the [ad_locale charset $locale] command. The templating system should call this right after it computes the effective locale, so it can set up that charset encoding conversion before reading the template file from disk.
We read the template file using this encoding, and set the default output character set to it as well. Inside of either the .adp page or the parent .tcl page, it is possible for the developer to issue a command to override this default output character set. The way this is done is currently to stick an explicit content-type header in the AOLserver output headers, for example to force the output to ISO-8859-1, you would do
ns_set put [ns_conn outputheaders] "content-type" "text/html; charset=iso-8859-1"
design questionWe should have an API for this. The hack now is that the adp handler adp_parse_ad_locale user_file looks at the output headers, and if it sees a content type with an explicit charset, it passes it along to ns_return.
The default character set for a template .adp file should be the default system encoding.
This default can be overridden by setting the AOLserver init parameter for the MIME type of .tcl files to include an explcit character set. If an explicit MIME type is not found, ns_encodingfortype will default to the AOLserver init parameter value DefaultCharset if it is set.
Example AOLserver .ini configuration file to set default script file and template file charset to ShiftJIS:
ns_section {ns/mimetypes } ... ns_param .tcl {text/plain; charset=shift_jis} ns_param .adp {text/html; charset=shift_jis} ns_section ns/parameters ... # charset hacking ns_param HackContentType 1 ns_param URLCharset shift_jis ns_param OutputCharset shift_jis ns_param HttpOpenCharset shift_jis ns_param DefaultCharset shift_jis
For AOLserver/TCL, to make the message catalog more manageable, we will split it into one message catalog per package, plus one default global message namespace in case we need it. So for example,
Message lookups are done using a combination of a key string and a locale or language, as well as an implicit package prefix on the key string. The API for using the message catalog is as follows:
The locale arg can actually be a full locale, or else a simple language abbrev, such as fr, en, etc. The lookup rules for finding strings based on key and locale are tried in order as follows:lang_message_lookup locale key [default_string]lang_message_lookup
is abbreviated by the procedure named "_
", which is the convention used by the GNU strings message catalog package.
[lang_message_lookup $locale notes.title "Title"] can be abbreviated by [_ $locale notes.title "Title"] # message key "title" is implicitly with respect to package key # "notes", i.e., notes.title [_ $locale title "Title"]The string is looked up by the symbolic key notes.title (or title for short), and the constant value "Title" is supplied as documentation and as a default value. Having a default value allows developers to code their application immediately without waiting to populate the message catalog.
You can override this behavior by either using a fully qualified key such as bboard.title or else by changing the message catalog namespace using the lang_set_package command:
[lang_set_package "bboard"]So for example code that runs in a scheduled proc, where there is not necessarily any concept of a "current package", would either use fully qualified keys to look up messages, or else call lang_set_package before doing a message lookup.
/packages/bboard/catalog/ bboard.iso-8859-1 bboard.shift_jis bboard.iso-8859-6A message catalog file consists of tcl code to define messages in a given language or locale:
_mr en mail_notification "This is an email notification" _mr fr mail_notification "Le notification du email" ...In the example above, if the catalog file was loaded from the bboard package, all of the keys would be prefixed autmatically with "
bboard.
".
lang_catalog_load package_keyIs used to load the message catalogs for a package. The catalog files are stored in a package subdirectory called catalog. Their file names have the form *.encoding.cat, where encoding is the name of a MIME charset encoding (not a Tcl charset name as was used in a previous version of this command).
/packages/bboard/catalog /main.iso8859-1.cat /main.shift_jis.cat /main.iso-8859-6.cat /other.iso8859-1.cat /other.shift_jis.cat /other.iso-8859-6.cat
You can add more pseudo-levels of hierarchy in naming the message keys, using any separator character you want, for example
which will be stored with the full key of bboard.alerts.mail_notification._mr fr alerts.mail_notification "Le notification du email"
<%= [_ $locale bboard.passwordPrompt "Enter Password"]%>However, this is awkward and ugly to use. We have defined an ADP tag which invokes the message catalog lookup. As explained in the previous section, since our system precompiles adp templates, we can get a performance improvement if we can cache the message lookups at template compile time.
The <TRN> tag is a call to lang_message_lookup that can be used inside of an ADP file. Here is the documention:
Procedure that gets called when the <trn> tag is encountered on an ADP page. The purpose of the procedure is to register the text string enclosed within a pair of <trn> tags as a message in the catalog, and to display the appropriate translated string. Takes three optional parameters:lang
,type
andkey
.Example 1: Display the text string Hello on an ADP page (i.e. do nothing special):
key
specifies the key in the message catalog. If it is omitted this procedure returns simply the text enclosed by the tags.lang
specifies the language of the text string enclosed within the flags. If it is ommitted value defaults to English.type
specifies the context in which the translation is made. If omitted, type is user which means that the translation is provided in the user's preferred language.static
specifies that this tag should be translated once at templat compile time, rather than dynamically every time the page is run. This will be unneccessaru and will be deprecated once we have implemented effective locale based cacheing for templates.<trn>Hello</trn>Example 2: Assign the key key hello to the text string Hello and display the translated string in the user's preferred language:<trn key="hello">Hello</trn>Example 3: Specify that Bonjour needs to be registered as the French translation for the key hello (in addition to displaying the translation in the user's preferred language):<trn key="hello" lang="fr">Bonjour</trn>Example 4: Register the string and display it in the preferred language of the current user. Note that the possible values for thetype
paramater are determined by what has been implemented in thead_locale
procedure. By default, only theuser
type is implemented. An example of a type that could be implemented issubsite
, for displaying strings in the language of the subsite that owns the current web page.<trn key="hello" type="user">Hello</trn>Example 5: Translates the string once at template compile time, using the effective local of the page.
<trn key="hello" static>Hello</trn>
Tables which are in acs kernel and have user-visible names that may need to be translated in order to create an admin back end in another language:
user groups: group_name acs_object_types: pretty_name pretty_plural acs_attributes: pretty_name pretty_plural acs_attribute_descriptions description (clob) procedure add_description- add a lang arg ? acs_enum_values ? pretty_name acs_privileges: pretty_name pretty_plural apm_package_types pretty_name pretty_plural apm_package "instance_name"? Maybe a given instance gets instantiated with a name in the desired language? apm_parameters: parameter_name section_nameOne approach is to split a table into two tables, one holding language-independent datam, and the other holding language-dependent data. This approach was described in the ASJ Multilingual Site Article.
In that case, it is convenient to create a new view which looks like the original table, with the addition of a language column that you can specify in the queries.
Extra join may slow things down
The extra join of the two
tables may cause queries to slow down, although I am not sure what the
actual performance hit might be. It shouldn't be too large, because
the join is against a fully indexed table.
ad_proc adp_parse_ad_conn_file {} { handle a request for an adp and/or tcl file in the template system. } { namespace eval template variable parse_level "" #ns_log debug "adp_parse_ad_conn_file => file '[file root [ad_conn file]]'" set parsed_template [template::adp_parse [file root [ad_conn file]] {}] db_release_unused_handles if {![empty_string_p $parsed_template]} { set content_type [ns_set iget [ns_conn outputheaders] "content-type"] if { $content_type == "" } { set content_type [ns_guesstype [ad_conn file]] } else { ns_set idelkey [ns_conn outputheaders] "content-type" } ns_return 200 $content_type $parsed_template } }
The revision history table below is for this template - modify it as needed for your actual design document.
Document Revision # | Action Taken, Notes | When? | By Whom? |
---|---|---|---|
0.1 | Creation | 12/4/2000 | Henry Minsky |
0.2 | More specific template search algorithm, extended message catalog API to use package keys or other namespace | 12/4/2000 | Henry Minsky |
0.3 | Details on how the <TRN> tag works in templates | 12/4/2000 | Henry Minsky |
0.4 | Definition of effective locale for template caching, documentation of TRN tag | 12/12/2000 | Henry Minsky |