I have investigated the standard internationalization (i18n) tools, because some day users may request to get messages in the language of their choice. These first steps are recorded here, first paragraph is to get a rough mental representation of how it works, second paragraph: how to build such an app.
Here is a super simple internationalized application in Python, called myi18n.py:
import gettext
_ = gettext.gettext
gettext.bindtextdomain( "myi18n", "language.d")
gettext.textdomain("myi18n")
gettext.install("myi18n")
print _("Yozzaaa")
These sources uses the following conventional directory layout:
|-- myi18n.py
`-- language.d
|-- fr
| `-- LC_MESSAGES
| `-- myi18n.mo
`-- it
`-- LC_MESSAGES
`-- myi18n.mo
In the sample application, the bindtextdomain is the most interesting function. It takes two arguments: domain and language_dir:
the domain, (myi18n in our example) is the name of a file containing a compilation of the translated messages, and is located in a directory $language_dir/$LOCALE/LC_MESSAGES. The file always ends with .mo but do not write the suffixe in the parameter,
the language directory (language.d in our example) which defines where to find all the fr/, en/,... directories which contains a mandatory LC_MESSAGES which contains the mo files i.e like that
<language directory>
|-- <language1>
| `-- LC_MESSAGES
| `-- <domain>.mo
`-- <language2>
`-- ...
The gettext system expects:
The code presented here is meant to be copied and pasted in a terminal, at the end, you get a working app.
When writing code, just use the _() shortcut around any strings that you later want to translate even you do not translate the application right away: it is not harmful anyway
cat > myi18n.py << EOF
import gettext
_ = gettext.gettext
gettext.bindtextdomain( "myi18n", "language.d")
gettext.textdomain("myi18n")
gettext.install("myi18n")
print _( "Hello world" )
EOF
Use pygettext on the source file to create a messages.pot template translation file, which lists every messages found in the sources
pygettext myi18n.py
sed -i s/CHARSET/UTF-8/ messages.pot
.pot is for an empty template translation file; .po are for translation source files; .mo are for compiled translation files
Copy and rename the template file to a <language>.po file. One for every language you want to support. Example: fr.po, en.po. Fill the template with the message translations
sub_below () { sed -r "h; N; s/^(.*$1.*\n).*/\1$2/; P; D" $3 ; }
sub_below Hello 'msgstr "Ciao tutti"' messages.pot > it.po
sub_below Hello 'msgstr "Héllo à tous"' messages.pot > fr.po
Create the directories where the application expects the message translation
mkdir -p language.d/{fr,it}/LC_MESSAGES/
Transform the .po in .mo with the msgfmt command, in the lang_dir/fr/LC_MESSAGES/<domain>.mo
msgfmt fr.po -o language.d/fr/LC_MESSAGES/myi18n.mo
msgfmt it.po -o language.d/it/LC_MESSAGES/myi18n.mo
rm {fr,it}.po messages.pot
Test the app with the different language
for i in en fr it; do
export LANGUAGE=$i ; python myi18n.py; done
# Obviously, accents are correctly handled, it is made with python :)
unset LANGUAGE
Next time, you’ll see how to use conversion specifiers in the message strings, as in printf. Also, there are similarities between i18n and audience (audience as in support, dev, admin, grandma), I would like to see how to hack i18n to address different audience.
15 November 2009