July 28, 2010
There are several tutorials explaining how to add i18n (internationalization) support to a pygtk application (a very good, if outdated one, can be found here). Many of them cover only the basics, leaving it up to the reader to find out how integrate it into the development process. This tutorial aims to fill that gap by showing how to add translation support, how to integrate it into the build process, and how to use launchpad as a front end to providing translations.
This tutorial was written whilst I was adding translation support to prioritise. All of the problems I faced are included in this article, along with solutions where available. You can download the code to see the complete implementation.
GNU gettext is used to provide translation support using the gettext module which comes as standard with python. This module includes everything you need to translate a pygtk application.
UPDATE 10/08/2010: Code examples now use the wordpress built-in syntax highlighter.
1. The code
A string is translated by passing it to gettext.gettext function which returns the correct text for the current domain, language code, and locale directory settings (more on these in a moment). This function is aliased to _ to improve readability (more on how this is achieved late).
print gettext.gettext("translate me") print _("translate me")
The rest of this tutorial refers to both forms as gettext.
Mark strings for translation
The first task is to go through your code and mark all translatable strings by passing them to gettext function. Be careful to only mark strings that require translation. For example, the following would cause Prioritise to crash if it were translated:
This is because the text “main_window” refers to the main_window object in the glade file.
Translatable strings that contain variables
Special attention must be given to translatable strings that contain variable values.
destination = "Paris" print _("You want to go to ") + destination + _(". Great choice!")
The above print statement contains two translatable strings. This means that the message displayed to the user, which is what we intend to translate, has actually been broken up into two messages, resulting in two translations. This is untidy in the best situation, but it almost all situations it will result in incorrect translations.
As word order is not the same in all languages, this message becomes impossible to translate correctly. For example, in English the destination goes after the second verb, but in Dutch, it’s placed after it:
print _("Uw wilt naar ") + destination + _(" gaan. Goed keuze!")
The most obvious way to solve this to move the second verb to the second part of the translation. However, this means that the second translation is now incorrect.
The solution to this problem is to use string formatting, where the translatable text is a single string containing placeholders for variables:
print _("You want to go to %s. Great choice!") % (destination) print _("Uw wilt naar %s gaan. Goed keuze!") % (destination)
The formatting symbol representing the variable can be easily moved in the sentence when translated, resulting in a correct translation. Moreover, the message is no longer broken into pieces. This method does, however, come with a drawback – if the message doesn’t contain the same number of symbols as variable arguments, it will assert.
Commenting translatable text
It’s not always clear from the origin text what the context of the translation is. Moreover, the translator doesn’t see variables used to replace formatting symbols.
In such cases it’s good practice to provide a translation comment giving additional information to the translator. This comment is extracted along with the text to be translated.
# TRANSLATOR: %s refers to the holiday destination print _("You want to go to %s. Great choice!") % (destination)
Translating text in glade
If you’ve built your pygtk gui using the glade interface designer, the resulting glade files also need to have translatable text marked. Most controls in the editor are set to translatable, however some have not (e.g. gtk.Action). This can be resolved in one of two ways:
- Editing ‘glade3/plugins/gtk+/gtk+.xml’ to give elements translation support (see this link)
- Editing the resulting glade file by adding the string ‘translatable=”yes”‘ to each applicable element
The first fix is permanent, and as I understand is already fixed in the development branch for the editor. The second solution will work, but every time you save the file in the glade editor, it will remove the changes.
I also found a second problem with toolbuttons that use a gtk.Action: their labels still require translation. Since the labels are often left at their default value, the translation template becomes full of “toolbuttonX” items. This has no adverse effect on the application, but is messy. The simplest way to resolve this is to open the glade file in a text editor, find the offending items, and set translatable=”no”. In this case, the change is permanent even if changes are made in the glade editor.
Importing and configuring gettext
If you run your application at this point, you should be presented with the following error:
NameError: name '_' is not defined
This is because we need to setup the application to use gettext.
At the beginning of this chapter I mentioned that gettext.gettext returns the translated text for a domain, language code, and locale directory. These map onto:
- Locale directory: the root folder of all translations for all languages on the system (e.g. /usr/share/locale)
- Domain: the filename containing the translations (extension .mo, see the following section)
- Language code: the directory name inside the locale directory (e.g. en_GB)
Using the examples above, if the locale directory were ‘/usr/share/locale’, the domain ‘prioritise’, and the language code ‘en_GB’, the path to the binary translation file would be:
Being able to control the domain, language code, and locale directory independently of one another allows for powerful options such as changing languages at run-time. However this tutorial will only setup gettext to retrieve translations based on the current system locale.
Only a handful of calls are needed to set up gettext. This is best shown by example:
import gettext import gtk import gtk.glade import locale import os import sys # setup translation support (lang_code, encoding) = locale.getlocale() LOCALE_DOMAIN = 'prioritise' LOCALE_DIR = os.path.join(sys.prefix, 'local', 'share', 'locale') gettext.bindtextdomain(LOCALE_DOMAIN, LOCALE_DIR) # (1) gettext.textdomain(LOCALE_DOMAIN) # (2) gettext.install(LOCALE_DOMAIN) # (3) gtk.glade.bindtextdomain(LOCALE_DOMAIN, LOCALE_DIR) # (4) gtk.glade.textdomain(LOCALE_DOMAIN) # (5)
The import lines in setting up gettext are numbered 1 to 5. Below is an explanation of what each line does:
- Sets up a mapping from the domain to the locale directory. Allows gettext for get the locale directory for a given domain.
- Sets up the global domain. This is the domain used when gettext is installed globally (see #3).
- Sets up the global alias for gettext to _() into python’s builtins namespace for the given domain. The locale directory used was set in #1.
- The same as #1 but for glade.
- The same as #2 but for glade.
A lot of tutorials online, and even some python applications I used for guidance, use python trickery to make _ globally available. This is not required. Line three does everything you need to access gettext.gettext using the alias _.
This section glossed over a lot of the features of gettext. It’s capable of a lot more. I recommend reading through the documentation.
If you run your application now, the error shown previously will be gone and your application will run.
2. The translation files
It’s important to understand how the translation files relate to one another. There are three types:
- Portable Object Template (.pot)
- Portable Object (.po)
- Machine Object (.mo)
The template file is generated from your code (including glade files) and contains all the translatable strings in your application. This is used to generate a .po file for each language your application will support; the .po files contain the actual translations. Finally, each .po file is used to generate a corresponding .mo file. An .mo file is a binary version of the .po file which is used by your program to get the translated text.
For Prioritise, I created scripts for the following processes to make the job quicker and less error prone. The last point is especially important when you want to merge a new template with existing translations. The scripts can be found in the scripts directory in launchpad or in the download source tarball.
Generating the template (.pot)
The template needs to contain all the translatable text from the python source files and glade interface designer files. First we need to extract the glade strings into a compatible format:
This will output a .h file for each glade file, which can be included with the source files to pygettext, the tool which creates the .pot template:
pygettext -d prioritise -k _ -k N_ -p ../po
There are several arguments passed to this function:
- -d prioritise: this is the domain we are using, and must match what we put in the code
- -k _ -k N_: these specify keywords which mark where translatable strings can be found in our source files. In our python files, we use_(), so _ is our keyword. However, the intltool-extract uses N_() for the glade files, so we have to specify this also.
- -p ../po: this is the output directory where the .pot file will be created
- <source files>: this is a space separated list of .py and .h files
Whenever you make changes to the translatable text in your application, you need to regenerate the template. It doesn’t matter if this overwrites the previous template.
Creating the portable objects (.po)
When you provide support for a language for the first time, you need to create a .po file. This is done using the msginit command:
msginit --no-translator -i -l -o
The –no-translator option prevents msginit from prompting for a translator email address. This is useful when you automate the generate as I have. The other options are self-explanatory.
You only need to create a .po file for a given language once. When changes are made to the template, it should be merged with existing .po files. See ‘Merging updated templates’ below.
Building the machine objects (.mo)
Now we have the translation files, we need to compile these into the machine object (.mo) files that are readable by gettext. To do this, we use msgfmt.py, which is distributed with the python gettext module. For convenience, I distribute this with prioritise:
Where src is the .po file and destination is the location to store them. This location must be relative to the LOCALE_DIRECTORY specified in your python code.
However, building the .mo file is typically done at installation time. Your program should be shipped with the .pot and .po files, and during installation the .mo files will be generated. See the section ‘Integrating with Distutils’ below for more information on how to do this.
Merging updated templates
There will come a time when you’ll need to add new strings so they can be translated. If you create .po files from the new templates, old translations will be lost. Instead, you need to merge the new .pot file with the existing .po translations using the msgmerge command:
msgmerge --update <.pot file>
3. Integrating with Distutils
As mentioned in the section above, generating the .mo files is commonly done when your application is built. This can be done by creating a separate build command for translations. I borrowed the code for this from the deluge project. Open up the setup.py file for prioritise and look at the Build and BuildTranslations classes. These are passed to the setup function via the cmdclass parameter. This will cause the .mo files to be compiled from the .po files at build time, but only if the .po files have changed.
The common method to include data files in the installation is to add them to the data_files parameter:
data_file = [[<target>, [<source>])]
The problem with this is that the target is different depending on the .mo file. For example, the Mexican Spanish translation needs to go to share/locale/es_MX/LC_MESSAGES/prioritise.mo. We could specify each target location manually for each .mo file, but that is a long and tedious process. Instead we can override the InstallData command class in distutils and get it to add each file to that list programatically. The class InstallData in the prioritise setup.py file was copied from pyroom.
I had a problem with distutils that I couldn’t resolve. In the documentation it states that data_files are installed relative to sys.prefix:
Each (directory, files) pair in the sequence specifies the installation directory and the files to install there. If directory is a relative path, it is interpreted relative to the installation prefix (Python’s sys.prefix for pure-Python packages, sys.exec_prefix for packages that contain extension modules).
Despite my sys.path outputting as /usr, files install to /usr/local/. This can be worked around by adding the /local/ portion in your python code, but it’s not correct. If anyone is able to resolve this, please let me know in the comments.
4. Testing translations
Now all the steps have been completed, we can check that everything works. There are two ways to do this:
The first method is to set the system locale prior to setting up the gettext module:
locale.setlocale('es_MX', (locale.LC_ALL, 'UTF8')) # setup translation support (lang_code, encoding) = locale.getlocale() LOCALE_DOMAIN = 'prioritise' LOCALE_DIR = os.path.join(sys.prefix, 'local', 'share', 'locale') gettext.bindtextdomain(LOCALE_DOMAIN, LOCALE_DIR) ...
The downside to this method is that it requires a change to your code. An alternative is to specify the locale in an environment variable when you start your application:
For some reason, this never works for me. All I get is the error:
(process:3513): Gtk-WARNING **: Locale not supported by C library. Using the fallback 'C' locale.
If anyone knows the cause and a solution for this. Please let me know.
5. Integrating with Launchpad
Integrating with launchpad is optional. You could just edit the .po files manually, or using an application such as poedit. The advantage of using launchpad is that it opens up translations to a much wider community. The launchpad documentation covers this subject well but doesn’t offer any specific examples of how to set it up. This section will show how the it’s been setup for prioritise.
Importing .pot and .po files
There are two options for importing these files. The first is a one time import, and the second is an automatic import from a branch hosted in launchpad. The latter option is used in prioritise as it ensures that translators are working with the most up-to-date information available. I set it up to import from the trunk branch.
Exporting .po files
The results of translation efforts in launchpad are updated .po files. These need to be exported to be used by your application. In prioritise I’ve set up a separate branch just for the exported translations. When I’m about to release, I can merge the translation branch with the trunk branch, thus updating my translations for that release.