Getttext PO/POT format explained#

Gettext is internationalization (i18n) mechanism and library used not only in many software products, programming languages but also for translating Sphinx documentations. Gettext extracts strings marked “to-be-localized” from a source code (or a document) to plain text file with .po or .pot file extension. Let’s look at its format.

Before continuing, you might want to read gettext-translation first. Except the process, the post explains the difference between PO and POT files.

See also gettext manual for exhausive PO file reference.

PO/POT format#

Basically, po/pot is a list of original string - translation couples with comments.

original string is on the line starting with msgid
translation on the next line starting with msgstr
comments starts with #
very first couple with blank msgid is a header

# Menu translations from English to Czech

msgid ""
msgstr ""
"Project-Id-Version:  Documatt\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2021-10-15 15:32+0200\n"
"PO-Revision-Date: 2021-04-28 14:31+0000\n"
"Last-Translator: Matt <matt@documatt.com>\n"
"Language: cs\n"
"Plural-Forms: nplurals=2; plural=n != 1\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"

# terminated with any result, not neccessarily succeeded
msgid "Mark as completed"
msgstr "Vyřídit jako hotovou"

#, fuzzy
msgid "Mark as spam"
msgstr "Označit jako spam"

msgid and msgstr#

Unlike some i18n libraries, gettext uses as ID actual original string instead of key. For example, ID is convinient “Do you want to save the file?” instead of “SaveFileQuestionDialog”. msgids are automatically extracted from source code or Sphinx document.

Actual translation is on the msgstr line. In template .pot file, it is empty (blank string).

Couples of original-translation are separated with the blank line:

msgid "Mark as completed"
msgstr "Vyřídit jako hotovou"

msgid "Mark as spam"
msgstr "Označit jako spam"

Long strings may be split on multiple lines with "":

msgid
"You can see the URL of the page where the visitor is and the "
"information about them is available, such as when the chat started, what "
"project the conversation is about, etc. This corresponds to the editor "
"layout structure into the following parts:"
msgstr ""

However, I highly recommend you to use one long line. It means much easier Git diffs. You can easily wrap long lines in your text editor or viewer.

msgid "You can see the URL of the page where the visitor is and the information about them is available, such as when the chat started, what project the conversation is about, etc. This corresponds to the editor layout structure into the following parts:"
msgstr ""

Escaping#

Both orinal and translation has to be in double quote strings. The escape character is \. The " become \", newline is \n, \ will be \\.

msgid "Hello gettext!\n\nBe careful to escape double quotes \" in PO."
msgstr ""

Comments#

Comments after # in PO/POT play a very important role.

Comments related to a msgid-msgstr couple must placed right before it.

There is multiple commen types. Translator comments are the only comments written by the human. The remaining comment types are automatically created by the gettext tools.

#  translator-comments
#. extracted-comments
#: reference…
#, flag…
msgid untranslated-string
msgstr translated-string

Translator comments (`#`)#

Comments written by the translator are after # followed by the single space (the space is important!).

Translator comments are usually aimed at himself or other translator. He might explain used grammar, style or term, etc. Translator comments are often in a target language.

# comment in target language for other translators
msgid "Mark as completed"
msgstr "Vyřídit jako hotovou"

Extracted comments (`#.`)#

Comments beginning with #. are given by the programmer and are aimed at the translators. They are called extracted because gettext tools extract them from the source code.

Reference comments (`#:`)#

Comments beginning with #: are reference to a location in sources where message has been extracted in format {<path_to_file>}:{<line_number>}. If a translator is unsure, he might want to explore message surrounding in the source. If original string appears at multiple locations, they are space separated.

#: src/msgcmp.c:128
msgid "Mark as spam"
msgstr "Označit jako spam"

#: src/msgcmp.c:338 src/po-lex.c:699
msgid "Mark as completed"
msgstr "Vyřídit jako hotovou"

Flag comments (`#,`)#

Comment beginning with #, is a list of flags controlling message processing.

The most common flag is fuzzy. Fuzzy is usually set automatically to the message by the gettext tools if a original is similar (but not the same) to another already translated message. Fuzzy thus means “not sure with the translation, needs a human check”. Translator fix a translation or remove fuzzy flag if a translation was correct.

msgid "Mark as completed"
msgstr "Vyřídit jako hotovou"

#, fuzzy
msgid "Mark with completed"
msgstr "Vyřídit jako hotovou"

Caution

Tools usually warn about fuzzy translations. Fuzzy translations are not displayed to the end-user. If a translation is fuzzy, the original string is displayed instead.

Comments

comments powered by Disqus

Tech writer at work blog