DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

(gettext.info.gz) Preparing Strings

Info Catalog (gettext.info.gz) Triggering (gettext.info.gz) Sources (gettext.info.gz) Mark Keywords
 
 4.3 Preparing Translatable Strings
 ==================================
 
 Before strings can be marked for translations, they sometimes need to
 be adjusted.  Usually preparing a string for translation is done right
 before marking it, during the marking phase which is described in the
 next sections.  What you have to keep in mind while doing that is the
 following.
 
    * Decent English style.
 
    * Entire sentences.
 
    * Split at paragraphs.
 
    * Use format strings instead of string concatenation.
 
    * Avoid unusual markup and unusual control characters.
 
 Let's look at some examples of these guidelines.
 
    Translatable strings should be in good English style.  If slang
 language with abbreviations and shortcuts is used, often translators
 will not understand the message and will produce very inappropriate
 translations.
 
      "%s: is parameter\n"
 
 This is nearly untranslatable: Is the displayed item _a_ parameter or
 _the_ parameter?
 
      "No match"
 
 The ambiguity in this message makes it unintelligible: Is the program
 attempting to set something on fire? Does it mean "The given object does
 not match the template"? Does it mean "The template does not fit for any
 of the objects"?
 
    In both cases, adding more words to the message will help both the
 translator and the English speaking user.
 
    Translatable strings should be entire sentences.  It is often not
 possible to translate single verbs or adjectives in a substitutable way.
 
      printf ("File %s is %s protected", filename, rw ? "write" : "read");
 
 Most translators will not look at the source and will thus only see the
 string `"File %s is %s protected"', which is unintelligible.  Change
 this to
 
      printf (rw ? "File %s is write protected" : "File %s is read protected",
              filename);
 
 This way the translator will not only understand the message, she will
 also be able to find the appropriate grammatical construction.  A French
 translator for example translates "write protected" like "protected
 against writing".
 
    Entire sentences are also important because in many languages, the
 declination of some word in a sentence depends on the gender or the
 number (singular/plural) of another part of the sentence.  There are
 usually more interdependencies between words than in English.  The
 consequence is that asking a translator to translate two half-sentences
 and then combining these two half-sentences through dumb string
 concatenation will not work, for many languages, even though it would
 work for English.  That's why translators need to handle entire
 sentences.
 
    Often sentences don't fit into a single line.  If a sentence is
 output using two subsequent `printf' statements, like this
 
      printf ("Locale charset \"%s\" is different from\n", lcharset);
      printf ("input file charset \"%s\".\n", fcharset);
 
 the translator would have to translate two half sentences, but nothing
 in the POT file would tell her that the two half sentences belong
 together.  It is necessary to merge the two `printf' statements so that
 the translator can handle the entire sentence at once and decide at
 which place to insert a line break in the translation (if at all):
 
      printf ("Locale charset \"%s\" is different from\n\
      input file charset \"%s\".\n", lcharset, fcharset);
 
    You may now ask: how about two or more adjacent sentences? Like in
 this case:
 
      puts ("Apollo 13 scenario: Stack overflow handling failed.");
      puts ("On the next stack overflow we will crash!!!");
 
 Should these two statements merged into a single one? I would recommend
 to merge them if the two sentences are related to each other, because
 then it makes it easier for the translator to understand and translate
 both.  On the other hand, if one of the two messages is a stereotypic
 one, occurring in other places as well, you will do a favour to the
 translator by not merging the two.  (Identical messages occurring in
 several places are combined by xgettext, so the translator has to
 handle them once only.)
 
    Translatable strings should be limited to one paragraph; don't let a
 single message be longer than ten lines.  The reason is that when the
 translatable string changes, the translator is faced with the task of
 updating the entire translated string.  Maybe only a single word will
 have changed in the English string, but the translator doesn't see that
 (with the current translation tools), therefore she has to proofread
 the entire message.
 
    Many GNU programs have a `--help' output that extends over several
 screen pages.  It is a courtesy towards the translators to split such a
 message into several ones of five to ten lines each.  While doing that,
 you can also attempt to split the documented options into groups, such
 as the input options, the output options, and the informative output
 options.  This will help every user to find the option he is looking
 for.
 
    Hardcoded string concatenation is sometimes used to construct English
 strings:
 
      strcpy (s, "Replace ");
      strcat (s, object1);
      strcat (s, " with ");
      strcat (s, object2);
      strcat (s, "?");
 
 In order to present to the translator only entire sentences, and also
 because in some languages the translator might want to swap the order
 of `object1' and `object2', it is necessary to change this to use a
 format string:
 
      sprintf (s, "Replace %s with %s?", object1, object2);
 
    A similar case is compile time concatenation of strings.  The ISO C
 99 include file `<inttypes.h>' contains a macro `PRId64' that can be
 used as a formatting directive for outputting an `int64_t' integer
 through `printf'.  It expands to a constant string, usually "d" or "ld"
 or "lld" or something like this, depending on the platform.  Assume you
 have code like
 
      printf ("The amount is %0" PRId64 "\n", number);
 
 The `gettext' tools and library have special support for these
 `<inttypes.h>' macros.  You can therefore simply write
 
      printf (gettext ("The amount is %0" PRId64 "\n"), number);
 
 The PO file will contain the string "The amount is %0<PRId64>\n".  The
 translators will provide a translation containing "%0<PRId64>" as well,
 and at runtime the `gettext' function's result will contain the
 appropriate constant string, "d" or "ld" or "lld".
 
    This works only for the predefined `<inttypes.h>' macros.  If you
 have defined your own similar macros, let's say `MYPRId64', that are
 not known to `xgettext', the solution for this problem is to change the
 code like this:
 
      char buf1[100];
      sprintf (buf1, "%0" MYPRId64, number);
      printf (gettext ("The amount is %s\n"), buf1);
 
    This means, you put the platform dependent code in one statement,
 and the internationalization code in a different statement.  Note that
 a buffer length of 100 is safe, because all available hardware integer
 types are limited to 128 bits, and to print a 128 bit integer one needs
 at most 54 characters, regardless whether in decimal, octal or
 hexadecimal.
 
    All this applies to other programming languages as well.  For
 example, in Java and C#, string concatenation is very frequently used,
 because it is a compiler built-in operator.  Like in C, in Java, you
 would change
 
      System.out.println("Replace "+object1+" with "+object2+"?");
 
 into a statement involving a format string:
 
      System.out.println(
          MessageFormat.format("Replace {0} with {1}?",
                               new Object[] { object1, object2 }));
 
 Similarly, in C#, you would change
 
      Console.WriteLine("Replace "+object1+" with "+object2+"?");
 
 into a statement involving a format string:
 
      Console.WriteLine(
          String.Format("Replace {0} with {1}?", object1, object2));
 
    Unusual markup or control characters should not be used in
 translatable strings.  Translators will likely not understand the
 particular meaning of the markup or control characters.
 
    For example, if you have a convention that `|' delimits the
 left-hand and right-hand part of some GUI elements, translators will
 often not understand it without specific comments.  It might be better
 to have the translator translate the left-hand and right-hand part
 separately.
 
    Another example is the `argp' convention to use a single `\v'
 (vertical tab) control character to delimit two sections inside a
 string.  This is flawed.  Some translators may convert it to a simple
 newline, some to blank lines.  With some PO file editors it may not be
 easy to even enter a vertical tab control character.  So, you cannot be
 sure that the translation will contain a `\v' character, at the
 corresponding position.  The solution is, again, to let the translator
 translate two separate strings and combine at run-time the two
 translated strings with the `\v' required by the convention.
 
    HTML markup, however, is common enough that it's probably ok to use
 in translatable strings.  But please bear in mind that the GNU gettext
 tools don't verify that the translations are well-formed HTML.
 
Info Catalog (gettext.info.gz) Triggering (gettext.info.gz) Sources (gettext.info.gz) Mark Keywords
automatically generated byinfo2html