LaTeX/Internaționalizare
LaTeX has to be configured and used appropriately when it is used to write documents in languages other than English. This has to address three main areas:
- LaTeX needs to know how to hyphenate the language(s) to be used.
- The user needs to use language-specific typographic rules. In French for example, there is a mandatory space before each colon character (:).
- The input of special characters, especially for languages using an input system (Arab, Chinese, Japanese, Korean).
It is convenient to be able to insert language-specific special characters directly from the keyboard instead of using cumbersome coding (for example, by typing ä instead of \"{a}). This can be done by configuring input encoding properly. We will not tackle this issue here: see the Special Characters chapter.
Some languages require special fonts with the proper font encoding set. See Font encoding.
Some of the methods described in this chapter may be useful when dealing with non-English author names in bibliographies.
Here is a collection of suggestions about writing a LaTeX document in a language other than English. If you have experience in a language not listed below, please add some notes about it.
Prerequisites
modificareMost non-english language will need to input special characters very often. For a convenient writing you will need to set the input encoding and the font encoding properly.
The following configuration is optimal for many languages (most latin languages). Make sure your document is saved using the UTF-8 encoding.
\usepackage[utf8]{inputenc} \usepackage[T1]{fontenc}
For more details check Font encoding and Special Characters.
Babel
modificareThe Format:LaTeX/Package package by Johannes Braams and Javier Bezos will take care of everything (with XeTeX and LuaTeX you should consider Format:LaTeX/Package). You can load it in your preamble, providing as an argument name of the language you want to use (usually its English name, but not always):
\usepackage[language]{babel}
You should place it soon after the \documentclass
command, so that all the other packages you load afterwards will know the language you are using. Babel will automatically activate the appropriate hyphenation rules for the language you choose. If your LaTeX format does not support hyphenation in the language of your choice, babel will still work but will disable hyphenation, which has quite a negative effect on the appearance of the typeset document. Babel also specifies new commands for some languages, which simplify the input of special characters. See the sections about languages below for more information.
If you call babel with multiple languages:
\usepackage[languageA,languageB]{babel}
then the last language in the option list will be active (i.e. languageB), and you can use the command
\selectlanguage{languageA}
to change the active language. You can also add short pieces of text in another language using the command
\foreignlanguage{languageB}{Text in another language}
Babel also offers various environments for entering larger pieces of text in another language:
\begin{otherlanguage}{languageB} Text in language B. This environment switches all language-related definitions, like the language specific names for figures, tables etc. to the other language. \end{otherlanguage}
The starred version of this environment typesets the main text according to the rules of the other language, but keeps the language specific string for ancillary things like figures, in the main language of the document. The environment hyphenrules switches only the hyphenation patterns used; it can also be used to disallow hyphenation by using the language name 'nohyphenation' (but note selectlanguage* is preferred).
The babel manual provides much more information on these and many other options.
Multilingual versions
modificareIt is possible in LaTeX to typeset the content of one document in several languages and to choose upon compilation which language to output. This might be convenient to keep a consistent sectioning and formatting across the different languages. It is also useful if you make use of multiple proper nouns and other untranslated content. Using the commands above in multilingual documents can be cumbersome, and therefore Format:LaTeX/Package provides a way to define shorter names. With Format:LaTeX/Usage You can write: Format:LaTeX/Usage
Alternative choice using iflang
modificareThe current language can also be tested by using the Format:LaTeX/Package package by Heiko Oberdiek (the built-in feature from the babel package is not reliable). Here comes a simple example:
\IfLanguageName{ngerman}{Hallo}{Hello}
This allows to easily distinguish between two languages without the need of defining own commands. The babel language is changed by setting
\selectlanguage{english}
Specific languages
modificareArabic script
modificareFor languages which use the Arabic script, including Arabic, Persian, Urdu, Pashto, Kurdish, Uyghur, etc., add the following code to your preamble:
Format:LaTeX/Usage You can input text in either romanized characters or native Arabic script encodings. Use any of the following commands and environments to enter in text:
See the ArabTeX Wikipedia article for further details.
You may also use the Format:LaTeX/Package package within Babel to typeset Arabic and Persian
You may also copy and paste from PDF files produced with Arabi thanks to the support of the Format:LaTeX/Package package. You may use Arabi with LyX, or with tex4ht to produce HTML.
Armenian
modificareThe Armenian script uses its own characters, which will require you to install a text editor that supports Unicode and will allow you to enter UTF-8 text, such as Texmaker or WinEdt. These text editors should then be configured to compile using XeLaTeX.
Once the text editor is set up to compile with XeLaTeX, the Format:LaTeX/Package package can be used to write in Armenian:
or
The Sylfaen font lacks italic and bold, but DejaVu Serif supports them.
See Armenian Wikibooks for further details, especially on how to configure the Unicode supporting text editors to compile with XeLaTeX.
Cyrillic script
modificareVersion 3.7h of Format:LaTeX/Package includes support for the T2* encodings and for typesetting Bulgarian, Russian and Ukrainian texts using Cyrillic letters[1]. Support for Cyrillic is based on standard LaTeX mechanisms plus the Format:LaTeX/Package and Format:LaTeX/Package packages. AMS-LaTeX packages should be loaded before Format:LaTeX/Package and Format:LaTeX/Package. If you are going to use Cyrillics in mathmode, you also need to load Format:LaTeX/Package package before Format:LaTeX/Package:
Generally, Format:LaTeX/Package will automatically choose the default font encoding, for the above three languages this is T2A. However, documents are not restricted to a single font encoding. For multilingual documents using Cyrillic and Latin-based languages it makes sense to include Latin font encoding explicitly. Babel will take care of switching to the appropriate font encoding when a different language is selected within the document.
On modern operating systems it is beneficial to use Unicode (Format:LaTeX/Parameter or Format:LaTeX/Parameter) instead of KOI8-RU (Format:LaTeX/Parameter) as an input encoding for Cyrillic text.
In addition to enabling hyphenations, translating automatically generated text strings, and activating some language specific typographic rules (like \frenchspacing
), Format:LaTeX/Package provides some commands allowing typesetting according to the standards of Bulgarian, Russian, or Ukrainian languages.
For all three languages, language specific punctuation is provided: the Cyrillic dash for the text (it is little narrower than Latin dash and surrounded by tiny spaces), a dash for direct speech, quotes, and commands to facilitate hyphenation:
Key combination | Action |
---|---|
"| |
No ligature at this position. |
"- |
Explicit hyphen sign, allowing hyphenation in the rest of the word. |
"--- |
Cyrillic emdash in plain text. |
"--~ |
Cyrillic emdash in compound names (surnames). |
"--* |
Cyrillic emdash for denoting direct speech. |
"" |
Similar to "-, but it produces no hyphen sign (used for compound words with hyphen, e.g. x-""y or some other signs as “disable/enable”).
|
"~ |
Compound word mark without a breakpoint. |
"= |
Compound word mark with a breakpoint, allowing hyphenation in the composing words. |
", |
Thinspace for initials with a breakpoint in a following surname. |
"‘ |
German opening double quote (,,). |
"’ |
German closing double quote (“). |
"< |
French opening double quote (<<). |
"> |
French closing double quote (>>). |
The Russian and Ukrainian options of Format:LaTeX/Package define the commands
Format:LaTeX/Usage
which act like \Alph
and \alph
(commands for turning counters into letters, e.g. a, b, c...
), but produce capital and small letters of Russian or Ukrainian alphabets (whichever is the active language of the document).
The Bulgarian option of Format:LaTeX/Package provides the commands
Format:LaTeX/Usage
which make \Alph
and \alph
produce letters of either Bulgarian or Latin (English) alphabets. The default behaviour of \Alph
and \alph
for the Bulgarian language option is to produce letters from the Bulgarian alphabet.
See the Bulgarian translation of "The Not So Short Introduction to LaTeX" [2] for a method to type Cyrillic letters directly from the keyboard using a different distribution.
Chinese
modificareOne possible Chinese support is made available thanks to the Format:LaTeX/Package package collection. If you are using a package manager or a portage tree, the CJK collection is usually in a separate package because of its size (mainly due to fonts).
Make sure your document is saved using the UTF-8 character encoding. See Special Characters for more details. Put the parts where you want to write chinese characters in a Format:LaTeX/Environment environment.
The last argument specifies the font. It must fit the desired language, since fonts are different for Chinese, Japanese and Korean. Possible choices for Chinese include:
- gbsn (简体宋体, simplified Chinese)
- gkai (简体楷体, simplified Chinese)
- bsmi (繁体细上海宋体, traditional Chinese)
- bkai (繁体标楷体, traditional Chinese)
Czech
modificareCzech is fine using
Format:LaTeX/Usage
UTF-8 allows you to have „czech quotation marks“ directly in your text. Otherwise, there are macros \clqq and \crqq to produce left and right quote. You can place quotated text inside \uv
.
Finnish
modificareFinnish language hyphenation is enabled with: Format:LaTeX/Usage This will also automatically change document language (section names, etc.) to Finnish.
French
modificareYou can load French language support with the following command:
There are multiple options for typesetting French documents, depending on the flavor of French: Format:LaTeX/Parameter, Format:LaTeX/Parameter, and Format:LaTeX/Parameter for Parisian French, and Format:LaTeX/Parameter and Format:LaTeX/Parameter for new-world French. If you do not know or do not really care, we would recommend using frenchb
.
All enable French hyphenation, if you have configured your LaTeX system accordingly. All of these also change all automatic text into French: \chapter
prints Chapitre, \today
prints the current date in French and so on. A set of new commands also becomes available, which allows you to write French input files more easily. Check out the following table for inspiration:
input code | rendered output |
---|---|
\og guillemets \fg{} | « guillemets » |
M\up{me}, D\up{r} | Mme, Dr |
1\ier{}, 1\iere{}, 1\ieres{} | 1er, 1re, 1res |
2\ieme{} 4\iemes{} | 2e 4es |
\No 1, \no 2 | N° 1, n° 2 |
20~\degres C, 45\degres | 20 °C, 45° |
M. \bsc{Durand} | M. Durand |
\nombre{1234,56789} | 1 234,567 89 |
You may want to typeset guillemets and other French characters directly if your keyboard have them. Running Xorg (*BSD and GNU/Linux), you may want to use the oss variant which features some nice shortcuts, like
Key combination | Character |
---|---|
Alt Gr + w | « |
Alt Gr + x | » |
Alt Gr + Shift + é | É |
Alt Gr + Shift + è | È |
Alt Gr + Shift + ç | Ç |
You will need the T1 font encoding for guillemets to print properly.
For the degree character you will get an error like
! Package inputenc Error: Unicode char \u8:° not set up for use with LaTeX.
The Format:LaTeX/Package package will fix it for you.
The great advantage of Babel for French is that it will handle some elements of French typography for you, especially non-breaking spaces before all two-parts punctuation marks. So now you can write:
The non-breaking space before the euro symbol is still necessary because currency symbols and other units or not supported in general (that's not specific to French).
You can use the Format:LaTeX/Package package along Babel. It will let you print numbers the French way. Format:LaTeX/Example
You will also notice that the layout of lists changes when switching to the French language. This is customizable using the \frenchbsetup
command. For more information on what the Format:LaTeX/Parameter option of Format:LaTeX/Package does and how you can customize its behavior, run LaTeX on file frenchb.dtx and read the produced file frenchb.pdf or frenchb.dvi. You can get the PDF version on CTAN.
German
modificareYou can load German language support using either one of the two following commands.
For traditional ("old") German orthography use Format:LaTeX/Usage
or for reform ("new") German orthography use
This enables German hyphenation, if you have configured your LaTeX system accordingly. It also changes all automatic text into German, e.g. “Chapter” becomes “Kapitel”. A set of new commands also becomes available, which allows you to write German input files more quickly even when you don't use the inputenc package. Check out the table below for inspiration. With inputenc, all this becomes moot, but your text also is locked in a particular encoding world.
"A "O "U | Ä Ö Ü |
"a "o "u "s | ä ö ü ß |
"` or \glqq | „ |
"' or \grqq | “ |
\glq \grq | |
"< or \flqq | « |
"> or \frqq | » |
\flq \frq | ‹ › |
\dq | " |
In German books you sometimes find French quotation marks («guillemets»). German typesetters, however, use them differently. A quote in a German book would look like »this«. In the German speaking part of Switzerland, typesetters use «guillemets» the same way the French do. A major problem arises from the use of commands like \flq
: If you use the OT1 font encoding (which is the default) the guillemets will look like the math symbol " ", which turns a typesetter's stomach. T1 encoded fonts, on the other hand, do contain the required symbols. So if you are using this type of quote, make sure you use the T1 encoding.
Decimal numbers usually have to be written like 0{,}5 (not just 0,5). Packages like ziffer enable input like 0,5.
Greek
modificareThis is the preamble you need to write in the Greek language. Note the particular input encoding.
This preamble enables hyphenation and changes all automatic text to Greek. A set of new commands also becomes available, which allows you to write Greek input files more easily. In order to temporarily switch to English and vice versa, one can use the commands \textlatin{english text}
and \textgreek{greek text}
that both take one argument which is then typeset using the requested font encoding. Otherwise you can use the command \selectlanguage{...}
described in a previous section. Use \euro
for the Euro symbol.
Hungarian
modificareUse the following lines: Format:LaTeX/Usage
More information in hungarian.
Icelandic and Faroese
modificareThe following lines can be added to write Icelandic text:
This changes text like Part into Hluti. It makes additional commands available:
"` or \glqq | „ |
\grqq | “ |
\TH | Þ |
\th | þ |
\DH | Ð |
\dh | ð |
To make special characters such as Þ and Æ become available just add:
The default LATEX font encoding is OT1, but it contains only the 128 characters. The T1 encoding contains letters and punctuation characters for most of the European languages using Latin script.
Italian
modificareItalian is well supported by LaTeX. Just add Format:LaTeX/Usage at the beginning of your document and the output of all the commands will be translated properly.
Japanese
modificareThere is a variant of TeX intended for Japanese named pTeX, which supports vertical typesetting.
Another possible way to write in japanese is to use Lualatex and the luatex-ja package. An example from the Luatexja documentation : Format:LaTeX/Usage
You can also use capabilities provided by the Fontspec package and those provided by Luatexja-fontspec to declare the font you want to use in your paper. Let us take an example : Format:LaTeX/Usage Use UTF-8 as your encoding. In case you don't know how to do this, take a look at Texmaker, a LaTeX editor which use UTF-8 by default.
Another (but old) possible Japanese support is made available thanks to the Format:LaTeX/Package package collection. If you are using a package manager or a portage tree, the CJK collection is usually in a separate package because of its size (mainly due to fonts).
Make sure your document is saved using the UTF-8 character encoding. See Special Characters for more details. Put the parts where you want to write japanese characters in a Format:LaTeX/Environment environment. Format:LaTeX/Usage The last argument specifies the font. It must fit the desired language, since fonts are different for Chinese, Japanese and Korean. Format:LaTeX/Parameter is an example for Japanese.
Korean
modificareThe two most widely used encodings for Korean text files are EUC-KR and its upward compatible extension used in Korean MS-Windows, CP949/Windows-949/UHC. In these encodings each US-ASCII character represents its normal ASCII character similar to other ASCII compatible encodings such as ISO-8859-x, EUC-JP, Big5, or Shift_JIS. On the other hand, Hangul syllables, Hanjas (Chinese characters as used in Korea), Hangul Jamos, Hiraganas, Katakanas, Greek and Cyrillic characters and other symbols and letters drawn from KS X 1001 are represented by two consecutive octets. The first has its MSB set. Until the mid-1990's, it took a considerable amount of time and effort to set up a Korean-capable environment under a non-localized (non-Korean) operating system. You can skim through the now much-outdated http://jshin.net/faq to get a glimpse of what it was like to use Korean under non-Korean OS in mid-1990's.
TeX and LaTeX were originally written for scripts with no more than 256 characters in their alphabet. To make them work for languages with considerably more characters such as Korean or Chinese, a subfont mechanism was developed. It divides a single CJK font with thousands or tens of thousands of glyphs into a set of subfonts with 256 glyphs each.
For Korean, there are three widely used packages.
- HLATEX by UN Koaunghi
- hLATEXp by CHA Jaechoon
- the CJK package by Werner Lemberg
HLATEX and hLATEXp are specific to Korean and provide Korean localization on top of the font support. They both can process Korean input text files encoded in EUC-KR. HLATEX can even process input files encoded in CP949/Windows-949/UHC and UTF-8 when used along with Λ, Ω.
The CJK package is not specific to Korean. It can process input files in UTF-8 as well as in various CJK encodings including EUC-KR and CP949/Windows-949/UHC, it can be used to typeset documents with multilingual content (especially Chinese, Japanese and Korean). The CJK package has no Korean localization such as the one offered by HLATEX and it does not come with as many special Korean fonts as HLATEX.
The ultimate purpose of using typesetting programs like TeX and LaTeX is to get documents typeset in an aesthetically satisfying way. Arguably the most important element in typesetting is a set of welldesigned fonts. The HLATEX distribution includes UHC PostScript fonts of 10 different families and Munhwabu fonts (TrueType) of 5 different families. The CJK package works with a set of fonts used by earlier versions of HLATEX and it can use Bitstream's cyberbit True-Type font.
To use the HLATEX package for typesetting your Korean text, put the following declaration into the preamble of your document: Format:LaTeX/Usage This command turns the Korean localization on. The headings of chapters, sections, subsections, table of content and table of figures are all translated into Korean and the formatting of the document is changed to follow Korean conventions. The package also provides automatic particle selection. In Korean, there are pairs of post-fix particles grammatically equivalent but different in form. Which of any given pair is correct depends on whether the preceding syllable ends with a vowel or a consonant. (It is a bit more complex than this, but this should give you a good picture.) Native Korean speakers have no problem picking the right particle, but it cannot be determined which particle to use for references and other automatic text that will change while you edit the document. It takes a painstaking effort to place appropriate particles manually every time you add/remove references or simply shuffle parts of your document around. HLATEX relieves its users from this boring and error-prone process.
In case you don't need Korean localization features but just want to typeset Korean text, you can put the following line in the preamble, instead. Format:LaTeX/Usage For more details on typesetting Korean with HLATEX, refer to the HLATEX Guide. Check out the web site of the Korean TeX User Group (KTUG).
Persian script
modificareFor Persian language, there is a dedicated package called XePersian which uses XeLaTeX as the typesetting engine. Just add the following code to your preamble:
Moreover, Arabic script can be used to type Persian as illustrated in the corresponding section.
Polish
modificareIf you plan to use Polish in your UTF-8 encoded document, use the following code Format:LaTeX/Usage
The above code merely allows to use Polish letters and translates the automatic text to Polish, so that "chapter" becomes "rozdział". There are a few additional things one must remember about.
Connectives
modificarePolish has many single letter connectives: "a", "o", "w", "i", "u", "z", etc., grammar and typography rules don't allow for them to end a printed line. To ensure that LaTeX won't set them as last letter in the line, you have to use non breakable space:
Numerals
modificareAccording to Polish grammar rules, you have to put dots after numerals in chapter, section, subsection, etc. headers.
This is achieved by redefining few LaTeX macros.
For books: Format:LaTeX/Usage
For articles: Format:LaTeX/Usage
Alternatively you can use dedicated document classes:
- the Format:LaTeX/Environment class instead of Format:LaTeX/Environment,
- Format:LaTeX/Environment instead of Format:LaTeX/Environment
- and Format:LaTeX/Environment instead of Format:LaTeX/Environment.
Those classes have much more European typography settings but do not require the use of Polish babel settings or character encoding.
Simple usage: Format:LaTeX/Usage
Full documentation for those classes is available at http://web.archive.org/web/20040609034031/http://www.ci.pwr.wroc.pl/~pmazur/LaTeX/mwclsdoc.pdf (Polish).
Indentation
modificareIt may be customary (depending on publisher) to indent the first paragraph in sections and chapters: Format:LaTeX/Usage
Hyphenation and typography
modificareIt's much more frowned upon to set pages with hyphenation between pages than it is customary in American typesetting.
To adjust penalties for hyphenation spanning pages, use this command: Format:LaTeX/Usage
To adjust penalties for leaving widows and orphans (clubs in TeX nomenclature) use those commands: Format:LaTeX/Usage
Commas in math
modificareAccording to Polish typography rules, fractional parts of numbers should be delimited by a comma, not a dot. To make LaTeX not insert additional space in math mode after a comma (unless there is a space after the comma), use the Format:LaTeX/Package package.
Unfortunately, it is partially incompatible with the Format:LaTeX/Package package. One needs to either use dots in columns with numerical data in the source file and make Format:LaTeX/Package switch them to commas for display or define the column as follows: Format:LaTeX/Usage
The alternative is to use the Format:LaTeX/Package package, but it is much less convenient.
Further information
modificareRefer the Słownik Ortograficzny (in Polish) for additional information on Polish grammar and typography rules.
Good extract is available at Zasady Typograficzne Składania Tekstu (in Polish).
Portuguese
modificareAdd the following code to your preamble:
You can substitute the language for brazilian portuguese by choosing Format:LaTeX/Parameter.
Slovak
modificareBasic settings are fine when left the same as Czech, but Slovak needs special signs for 'ď', 'ť', 'ľ'. To be able to type them from keyboard use the following settings: Format:LaTeX/Usage
Spanish
modificareInclude the appropriate Babel option: Format:LaTeX/Usage
The trick is that Spanish has several options and commands to control the layout. The options may be loaded either at the call to Babel, or before, by defining the command \spanishoptions
. Therefore, the following commands are roughly equivalent:
On average, the former syntax should be preferred, as the latter is a deviation from standard Babel behavior, and thus may break other programs (LyX, latex2rtf) interacting with LaTeX.
Spanish also defines shorthands for the dot and << >> so that they are used as logical markup: the former is used as decimal marker in math mode, and the output is typically either a comma or a dot; the latter is used for quoted text, and the output is typically either «» or “”. This allows different typographical conventions with the same input, as preferences may be quite different from, say, Spain and Mexico.
Two particularly useful options are Format:LaTeX/Parameter: some packages and classes are known to collide with Spanish in the way they handle active characters, and these options disable the internal workings of Spanish to allow you to overcome these common pitfalls. Moreover, these options may simplify the way LyX customizes some features of the Spanish layout from inside the GUI.
The options Format:LaTeX/Parameter provide support for local custom in Mexico: the former using decimal dot, as customary, and the latter allowing decimal comma, as required by the Mexican Official Norm (NOM) of the Department of Economy for labels in foods and goods. More localizations are in the making.
The other commands modify the spanish layout after loading Babel. Two particularly useful commands are \spanishoperators
and \spanishdeactivate
.
The macro \spanishoperators{<list of operators>}{
contains a list of spanish mathematical operators, and may be redefined at will. For instance, the command
Format:LaTeX/Usage
only defines Format:LaTeX/Parameter, overriding all other definitions; the command \let\spanishoperators\relax
disables them all. This command supports accented or spaced operators: the \acute{<letter>}
command puts an accent, and the \,
command adds a small space.
For instance, the following operators are defined by default.
Finally, the macro \spanishdeactivate{<list of characters>}
disables some active characters, to keep you out of trouble if they are redefined by other packages. The candidates for deactivation are the set {<>."'}. Please, beware that some option preempt the availability of some active characters. In particular, you should not combine the Format:LaTeX/Parameter option with \spanishdeactivate{<>}
, or the Format:LaTeX/Parameter with \spanishdeactivate{<>."}
.
Please check the documentation for Babel or spanish.dtx for further details.
Tibetan
modificareOne option to use Tibetan script in LaTeX is to add Format:LaTeX/Usage to your preamble and use a slightly modified Wylie transliteration for input. Refer to the excellent package documentation for details. More information can be found on [1]
References
modificare- ↑ The Not So Short Introduction to LaTeX, 2.5.6 Support for Cyrillic, Maksym Polyakov
- ↑ The Not So Short Introduction to LaTeX, Bulgarian translation