[tex-hyphen] Hyphenation patterns for Belarusian

Discussion:

Maksim Salau

2016-08-28 00:29:36 UTC

Hello,

Recently I've found hyphenation patterns in the LibreOffice extension [1]. According to discussions in this list, it is possible to use patterns from (Libre|Open)Office since hyphenation engine is the same (or almost the same).
Actions seems be pretty straightforward:
1. Put patterns to tex/generic/hyph-utf8/patterns/tex/hyph-<lang>.tex
2. Describe the language in source/generic/hyph-utf8/languages.rb
3. Adjust source/generic/hyph-utf8/generate-pattern-loaders.rb if patterns are UTF-8 only
4. Generate loader & etc.
5. Install and use.

My steps 1-4 can be found on GitHub [2] (last 2 commits).
I have troubles with step 5 :(

I tried to create a deb file similar to texlive-lang-cyrillic_2014.20141024-1_all.deb:
* /etc/texmf/hyphen.d/10texlive-lang-belarusian.cnf with specification of the language
name=belarusian file=loadhyph-be.tex patterns=hyph-be.pat.txt lefthyphenmin=2 righthyphenmin=2 exceptions=
* empty /etc/texmf/fmt.d/10texlive-lang-belarusian.cnf
* patterns in /usr/share/texlive/texmf-dist/tex/generic/hyph-utf8
* '10texlive-lang-belarusian' in /var/lib/tex-common/hyphen-cnf/texlive-lang-belarusian.list
and /var/lib/tex-common/fmtutil-cnf/texlive-lang-belarusian.list

language.dat is regenerated during installation, but fmtutil-sys is not happy. Its complains:

$ sudo fmtutil-sys --all
fmtutil: running `luatex -ini -jobname=luatex -progname=luatex luatex.ini' ...
This is LuaTeX, Version beta-0.79.1 (TeX Live 2015/dev/Debian) (rev 4971) (INITEX)
restricted \write18 enabled.
(/usr/share/texlive/texmf-dist/tex/plain/config/luatex.ini
(/usr/share/texlive/texmf-dist/tex/generic/config/luatexiniconfig.tex)
(/usr/share/texlive/texmf-dist/tex/generic/config/luatex-unicode-letters.tex
loading Unicode properties)
(/usr/share/texlive/texmf-dist/tex/plain/config/pdfetex.ini
(/usr/share/texlive/texmf-dist/tex/generic/config/pdftexconfig.tex
(/var/lib/texmf/tex/generic/config/pdftexconfig-paper.tex))
(/usr/share/texlive/texmf-dist/tex/luatex/hyph-utf8/etex.src
(/usr/share/texlive/texmf-dist/tex/plain/base/plain.tex
Preloading the plain format: codes, registers, parameters, fonts, more fonts,
macros, math definitions, output routines, hyphenation
(/usr/share/texlive/texmf-dist/tex/generic/hyphen/hyphen.tex
[skipping from \patterns to end-of-file...]))
(/usr/share/texlive/texmf-dist/tex/plain/etex/etexdefs.lib
Skipping module "grouptypes"; Loading module "interactionmodes";
Skipping module "nodetypes"; Skipping module "iftypes";)
(/var/lib/texmf/tex/generic/config/language.def
(/usr/share/texlive/texmf-dist/tex/generic/hyphen/hyphen.tex)
(/usr/share/texlive/texmf-dist/tex/generic/hyph-utf8/loadhyph/loadhyph-be.tex
UTF-8 Belarusian hyphenation patterns
(/usr/share/texlive/texmf-dist/tex/generic/hyph-utf8/patterns/tex/hyph-be.tex
! Conflicting pattern ignored.
l.6024 }

?
! Emergency stop.
l.6024 }

! ==> Fatal error occurred, no output PDF file produced!
Transcript written on luatex.log.

Is there any way to make it more verbose? Or debug the issue somehow?

Also, please, clarify for me usage of quotes. There are 3 symbols used in hyph-be.tex: ' ` ’
I suspect this can confuse the engine, since generate-plain-patterns.rb checks only the first one and convert it to the third one to populate hyph-quote-<lang>.tex
What is the official position on quotes? Should one use only ' and *TeX will do the rest, or other symbols are allowed too?

And the third moment with these patterns is T2A encoding. The U+2019 symbol (the third quote from the list above) make conversion impossible, since the symbol is not mapped in converter. I tried to enable it in t2a.dat and regenerate converter, but it fails with message: The encoding t2a uses more than two bytes to encode characters.

Thanks in advance,
Maksim Salau.

[1] http://extensions.libreoffice.org/extension-center/belarusian-dictionary-spelling-hyphenation-official-orthography-2008
[2] https://github.com/msalau/hyph-utf8-belarusian/tree/belarusian

Arthur Reutenauer

2016-08-28 14:12:48 UTC