Discussion:
[tex-hyphen] hyphenation for Bulgarian language
Georgi Boshnakov
2017-10-20 08:05:20 UTC
Permalink
Dear All,

This is to confirm that I support and recommend changing the TeX hyphenation pattern for the Bulgarian language (provided by me circa 1994) with the ones prepared by Anton Zinoviev (cc-ed). The latter patterns are more complete and reflect changes in the hyphenation rules in recent years.

Kind regards,
Georgi

--
Dr Georgi Boshnakov tel: (+44) (0)161 306 3684
School of Mathematics fax: (+44) (0)161 306 3669
Alan Turing Building 1.125
The University of Manchester email: ***@manchester.ac.uk<mailto:***@manchester.ac.uk>
Oxford Road
Manchester M13 9PL
UK
Anton Zinoviev
2017-10-21 20:57:22 UTC
Permalink
[I am sending a CC of this message to Georgi Boshnakov and Stoyan Dimitrov]

Hello everybody,

As far as I understand, in order to accept new hyphenation patterns
you need:

1. to have a permission to distribute them with free license;
2. to have them encoded in UTF-8;
3. since the patterns are generated algorithmically, to have the
script which generates them;
4. to have an analysis of the differences between the new patterns and
the existing patterns by Mr. Boshnakov;
5. to have the opinion of Mr. Boshnakov about the new patterns.

I believe we were able to satisfy all these requirements. You already
got a message by Mr. Boshnakov. And at the end of this message you
will find url adresses that you can use in order to download a shell
script `hyph-bg.sh` with a permissible license which can be used in
the following ways.

hyph-bg.sh --help

This will print a short usage instructions.

hyph-bg.sh --doc-txt

This will generate (on the standard output) a text about the Bulgarian
hyphenation, including an analysis of the differences between the
Bulgarian hyphenation patterns by Mr. Boshnakov and the proposed new
hyphenation patterns.

If the system you use has pandoc installed, then you can also use one
of the following options in order to have an easier to read document:

hyph-bg.sh --doc-html
hyph-bg.sh --doc-latex

In order to generate Bulgarian hyphenation patterns for TeX, the
following options should be used:

hyph-bg.sh --safe-morphology --standalone-tex

Both the left and the right hyphen mins are 2.

One important difference between the line-breaking algorithm used by
TeX and the line-breaking algorithm used by most other software is
that the algorithm of TeX is smart and can produce perfect results
even with fewer hyphenation possibilities. Because of this, with TeX
it makes sense to use hyphenation patterns which separate the words
only in the preferred positions. On the other hand, with software
using dumb line-breaking algorithm, it is perhaps preferable to use
hyphenation patterns which provide more hyphenation possibilities.

If it is possible to provide two different sets of the Bulgarian
hyphenation patterns, then the other software (not TeX!) should use
patterns produced in the following way:

hyph-bg.sh --no-hyphen-mins

(The option --no-hyphen-mins is because the current versions of Mozilla
ignore the hyphen mins in words containing a dash.)

The following are url addresses you can use in order to download the
script `hyph-bg.sh` and the results produced by it.

The script itself:

http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.sh

Documentation about the Bulgarian hyphenation:

http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.html

The same in format PDF:

http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.pdf

Hyphenation patterns for TeX:

http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.tex

Hyphenation patterns for other software:

http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.other

Regards,
Anton Zinoviev
Arthur Reutenauer
2017-10-22 21:39:50 UTC
Permalink
Dear Anton Zinoviev,

Thank you very much for all your efforts, we’ll get back to you very
soon. I do unfortunately not have time to look into your work right now.

Best,

Arthur
Mojca Miklavec
2017-10-23 01:31:02 UTC
Permalink
Dear Anton,

Amazing work! I would suggest to publish the article in TUGboat.

I'm currently travelling without my computer, but will try to include your
patterns ASAP*. I didn't yet check which licence you used, but would MIT be
acceptable (for the patterns at least) if that's not what you used already?

Would users need to switch between different versions (years when certain
rules applied)? If so, we could ship multiple versions, but it would make
sense to register new subtags at IANA.

Mojca

* We are also switching to YAML, so maybe some additional delay may occur,
hopefully not too much.
Anton Zinoviev
2017-10-23 08:08:54 UTC
Permalink
I didn't yet check which licence you used, but would MIT be acceptable
(for the patterns at least) if that's not what you used already?
The licence is similar in terms but shorter in words to the X11 version
of the MIT licences:

This software may be used, modified, copied, distributed, and sold,
both in source and binary form provided that the above copyright
notice and these terms are retained. The name of the author may not
be used to endorse or promote products derived from this software
without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT
SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE.
Would users need to switch between different versions (years when certain
rules applied)?
No. These versions exist for the benefit of future development (not by
me, but by other people who have to learn).
* We are also switching to YAML, so maybe some additional delay may occur,
hopefully not too much.
Don't worry about delays.

Anton Zinoviev
Стоян Димитров
2017-11-30 14:07:56 UTC
Permalink
Hello everyone,

I want to inform you that the script is now moved to it's new home - the
repository of BG Office project [1]. Please use this instead and treat
it as the upstream version.

In this regard and as in our private conversation, I'll kindly ask mr.
Zinoviev to use BG Office repository for further development of the script.

Here are the direct links to the script and the resulting TeX file

* https://sourceforge.net/p/bgoffice/code/HEAD/tree/trunk/hyph-bg/hyph-bg.sh?format=raw
* https://sourceforge.net/p/bgoffice/code/HEAD/tree/trunk/hyph-bg/hyph-bg.tex?format=raw

P.S.
The first change to the script landed to the repo so the linked files
below are already outdated.

___
[1] https://svn.code.sf.net/p/bgoffice/code/
Post by Anton Zinoviev
[I am sending a CC of this message to Georgi Boshnakov and Stoyan Dimitrov]
Hello everybody,
As far as I understand, in order to accept new hyphenation patterns
1. to have a permission to distribute them with free license;
2. to have them encoded in UTF-8;
3. since the patterns are generated algorithmically, to have the
script which generates them;
4. to have an analysis of the differences between the new patterns and
the existing patterns by Mr. Boshnakov;
5. to have the opinion of Mr. Boshnakov about the new patterns.
I believe we were able to satisfy all these requirements. You already
got a message by Mr. Boshnakov. And at the end of this message you
will find url adresses that you can use in order to download a shell
script `hyph-bg.sh` with a permissible license which can be used in
the following ways.
hyph-bg.sh --help
This will print a short usage instructions.
hyph-bg.sh --doc-txt
This will generate (on the standard output) a text about the Bulgarian
hyphenation, including an analysis of the differences between the
Bulgarian hyphenation patterns by Mr. Boshnakov and the proposed new
hyphenation patterns.
If the system you use has pandoc installed, then you can also use one
hyph-bg.sh --doc-html
hyph-bg.sh --doc-latex
In order to generate Bulgarian hyphenation patterns for TeX, the
hyph-bg.sh --safe-morphology --standalone-tex
Both the left and the right hyphen mins are 2.
One important difference between the line-breaking algorithm used by
TeX and the line-breaking algorithm used by most other software is
that the algorithm of TeX is smart and can produce perfect results
even with fewer hyphenation possibilities. Because of this, with TeX
it makes sense to use hyphenation patterns which separate the words
only in the preferred positions. On the other hand, with software
using dumb line-breaking algorithm, it is perhaps preferable to use
hyphenation patterns which provide more hyphenation possibilities.
If it is possible to provide two different sets of the Bulgarian
hyphenation patterns, then the other software (not TeX!) should use
hyph-bg.sh --no-hyphen-mins
(The option --no-hyphen-mins is because the current versions of Mozilla
ignore the hyphen mins in words containing a dash.)
The following are url addresses you can use in order to download the
script `hyph-bg.sh` and the results produced by it.
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.sh
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.html
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.pdf
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.tex
http://logic.fmi.uni-sofia.bg/hyphenation/hyph-bg.other
Regards,
Anton Zinoviev
--
С.
Стоян Димитров
2018-09-14 13:05:32 UTC
Permalink
Hello,

Can I help the migration process somehow?
Post by Georgi Boshnakov
Dear All,
This is to confirm that I support and recommend changing the TeX
hyphenation pattern for the Bulgarian language (provided by me circa
1994) with the ones prepared by Anton Zinoviev (cc-ed). The latter
patterns are more complete and reflect changes in the hyphenation
rules in recent years.
Kind regards,
Georgi
--
Dr Georgi Boshnakov               tel: (+44) (0)161 306 3684
School of Mathematics             fax: (+44) (0)161 306 3669
Alan Turing Building 1.125
Oxford Road
Manchester M13 9PL
UK
Arthur Reutenauer
2018-09-14 13:15:33 UTC
Permalink
Hi Stoyan,

We changed the patterns on CTAN in April or May this year, they are
now in TeX Live (including the TeX Live 2018 DVD). I sent you an email
about that. Is there anything wrong with the patterns currently in TeX
Live?

Best,

Arthur
Arthur Reutenauer
2018-09-14 13:23:43 UTC
Permalink
Post by Arthur Reutenauer
I sent you an email
about that.
I stand corrected, I did *not* send you an email, which I suppose is
the reason for your question right now. Sorry about that :-)

Best,

Arthur
Стоян Димитров
2018-09-14 13:25:10 UTC
Permalink
Thank you!
Post by Arthur Reutenauer
Post by Arthur Reutenauer
I sent you an email
about that.
I stand corrected, I did *not* send you an email, which I suppose is
the reason for your question right now. Sorry about that :-)
Best,
Arthur
Стоян Димитров
2018-09-14 13:27:35 UTC
Permalink
Is it possible to change the author and the license on site?
Post by Стоян Димитров
Thank you!
Post by Arthur Reutenauer
I sent you an email
about that.
   I stand corrected, I did *not* send you an email, which I suppose is
the reason for your question right now.  Sorry about that :-)
    Best,
        Arthur
Arthur Reutenauer
2018-09-14 13:31:10 UTC
Permalink
Post by Стоян Димитров
Is it possible to change the author and the license on site?
Sure, what should we put? If we could have the MIT licence, that
would be very nice :-)

Best,

Arthur
Стоян Димитров
2018-09-14 13:35:33 UTC
Permalink
The author is Anton Zinoviev the license is in the file itself. Sorry, I
don't know if is MIT.
Post by Arthur Reutenauer
Post by Стоян Димитров
Is it possible to change the author and the license on site?
Sure, what should we put? If we could have the MIT licence, that
would be very nice :-)
Best,
Arthur
Arthur Reutenauer
2018-09-14 13:46:05 UTC
Permalink
Sorry, I just realised that by “on site” you meant on the website:

http://tug.org/tex-hyphen/#languages

I’m working on making the website updates automatic but I’m not
completely done; a bit of patience, please :-)
Post by Стоян Димитров
Sorry, I
don't know if is MIT.
It is almost equivalent except for the additional non-advertising
clause (“the name of the author may not be used to endorse or promote
products derived from this software without prior permission”), which is
a bit of a pain (but so is any custom licence text, to be honest).

Best,

Arthur

Loading...