Discussion:
[tex-hyphen] [tex-live] hyph-ru.tex is faulty.
Arthur Reutenauer
2017-01-26 12:51:56 UTC
Permalink
(Moving discussion from the TeX Live list to TeX-hyphen, please reply
there.)
Did you add patterns for all combining accents as I mentioned in one
of the comments?
Not yet, but it’s on our list: https://github.com/hyphenation/tex-hyphen/issues/5
I need to figure out the best way to do it: should we input the full
list of all combining characters for every language, or only those
diacritic signs that are relevant for each language? The former option
may seem like less work but we need to make sure that the accents don’t
interact with the existing patterns (for all the languages), and ensure
that it stays so in the future. If for example someone comes up with a
pattern set for Russian that does take the combining acute accent into
account, having a default list of patterns with accents may be
self-defeating.

Best,

Arthur
David Carlisle
2017-01-26 13:16:43 UTC
Permalink
A related issue came up for latex as we set up the formats for
2017/01/01 release defaulting to Unicode encoding for the first time,
should we default to NFC normalisation (using the xetex primitive and
some lua callback) which would go some way to
avoiding the need to deal with combining accents in the patterns?

we didn't do that this time for fear of clashing with existing code
but if this issue is going to keep coming up it might be good to look
at this again...

David


On 26 January 2017 at 12:51, Arthur Reutenauer
Post by Arthur Reutenauer
(Moving discussion from the TeX Live list to TeX-hyphen, please reply
there.)
Did you add patterns for all combining accents as I mentioned in one
of the comments?
Not yet, but it’s on our list: https://github.com/hyphenation/tex-hyphen/issues/5
I need to figure out the best way to do it: should we input the full
list of all combining characters for every language, or only those
diacritic signs that are relevant for each language? The former option
may seem like less work but we need to make sure that the accents don’t
interact with the existing patterns (for all the languages), and ensure
that it stays so in the future. If for example someone comes up with a
pattern set for Russian that does take the combining acute accent into
account, having a default list of patterns with accents may be
self-defeating.
Best,
Arthur
Arthur Reutenauer
2017-01-26 14:08:47 UTC
Permalink
Post by David Carlisle
A related issue came up for latex as we set up the formats for
2017/01/01 release defaulting to Unicode encoding for the first time,
should we default to NFC normalisation (using the xetex primitive and
some lua callback) which would go some way to
avoiding the need to deal with combining accents in the patterns?
This won’t help in this case :-) There are no precomposed characters
in Unicode for Cyrillic letters with acute accent, which is what we’d
need in this instance (the acute accent is only used in very specific
contexts in Russian to mark stress, for example in some dictionaries,
and texts for teaching Russian as a second language, which was the use
case discussed in November).
Post by David Carlisle
we didn't do that this time for fear of clashing with existing code
but if this issue is going to keep coming up it might be good to look
at this again...
I’d really advocate against imposing NFC in the formats. Actually,
on the long run I think NFD makes more sense, but that’s probably
several more years down the line :-)

Best,

Arthur
David Carlisle
2017-01-26 14:18:02 UTC
Permalink
On 26 January 2017 at 14:08, Arthur Reutenauer
Post by Arthur Reutenauer
Post by David Carlisle
A related issue came up for latex as we set up the formats for
2017/01/01 release defaulting to Unicode encoding for the first time,
should we default to NFC normalisation (using the xetex primitive and
some lua callback) which would go some way to
avoiding the need to deal with combining accents in the patterns?
This won’t help in this case :-) There are no precomposed characters
in Unicode for Cyrillic letters with acute accent,
Which I knew at some point since Christmas as that was one argument
I used against adding normalisation, that it only avoided the problem
for some subset of languages..
But missed that just now, sorry:-)
Post by Arthur Reutenauer
which is what we’d
need in this instance (the acute accent is only used in very specific
contexts in Russian to mark stress, for example in some dictionaries,
and texts for teaching Russian as a second language, which was the use
case discussed in November).
Post by David Carlisle
we didn't do that this time for fear of clashing with existing code
but if this issue is going to keep coming up it might be good to look
at this again...
I’d really advocate against imposing NFC in the formats. Actually,
on the long run I think NFD makes more sense, but that’s probably
several more years down the line :-)
Yes NFD is in a way more consistent I'd agree. Anyway thanks for
the confirmation that we are best not touching normalisation at this point
Post by Arthur Reutenauer
Best,
Arthur
David
Arthur Reutenauer
2017-01-26 14:52:26 UTC
Permalink
Post by David Carlisle
Which I knew at some point since Christmas as that was one argument
I used against adding normalisation, that it only avoided the problem
for some subset of languages..
But missed that just now, sorry:-)
That’s funny.
Post by David Carlisle
Yes NFD is in a way more consistent I'd agree. Anyway thanks for
the confirmation that we are best not touching normalisation at this point
Yes please, don’t go down that route.

Best,

Arthur

Loading...