Bug#929923: missing dictionaries.xcu confuses non-US English locales (e.g. en_AU)
Rene Engelhard wrote:
> On Wed, Aug 21, 2019 at 03:44:36PM +1000, Trent W. Buck wrote:
> > I still advocate solving only MY problem, with a simple change:
> >
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?att=2;bug=929923;filename=929923.patch;msg=22
>
> And I still say that it at least for en_GB is wrong.
> As said: color vs. colour.
> You say that Australia is used to both, OK, I believe so - but I don't think so
> for en_GB.
As I hinted before, mythes-en-us already contains "colour",
though admittedly not in all cases:
bash5$ grep -Fc color /usr/share/mythes/th_en_US_v2.dat
960
bash5$ grep -Fc colour /usr/share/mythes/th_en_US_v2.dat
661
A quick analysis of Debian 10's mythes-en-us [1] shows,
* About 4.3% of the words are valid British-only words ((276K - 208K) ÷ 1.6M).
* About 3.8% of the words are valid American-only words ((269K - 208K) ÷ 1.6M).
So according to hunspell (the same spell-checker LibreOffice uses),
th_en_US_v2.dat is actually more British than American :-)
[1]
bash5$ dpkg-query -W mythes-en-us hunspell hunspell-en-us hunspell-en-gb
hunspell 1.7.0-2
hunspell-en-gb 1:6.2.0-1
hunspell-en-us 1:2018.04.16-1
mythes-en-us 1:6.2.0-1
bash5$ wc -w /usr/share/mythes/th_en_US_v2.dat | numfmt --to si # how many words in total?
1.6M /usr/share/mythes/th_en_US_v2.dat
bash5$ hunspell -l -d en_US,en_GB /usr/share/mythes/th_en_US_v2.dat | wc -l | numfmt --to si # how many words misspelt in "both" english varieties (i.e. false positives)?
208K
bash5$ hunspell -l -d en_US /usr/share/mythes/th_en_US_v2.dat | wc -l | numfmt --to si # how many words misspelt in en_US?
276K
bash5$ hunspell -l -d en_GB /usr/share/mythes/th_en_US_v2.dat | wc -l | numfmt --to si # how many words misspelt in en_GB?
269K
PS: Out of curiosity, I looked up some references re "colour" specifically.
The OED is different enough from en-GB to have its own locale (en-GB-oxendict), but
AFAIK it is nevertheless the primary reference for en-GB spelling.
I don't have a dead-tree version; it's online version appears to live here:
https://www.lexico.com/en/definition/color
https://www.lexico.com/en/definition/colour
https://www.lexico.com/en/definition/-our
which simply has rather dogmatic labels "US" and "British",
though it notes that "-our" is merely a "variant spelling".
Fowler (1e) definition of "colo(u)r" (p. 83) directs me to
"See -OR & -OUR", which says
It is not worth while either to resist such a gradual change or
to fly in the face of national sentiment by trying to hurry it.
The American abolition of -our [...] has probably retarded
rather than quickened English progress in the same direction.
For en-AU, the AGPS Style Manual (5e) on §3.1 through §3.18
(pp. 39-42) simply advises doing whatever Macquarie says.
I don't have a copy of Macquarie handy, and
the online version is paywalled.
Reply to: