Re: [Draft] Writing i18n apps with glibc 2.2
Hello rigel,
On Mon, Oct 16, 2000 at 01:31:18AM -0700, rigel wrote:
> Hi Roger,
> On Mon, Oct 16, 2000 at 04:10:07PM +1100, Roger So wrote:
> > >
> > > No, please don't. You should continue to use isprint to test
> > > whether a byte is printable.
> >
> > I thought so too, but isprint(0xA7) didn't work, however
> > iswprint(0xA7) worked ...? Now I'm confused ...
>
> This is indeed correct. Byte 0xA7 is not a legal character in zh
> lcoales, so isprint(0xA7) should return 0. While widechar 0xA7 =
> U000000A7 = 0xA1EC (gb2312) = 0xA1B1 (big5), is a printable character,
> so iswprint(0xA7) returns 1.
>
> Also glibc retains more information for widechar (used by iswprint)
> than for multibyte (used by isprint). Internally the binary locale
> file keep two separate sets of information: multibyte and widechar.
> All the chars presented in locale def file will be put in widechar
> part, while only those also exist in charmap file, i.e. legal chars,
> will be recorded in multibyte part. For example, U00A6 exist in zh_HK
> def files, although it's not a legal character in Big5HKSCS charmap,
> iswprint(0xA6) will return 1. The same call in zh_CN and zh_TW lcoales
> will result a 0, because U00A6 is not exist in zh_CN and zh_TW def
> files.
Thank you for the clarification -- I stand corrected.
So, given a stream of bytes which might contain multibyte characters,
how would I test whether a byte is, say, printable? Do I need to test
for MB_CUR_MIN to MB_CUR_MAX number of bytes instead of individual
bytes? (seems wildly inefficient ...)
Also, in glibc, are widechars always in Unicode? (UCS-4?)
> > > Just a small thing. A new LC_CTYPE class "hanzi" was added in
> > > glibc 2.2 locale (both zh_CN and zh_TW have it, zh_HK doesn't
> > > though).
> >
> > Hmm ... that's a bug ...
>
> Well, not really a bug. I added this hanzi class in zh_CN. zh_TW's
> CTYPE simply copy zh_CN, while zh_HK copy "i18n".
Then zh_HK should copy "zh_CN" instead ...?
BTW several definitions in zh_HK seems to be wrong; when I get the time
I shall have a closer look. Also it seems that an en_HK locale would be
nice for people like me :)
--
Roger So telnet://e-fever.org
spacehunt at e-fever dot org SysOp, e-Fever BBS
GnuPG 1024D/98FAA0AD F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD
Reply to: