[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1026231: debian-policy: document droppage of support for legacy locales



On Fri, 20 Jan 2023 at 09:54:21 -0700, Anthony Fok wrote:
> supposedly some older Chinese websites are still using "GBK" as
> encoding, probably something like:
> 
>      <meta http-equiv="Content-Type" content="text/html;charset=gbk">
> 
> which has less than 30,000 characters and thus a very limited subset
> of Unicode.  And, presumably not everyone has the know how to convert
> to UTF-8, the Chinese government wants those unable to at least change
> that meta tag to:
> 
>      <meta http-equiv="Content-Type" content="text/html;charset=gb18030">

Sure, but neither of those actually require us to support GBK or GB
18030 as a system locale, only as something that iconv() (or whatever
browsers actually use, which is probably their own thing) can convert
into their preferred internal representation (which is almost certainly
UTF-8, UTF-16 or UCS-4).

Analogously, we've never supported using Windows-1252 (Microsoft's
legacy Latin-1 variant) as a system locale encoding in some hypothetical
locale like en_US.windows-1252, but HTML documents with
text/html;charset=windows-1252 still work fine.

> I have the feeling that many tech-savvy Chinese have already switched
> to UTF-8, but then perhaps in some circles there are lots of legacy
> GB2312/GBK documents or systems that made GB18030 a necessity, as an
> intermediate step to Unicode.

That doesn't seem so far away from how in some English-speaking circles
there are lots of legacy ISO-8859-1, ISO-8859-15 or (more likely)
Windows-1252 documents, and we can cope OK with those via transcoding,
even in UTF-8 system locales.

    smcv


Reply to: