Re: A Great Idea (tm) about reimplementing NLS.

In article <[email protected]> you wrote:
> (implication of utf8 and not utf16 goes here)
> 
> Very few Unicode characters require three bytes, instead of the usual one or 
> two.

UTF-8 2 bytes end with U+07ff which covers only Latin, Cyrillic, Hebrew and
Arabic.

All JCK Unified Ideographs  (U+4E00-) and Extensions (U+3400-) have 3 byte
encodings with UTF-8. Some of the B Extensions even use 4 bytes (U+20000-)

> For one byte you just have the byte. 

For ASCII you have one byte.

> For two bytes, you really have three: a control code stating "the following 
> two bytes are a two byte character", and then the two bytes. 

Umm, thats a bit missleading. UTF-8 works with bit not byte prefixes.
Unicode code points are integers and depending on the encoding represented
as multiple code points, which can be represented as bytes.

> Unless I've completely misunderstood the Unicode specification, this is what 
> is going on.

You might want to look up Joel's Tutorial or just browse the Unihan Database:
http://www.joelonsoftware.com/articles/Unicode.html
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=3400
http://www.unicode.org/cgi-bin/UnihanGrid.pl?codepoint=U+07F1&useutf8=false

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: A Great Idea (tm) about reimplementing NLS.
  - From: Patrick McFarland <[email protected]>

Prev by Date: Re: A Great Idea (tm) about reimplementing NLS.
Next by Date: Re: PROBLEM: Devices behind PCI Express-to-PCI bridge not mapped
Previous by thread: Re: A Great Idea (tm) about reimplementing NLS.
Next by thread: Re: A Great Idea (tm) about reimplementing NLS.
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]