In article <[email protected]> you wrote:
> Every unicode character has exactly one UTF-8 representation.
Every unicode code point has exactly one UTF-8 representation, however there
are for a few glyphs multiple code points. And this is not only a problem
beause of homoglphys which look like/similiar, but also because of combining
characters vs. legacy characters. However thats more an issue of the user
interface (think IDN exploits).
Personally I think the on-disk filesystem format should be required to be
UTF-8, and its an open discussion if the syscalls accept UTF-8 or locale
byte encodings. Currently its a mess. We can learn from Windows here:)
Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]