> fredagen den 17 juni 2005 15.23 skrev Måns Rullgård:
> > Some characters can be encoded in several equally shortest ways.
>
> No they cannot. How to encode characters i explicitly and well defined. If you
> don't follow the rules you are simply not producing UTF-8, but something
> else.
>
> Every unicode character has exactly one UTF-8 representation.
>
> -- robin
You are confused between unicode characters and unicode codepoints.
Every unicode codepoint has exactly one UTF-8 representation.
Unicode characters may use one ore more unicode codepoints.
Some characters have also representation with one codepoint, but not all.
For example
LATIN CAPITAL LETTER A WITH ACUTE
have presentation 0041 0301
That is two unicode codepoints. That character
have also other (compatibility) representation
that is 00C1
But consider (somewhat imaginary) character
LATIN CAPITAL LETTER A WITH GRAVE AND CIRCUMFLEX
that have presentation 0041 0300 0302
but it have also presentation 0041 0302 0300
Both presentations are equal short.
/ Kari Hurtta
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]