On Nov 06, 2007, at 01:33:05, Adrian Bunk wrote:
Can you limit this to 7bit ASCII and use isascii() somewhere?
Otherwise I'd expect funny things to happen when you e.g. use
isspace() on the UTF-8 encoded character à.
Actually, you don't need to. You tell them it expects UTF-8 encoded
strings and be done with it. All US-ASCII characters from 0 through
127 (IE: high bit clear) are exactly the same in UTF-8, and UTF-8
special characters have the high bit set in all bytes. Therefore you
just assume that anything with the high bit set is part of a word and
you can handle basic UTF-8. (It doesn't work on special UTF-8 space
characters like nonbreaking space and similar, but handling those is
significantly more complicated).
Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]