Re: [PATCH] Smackv10: Smack rules grammar + their stateful parser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Nov 06, 2007, at 01:33:05, Adrian Bunk wrote:
Can you limit this to 7bit ASCII and use isascii() somewhere?

Otherwise I'd expect funny things to happen when you e.g. use isspace() on the UTF-8 encoded character à.

Actually, you don't need to. You tell them it expects UTF-8 encoded strings and be done with it. All US-ASCII characters from 0 through 127 (IE: high bit clear) are exactly the same in UTF-8, and UTF-8 special characters have the high bit set in all bytes. Therefore you just assume that anything with the high bit set is part of a word and you can handle basic UTF-8. (It doesn't work on special UTF-8 space characters like nonbreaking space and similar, but handling those is significantly more complicated).

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux