Originally isspace() and other similar functions in ctype.h ignored
any character with the high bit set; however this was changed during
the linux 2.1 days to map Latin-1. As following Latin-1 will most
likely break UTF-8 any any *other* encoding that is backwards-
compatible with 7-bit-ASCII, change ctype.c to ignore such characters
completely (the way they were before). Linus seems to think this is
a good thing, and he's the one that wrote the code in the first place.
Signed-off-by: Kyle Moffett <[email protected]>
---
On Nov 06, 2007, at 10:53:08, Linus Torvalds wrote:
On Tue, 6 Nov 2007, Kyle Moffett wrote:
Personally I think that isspace() accepting character 0xA0 is a bug
I think I agree with you. As far as the kernel is concerned,
"isspace()" should just accept the obvious spaces (hardspace, tab,
newline), and *perhaps* the VT/FF kind of things.
You should realize that the kernel <ctype.h> thing is *ancient*.
It's basically there from v0.01, and while the really original one
(I just checked) had all the non-ascii characters not trigger
anything, it was converted to be latin1 in the 2.1.x timeframe.
That's a *loong* time ago. Way before UTF-8 and other things were
really common.
So we should probably just make all the upper 128 bytes go back to
"don't trigger anything in ctype.h" - they'd not be spaces, but
they'd not be control characters or anything else either.
lib/ctype.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/lib/ctype.c b/lib/ctype.c
index d02ace1..ce2807a 100644
--- a/lib/ctype.c
+++ b/lib/ctype.c
@@ -24,13 +24,18 @@ _P,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L, /* 96-103 */
_L,_L,_L,_L,_L,_L,_L,_L, /* 104-111 */
_L,_L,_L,_L,_L,_L,_L,_L, /* 112-119 */
_L,_L,_L,_P,_P,_P,_P,_C, /* 120-127 */
+
+/*
+ * None of these match any type bits to avoid screwing up UTF-8 or any other
+ * 7-bit-ASCII-compatible encoding.
+ */
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 128-143 */
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 144-159 */
-_S|_SP,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P, /* 160-175 */
-_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P, /* 176-191 */
-_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U, /* 192-207 */
-_U,_U,_U,_U,_U,_U,_U,_P,_U,_U,_U,_U,_U,_U,_U,_L, /* 208-223 */
-_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L, /* 224-239 */
-_L,_L,_L,_L,_L,_L,_L,_P,_L,_L,_L,_L,_L,_L,_L,_L}; /* 240-255 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 160-175 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 176-191 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 192-207 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 208-223 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 224-239 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; /* 240-255 */
EXPORT_SYMBOL(_ctype);
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]