[PATCH] Fix isspace() and other ctype.h functions to ignore chars 128-255

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Originally isspace() and other similar functions in ctype.h ignored any character with the high bit set; however this was changed during the linux 2.1 days to map Latin-1. As following Latin-1 will most likely break UTF-8 any any *other* encoding that is backwards- compatible with 7-bit-ASCII, change ctype.c to ignore such characters completely (the way they were before). Linus seems to think this is a good thing, and he's the one that wrote the code in the first place.

Signed-off-by: Kyle Moffett <[email protected]>

---

On Nov 06, 2007, at 10:53:08, Linus Torvalds wrote:
On Tue, 6 Nov 2007, Kyle Moffett wrote:
Personally I think that isspace() accepting character 0xA0 is a bug

I think I agree with you. As far as the kernel is concerned, "isspace()" should just accept the obvious spaces (hardspace, tab, newline), and *perhaps* the VT/FF kind of things.

You should realize that the kernel <ctype.h> thing is *ancient*. It's basically there from v0.01, and while the really original one (I just checked) had all the non-ascii characters not trigger anything, it was converted to be latin1 in the 2.1.x timeframe.

That's a *loong* time ago. Way before UTF-8 and other things were really common.

So we should probably just make all the upper 128 bytes go back to "don't trigger anything in ctype.h" - they'd not be spaces, but they'd not be control characters or anything else either.

 lib/ctype.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/ctype.c b/lib/ctype.c
index d02ace1..ce2807a 100644
--- a/lib/ctype.c
+++ b/lib/ctype.c
@@ -24,13 +24,18 @@ _P,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L,	/* 96-103 */
 _L,_L,_L,_L,_L,_L,_L,_L,			/* 104-111 */
 _L,_L,_L,_L,_L,_L,_L,_L,			/* 112-119 */
 _L,_L,_L,_P,_P,_P,_P,_C,			/* 120-127 */
+
+/*
+ * None of these match any type bits to avoid screwing up UTF-8 or any other
+ * 7-bit-ASCII-compatible encoding.
+ */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 128-143 */
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 144-159 */
-_S|_SP,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,   /* 160-175 */
-_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,       /* 176-191 */
-_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,       /* 192-207 */
-_U,_U,_U,_U,_U,_U,_U,_P,_U,_U,_U,_U,_U,_U,_U,_L,       /* 208-223 */
-_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,       /* 224-239 */
-_L,_L,_L,_L,_L,_L,_L,_P,_L,_L,_L,_L,_L,_L,_L,_L};      /* 240-255 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 160-175 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 176-191 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 192-207 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 208-223 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,		/* 224-239 */
+0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};		/* 240-255 */
 
 EXPORT_SYMBOL(_ctype);


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux