[Patch] Support UTF-8 scripts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patch adds support for UTF-8 signatures (aka BOM, byte order
mark) to binfmt_script. Files that start with EF BF FF # ! are now
recognized as scripts (in addition to files starting with # !).

With such support, creating scripts that reliably carry non-ASCII
characters is simplified. Editors and the script interpreter can
easily agree on what the encoding of the script is, and the
interpreter can then render strings appropriately. Currently,
Python supports source files that start with the UTF-8 signature;
the approach would naturally extend to Perl to enhance/replace
the "use utf8" pragma. Likewise, Tcl could use the UTF-8 signature
to reliably identify UTF-8 source code (instead of assuming
[encoding system] for source code).

Please find the patch attached below.

Regards,
Martin

Signed-off-by: Martin v. Löwis <[email protected]>

diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -1,7 +1,7 @@
 /*
  *  linux/fs/binfmt_script.c
  *
- *  Copyright (C) 1996  Martin von Löwis
+ *  Copyright (C) 1996, 2005  Martin von Löwis
  *  original #!-checking implemented by tytso.
  */

@@ -23,7 +23,16 @@ static int load_script(struct linux_binp
        char interp[BINPRM_BUF_SIZE];
        int retval;

-       if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') ||
(bprm->sh_bang))
+       /* It is a recursive invocation. */
+       if (bprm->sh_bang)
+               return -ENOEXEC;
+
+       /* It starts neither with #!, nor with #! preceded by
+          the UTF-8 signature. */
+       if (!(((bprm->buf[0] == '#') && (bprm->buf[1] == '!'))
+             || ((bprm->buf[0] == '\xef') && (bprm->buf[1] == '\xbb')
+                 && (bprm->buf[2] == '\xbf') && (bprm->buf[3] == '#')
+                 && (bprm->buf[4] == '!'))))
                return -ENOEXEC;
        /*
         * This section does the #! interpretation.
@@ -46,7 +55,8 @@ static int load_script(struct linux_binp
                else
                        break;
        }
-       for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
+       cp = (bprm->buf[0]=='\xef') ? bprm->buf+5 : bprm->buf+2;
+       while ((*cp == ' ') || (*cp == '\t')) cp++;
        if (*cp == '\0')
                return -ENOEXEC; /* No interpreter name found */
        i_name = cp;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux