Re: [Patch] Support UTF-8 scripts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bernd Petrovitsch <[email protected]> wrote:
> On Sat, 2005-09-17 at 08:20 +0200, "Martin v. Löwis" wrote:
>> Bernd Petrovitsch wrote:
>> > On Fri, 2005-09-16 at 22:41 +0200, "Martin v. Löwis" wrote:

>> > [ Language-specific examples ]
>> > 
>> > And that's the only working way - the programming languages can actually
>> > do it because it defines the syntax and semantics of the contents
>> > anyways.
>> 
>> It works from the programming language point of view, but it is a mess
>> from the text editor point of view.
> 
> Most of the text editors have ways to markup the source files. Not even
> the various editors are able to agreen on one method for all, so why
> could the (Linux) world agree on one for all text files?

You don't need a marker for all text files, but it's legal to have a marker
for utf-8 text files (see the uniocode standard 4.0.0 section 15.9), and
it's handy to use it until you made everybody in the world convert
everything to utf-8 (but not utf-{16,32}{le,be}).

>> > With this marker you are interferign with (at least) *all* text files.
>> 
>> Hmm. What does that have to do with the patch I'm proposing? This
>> patch does *not* interfere with all text files. It is only relevant
>> for executable files starting with the #! magic.
> 
> It *does* interfere since scripts are also text files in every aspect.
> So every feature you want for "scripts" you also get for text files (and
> vice versa BTW).

If utf-8 encoded text files are text files, and text files are scripts,
and all of them shall have the same features, utf-8 encoded text files
with BOM MUST be recognized as legal scripts, too. Therefore this patch
fixes a kernel bug.

BTW: Implementing the other utf signatures from Table 15.3 is left to the
reader as an exercise.-)

> If you think "script" and "text file" are different, define both of
> them, please, otherwise a discussion is pointless.

If all text files are script files, execute this mail.

>> > And there are always tools out there which simply do not understand the
>> > generic marker and can not ignore it since these bytes are part of the
>> > file.
>> 
>> This conclusion is false. Many tools that don't understand the file
>> structure still can do their job on the files. So the fact that a tool
>> does not understand the structure does not necessarily imply that
>> the tool breaks when the structure changes.
> 
> It *may* break just because of some to-be-ignored inline marking due to
> some questionable feature.

How exactly does it break, and what is it? And why must *it* be prevented
from breaking by ignoring script signatures in valid text files?

> And *when* (not if) it breaks, it is probably cumbersome to find since
> you have pretty unprintable characters.

If your tools can't print utf-8 encoded characters, they are broken for
ISO-8859-*, too. Besides that, it's not a kernel problem.

> Let alone the confusion why the size of a file with `ls -l` is different
> from the size in the editor or a marker-aware `wc -c`.
> So IMHO either you have a clear and visible marker or you none at all.

Like e.g. the "From "-line starting each message in a mbox file? Virtually
no email client will display it. The size of email messages does differ
from it's unencoded content size, too! Off cause nobody can handle this,
and all users contantly try to kill themselfes because of that - NOT.

-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux