Re: [Patch] Support UTF-8 scripts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2005-09-18 at 02:53 +0200, Bodo Eggert wrote:
> Bernd Petrovitsch <[email protected]> wrote:
[...]
> > Most of the text editors have ways to markup the source files. Not even
> > the various editors are able to agreen on one method for all, so why
> > could the (Linux) world agree on one for all text files?
> 
> You don't need a marker for all text files, but it's legal to have a marker
> for utf-8 text files (see the uniocode standard 4.0.0 section 15.9), and
> it's handy to use it until you made everybody in the world convert
> everything to utf-8 (but not utf-{16,32}{le,be}).

Have fun patching almost every text processing tool and concept out
there.
Apart from that the way of that marker is wrong it seems to me that the
UTF-8 body has no other choice than such a insane "rule" or
recommendation).

> >> > With this marker you are interferign with (at least) *all* text files.
> >> 
> >> Hmm. What does that have to do with the patch I'm proposing? This
> >> patch does *not* interfere with all text files. It is only relevant
> >> for executable files starting with the #! magic.
> > 
> > It *does* interfere since scripts are also text files in every aspect.
> > So every feature you want for "scripts" you also get for text files (and
> > vice versa BTW).
> 
> If utf-8 encoded text files are text files, and text files are scripts,

No one said all text files are scripts, instead it is the other way
'round.

[ snipped because of ex falso quod libet ]

> > If you think "script" and "text file" are different, define both of
> > them, please, otherwise a discussion is pointless.
> 
> If all text files are script files, execute this mail.

See above. Obviously you misunderstand some thing.

> >> > And there are always tools out there which simply do not understand the
> >> > generic marker and can not ignore it since these bytes are part of the
> >> > file.
> >> 
> >> This conclusion is false. Many tools that don't understand the file
> >> structure still can do their job on the files. So the fact that a tool
> >> does not understand the structure does not necessarily imply that
> >> the tool breaks when the structure changes.
> > 
> > It *may* break just because of some to-be-ignored inline marking due to
> > some questionable feature.
> 
> How exactly does it break, and what is it? And why must *it* be prevented
> from breaking by ignoring script signatures in valid text files?

The question was: What is if this marker in encountered within a file?
To be ignored (by UTF-8 aware tools)? Some other interpretation?
Illegal/Forbidden?

> > And *when* (not if) it breaks, it is probably cumbersome to find since
> > you have pretty unprintable characters.
> 
> If your tools can't print utf-8 encoded characters, they are broken for
> ISO-8859-*, too. Besides that, it's not a kernel problem.

Which is again not true since lots of tools out there printed ISO-8859-*
correctly before UTF-8 was deployed.

[...]

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux