Re: [Patch] Support UTF-8 scripts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 18 Sep 2005 21:23:42 +0200, Bodo Eggert said:
> Bernd Petrovitsch <[email protected]> wrote:
> > Apparently I have to repeat: If you do `cat a.txt b.txt >c.txt` where
> > a.txt and b.txt have this marker, then c.txt have the marker of b.txt
> > somewhere in the middle. Does this make sense in anyway?
> > How do I get rid of the marker in the middle transparently?
> 
> The unicode standard defines how to handle them.

For the benefit of those of us who are interested in the problem, but aren't
in the mood to wade through a long standard looking for the answer to a
specific question, can you elaborate?

It isn't as obvious as all that, because of all the nasty corner cases...

> > It is different even if a pure ASCII file is marked as UTF-8.
> 
> No pure ASCII file will be marked, since a marked file will be no
> ASCII file.

Given a file "a.txt" that's pure ASCII, and a file "b.txt" that has the BOM
marker on it, what happens when you do "cat a.txt b.txt > c.txt"?

'cat' doesn't know, and has no way of knowing, that c.txt needs a BOM at the
*front* of the file until it's already written past the point in c.txt where
the BOM has to go.

What does the Unicode standard say to do in this case?

Attachment: pgpFynk0QZkY3.pgp
Description: PGP signature


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux