On Sun, 18 Sep 2005 21:23:42 +0200, Bodo Eggert said: > Bernd Petrovitsch <[email protected]> wrote: > > Apparently I have to repeat: If you do `cat a.txt b.txt >c.txt` where > > a.txt and b.txt have this marker, then c.txt have the marker of b.txt > > somewhere in the middle. Does this make sense in anyway? > > How do I get rid of the marker in the middle transparently? > > The unicode standard defines how to handle them. For the benefit of those of us who are interested in the problem, but aren't in the mood to wade through a long standard looking for the answer to a specific question, can you elaborate? It isn't as obvious as all that, because of all the nasty corner cases... > > It is different even if a pure ASCII file is marked as UTF-8. > > No pure ASCII file will be marked, since a marked file will be no > ASCII file. Given a file "a.txt" that's pure ASCII, and a file "b.txt" that has the BOM marker on it, what happens when you do "cat a.txt b.txt > c.txt"? 'cat' doesn't know, and has no way of knowing, that c.txt needs a BOM at the *front* of the file until it's already written past the point in c.txt where the BOM has to go. What does the Unicode standard say to do in this case?
Attachment:
pgpFynk0QZkY3.pgp
Description: PGP signature
- References:
- Re: [Patch] Support UTF-8 scripts
- From: Bodo Eggert <[email protected]>
- Re: [Patch] Support UTF-8 scripts
- Prev by Date: Re: [linux-usb-devel] URB_ASYNC_UNLINK b0rkage
- Next by Date: Re: dell's latitude cdburner problem
- Previous by thread: Re: [Patch] Support UTF-8 scripts
- Next by thread: Re: [Patch] Support UTF-8 scripts
- Index(es):