On Mon, 2005-09-19 at 06:54 +0200, "Martin v. Löwis" wrote:
> Bernd Petrovitsch wrote:
> >>The specific feature I get is that when I pass a file starting
> >>with <utf8sig>#! to execve, Linux will execute the file following
> >>the #!. In what way do I get this feature for text in general?
> >>And if I do, why is that a problem?
> >
> > After applying this patch it seems that "Linux" is supporting this
> > marker officially in general - especially if the kernel supports it.
>
> What makes it seem so? That binfmt_script supports a certain convention
> doesn't mean that all other programs also somehow need to support that
> convention - and certainly not in the same way.
We will see how it develops. Actually the marker could be used to detect
endianness of the file if I read below URL correctly ....
> > I suppose the next kernel patch is to support Win-like CR-LF sequences
> > (which is not the case AFAIK).
>
> What makes you suppose that? I have no plans to submit such a patch.
No need to. Other people tried already.
> This reasoning is just flawed: it is like saying to a web browser
> developer: "don't _support_ XHTML, because there are so many tools
> which use HTML 4".
No, the saying was more: "don't support XHTML since it may break HTML
compliant browsers."
For XHTML/HTML we all know that this is not the case, so the comparison
is flawed.
> > Apparently I have to repeat: If you do `cat a.txt b.txt >c.txt` where
> > a.txt and b.txt have this marker, then c.txt have the marker of b.txt
> > somewhere in the middle. Does this make sense in anyway?
>
> Indeed, it does. There is nothing inherently wrong with having
> the marker in the middle.
>
> > How do I get rid of the marker in the middle transparently?
>
> http://www.unicode.org/faq/utf_bom.html#38
Thanks.
---- snip ----
In that case, any U+FEFF occurring in the middle of the file can be
ignored, or treated as an error.
---- snip ----
Well, this doesn't sound like an clear rule stating that it *must* be
ignored.
BTW:
---- snip ----
Q: How I should deal with BOMs?
[...]
3. Some byte oriented protocols expect ASCII characters at the beginning
of a file. If UTF-8 is used with these protocols, use of the BOM as
encoding form signature should be avoided.
---- snip ----
Voila. Avoid the BOM in your scripts and be done.
> > It depends on the definition of "character". There are other standards
> > which define "character" as "byte".
>
> Certainly. However, you specifically talked about 'wc -c', and, in
> wc(1), atleast in the implementation commonly used on Linux, characters
> and bytes are not the same.
Yes, now since multi-byte character sets gets more commonly used.
However, I don't think you get this into the C standard. But we are now
far off the discussion ....
> >>It depends on the editor I use, of course
> >
> > No, more on the OS the editor runs on.
>
> You talked about Windows specifically. On Windows, most editors give you
> the choice of chosing the line ending, and will preserve whatever line
> ending they find when adding new lines to a file.
I belive this vor vim, emacs, etc. but I don't believe ir for the native
ones ...
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|