Re: [Patch] Support UTF-8 scripts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Martin v. Löwis wrote:

Says who? In UTF-8, it is not used to indicate a byte order; instead,
it is used to indicate the fact that the file is UTF-8, like a magic.
That's why I prefer to call it "UTF-8 signature".

The Unicode consortium thinks that the BOM can be used in UTF-8:

http://www.unicode.org/faq/utf_bom.html#29

The UTF-8 signature is very useful, and I would prefer if it would
be used instead of format-specific encoding declarations.


In Unix, it's a hideously bad idea. The reason is that Unix inherently assumes that text streams can be merged, split, and modified. In other words, unless you can guarantee that EVERY program can handle BOM EVERYWHERE, it's broken.

In other words, it's broken.

	-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux