H. Peter Anvin wrote:
> You don't have markers (although they're defined, see ISO 2022) for your
> 8-bit encodings, and *THEY'RE THE ONES THAT NEED TO BE DISTINGUISHED.*
> Flagging UTF-8, especially with the BOM (as opposed to the ISO 2022
> signature, <ESC>%G) is pointless in the context, since you still can't
> distinguish your arbitrary number of legacy encodings.
In programming languages that support the notion of source encodings,
you do have markers for 8-bit encodings. For example, in Python, you
can specify
# -*- coding: iso-8859-1 -*-
to denote the source encoding. In Perl, you write
use encoding "latin-1";
(with 'use utf8;' being a special-case shortcut).
In Java, you can specify the encoding through the -encoding argument
to javac. In gcc, you use -finput-charset (with the special case of
-fexec-charset and -fwide-exec-charset potentially being different).
So you *must* use encoding declarations in some languages; the UTF-8
signature is a particularly convenient way of doing so, since it allows
for uniformity across languages, with no need for the text editors to
parse all the different programming languages.
Regards,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|