Re: [Patch] Support UTF-8 scripts

H. Peter Anvin wrote:
> You don't have markers (although they're defined, see ISO 2022) for your
> 8-bit encodings, and *THEY'RE THE ONES THAT NEED TO BE DISTINGUISHED.*
> Flagging UTF-8, especially with the BOM (as opposed to the ISO 2022
> signature, <ESC>%G) is pointless in the context, since you still can't
> distinguish your arbitrary number of legacy encodings.

In programming languages that support the notion of source encodings,
you do have markers for 8-bit encodings. For example, in Python, you
can specify

# -*- coding: iso-8859-1 -*-

to denote the source encoding. In Perl, you write

use encoding "latin-1";

(with 'use utf8;' being a special-case shortcut).

In Java, you can specify the encoding through the -encoding argument
to javac. In gcc, you use -finput-charset (with the special case of
-fexec-charset and -fwide-exec-charset potentially being different).

So you *must* use encoding declarations in some languages; the UTF-8
signature is a particularly convenient way of doing so, since it allows
for uniformity across languages, with no need for the text editors to
parse all the different programming languages.

Regards,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [Patch] Support UTF-8 scripts
  - From: Bernd Petrovitsch <bernd@firmix.at>
- Re: [Patch] Support UTF-8 scripts
  - From: "H. Peter Anvin" <hpa@zytor.com>

Prev by Date: Re: [RFC] utterly bogus userland API in infinibad
Next by Date: printk timings stay weird, also waaay after 5 seconds
Previous by thread: Re: [Patch] Support UTF-8 scripts
Next by thread: Re: [Patch] Support UTF-8 scripts
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]