Re: [Patch] Support UTF-8 scripts

Bernd Petrovitsch wrote:
>>>It depends on the definition of "character". There are other standards
>>>which define "character" as "byte".
>>
>>Certainly. However, you specifically talked about 'wc -c', and, in
>>wc(1), atleast in the implementation commonly used on Linux, characters
>>and bytes are not the same.
> 
> 
> Yes, now since multi-byte character sets gets more commonly used.
> However, I don't think you get this into the C standard. But we are now
> far off the discussion ....

It does indeed, so just one final clarification. wc(1) is not part
of the C standard - ISO 9899 does not talk about command line utilities
at all. The relevant standard is POSIX; IEEE Std 1003.1, 2004 Edition
says, in

http://www.opengroup.org/onlinepubs/009695399/utilities/wc.html

-c
    Write to the standard output the number of bytes in each input file.
[...]
-m
    Write to the standard output the number of characters in each input
file.

[...]
RATIONALE
[...]
The -c option stands for "character" count, even though it counts bytes.
This stems from the sometimes erroneous historical view that bytes and
characters are the same size. Due to international requirements, the -m
option (reminiscent of "multi-byte") was added to obtain actual
character counts.

Regards,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: [Patch] Support UTF-8 scripts
  - From: "Martin v. Löwis" <martin@v.loewis.de>
- Re: [Patch] Support UTF-8 scripts
  - From: Bernd Petrovitsch <bernd@firmix.at>

Prev by Date: Re: NUMA mempolicy /proc code in mainline shouldn't have been merged
Next by Date: Re: later kernels vs ntpd
Previous by thread: Re: [Patch] Support UTF-8 scripts
Next by thread: Re: [Patch] Support UTF-8 scripts
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]