Re: [Patch] Support UTF-8 scripts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bernd Petrovitsch wrote:
>>>It depends on the definition of "character". There are other standards
>>>which define "character" as "byte".
>>
>>Certainly. However, you specifically talked about 'wc -c', and, in
>>wc(1), atleast in the implementation commonly used on Linux, characters
>>and bytes are not the same.
> 
> 
> Yes, now since multi-byte character sets gets more commonly used.
> However, I don't think you get this into the C standard. But we are now
> far off the discussion ....

It does indeed, so just one final clarification. wc(1) is not part
of the C standard - ISO 9899 does not talk about command line utilities
at all. The relevant standard is POSIX; IEEE Std 1003.1, 2004 Edition
says, in

http://www.opengroup.org/onlinepubs/009695399/utilities/wc.html

-c
    Write to the standard output the number of bytes in each input file.
[...]
-m
    Write to the standard output the number of characters in each input
file.

[...]
RATIONALE
[...]
The -c option stands for "character" count, even though it counts bytes.
This stems from the sometimes erroneous historical view that bytes and
characters are the same size. Due to international requirements, the -m
option (reminiscent of "multi-byte") was added to obtain actual
character counts.

Regards,
Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux