Lancashire, Pete wrote: > how do I convert a file (or output to stdont) with an unknown 16 encoding > into plain > ol' ASCII aka 8 BIT ? > > Example of files contents > > 0 255 254 > 2 60 0 > 4 72 0 > 6 84 0 > 8 77 0 > 10 76 0 > 12 62 0 > > or .. > > 0000000 377 376 < \0 H \0 T \0 M \0 L \0 > \0 \n \0 > 0000020 \0 \0 < \0 B \0 O \0 D \0 Y \0 > \0 > 0000040 \n This looks like either UCS-2 or UTF-16. Fortunately you don't have to figure out which of those it is, because any UCS-2 text is encoded identically in UTF-16, so you can just say that it is UTF-16. On the other hand, UCS-2 can represent all characters that ASCII can represent. If the text is in UTF-16 and contains anything that can't be treated as UCS-2, then it can't be converted to ASCII, so when converting to ASCII you can just as well treat it as UCS-2. The first two bytes are a byte order mark that shows that the encoding is little-endian. It's good that the byte order mark is there, but it must be removed in order to convert to ASCII. (ASCII doesn't need byte order marks anyway.) If it's guaranteed that the text will always be representable in ASCII (7-bit), then "iconv --from-code=UTF-16 --to-code=ASCII" should do the conversion. Iconv seems to strip away the byte order mark automatically from UTF-16 but not from UCS-2. If any non-ASCII characters may occur, then you probably want to convert to UTF-8 instead. UTF-8 can represent all Unicode characters. If you know exactly which characters can occur, then you may be able to find a suitable 8-bit encoding (preferably one from the ISO 8859 family). Either way, make sure that the receiving program knows which encoding it is. Otherwise the text will probably get garbled. Björn Persson
Attachment:
signature.asc
Description: This is a digitally signed message part.
-- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list