Nadeem Bitar wrote: > I'm interested to know why en_US works but en_US.UTF-8 doesn't. (The context was id3 tags in MP3s etc) Computers store letters as binary numbers. The standard way of encoding Latin letters is the ASCII encoding. In anything ASCII based, for example, A is (decimal) 65. ASCII covers the symbols on a standard US keyboard, and uses numbers up to 127. Historically, Western computers have stored each character in one byte. That gives you up to 256 characters. Many people want to use other symbols. For example, I might want to use the  and â signs for currency. Greeks and Russians will want to use their own letters (Î or Ð). People speaking French or Spanish will want to use ÃÃcÃÃts. And you want to properly tag your MP3s. In fact, there are *way* more symbols than can be encoded in one byte. So a number of "character sets" were invented: some for Greek letters, some for Russian, some for Western European, etc. Usually the first half was ASCII, and the rest character-set specific. And the problem is that it isn't always clearly specified which character set you're using. I suspect that's what's happening here: the encoder and the player are using different character sets. UTF-8 is a way of encoding practically any character, possibly in more than one byte. If and when it becomes universal, then character set problems should go away. But it's also another character set, so for now, if an encoding program encodes symbols in UTF-8, but the readers expect them to be in ISO 8859-1 ("Western Europe"), you'll have trouble. Now the LANG variable, among other things, sets which character set is in use. en_US uses ISO 8859-1, while en_US.UTF-8 uses UTF-8 (not surprisingly). So using en_US gets your MP3s using the ISO8859-1 encoding that the MP3 players expect (because the encoder works that way but the decoders presumably don't...) I have not been able to find if there is a character set specification in id3 tags that one program or another is ignoring, or whether the standard is simply deficient. With e-mails, for example, there's a MIME-Version and a Content-Type header that specify that this e-mail is using UTF-8 (because that's the only character set that covers everything I've used). James. Yes, I know, I've massively simplified in places. -- E-mail address: james | DON'T be put off by "horror stories" spread by @westexe.demon.co.uk | others. People who talk about death and serious | injury are very rarely the ones who have actually | suffered such things. -- Adrian Plass