Am 08.01.2007 01:38 schrieb Willy Tarreau: > I'm not blaming UTF-8 per se, but people who still believe in encoding > *whole documents*. Copy-paste, text insertion, git output, etc... everything > has a good reason not to be in the same encoding as what your MUA believes. > If major MUAs still have problems with UTF-8 10 years after it was introduced, > it's clearly the proof of a flaw in the initial design. Actually it's working amazingly well. In the past 15 years the frequency of character encoding problems has gone from "frequent, that's just the way it is, if you want your text to go through unharmed then stick to 7 bit ASCII" to "rare, it normally works, if it doesn't then we've got a problem to solve". Copy/paste? Works for me! Mail forwarding? Works for me! The only thing that doesn't and cannot work is automatic conversion of data with unknown encoding or with incorrect encoding information, and that has to be solved at the source. If you store differently encoded text in git without labelling it accordingly then there is just no way to retrieve it consistently. There are two ways out of that - a complicated one: storing encoding information with every piece of text, and a simple one - declaring a single internal encoding to which everything must be converted upon entry into the database. Guess which one has more chances of working well. > And I'm not even > discussing the stupidity which requires that you read a whole text to get > its number of characters ! Personally I find the requirement to know the number of characters in a text rather unusual, so I wouldn't base a decision for an encoding on that. In fact, I cannot remember ever really wanting to know the actual number of characters in a text. The number of bytes occupied on storage, ok. The number of letters, of words, of lines, perhaps even the number of printable characters, all potentially interesting, depending on the application. But the raw number of characters? I don't know what that might serve for. HTH Tilman -- Tilman Schmidt E-Mail: [email protected] Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Attachment:
signature.asc
Description: OpenPGP digital signature
- Follow-Ups:
- Re: OT: character encodings
- From: Adrian Bunk <[email protected]>
- Re: OT: character encodings
- References:
- Re: Linux 2.6.20-rc4
- From: Jan Engelhardt <[email protected]>
- Re: Linux 2.6.20-rc4
- From: Russell King <[email protected]>
- OT: character encodings (was: Linux 2.6.20-rc4)
- From: Tilman Schmidt <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: David Woodhouse <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: Russell King <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: David Woodhouse <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: Russell King <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: Jan Engelhardt <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: Willy Tarreau <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: Adrian Bunk <[email protected]>
- Re: OT: character encodings (was: Linux 2.6.20-rc4)
- From: Willy Tarreau <[email protected]>
- Re: Linux 2.6.20-rc4
- Prev by Date: Re: OT: character encodings (was: Linux 2.6.20-rc4)
- Next by Date: Re: OT: character encodings (was: Linux 2.6.20-rc4)
- Previous by thread: Re: OT: character encodings (was: Linux 2.6.20-rc4)
- Next by thread: Re: OT: character encodings
- Index(es):