RE: Login scripts?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tomas Larsson:
>>> It seems that I'm using UTF8 and so on, however how to set the
>>> character set to 8859-1,

Tim:
>> Are you sure that you need to.  Anything that you can display in
>> iso-8859-1 is also a part of UTF-8 (and in the same location).

Tomas Larsson:
> Yes, but the encoding is quite different, even in the lower 256 bytes, where
> if I'm correct (after some extensive searches) it seems that, if something
> is using ISO8859, it is not possible to change it directly to UTF8, and vice
> verca, without further proccessing.

Well, my reading of that issue over the years is as follows:

Begining with ASCII (the real defined one, not others' redefinitions,
like those who like to refer to something that doesn't really exist,
calling it extended ASCII), ISO-8859-1 extends it (it starts the same,
and adds onto the end of it).  Then, UTF-8 does the same (it starts the
*same* as ISO-8859-1 and adds onto the end of it).  The characters are
in the same positions, and up until you exceed 255 is using the same
codes (character number 255 in ISO-8859-1 is the same as character
number 255 in UTF-8, and the same number is used to represent it).  It's
only when you refer to higher numbers, such as 256, that you need to use
more bits.

If correct, then for data that is ASCII or ISO-8859-1 it's directly
equivelent with UTF-8 (for the same characters).  I know it's certainly
true for ASCII and UTF-8 interchangeability, and can find documentation
detailing it.  I'm 99% sure for the ISO-8859-1 side of things, but only
have hearsay evidence about it to hand at the moment.  The nearest I can
come to documention about that is that the first 256 code points in
"Unicode" are the same as ISO-8859-1, and extrapolating what I know
about UTF-8 encoding of Unicode supports what I've said (it's a single
byte up until 255, it only starts using more than one byte to represent
characters above 255).

This page <http://en.wikipedia.org/wiki/Unicode> is probably one of the
less painful to read about it.  For more authoritive information, I'd
refer you to <http://www.unicode.org/>, but last time I looked through
it, it was a bit of a headache to go through (not to mention that I
can't get the site to load at the moment, to find information in it to
refer to in this e-mail).  :-\

> The problem I had was that a piece of SW used 8859, and the system was using
> UTF8, hence problem in encoding when data was stored and retreived from a
> sql-db.
> After changing in i18n to 8859, everything seems to work ok.

That may indicate some other problem, but not directly what we've been
discussing.  I could expect problems going the other way around, trying
to take UTF-8 data and use it with something that only knew ISO-8859-1
(e.g. it mistaking a two-byte character sequence for being two separate
characters), but not what you've just mentioned.  It should be directly
compatible.

-- 
(Currently running FC4, occasionally trying FC5.)

Don't send private replies to my address, the mailbox is ignored.
I read messages from the public lists.


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux