Alan Cox wrote:
> For example, there is umlaut - and that could be transliterated into
> `u' for example. Others may have strange looking unicode and I have
That depends on the language. Unicode is just character encoding rules.
You need more context to do transliterations. Not that you should need to
as you can just install the relevant fonts. DejaVu has pretty good
European coverage for example and is one of the standard installed fonts
in current Fedora.
You also have to watch the encodings. Its not uncommon to find mis-coded
information in OGG and similar files where the track data is mis-encoded
in one of the legacy ISO-8859 code pages not UTF-8 and that produces
invalid utf-8 sequences so will be displayed as the symbol for an invalid
character.
> no idea what it is supposed to me - so I cannot transliterate w/o
knowing
> what it is in the first place - so how do I find out? The unlaut is
> sometimes
> obvious - but others are not. So is there a way to show this? As I
said,
Load the right fonts and they will be rendered correctly.
> I get binary icons so how do I get the unicode decimal
representation so
> that I can match against the unicode character table to see what it is?
The 'four squares' shown for an unknown symbol should each contain a
hex digit which together give you the symbol code which you can look up
on the unicode web site.
> Would it be: print \\%d, $1 ?
It's UTF-8 so a variable length encoding of the full unicode symbol
space. See www.unicode.org if you want to the full details but basically
each symbol is encoded as a series of bytes such that C special symbol \0
is never found mid-character and so that the ASCII range of symbols for
American English is mapped 1:1 with UTF-8.
Alan
Thanks for the tip! I will review the link you gave me (already started!)
Dan
--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines