Re: [PATCH] console UTF-8 fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Egmont Koblinger wrote:
On Sat, Apr 07, 2007 at 01:00:48PM +0200, Jan Engelhardt wrote:

Hi,

Please, no dot, and no inverse color.
Imagine someone had the following bitmap for <unknown glyph/illegal sequence>:

No dot, I'm already convinced. To clarify the inverse thingy:

This is what the current kernel does:
  1) tries to display the desired symbol
  2) if it fails, tries to display U+FFFD (which usually looks similar to an
     inverted question mark)
  3) if this fails again then displays a normal '?'
     (or a different symbol due to a bug discussed below)

Here's my proposal. This only alters the 3rd step, not the first two:
  1) tries to display the desired symbol
  2) if it fails, tries to display U+FFFD, still with _normal_ attributes
  3) if this fails then display an ascii '?' with inverted attributes

So you won't get "double" inversion. If you do have U+FFFD in your font then
this will introduce no chance. If you don't have U+FFFD, you'll see inverse
question marks instead of normal ones.


This seems fine.


I blame your latin2 unicode map. (See above about 'Û'.)

There's nothing wrong with my latin2 unicode map, and I've located and
changed the part _in the kernel_ that displays a false glyph using the
algorithm I've outlined. It just uses "the glyph at that code position
within the glyph table" as a fallback, which might be okay in 8-bit mode
(and I haven't modified the behavior in that case), but I got rid of this
behavior in UTF-8 mode since it's definitely a fault in the world of
Unicode.

It should perhaps display a regular 'u' if it cannot display 'û',

I rather think it should display U+FFFD but YMMV.

That's a policy decision for the maker of the Unicode map. The kernel cannot by default know that a pre-composed ű is a modified u; obviously, if the ű is send in decomposed form the kernel probably will display it as u? or some such.

but definitely not 'ü' (which is not called a double accent, btw).

This is not the character I've been talking about, I actually _did_ talk
about u with double acute accent (ű - you might not have seen this character
so far, AFAIK it's only used in Hungarian, no other languages). But we agree
that the kernel definitely shouldn't display a character with a different
accent on it. This is one of the bugs my patch addresses.

As far as width handling -- in order to make all the text line up under all circumstances you need more than width handling. The wcwidth() stuff is specific to CJK -- a character set which is totally implausible to display on the builtin console. You also need bidir support (in case you encounter Hebrew or Arabic), you need Indic shape handling (Indic langauges have some *very* odd composing rules), etc, and this is just to know how much space to take up on the screen.

is is ridiculous. It's much better to draw a line in the sand and say that this is beyond the scope of the in-kernel Linux console.

	-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux