Re: [PATCH] console UTF-8 fixes

Egmont Koblinger wrote:

On Sat, Apr 07, 2007 at 01:00:48PM +0200, Jan Engelhardt wrote:

Hi,

Please, no dot, and no inverse color.
Imagine someone had the following bitmap for <unknown glyph/illegal sequence>:


No dot, I'm already convinced. To clarify the inverse thingy:

This is what the current kernel does:
  1) tries to display the desired symbol
  2) if it fails, tries to display U+FFFD (which usually looks similar to an
     inverted question mark)
  3) if this fails again then displays a normal '?'
     (or a different symbol due to a bug discussed below)

Here's my proposal. This only alters the 3rd step, not the first two:
  1) tries to display the desired symbol
  2) if it fails, tries to display U+FFFD, still with _normal_ attributes
  3) if this fails then display an ascii '?' with inverted attributes

So you won't get "double" inversion. If you do have U+FFFD in your font then
this will introduce no chance. If you don't have U+FFFD, you'll see inverse
question marks instead of normal ones.


This seems fine.

I blame your latin2 unicode map. (See above about 'Û'.)


There's nothing wrong with my latin2 unicode map, and I've located and
changed the part _in the kernel_ that displays a false glyph using the
algorithm I've outlined. It just uses "the glyph at that code position
within the glyph table" as a fallback, which might be okay in 8-bit mode
(and I haven't modified the behavior in that case), but I got rid of this
behavior in UTF-8 mode since it's definitely a fault in the world of
Unicode.

It should perhaps display a regular 'u' if it cannot display 'û',


I rather think it should display U+FFFD but YMMV.

That's a policy decision for the maker of the Unicode map. The kernelcannot by default know that a pre-composed ű is a modified u; obviously,if the ű is send in decomposed form the kernel probably will display itas u? or some such.

but definitely not 'ü' (which is not called a double accent, btw).


This is not the character I've been talking about, I actually _did_ talk
about u with double acute accent (ű - you might not have seen this character
so far, AFAIK it's only used in Hungarian, no other languages). But we agree
that the kernel definitely shouldn't display a character with a different
accent on it. This is one of the bugs my patch addresses.

As far as width handling -- in order to make all the text line up underall circumstances you need more than width handling. The wcwidth()stuff is specific to CJK -- a character set which is totally implausibleto display on the builtin console. You also need bidir support (in caseyou encounter Hebrew or Arabic), you need Indic shape handling (Indiclangauges have some *very* odd composing rules), etc, and this is justto know how much space to take up on the screen.

is is ridiculous. It's much better to draw a line in the sand and saythat this is beyond the scope of the in-kernel Linux console.


	-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [PATCH] console UTF-8 fixes
  - From: Egmont Koblinger <[email protected]>

References:
- [PATCH] console UTF-8 fixes
  - From: Egmont Koblinger <[email protected]>
- Re: [PATCH] console UTF-8 fixes
  - From: "H. Peter Anvin" <[email protected]>
- Re: [PATCH] console UTF-8 fixes
  - From: Egmont Koblinger <[email protected]>
- Re: [PATCH] console UTF-8 fixes
  - From: Jan Engelhardt <[email protected]>
- Re: [PATCH] console UTF-8 fixes
  - From: Egmont Koblinger <[email protected]>

Prev by Date: Re: [PATCH] ip_tables.h
Next by Date: Re: [patch 2/4] clean up identify_cpu
Previous by thread: Re: [PATCH] console UTF-8 fixes
Next by thread: Re: [PATCH] console UTF-8 fixes
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]