Cyrillic letters proplem

Aurelien Jacobs aurel at gnuage.org
Wed Jan 18 23:52:08 UTC 2006


On Wed, 18 Jan 2006 17:14:08 -0600
Rob Landley <rob at landley.net> wrote:

> On Wednesday 18 January 2006 10:35, Mike Frysinger wrote:
> > On Wednesday 18 January 2006 11:01, Rob Landley wrote:
> > > On Wednesday 18 January 2006 05:03, ybrnj80 wrote:
> > > > It seems like support of cyrillic letters
> > > > ( or possibly all !is_alpha() ascii codes )
> > >
> > > We haven't got any internationalization support, we currently treat it
> > > all as 8 bit ascii.
> >
> > 7bit ascii ... the 8th bit should always be ignored right ?
> 
> Nope, we pass it through unmodified.  No reason to strip it out.  (We don't 
> pay any special _attention_ to it, but if you tell us to echo or sed high 
> ascii, we'll do it.  Nothing special about that.  -funsigned-char is your 
> friend. :)
> 
> > > Your C library may or may not have some
> > > internationalization support for us to inherit, and we have plans to
> > > treat everyting as UTF-8 when we get around to it.
> >
> > as long as nls can be toggle off i'm happy ;)
> 
> What exactly supporting UTF-8 requires above and beyond being 8-bit clean is 
> something I'm still a little unclear on, hence the TODO item when I have time 
> and inclination to learn about it (or somebody else gets inspired).

Just an example of what need to be done :
If you feed some UTF-8 strings to the sort command, it can't simply compare
bytes to do it's job. It has to decode the UTF-8 into unicode character's
code point. It can then compare the code points to do it's sort.
There's probably plenty of other things to modify for UTF-8.

Aurel



More information about the busybox mailing list