Cyrillic letters proplem
Aurelien Jacobs
aurel at gnuage.org
Wed Jan 18 23:52:08 UTC 2006
On Wed, 18 Jan 2006 17:14:08 -0600
Rob Landley <rob at landley.net> wrote:
> On Wednesday 18 January 2006 10:35, Mike Frysinger wrote:
> > On Wednesday 18 January 2006 11:01, Rob Landley wrote:
> > > On Wednesday 18 January 2006 05:03, ybrnj80 wrote:
> > > > It seems like support of cyrillic letters
> > > > ( or possibly all !is_alpha() ascii codes )
> > >
> > > We haven't got any internationalization support, we currently treat it
> > > all as 8 bit ascii.
> >
> > 7bit ascii ... the 8th bit should always be ignored right ?
>
> Nope, we pass it through unmodified. No reason to strip it out. (We don't
> pay any special _attention_ to it, but if you tell us to echo or sed high
> ascii, we'll do it. Nothing special about that. -funsigned-char is your
> friend. :)
>
> > > Your C library may or may not have some
> > > internationalization support for us to inherit, and we have plans to
> > > treat everyting as UTF-8 when we get around to it.
> >
> > as long as nls can be toggle off i'm happy ;)
>
> What exactly supporting UTF-8 requires above and beyond being 8-bit clean is
> something I'm still a little unclear on, hence the TODO item when I have time
> and inclination to learn about it (or somebody else gets inspired).
Just an example of what need to be done :
If you feed some UTF-8 strings to the sort command, it can't simply compare
bytes to do it's job. It has to decode the UTF-8 into unicode character's
code point. It can then compare the code points to do it's sort.
There's probably plenty of other things to modify for UTF-8.
Aurel
More information about the busybox
mailing list