How do I (unconditionally) enable unicode support in busybox?

James Bowlin bitjam at gmail.com
Thu Aug 7 06:34:39 UTC 2014


Thank you for the detailed replies.  My responses are below but I
don't want this to degenerate into a useless "does too!" "does
not!" argument.

Therefore I will follow this up tomorrow with another email that
gives simple instructions and a little script for creating a busybox
chroot environment where "export LANG=..." fails to work as
expected on my machine.  I hope this will help us quickly figure
out either where I'm going wrong or what the problem in busybox
is.  I hope that whatever changes are needed to make it work in
the chroot will also make it work when running as /init in an
initrd.


On Wed, Aug 06, 2014 at 05:19 PM, Harald Becker said:
> Ok I see the problem ... and I know what's the reason:
> 
> setlocale is called at the start of the shell. Changing the value 
> afterwards does not effect locale settings in the shell itself. This 
> would need a setlocale after assigning a new value to the variable
> LANG. So if you need this handling a restart of process 1 is
> required. Example how to do, given below:
> 
> >  
> > > first init script:
> > >
> > > export LANG=...
> > > exec /bin/sh second_init_script MAY EVEN SET COMMAND LINE
> > > ARGUMENTS
> > >
> > > second init script runs as process 1 with environment set as you
> > > like.  

As I said before, I already tried something like this with:

    [ "$LANG" = utf ] || LANG=utf exec $0 "$@"

and it did not fix the problem.  Did I miss the boat by not exec'ing
/bin/sh explicitly and only through calling the script?  I don't think
so because this does not work either:

if [ "$LANG" != utf ]; then
    export LANG=utf
    exec /bin/sh $0 "$@"
fi

The follow doesn't work either:

    export LANG=utf
    init2                  
                             
where init2 is a different busybox script that contains the sed call.

> initrd? Do you have one of those systems which have an initial
> system in kernel and then load another initrd/system image.
> Here the problem is YOUR init script is not the first script
> run. There is already a script or program controlling your
> startup.

It is my initrd that runs.  My /init is the first script that
runs.  There is nothing between the bootloader and my /init script
(see below).

> Argh ... this kind of exporting makes trouble with ash (and many
> other Unix shells, not all behave correct as expected). Always set it
> at the beginning of your script:
> 
> export LANG=en_US.utf8
> echo -n "$x" | sed 's/./x/g' | wc -c

I did set it at the beginning of both scripts with an export and it
did not work.  I tried exactly the code above and it does not work.
I keep saying:

    export LANG=...

DOES NOT WORK for changing the unicode behavior.  It seems to be set
once and for all by the value passed in by the user via the bootloader.

> give exact examples on how you call your script, and your scripts.
> You are doing something ill. Busybox ash works pretty good as init
> script and allows to control every aspect.

My script gets called directly from the bootloader.  The entry in
isolinux looks like:

LABEL test_busy_box_unicode
	KERNEL /antiX/vmlinuz
	APPEND quiet
	INITRD /Live/initrd.gz

The entry for legacy grub is similar but slightly different.

The initrd.gz file is a gzip compressed cpio archive.  It contains
busybox and a few other things.  The cpio archive gets uncompressed
and copied into a tmpfs at /.  After that, control is passed by the
bootloader (or kernel?) to the /init script that was stored in the
archive. This is my script.  This is the first code that runs when
Linux boots with an initrd (technically it is an initramfs because it
is a cpio archive and not a filesystem-on-a-file).

All input to the /init script comes from boot parameters that the
*user* gives to the bootloader.  Every boot parameter gets passed
to /init as a command line parameter.  Once you mount /proc these
parameters are also available at /proc/cmdline.  In addition, all
boot parameter of the form:

     name=value

where name is a valid variable name get set as environment variables
where the name is converted to all uppercase.  So if the user enters

    lang=es

There will be an environment variable set like this:

    LANG=es

In my experience, having the user set this boot parameter is the only
reliable way to control the unicode behavior when running inside the
initrd.  Exporting and running exec and running subshells don't seem
to break the stranglehold the first environment variable seems to have.

If I run my scripts from the command line, (using busybox sh and
busybox commands) everything seems to work just fine.  The problem
only occurs when it is run as the very first script (or called by
that script) inside the initrd. My guess is there is some interaction
between the way bootloaders call /init and the way busybox works that
causes the strange behavior.

Someone else ran into the same problem of being unable to affect
unicode behavior by exporting LANG:

http://lists.busybox.net/pipermail/busybox/2014-June/081021.html

> Exporting LANG in rcS didnt have an effect.

My scripts work fine when I run them from the command line. Export
LANG changes the unicode behavior exactly as expected.  It is only
when they run in the initrd environment that there is a problem.
It also fails to work inside a simple busybox chroot.

If there is any more information I can provide that will possibly
be of help, I'd be glad to give it to you.  I've posted my busybox
.config in another part of this thread.

Busybox is absolutely fantastic software.  It is so good it is almost
magical.  I will do what I can to help with this problem because this
is a small way I can help give back.  If the problem is on my end I
don't know what else I can change except the .config but I'm willing
to try anything you might suggest.  If you don't want to pursue this
further, that is okay with me too because I have my stuff working now
with my tr hack.


Peace, James


More information about the busybox mailing list