Fixing unicode detection

Denys Vlasenko vda.linux at
Sun Jun 30 11:28:59 UTC 2013

On Sunday 30 June 2013 03:01, Rich Felker wrote:
> I just submitted a bug report
> ( and a proposed partial
> fix for busybox's unicode detection.

You forgot to describe what the actual problem is...

I am resorting to guessing here.

You want "LC_ALL=en_US.UTF-8" to work, but it doesn't?

> To elaborate on the issue, UTF-8 
> support will not be enabled unless the LANG environment variable
> contains the name of a locale that's UTF-8-based; the rest of the
> standard locale logic based on the LC_* variables is overridden. For
> example if you leave LANG unset and just set LC_CTYPE or LC_ALL to a
> UTF-8 locale, busybox will ignore them and use the "C" locale.
> I've never used the LANG variable,

I just looked what Fedora does and the only sign of Unicode
in the environment is "LANG=en_US.UTF-8", no LC_* variables are set.

> In the bug report, I noted that the only way to ensure the standard
> locale semantics apply is to pass "" to setlocale, but this cannot
> easily facilitate dynamic locale changes in shells. One possible
> solution that will give _approximately_ correct, but not entirely
> correct on all implementations, semantics is the following:
> char *loc;
> (loc = getenv("LC_ALL")) ||
> (loc = getenv("LC_CTYPE")) ||
> (loc = getenv("LANG")) ||
> (loc = "");
> setlocale(LC_CTYPE, loc);

I tend to not depend on localized ctype functions in busybox,
since for the most important locale, UTF-8, they don't work anyway.

I open-code two-way conditionals: we are either in ASCII or in Unicode.
This should cover ~99.99999% of all users.

Are you concerned that sometimes busybox doesn't detect that it's
running in "Unicoded" environment, or do you want to support
some other setup (non-C and non-Unicode? Mixed setup for different
LC_* categories?)?

> if the variables are unset in the shell but still in the environment,

This never happens in shells AFAIK...


More information about the busybox mailing list