Fixing unicode detection
vda.linux at googlemail.com
Sun Jun 30 11:28:59 UTC 2013
On Sunday 30 June 2013 03:01, Rich Felker wrote:
> I just submitted a bug report
> (https://bugs.busybox.net/show_bug.cgi?id=6356) and a proposed partial
> fix for busybox's unicode detection.
You forgot to describe what the actual problem is...
I am resorting to guessing here.
You want "LC_ALL=en_US.UTF-8" to work, but it doesn't?
> To elaborate on the issue, UTF-8
> support will not be enabled unless the LANG environment variable
> contains the name of a locale that's UTF-8-based; the rest of the
> standard locale logic based on the LC_* variables is overridden. For
> example if you leave LANG unset and just set LC_CTYPE or LC_ALL to a
> UTF-8 locale, busybox will ignore them and use the "C" locale.
> I've never used the LANG variable,
I just looked what Fedora does and the only sign of Unicode
in the environment is "LANG=en_US.UTF-8", no LC_* variables are set.
> In the bug report, I noted that the only way to ensure the standard
> locale semantics apply is to pass "" to setlocale, but this cannot
> easily facilitate dynamic locale changes in shells. One possible
> solution that will give _approximately_ correct, but not entirely
> correct on all implementations, semantics is the following:
> char *loc;
> (loc = getenv("LC_ALL")) ||
> (loc = getenv("LC_CTYPE")) ||
> (loc = getenv("LANG")) ||
> (loc = "");
> setlocale(LC_CTYPE, loc);
I tend to not depend on localized ctype functions in busybox,
since for the most important locale, UTF-8, they don't work anyway.
I open-code two-way conditionals: we are either in ASCII or in Unicode.
This should cover ~99.99999% of all users.
Are you concerned that sometimes busybox doesn't detect that it's
running in "Unicoded" environment, or do you want to support
some other setup (non-C and non-Unicode? Mixed setup for different
> if the variables are unset in the shell but still in the environment,
This never happens in shells AFAIK...
More information about the busybox