Fixing unicode detection
vda.linux at googlemail.com
Tue Jul 2 15:25:28 UTC 2013
On Mon, Jul 1, 2013 at 5:24 AM, Rich Felker <dalias at aerifal.cx> wrote:
> I want any combination of locale environment variables that would lead
> to mbrtowc processing input as UTF-8 after a call to
> setlocale(LC_CTYPE,"") to put busybox into "unicode mode" (UTF-8
> handling). This is required from a conformance standpoint.
I'm going to add check for $LC_ALL.
What are the chances that someone doesn't set $LANG, $LC_ALL,
but does set $LC_CTYPE?
> Aside from the obvious standard ways one could request (for example)
> en_US.UTF-8 for the CTYPE category (using LANG, LC_CTYPE, or LC_ALL),
> it's also possible (implementation-defined) that even after calling
> setlocale(LC_CTYPE,"") with NO variables set, the ctype encoding is
> Since this behavior is implementation-defined, you can't
> emulate it by processing the variables; you really have to pass "" to
> setlocale to get it.
Take a look at the code. At #ifs around that place:
/* Homegrown Unicode support. It knows only C and Unicode locales. */
I want to be able to conditionally *not use setlocale at all*
(for one, I use uclibc configured w/o locale, for size reasons),
and yet, I want Unicode to work.
(To make that possible, I roll my own wcrtomb et al).
Therefore, "how to call setlocale() correctly" is a nonsensical
question in some busybox configs.
>> I just looked what Fedora does and the only sign of Unicode
>> in the environment is "LANG=en_US.UTF-8", no LC_* variables are set.
> That's just one way to set it. See:
> I'm not asking for support for other character encodings, just for
> correct detection of whether the user's configured locale is
> UTF-8-based or not.
>> Are you concerned that sometimes busybox doesn't detect that it's
>> running in "Unicoded" environment,
> Precisely. I'm sorry that I was not more clear in stating this.
Does addition of LC_ALL check make your broken case work?
More information about the busybox