[Buildroot] [PATCH v2 10/10] autobuild-run: set locale to en_US or C

André Erdmann dywi at mailerd.de
Tue Apr 14 19:58:18 UTC 2015


Hi,

2015-04-12 10:31 GMT+02:00 Samuel Martin <s.martin49 at gmail.com>:
> Hi André, Thomas, all,
>
> On Wed, Mar 18, 2015 at 4:50 PM, André Erdmann <dywi at mailerd.de> wrote:
>> some python scripts break if the locale is set to C, try en_US{.UTF-8,} first
> Could you give an example of such a script?
>

Apart from the small snippet I've included below the commit message:
no, I don't have any real-world script at hand that exposes this issue
and is sufficiently small. In general, it occurs when a script relies
on the default system encoding derived from the LANG/LC_ and has to
deal with non-ascii strings, e.g. when using open() and not
io.open()/codecs.open() to read a file.

>>
>> Additionally, drop all locale env vars (LC_*, LANG[GUAGE]) when
>> setting the new locale.
>>
>> Signed-off-by: André Erdmann <dywi at mailerd.de>
>> ---
>>
>> Just for reference, a small python example that breaks if LANG is set to C:
>>
>>   $ LANG=C python -c "print(u'Andr\xe9')"
>>   UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 4: ordinal not in range(128)
>>
>> ---
>>  scripts/autobuild-run | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 44 insertions(+), 4 deletions(-)
>>
>> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
>> index e1a3398..a1e947d 100755
>> --- a/scripts/autobuild-run
>> +++ b/scripts/autobuild-run
>> @@ -168,6 +168,12 @@ class SystemInfo:
>>      DEFAULT_NEEDED_PROGS = ["make", "git", "gcc", "timeout"]
>>      DEFAULT_OPTIONAL_PROGS = ["bzr", "java", "javac", "jar"]
>>
>> +    # list of default locales (in lowercase, without "-", descending order)
>> +    #  some python scripts break if the locale is set to C, try en_US first
>> +    TRY_LOCALES = ['en_us.utf8', 'en_us', 'c']
>> +    # list of locale environment variables that should be (re-)set by set_locale()
>> +    LOCALE_KEYS = ['LANG']
>> +
>>      def __init__(self):
>>          self.needed_progs = list(self.__class__.DEFAULT_NEEDED_PROGS)
>>          self.optional_progs = list(self.__class__.DEFAULT_OPTIONAL_PROGS)
>> @@ -231,6 +237,41 @@ class SystemInfo:
>>
>>          return not missing_requirements
>>
>> +    def set_locale(self):
>> +        def is_locale_env_varname(w):
>> +            # w[:4] == 'LANG' and (not w[4:] or w[4:] == 'UAGE') ...
> Hmm... I must say the comment is rather confusing!
>

I'll drop it.

I usually take some "this could be optimized" notes while coding, this
one aims at reducing the str parts that need to be matched more than
once, because the word w matches "LANG" or "LANGUAGE" if the first 4
chars match "LANG" and the remainder is either empty or "UAGE".

>> +            return w[:3] == 'LC_' or w == 'LANG' or w == 'LANGUAGE'
> or: return w.startswith('LC_') or w in ['LANG', 'LANGUAGE']
>
>> +
>> +        ret, locales_str = self.run_cmd_get_stdout(["locale", "-a"])[
>> +        if ret != os.EX_OK:
>> +            return False
>> +
>> +        # create a dict
>> +        #   <locale identifier> (as listed in TRY_LOCALES) => <locale env name>
>> +        locales = dict((
>> +            (k.lower().replace("-", ""), k) for k in locales_str.split(None)
> s/None//
>
>> +        ))
>> +
>> +        for loc_key in filter(lambda x: x in locales, self.TRY_LOCALES):
> Is a for loop really needed here?
> You could just do:
>   try:[
>     loc_key = filter(lambda x: x in locales, self.TRY_LOCALES)[0]
>   except (TypeError, IndexError):
>     # set "C" as locale
>     loc_key = self.TRY_LOCALES[-1]
>

I've chosen a for-loop here because it works with both python 2 and
python 3 equally well:

* in python 2: filter() returns a list  =>  "filter(...)[0]" enclosed
by a "try/except IndexError" block
* in python 3: filter() returns a generator object  =>
"next(filter(...))" enclosed by a "try/except StopIteration" block

None of these variants are compatible with each other, because you
can't access generator items by index (filter(...)[0] raises a
TypeError in py3), and lists are not iterators (next(filter(...))
raises a TypeError in py2).

What would work for both py2/py3 is "next(iter(filter(...)))" enclosed
by a "try/except StopIteration" block, but the for-loop is more
readable.


>> +            # cannot modify self.env while iterating over it,
>> +            #  create intermediate list
>> +            env_old_locale_keys = [
>> +                k for k in self.env.keys() if is_locale_env_varname(k)
>> +            ]
>> +            for k in env_old_locale_keys:
>> +                del self.env[k]
> Setting the new value will automatically override the old value, so
> deleting it before looks a bit overkill...
>

Deleting/Inserting is asymmetrical:

* drop any env var matching LANG, LANGUAGE or LC_* (LC_MESSAGES et al)
* set LANG to the new locale

We could also simply set LC_ALL instead of deleting LC_*, but that
opens up the possibility to leak LC_ env vars when LC_ALL gets
unset/emptied:

$ env LC_MESSAGES=en_US.UTF-8 LC_ALL=de_DE.UTF-8  sh -c 'locale | grep
LC_MES; export LC_ALL=; locale | grep LC_MES;'

LC_MESSAGES="de_DE.UTF-8"
LC_MESSAGES=en_US.UTF-8

>> +
>> +            # set new locale once
>> +            for vname in self.LOCALE_KEYS:
>> +                self.env[vname] = locales[loc_key]
>> +            return True
>> +        # -- end for
>> +        # practically impossible to reach this return if 'c' is in TRY_LOCALES:
>> +        return None
>> +
>> +    def sanitize_env(self):
>> +        self.set_locale()
>> +
>>      def popen(self, cmdv, **kwargs):
>>          kwargs.setdefault('stdin', self.devnull)
>>          kwargs.setdefault('stdout', self.devnull)
>> @@ -789,12 +830,11 @@ def merge(dict_1, dict_2):
>>
>>  def main():
>>
>> -    # Avoid locale settings of autobuilder machine leaking in, for example
>> -    # showing error messages in another language.
>> -    os.environ['LC_ALL'] = 'C'
>> -
>>      check_version()
>>      sysinfo = SystemInfo()
>> +    # Avoid locale settings of autobuilder machine leaking in, for example
>> +    # showing error messages in another language.
>> +    sysinfo.sanitize_env()
>>
>>      args = docopt.docopt(doc, version=VERSION)
>>
>> --
>> 2.3.2
>>


-- 
André


More information about the buildroot mailing list