Bernd Petrovitsch bernd at firmix.at
Wed Aug 31 21:44:21 UTC 2005

On Wed, 2005-08-31 at 17:15 -0400, Paul Fox wrote:
>  > > So its not a bug its a feature.

It cannot be a feature since it is not documented. So it must be a bug.

>  > I hear the words, but I'm afraid I don't understand.
>  > 
>  > Can please you explain how it is possible that '[a-z]*' matches "CVS"
>  > or "Makefile"? What is the flow of logic,  and  where  is  the  place
>  > where the folding of upper and lower case characters creeps in?

I don't know but it effectively forces one to LANG=C (and nothing else)
on the shell.

> i suspect you're only baiting here, because i suspect you know the
> answer.  i also agree with you in spirit -- to break such a construct
> was shortsighted.  but here goes anyways:  i assume this is because
> a-z only including lowercase characters is an accident of ASCII
> ordering.  other character sets order differently.  i.e. ASCII
> has the order
>     ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
> and other character sets may look like:
>     AaBbCcDdEdFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz
> so in that character set, [a-z] represents this:
>     aBbCcDdEdFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz

... thus effectively going from Unix-(and C)-style (if not explicitly
defined otherwise) case-sensitivity to (- even worse - only almost)
case-insensitivity for globbing if certain LANG or similar
shell-variables are set.

As a similar "minor" change I propose that glibc interprets all numbers
per default as hexadecimal except one sets LC_NUMERIC to C.

> this is, of course, why the "[:alpha:]" style character classes
> were introduced -- they're independent of actual character collation
> sequence.

And people who want to have that behaviour should use the classes and
that's it.

> none of which is any help when all of one's scripts start breaking
> because some distribution changed the default LANG setting.  (i don't
> honestly know why the default was changed.  sure seems wrong to me.)

It is worng by all means (except you see a very special and small part
of some applications in the Unix-world as the whole universe).

Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

More information about the busybox mailing list