Rob Landley rob at landley.net
Wed Aug 31 23:51:06 UTC 2005

On Wednesday 31 August 2005 16:15, Paul Fox wrote:
>  > > So its not a bug its a feature.
>  >
>  > I hear the words, but I'm afraid I don't understand.
>  >
>  > Can please you explain how it is possible that '[a-z]*' matches "CVS"
>  > or "Makefile"? What is the flow of logic,  and  where  is  the  place
>  > where the folding of upper and lower case characters creeps in?
> i suspect you're only baiting here, because i suspect you know the
> answer.  i also agree with you in spirit -- to break such a construct
> was shortsighted.  but here goes anyways:  i assume this is because
> a-z only including lowercase characters is an accident of ASCII
> ordering.  other character sets order differently.  i.e. ASCII
> has the order
>     ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
> and other character sets may look like:
>     AaBbCcDdEdFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz

1) I thought the whole point of utf8 was sane compatability with ascii.

2) The uppper and lower case character ranges have been sanely separated in 
Unix systems for the last 35 years.

> none of which is any help when all of one's scripts start breaking
> because some distribution changed the default LANG setting.  (i don't
> honestly know why the default was changed.  sure seems wrong to me.)

Because they bought into the hype that utf8 worked just like ascii when all 
you fed it was standard ascii characters that didn't contain a utf8 escape 
sequence.  That utf8 escape sequences really have no more impact than ANSI 
escape sequences for moving the cursor or changing color.

Now either:

A) This was a fallacy perpetuated by people like Linus Torvalds:


Note the "acts as ASCII" assertion...

And vigorously defended by same:


B) This "not acting like ascii in something as fundamental as grouping the 
lower case characters together, which even EBCDIC got right for crying out 
loud" is a blatant, blaring, _BUG_.

This is somewhat bolstered by the fact that Linus considers case insensitivity 
itself a source of bugs:



More information about the busybox mailing list