rob at landley.net
Wed Aug 31 23:51:06 UTC 2005
On Wednesday 31 August 2005 16:15, Paul Fox wrote:
> > > So its not a bug its a feature.
> > I hear the words, but I'm afraid I don't understand.
> > Can please you explain how it is possible that '[a-z]*' matches "CVS"
> > or "Makefile"? What is the flow of logic, and where is the place
> > where the folding of upper and lower case characters creeps in?
> i suspect you're only baiting here, because i suspect you know the
> answer. i also agree with you in spirit -- to break such a construct
> was shortsighted. but here goes anyways: i assume this is because
> a-z only including lowercase characters is an accident of ASCII
> ordering. other character sets order differently. i.e. ASCII
> has the order
> and other character sets may look like:
1) I thought the whole point of utf8 was sane compatability with ascii.
2) The uppper and lower case character ranges have been sanely separated in
Unix systems for the last 35 years.
> none of which is any help when all of one's scripts start breaking
> because some distribution changed the default LANG setting. (i don't
> honestly know why the default was changed. sure seems wrong to me.)
Because they bought into the hype that utf8 worked just like ascii when all
you fed it was standard ascii characters that didn't contain a utf8 escape
sequence. That utf8 escape sequences really have no more impact than ANSI
escape sequences for moving the cursor or changing color.
A) This was a fallacy perpetuated by people like Linus Torvalds:
Note the "acts as ASCII" assertion...
And vigorously defended by same:
B) This "not acting like ascii in something as fundamental as grouping the
lower case characters together, which even EBCDIC got right for crying out
loud" is a blatant, blaring, _BUG_.
This is somewhat bolstered by the fact that Linus considers case insensitivity
itself a source of bugs:
More information about the busybox