ihno at suse.de
Wed Aug 31 22:26:48 UTC 2005
On Wed, Aug 31, 2005 at 05:15:25PM -0400, Paul Fox wrote:
> > > So its not a bug its a feature.
> > I hear the words, but I'm afraid I don't understand.
> > Can please you explain how it is possible that '[a-z]*' matches "CVS"
> > or "Makefile"? What is the flow of logic, and where is the place
> > where the folding of upper and lower case characters creeps in?
> i suspect you're only baiting here, because i suspect you know the
> answer. i also agree with you in spirit -- to break such a construct
> was shortsighted. but here goes anyways: i assume this is because
> a-z only including lowercase characters is an accident of ASCII
> ordering. other character sets order differently. i.e. ASCII
> has the order
> and other character sets may look like:
Thats the dictionary sorting order (Dont mix it up with the
sorting order of the german telefon book, which is different).
> so in that character set, [a-z] represents this:
For en_US.UTF-8 it looks like:
ihno at uttenreuth:~/collate> echo $LANG
ihno at uttenreuth:~/collate> echo [a-z]*
abc ABC zig
ihno at uttenreuth:~/collate> echo *
abc ABC zig ZIG
ihno at uttenreuth:~/collate> ls | egrep "[[:lower:]]"
ihno at uttenreuth:~/collate>
so it is
> this is, of course, why the "[:alpha:]" style character classes
and [:upper:] or [:lower:]. For a complete list look at egrep(1).
> were introduced -- they're independent of actual character collation
> none of which is any help when all of one's scripts start breaking
> because some distribution changed the default LANG setting. (i don't
> honestly know why the default was changed. sure seems wrong to me.)
"Never trust a computer you can lift."
Ihno Krumreich ihno at suse.de
SUSE LINUX Products GmbH Projectmanager S390 & zSeries
Maxfeldstr. 5 +49-911-74053-439
D-90409 Nürnberg http://www.suse.de
More information about the busybox