Testiness.

Ihno Krumreich ihno at suse.de
Thu Sep 1 12:30:25 UTC 2005


On Wed, Aug 31, 2005 at 07:21:16PM -0500, Rob Landley wrote:
> On Wednesday 31 August 2005 15:54, Wolfgang Denk wrote:
> 
> > > So its not a bug its a feature.
> 
> Isn't he supposed to put quotes around "feature" when he says that?
> 
> Or is he actually implying this fundamental behavior change, without warning 
> or documentation, isn't completely broken?  Despite the fact that the central 
> design idea behind UTF8 is the ability to maintain backwards compatability 
> with existing software that doesn't use it, and that bundling a change to the 
> sequencing of existing characters with the ability to decode new additions to 
> the character set is moronic at best?
> 

The central design idea behind UTF8 was never broken.

You should not mix up the character set with the collating sequence.
There is not defined collating sequence for UTF8.
There is a defined collating sequence for en_US.

If you do a 

ihno at tammanrasset:/lib> locale -a | grep en_US
en_US
en_US.iso885915
en_US.utf8
ihno at tammanrasset:/lib> 

you find the same collating sequence for three different character sets.
so maybe just invite a C.utf8.


> > I hear the words, but I'm afraid I don't understand.
> >
> > Can please you explain how it is possible that '[a-z]*' matches "CVS"
> > or "Makefile"? What is the flow of logic,  and  where  is  the  place
> > where the folding of upper and lower case characters creeps in?
> 
> Especially since 'a' is still 97, and 'z' is still 122.  I know this because 
> I'm feeding an ascii document into "sort", and invoking sort via a shell 
> script that also consists of ascii character...
> 

As I said, dont mix up character set with collating sequence.

> The really _fun_ part is doing this on an ascii text file that comes with or 
> is maintained by the OS, such as /etc/passwd, man pages, anything 
> under /usr/doc.  They aren't proposing a new storage mechanism for these text 
> files which is incompatible with existing ones.  An mbox file is still an 
> mbox file.
> 
> Now add in the fact that sort (which we were talking about) has the -f option 
> to ignore case.  (It's in the SUSv3 spec.)  I did not supply the -f option.  
> The _bug_ that I'm seeing is that with UTF8, the -f option is forced on.  
> That is a bug.

Ihno

-- 
Best regards/Mit freundlichen Grüßen

Ihno Krumreich

"Never trust a computer you can lift."
--
Ihno Krumreich            ihno at suse.de
SUSE LINUX Products GmbH  Projectmanager S390 & zSeries
Maxfeldstr. 5             +49-911-74053-439
D-90409 Nürnberg          http://www.suse.de



More information about the busybox mailing list