uClibc-0.9.28.3 bugs

Rob Landley rob at landley.net
Sun Sep 30 03:31:54 UTC 2007


On Friday 28 September 2007 7:23:12 pm Kevin Day wrote:
> That aside, here are my problems, of which may or may not be related:
>
> 1) Kernel 2.6.2? compiled with SMP enabled, will systematically
> destroy itself and collapse  in a series of kernel panics flood the
> system until the OS comes to a complete windows style lockup.

Userspace shouldn't cause panics, that's a kernel internal issue.

While it's possible for userspace to cause a few selected panics, they're 
generally extremely specific.  If init exits, or if root writes to /dev/kmem, 
that's a panic that's not the kernel's fault.  But it's really not something 
userspace should ever be able to do by accident.

That really sounds like a kernel bug, which uClibc is merely triggering.

> My first impression was that there was some serious regression in the
> kernel+hardware I was using or that the kernel may have had an
> isolated corrupt compilation.  Repeated attempts on different hardware
> and the SMP-kernel crash was consistent.

So you had an intermittent kernel panic that went away when you changed 
something in userspace.  That isn't a fix, you're just not _triggering_ the 
problem anymore...

> I later build a uClibc-0.9.28.3 system and I have yet to see the
> SMP-kernel crash reproduce itself on the same hardware.

Whatever the problem is in your kernel, you're no longer triggering it.  
Doesn't mean it's our bug...

> The 0.9.29 crash took no more than a day and I have been running an
> intel dual-processor server for a few weeks now under the SMP kernel
> compiled under a uClibc-0.9.28.3.
> My memory checks out clear and compilation between tests was done
> under different systems just in case.

I had a fun problem once where I suspect the power supply was marginal and 
everything worked fine until the hard drive sucked too much power.  So 
intense disk activity _combined_ with intense CPU activity presented as 
corrupted data read from the disk.

A friend (Garrett the uClibc++ guy) had a problem where the "string move" 
instruction on his CPU went bad.  (This instruction is apparently used by 
very few programs, but one of them was gcc...)

> I have no way of figuring out what/where this is happening,
> considering that the kernel has its own internal libc equivalent code.
>
> That suggests that gcc and/or binutils is somehow becoming corrupt under
> 0.9.29? If so, this may be the cause of the other obscure problems I am
> having..

If gcc or binutils becomes corrupt, and it builds a screwed up binary, that 
screwed up binary will have deterministic behavior.  (Might not happen on a 
_rebuild_, but a given binary should be have reproducibly.)

> 2) Fuse jumps into a (threaded?) deadlock.
> This problem exists in 0.9.28.3, but I have a work-around under
> 0.9.28.3. That work-around no longer works around the problem in
> 0.9.29. I explecitly need fuse and so long as I cannot use fuse, I
> cannot change to 0.9.29.

If we can reproduce a problem, we can probably debug it.  Reproducibility is 
good.  However, combined with the other kernel problems you're having, I'm 
really not sure it's our bug.

> 4) There are more problems with deadlocking as with #2 or random
> crashing as with #1.
> #1 seems to happen mostly with applications that are graphical (xorg
> or gtk based apps..).
> #2 happens to a very small number of apps, such as qingy (in qingy's
> case the crash is a fatal kernel-level deadlock, time to hard-reboot).

Again, userspace shouldn't be able to hard deadlock the kernel.  Can you still 
ping the machine when this happens?

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.



More information about the uClibc mailing list