Crash after successfully mounting root filesystem over NFS an d st arting busybox

Sun Dec 18 21:19:58 UTC 2005

Thanks for your response, Rob.

> -----Original Message-----
> From: Rob Landley [mailto:rob at landley.net]
> Sent: Sunday, December 18, 2005 10:48 PM
> To: busybox at busybox.net
> Cc: Gil Madar
> Subject: Re: Crash after successfully mounting root 
> filesystem over NFS
> and st arting busybox
> 
> 
> On Sunday 18 December 2005 08:56, Gil Madar wrote:
> > Hi All,
> >
> > I dumped part of the output, kept in the log_buf.
> >
> > Please note the MPC866 receives an exception during access 
> to the dual port
> > ram...
> 
> Translation: kernel problem.
> 
> > This, of course leads to another exception, and so on.
> > The exception is during early stages of running busybox.
> 
> So it sounds like we're triggering it, but is it our bug? 
> 
> Have you tried init=/bin/sh?  (Using bash or some such?)  See 
> if the bug 
> happens without busybox even involved.
> 
> > If I run a
> > while ( 1 ) schedule();
> > instead of loading root filesystem over NFS, the target 
> respondes to pings
> > without any problem.
> 
> The target no longer responding to pings means the kernel has 
> curled up into a 
> ball and is no longer responding to interrupts.
> 
> > I'm clueless whether this is a setup problem, or a software broblem.
> 
> It's a kernel problem.
> 
> > <4>Freeing unused kernel memory: 92k init
> > <6>init() line 782 /bin/busybox
> 
> I'm not finding a busybox error message that looks anything 
> like this in 
> busybox/init/*, is it some kind of debug info you compiled 
> in?  Line 782 of 
> init.c isn't in a function called "init()" (we haven't got a 
> function called 
> that, we've got init_main() though).  It's an "else" in 
> halt_signal(), which 
> we'd only reach if you're trying to shut down already, so 
> would not be the 
> first problem no matter what was wrong.
I should have mentioned I added debug code to the kernel's init/main.c. This
is an innocent printk().
Also, I use DENX' 2.6.14 kernel, loaded from u-boot 1.1.3.

> 
> > <4>Oops: kernel access of bad area, sig: 11 [#1]
> > <4>Oops: kernel access of bad area, sig: 11 [#2]
> 
> Kernel trap, sig 11.  I believe that's unhandled memory exception?
> 
> > <4>Call trace: [00000000]  [c0011148]  [c0011714]  
> [c001153c]  [c0011338]
> > [c00033d4]  [c0009f38]  [c0002f20]  [c018f964]  [c019d52c]  
> [c018f9b4]
> > [c01a941c]  [c01ab380]  [c01c6b14]  [c01c6f30]
> 
> You have no debug info so who knows what went wrong (you 
> could try to look 
> that up in System.map if you're bored), but the 00000000 
> looks a bit like a 
> jump to a null pointer.
According to the info in the log_buf and System.map, the second exception
and so on are generated
during access to the MPC866 dual port ram, when trying to write to the
serial port, which is ttyCPM0,
in my case. Till then there was no problem to write to the serial port.
Is there anything I have to configure in busybox generated filesystem, as
/etc/inittab, in order to 
use the console on my serial port - which is the MPC866's SMC1?

> 
> > <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
> 
> And it happend in an interrupt handler.  Beautiful.  That has 
> _nothing_ to do 
> with us.
> 
> > <4> <0>Rebooting in 180 seconds..Oops: kernel access of bad 
> area, sig: 11
> > [#3]
> 
> Invalid memory access in kernel space.
> 
> We may be triggering it, but it would be kind of difficult 
> for that one to be 
> our bug.  (Maybe on a nommu system where kernel space is 
> writeable from user 
> space busybox could have _corrupted_ the kernel and then the 
> kernel would 
> have gone boing.  But we'd see such an illegal access on 
> other platforms, 
> where it was possible to debug it.)
The MPC866 has no RTC at all. Does it make a difference to busybox?

> 
> Rob
> -- 
> Steve Ballmer: Innovation!  Inigo Montoya: You keep using that word.
> I do not think it means what you think it means.
> 
Thanks for your time Rob!