No subject


Wed Apr 23 14:39:25 UTC 2008


It's failing in a function which starts at 0x0000000008064ed3 (directly
after put()). Which function is that? Do "make procps/nmeter.s" and
"objdump -dr procps/nmeter.o" and post both results please, the function
will be visible there. 

---------------------------------------------------------------------- 
 nuclearcat - 06-28-08 16:04  
---------------------------------------------------------------------- 
Since i am using 1.11.0 (it is crashing too)
[  218.330465] nmeter[1733]: segfault at 0 ip 0806488d sp bf9d8680 error 4
in busybox[8048000+76000]

0806486e <collect_info>:
0806489a <collect_time>:

Looks like here:
Disassembly of section .text.collect_info:

00000000 <collect_info>:
   0:   53                      push   %ebx
   1:   89 c3                   mov    %eax,%ebx
   3:   83 ec 08                sub    $0x8,%esp
   6:   a1 00 00 00 00          mov    0x0,%eax
                        7: R_386_32     ptr_to_globals
   b:   80 30 01                xorb   $0x1,(%eax)
   e:   eb 14                   jmp    24 <collect_info+0x24>
  10:   8b 43 08                mov    0x8(%ebx),%eax
  13:   e8 fc ff ff ff          call   14 <collect_info+0x14>
                        14: R_386_PC32  .text.put
  18:   83 ec 0c                sub    $0xc,%esp
  1b:   53                      push   %ebx
  1c:   ff 53 04                call   *0x4(%ebx)
---->  1f:   8b 1b                   mov    (%ebx),%ebx
  21:   83 c4 10                add    $0x10,%esp
  24:   85 db                   test   %ebx,%ebx
  26:   75 e8                   jne    10 <collect_info+0x10>
  28:   59                      pop    %ecx
  29:   5b                      pop    %ebx
  2a:   5b                      pop    %ebx
  2b:   c3                      ret

i manage also to run gdb there
With disabled compiler optimizations all fine, it is not crashing.

If i enable compiler optimization:

(gdb) run nmeter "CPU %c IO %b MEM %[mf]"
Starting program: /home/root/busybox_unstripped nmeter "CPU %c IO %b MEM
%[mf]"

Program received signal SIGSEGV, Segmentation fault.
collect_info (s=0x0) at procps/nmeter.c:753
753     procps/nmeter.c: No such file or directory.
        in procps/nmeter.c

        while (s) {
                put(s->label);
                s->collect(s);
                s = s->next; <<<--- here
        }
}


(gdb) up
http://busybox.net/bugs/view.php?id=1  0x0806543c in nmeter_main (argc=2,
argv=0xbfd6b368) at
procps/nmeter.c:861
861     in procps/nmeter.c

        // Generate first samples but do not print them, they're bogus
        collect_info(first); <--- here 861
        reset_outbuf(); 

---------------------------------------------------------------------- 
 vda - 06-28-08 16:13  
---------------------------------------------------------------------- 
What does it print when you add this?

        while (s) {
                put(s->label);
                s->collect(s);
bb_error_msg("s:%p s->next:%p", s, s->next);
                s = s->next;
        } 

---------------------------------------------------------------------- 
 nuclearcat - 06-28-08 16:31  
---------------------------------------------------------------------- 
Not able to trigger bug with added line

Output:
Router-Dora ~ # ./busybox_unstripped nmeter "CPU %c IO %b MEM %[mf]"
nmeter: s:0x80c2078 s->next:0x80c20d8
nmeter: s:0x80c20d8 s->next:0x80c2100
nmeter: s:0x80c2100 s->next:(nil)
nmeter: s:0x80c2078 s->next:0x80c20d8
nmeter: s:0x80c20d8 s->next:0x80c2100
nmeter: s:0x80c2100 s->next:(nil)
CPU ii........ IO    0    0 MEM 1.9g
nmeter: s:0x80c2078 s->next:0x80c20d8
nmeter: s:0x80c20d8 s->next:0x80c2100
nmeter: s:0x80c2100 s->next:(nil)
CPU ii........ IO    0    0 MEM 1.9g
nmeter: s:0x80c2078 s->next:0x80c20d8
nmeter: s:0x80c20d8 s->next:0x80c2100
nmeter: s:0x80c2100 s->next:(nil)
CPU ii........ IO    0    0 MEM 1.9g 

---------------------------------------------------------------------- 
 vda - 06-28-08 16:38  
---------------------------------------------------------------------- 
We might have uninitialized ->next. I replaced xmalloc's with xzalloc's,
please try attached 3.patch 

---------------------------------------------------------------------- 
 nuclearcat - 06-28-08 16:55  
---------------------------------------------------------------------- 
Maybe some gcc optimization causing this?

diff in assembly of nmeter with line added and default
--- VAR1        2008-06-29 02:36:05.000000000 +0300
+++ VAR2        2008-06-29 02:37:24.000000000 +0300
@@ -5,26 +5,19 @@
    6:   a1 00 00 00 00          mov    0x0,%eax
                         7: R_386_32     ptr_to_globals
    b:   80 30 01                xorb   $0x1,(%eax)
-   e:   eb 24                   jmp    34 <collect_info+0x34>
+   e:   eb 14                   jmp    24 <collect_info+0x24>
   10:   8b 43 08                mov    0x8(%ebx),%eax
   13:   e8 fc ff ff ff          call   14 <collect_info+0x14>
                         14: R_386_PC32  .text.put
   18:   83 ec 0c                sub    $0xc,%esp
   1b:   53                      push   %ebx
   1c:   ff 53 04                call   *0x4(%ebx)
-  1f:   83 c4 0c                add    $0xc,%esp
-  22:   ff 33                   pushl  (%ebx)
-  24:   53                      push   %ebx
-  25:   68 0a 00 00 00          push   $0xa
-                        26: R_386_32    .rodata.str1.1
-  2a:   e8 fc ff ff ff          call   2b <collect_info+0x2b>
-                        2b: R_386_PC32  bb_error_msg
-  2f:   8b 1b                   mov    (%ebx),%ebx
-  31:   83 c4 10                add    $0x10,%esp
-  34:   85 db                   test   %ebx,%ebx
-  36:   75 d8                   jne    10 <collect_info+0x10>
-  38:   59                      pop    %ecx
-  39:   5b                      pop    %ebx
-  3a:   5b                      pop    %ebx
-  3b:   c3                      ret
+  1f:   8b 1b                   mov    (%ebx),%ebx
+  21:   83 c4 10                add    $0x10,%esp
+  24:   85 db                   test   %ebx,%ebx
+  26:   75 e8                   jne    10 <collect_info+0x10>
+  28:   59                      pop    %ecx
+  29:   5b                      pop    %ebx
+  2a:   5b                      pop    %ebx
+  2b:   c3                      ret
 Disassembly of section .text.collect_time:

If i change -Os to -O0 it works fine
in: Makefile.flags

ifneq ($(CONFIG_DEBUG),y)
CFLAGS += $(call cc-option,-Os,) <<--- here
else

 

---------------------------------------------------------------------- 
 nuclearcat - 06-28-08 16:53  
---------------------------------------------------------------------- 
Tested patch, it doesn't help.
Crashing in same place.

  LINK    busybox_unstripped
Trying libraries: crypt m
 Library crypt is needed
 Library m is needed
Final link with: crypt m
sunfire-1 busybox-1.11.0 # scp busybox_unstripped root at XXX.XXX.XXX.XXX:
root at XXX.XXX.XXX.XXX's password:
busybox_unstripped                                                        
                                                100%  565KB 565.3KB/s  
00:00
sunfire-1 busybox-1.11.0 # cat procps/nmeter.c|grep xzalloc
        SET_PTR_TO_GLOBALS(xzalloc(sizeof(G))); \
        s_stat *s = xzalloc(sizeof(s_stat));
        cpu_stat *s = xzalloc(sizeof(*s));
        s->bar = xzalloc(sz+1);
        /*s->bar[sz] = '\0'; - xzalloc did it */
        int_stat *s = xzalloc(sizeof(*s));
        ctx_stat *s = xzalloc(sizeof(*s));
        blk_stat *s = xzalloc(sizeof(*s));
        fork_stat *s = xzalloc(sizeof(*s));
        if_stat *s = xzalloc(sizeof(*s));
        mem_stat *s = xzalloc(sizeof(*s));
        swp_stat *s = xzalloc(sizeof(*s));
        fd_stat *s = xzalloc(sizeof(*s));
        time_stat *s = xzalloc(sizeof(*s));
                        /*s->next = NULL; - all initXXX funcs use xzalloc
*/
                /*s->next = NULL; - all initXXX funcs use xzalloc */

So it is applied, i made sure by rebuilding from scratch.

 

---------------------------------------------------------------------- 
 nuclearcat - 06-28-08 17:08  
---------------------------------------------------------------------- 
Also if i use gcc-3.4.5-gentoo it is ok too (with -Os even).
When bug triggered i am using 4.1.2-gentoo. I will try rebuild gcc 4.1.2
with vanilla flag, but i am feeling something new added in gcc-4
optimizations.

(Sorry, went to sleep, will be online if required after 8 hours)

 

---------------------------------------------------------------------- 
 vda - 06-28-08 17:20  
---------------------------------------------------------------------- 
produce nmeter.s with gcc 3.x.x and 4.x.x and attach to the bug please. 

---------------------------------------------------------------------- 
 nuclearcat - 06-29-08 06:11  
---------------------------------------------------------------------- 
files attached
binary produced by gcc 4.3.1 crashing too 

---------------------------------------------------------------------- 
 vda - 06-29-08 07:31  
---------------------------------------------------------------------- 
Please double check - does this line really makes bug disappear?

        while (s) {
                put(s->label);
                s->collect(s);
bb_error_msg("s:%p s->next:%p", s, s->next);
                s = s->next;
        }

if so - try simpler modifications, like:

        while (s) {
                put(s->label);
                s->collect(s);
write(2, "before\n", 7);
                s = s->next;
write(2, "after\n", 6);
        }

What I'm trying to verify by the above fragment - does it _really_ crashes
on s = s->next line? For now, I am not 100% sure.

 

---------------------------------------------------------------------- 
 nuclearcat - 06-29-08 10:02  
---------------------------------------------------------------------- 
With gcc 4.3.1 it crash on first variant too.

Router-Dora ~ # ./busybox_unstripped nmeter "CPU %c MEM %[mf] IO %b"
nmeter: s:0x80c2078 s->next:0x80c20d8
Segmentation fault

[1198888.680599] busybox_unstrip[6483]: segfault at 0 ip 0806500f sp
bf839c8c error 4 in busybox_unstripped[8048000+76000]


in busybox_unstripped
08064fed <collect_info>:

means collect_info + offset 0x22

then in nmeter.o

00000000 <collect_info>:
   0:   53                      push   %ebx
   1:   89 c3                   mov    %eax,%ebx
   3:   83 ec 08                sub    $0x8,%esp
   6:   a1 00 00 00 00          mov    0x0,%eax
                        7: R_386_32     ptr_to_globals
   b:   80 30 01                xorb   $0x1,(%eax)
   e:   eb 24                   jmp    34 <collect_info+0x34>
  10:   8b 43 08                mov    0x8(%ebx),%eax
  13:   e8 fc ff ff ff          call   14 <collect_info+0x14>
                        14: R_386_PC32  .text.put
  18:   83 ec 0c                sub    $0xc,%esp
  1b:   53                      push   %ebx
  1c:   ff 53 04                call   *0x4(%ebx)
  1f:   83 c4 0c                add    $0xc,%esp
  22:   ff 33                   pushl  (%ebx) THIS?
  24:   53                      push   %ebx
  25:   68 83 00 00 00          push   $0x83
                        26: R_386_32    .rodata.str1.1
  2a:   e8 fc ff ff ff          call   2b <collect_info+0x2b>
                        2b: R_386_PC32  bb_error_msg
  2f:   8b 1b                   mov    (%ebx),%ebx
  31:   83 c4 10                add    $0x10,%esp
  34:   85 db                   test   %ebx,%ebx
  36:   75 d8                   jne    10 <collect_info+0x10>
  38:   5b                      pop    %ebx
  39:   58                      pop    %eax
  3a:   5b                      pop    %ebx
  3b:   c3                      ret
Disassembly of section .text.nmeter_main:


What i found:
by modifying
static void collect_info(s_stat *s)
{
        gen ^= 1;
        while (s) {
                put(s->label);
                bb_error_msg("msg1 label %s",s->label);
                bb_error_msg("s:%p s->next:%p", s, s->next);
                s->collect(s);
                bb_error_msg("msg2");
                //bb_error_msg("s:%p s->next:%p", s, s->next);
                bb_error_msg("s:%p", s);
                s = s->next;
        }
}
Router-Dora ~ # ./busybox_unstripped nmeter "CPU %c MEM %[mf] IO %b"
nmeter: msg1 label CPU
nmeter: s:0x80c2078 s->next:0x80c20d8
nmeter: msg2
nmeter: s:0x80c2078
nmeter: msg1 label  MEM
nmeter: s:0x80c20d8 s->next:0x80c20f0
nmeter: msg2
nmeter: s:(nil)
Segmentation fault



so s became NULL while collecting meminfo? sure s->next cannot be
retrieved from it.

My meminfo now
Router-Dora ~ # cat /proc/meminfo
MemTotal:      2076508 kB
MemFree:       1900396 kB
Buffers:           620 kB
Cached:          51900 kB
SwapCached:          0 kB
Active:          12264 kB
Inactive:        43976 kB
HighTotal:     1179584 kB
HighFree:      1121704 kB
LowTotal:       896924 kB
LowFree:        778692 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:        3720 kB
Mapped:           3068 kB
Slab:           112572 kB
SReclaimable:     3668 kB
SUnreclaim:     108904 kB
PageTables:        172 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   1038252 kB
Committed_AS:     8836 kB
VmallocTotal:   114680 kB
VmallocUsed:      2232 kB
VmallocChunk:   112356 kB 

---------------------------------------------------------------------- 
 nuclearcat - 06-29-08 10:21  
---------------------------------------------------------------------- 
I think it is not related to memory, cause as i remember it happens
sometimes with another nmeter parameters. 

---------------------------------------------------------------------- 
 nuclearcat - 06-29-08 10:47  
---------------------------------------------------------------------- 
I am wrong.
Well it is related, if we check more my previous tests - it was always
containing %[mf] 

---------------------------------------------------------------------- 
 vda - 06-30-08 00:40  
---------------------------------------------------------------------- 
s is a local variable and cannot be accessible by called functions.
In fact, in your case s is in the %ebx register.

  1b: 53             push %ebx
  1c: ff 53 04       call *0x4(%ebx)    s->collect(s)
  1f: 83 c4 0c       add $0xc,%esp
  22: ff 33          pushl (%ebx)  <=   s->next
  24: 53             push %ebx     <=   s (is in %ebx)
  25: 68 83 00 00 00 push $0x83
                        26: R_386_32 .rodata.str1.1
  2a: e8 fc ff ff ff call 2b <collect_info+0x2b>
                        2b: R_386_PC32 bb_error_msg



More information about the busybox-cvs mailing list