series of ctrl-c makes ssh session hang

Denys Vlasenko vda.linux at googlemail.com
Thu Feb 2 10:11:40 UTC 2017


On Thu, Feb 2, 2017 at 9:21 AM, Ronny Meeus <ronny.meeus at gmail.com> wrote:
> Hello
>
> I'm seeing an issue that an ssh session hangs after generating a
> series of ctrl-c keystrokes.
>
> Environment description:
> ---------------------------------------------------
> - multiple board chassis, with network connectivity from the
> controller board (non-linux).
> - the slave boards:
>    * are running Linux (3.12.37)
>    * have a PPC CPU (P4080)
>    * busybox version 1.25 (but I also tested 1.26.2)
>    * dropbear version: v2016.74
> - SSH access to the slave board is done by relaying a TCP connection
> from the controller board
>     to the slave. So there is a small application running on the
> controller board, which reads a
>     socket on the IP stack and forwards this TCP stream to the slave
> board (and visa-versa).
>
> Test scenario:
> ---------------------------------------------------
> - open an ssh session to the slave board (via the TCP relay).
> - execute "while true; do find /; done". So this command keeps on
> sending a lot of data.
> - Repeat the 2 steps above 2 times so that in total 3 sessions are running.
> - Press control-c continuously in one of the sessions and observe that
> the "sh" process is blocked.
>
> Issue cannot be reproduced each time, but 1 out of 3 times and the
> probability increases with the
> amount of sessions opened in parallel.
> If I try the same scenario on another system with direct IP
> connectivity to Linux (so no tcp_relay),
> the issue is not observed. So it might be that the tcp_relay (and the
> delay/buffering it introduces)
> has an impact.


If tcp_relay is "a small application running on the
controller board", then without seeing its sources
it's impossible to diagnose the problem.


> Analysis of the problem:
> ---------------------------------------------------
> Once the session is blocked I execute:
>
> # pstree -p 2066
> dropbear(2066)---sh(2078)
>
> When pressing enter in the ssh session I see for dropbear:
> # strace -p 2066
> strace: Process 2066 attached
> _newselect(8, [3 5 7], [], NULL, {3516, 826061}) = 1 (in [5], left
> {3512, 787390})
> clock_gettime(0x6 /* CLOCK_??? */, {327, 955324187}) = 0
> read(5, ";\235\21\332\365\210T\200X}\230\"\306.\363\221", 16) = 16
> read(5, "\2\345\252\274\24Y\253\21\316>}\266\fU\20259\324\254Tu\3534\0238bMXzV\274\270",
> 32) = 32
> clock_gettime(0x6 /* CLOCK_??? */, {327, 955324187}) = 0
> writev(7, [{iov_base="\r", iov_len=1}], 1) = 1
> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
> _newselect(8, [3 5 7], [], NULL, {3600, 0}) = 1 (in [7], left {3599, 999987})
> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
> read(7, "\r\n", 16375)                  = 2
> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
> writev(5, [{iov_base="\231\310\271\315\354\243\342\271\22,\325Tj\n\356\345\"t\332d\205\317.\213\376\200\274h\201\347$\324"...,
> iov_len=48}], 1) = 48
> clock_gettime(0x6 /* CLOCK_??? */, {327, 957324207}) = 0
> _newselect(8, [3 5 7], [], NULL, {3600, 0}^Cstrace: Process 2066 detached
>
> While the sh process is not printing any additional traces. So this
> process is completely blocked:
> /isam/slot_default/run # strace -p 2078
> strace: Process 2078 attached
> futex(0xffed598, FUTEX_WAIT_PRIVATE, 2, NULL
>
>
> Connecting a debugger to the system (sh pid 2078) shows that the only
> thread the process has is blocked
> on a mutex in the C library.
>
> (gdb) info threads
>   Id   Target Id         Frame
> * 1    Thread 2078       0x1003d0ec in putprompt (s=<optimized out>)
> at shell/ash.c:2455
> (gdb) bt
> #0  0x0ff5c708 in __lll_lock_wait_private (futex=0xffed598
> <main_arena>) at ../nptl/sysdeps/unix/sysv/linux/lowlevellock.c:31
> #1  0x0fef07a8 in *__GI___libc_free (mem=<optimized out>) at malloc.c:3714
> #2  0x1003d0ec in putprompt (s=<optimized out>) at shell/ash.c:2455
> #3  setprompt_if (do_set=<optimized out>, whichprompt=<optimized out>)
> at shell/ash.c:2501
> #4  0x1003d448 in parsecmd (interact=<optimized out>) at shell/ash.c:12074
> #5  0x1004100c in cmdloop (top=<optimized out>) at shell/ash.c:12215
> #6  0x10042730 in ash_main (argc=<optimized out>, argv=<optimized
> out>) at shell/ash.c:13350

Looks like signal interrupted malloc or free, then
signal handler longjmped (ash by design does that)
without returning to the malloc or free.
malloc state is now corrupted, and free()
in putprompt() deadlocks.

INT_OFF/INT_ON pais guarding code which must not be
interrupted like this is missing somewhere.


More information about the busybox mailing list