series of ctrl-c makes ssh session hang

Denys Vlasenko vda.linux at googlemail.com
Fri Feb 3 11:58:51 UTC 2017


On Thu, Feb 2, 2017 at 3:25 PM, Ronny Meeus <ronny.meeus at gmail.com> wrote:
>>> When pressing enter in the ssh session I see for dropbear:
>>> # strace -p 2066
>>> strace: Process 2066 attached
>>> _newselect(8, [3 5 7], [], NULL, {3516, 826061}) = 1 (in [5], left
>>> {3512, 787390})
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 955324187}) = 0
>>> read(5, ";\235\21\332\365\210T\200X}\230\"\306.\363\221", 16) = 16
>>> read(5, "\2\345\252\274\24Y\253\21\316>}\266\fU\20259\324\254Tu\3534\0238bMXzV\274\270",
>>> 32) = 32
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 955324187}) = 0
>>> writev(7, [{iov_base="\r", iov_len=1}], 1) = 1
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
>>> _newselect(8, [3 5 7], [], NULL, {3600, 0}) = 1 (in [7], left {3599, 999987})
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
>>> read(7, "\r\n", 16375)                  = 2
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
>>> writev(5, [{iov_base="\231\310\271\315\354\243\342\271\22,\325Tj\n\356\345\"t\332d\205\317.\213\376\200\274h\201\347$\324"...,
>>> iov_len=48}], 1) = 48
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 957324207}) = 0
>>> _newselect(8, [3 5 7], [], NULL, {3600, 0}^Cstrace: Process 2066 detached
>>>
>>> While the sh process is not printing any additional traces. So this
>>> process is completely blocked:
>>> /isam/slot_default/run # strace -p 2078
>>> strace: Process 2078 attached
>>> futex(0xffed598, FUTEX_WAIT_PRIVATE, 2, NULL
>>>
>>>
>>> Connecting a debugger to the system (sh pid 2078) shows that the only
>>> thread the process has is blocked
>>> on a mutex in the C library.
>>>
>>> (gdb) info threads
>>>   Id   Target Id         Frame
>>> * 1    Thread 2078       0x1003d0ec in putprompt (s=<optimized out>)
>>> at shell/ash.c:2455
>>> (gdb) bt
>>> #0  0x0ff5c708 in __lll_lock_wait_private (futex=0xffed598
>>> <main_arena>) at ../nptl/sysdeps/unix/sysv/linux/lowlevellock.c:31
>>> #1  0x0fef07a8 in *__GI___libc_free (mem=<optimized out>) at malloc.c:3714
>>> #2  0x1003d0ec in putprompt (s=<optimized out>) at shell/ash.c:2455
>>> #3  setprompt_if (do_set=<optimized out>, whichprompt=<optimized out>)
>>> at shell/ash.c:2501
>>> #4  0x1003d448 in parsecmd (interact=<optimized out>) at shell/ash.c:12074
>>> #5  0x1004100c in cmdloop (top=<optimized out>) at shell/ash.c:12215
>>> #6  0x10042730 in ash_main (argc=<optimized out>, argv=<optimized
>>> out>) at shell/ash.c:13350
>>
>> Looks like signal interrupted malloc or free, then
>> signal handler longjmped (ash by design does that)
>> without returning to the malloc or free.
>> malloc state is now corrupted, and free()
>> in putprompt() deadlocks.
>>
>> INT_OFF/INT_ON pais guarding code which must not be
>> interrupted like this is missing somewhere.
>
> Interesting info, thanks.
>
> How do we continue to identify the place in the code?

I guess by code review and experiments. For example,
try adding "INT_OFF;" and "INT_ON;" around this
code block:


# if ENABLE_FEATURE_TAB_COMPLETION
                line_input_state->path_lookup = pathval();
# endif
                reinit_unicode_for_ash();
                nr = read_line_input(line_input_state, cmdedit_prompt,
buf, IBUFSIZ, timeout);


> Does this not mean that before all library calls we need to make sure
> signals are disabled?

Not all library calls, only some. For example, read() or strlen()
can be interrupted and longjmp'ed away with no ill effects.


More information about the busybox mailing list