Ash + telnetd: telnet client hangs after exit

Mon Oct 15 13:09:14 UTC 2007

On Friday 12 October 2007 03:34, Alexander Kriegisch wrote:
> BB 1.7.2 and many old versions down to 1.4.x, I don't know exactly.
> Platform: mipsel

Are you able to reproduce it on i386?

> On my embedded system there is BB running telnetd. The shell is ash. A
> client logs in and works until he types "exit". After that the session
> is terminated, but the telnet client hanngs waiting for - I don't know
> what. I can reproduce it with several telnet clients. It does not happen
> with Dropbear sshd, so I strongly assume it is a BB problem.
> 
> Interestingly, the session exits cleanly if I call "setconsole -r"
> during the telnet session.
>
> I have attached two sets of log files generated like this:
> 
> > strace -p 1186 -p 742 -f -ff -s 100 -o /ash-telnet-ok.log
> 
> PID 742 is telnetd, 1131/1186 are ash, respectively. The ash protocols
> are just there for completeness, they are identical. The telnetd logs
> differ, though.

Let's see.

ash-telnet-ok.log.742

742   _newselect(6, [3 4 5], [], NULL, NULL) = 1 (in [5])
742   read(5, "\r\n", 2022) = 2
742   _newselect(6, [3 4 5], [4], NULL, NULL) = 1 (out [4])
742   write(4, "\r\n", 2) = 2
742   _newselect(6, [3 4 5], [], NULL, NULL) = 1 (in [5])
742   --- SIGCHLD (Child exited) @ 0 (0) ---

ash has exited, and select returns "fd #5 is ready".

742   read(5, 0x4a7822, 2022)           = -1 EIO (Input/output error)

We read it. "OMG. Error. Probably client is dead, drop it."

742   kill(1186, SIGKILL)               = 0
742   wait4(1186, NULL, 0, NULL)        = 1186
742   close(5)                          = 0
742   close(4)                          = 0
742   close(4)                          = -1 EBADF (Bad file descriptor)

Now we have only listening socket to wait on:

742   _newselect(4, [3], [], NULL, NULL <unfinished ...>

ash-telnet-bad.log.742

742   _newselect(6, [3 4 5], [], NULL, NULL) = 1 (in [5])
742   read(5, "\r\n", 2022) = 2
742   _newselect(6, [3 4 5], [4], NULL, NULL) = 1 (out [4])
742   write(4, "\r\n", 2) = 2
742   _newselect(6, [3 4 5], [], NULL, NULL) = ? ERESTARTNOHAND (To be restarted)
742   --- SIGCHLD (Child exited) @ 0 (0) ---

ash has exited, but select DOES NOT return "fd #5 is ready"!
Probably because it is NOT ready (i.e. slave pty
is still open by some children of ash which are still alive,
but they don't write anything to us (yet?).

We continue to wait:

742   _newselect(6, [3 4 5], [], NULL, NULL <unfinished ...>

> Maybe somebody can find out what is wrong here. It might be a similar
> case to "ash endless loop after ssh client is killed" (April 20-29,
> 2007), I don't know.

Can you verify this theory by looking at ps output? These orphaned
children of ash will show up as children of init (they are reparented
to init because their original parent died).

Do you think it may be useful to add telnet switch to close connections
when ash dies?
--
vda