Ash + telnetd: telnet client hangs after exit
Ralf Friedl
Ralf.Friedl at online.de
Mon Oct 15 16:03:41 UTC 2007
Denys Vlasenko wrote:
> On Monday 15 October 2007 15:15, Alexander Kriegisch wrote:
>
>> BTW, the output "ps" output diff looks like this on i386:
>>
>> $ diff -U0 ps1.txt ps2.txt
>> --- ps1.txt 2007-10-15 16:13:06.000000000 +0200
>> +++ ps2.txt 2007-10-15 16:13:25.000000000 +0200
>> @@ -130 +129,0 @@
>> -root 31704 31643 0 16:10 ? 00:00:00 [login] <defunct>
>>
>
> Aha. You have a login which does not exec shell, it spawns it
> as a child. When shell exits, login exits too.
>
> What looks strange to me is that you see a _zombie_ login.
> It means that it exited, but is not waited for yet.
>
> How come telnetd doesn't see EOF from login's fd? It *exited*,
> and that implicitly closes all fds! I'm puzzled.
>
telnetd doesn't see EOF on the pty side because the client pty is open
by a child of the shell, not the shell itself.
The reason why the process is a zombie is that telnetd will only wait
for the child after sending SIGKILL, and it will only send SIGKILL after
either the network connection or the client side of the pty is closed.
But in this case the problem is that the client pty is not closed.
> (1) can you check PPID of zombie login? Is it 1 or <telnetd's PID>?
>
Therefor the PPID should be telnetd. Also, if the PPID would be init
that would mean that init is not working, which is unlikely.
> (2) is it possible that you start telnetd so that it inherits
> "ignore SIGCHLD" from the parent? Try adding this line
> after signal(SIGPIPE, SIG_IGN):
>
> signal(SIGPIPE, SIG_IGN);
> + signal(SIGCHLD, SIG_DFL);
>
SIGCHLD should already be SIG_DFL. Maybe you meant SIG_IGN, but ignoring
SIGCHLD should change nothing except for the left of zombie process.
> Unrelated note: I looked at telnetd source and tightened up
> some loose ends. Can you test this patch? (I don't think
> it will help with this particular problem, though...)
>
The only change in your patch which seems relevant to this problem is
where you removed the SIGKILL before the wait(). I think this is
dangerous in a single-threaded server. The child shell will probably
eventually clean up and exit, but if that takes some seconds, all other
connections will hang for that time.
As I wrote earlier, the problem is that the shell exits but the client
pty remains open. You can reproduce this as follows:
telnet server
# login ...
# on the server:
sleep 10 & exit
The telnet client will only exit after sleep terminates.
SIGCHLD is ignored (as in SIG_DFL) at the moment and the session will
only be terminated when the client pty or the network connection is closed.
Now the question is whether this should be considered a bug or a feature.
If it is considered a bug, the solution is to remove the sigkill and the
wait, install a signal handler for SIGCHLD, inside the signal handler
find the tsession for the pid and close the session.
Regards
Ralf Friedl
More information about the busybox
mailing list