[OT] poll() vs. AIO (was: [PATCH] ash: clear NONBLOCK flag from stdin when in foreground)

Mon Aug 22 06:33:42 UTC 2011

> In a way you're right. Since child processes can get killed
> independently, it's almost always a mistake to "detach" or "disown"
> them and rely on them completing their task. But there are alternate
> ways to do this like using the communication channel (pipe/etc.)
> rather than the pid.

 Yes there are, but if you're completely ignoring pids, relying on an
EOF to detect child termination, and reducing the SIGCHLD handler to
"just reap whatever died", you have no way of obtaining your children's
exit status.

 I agree with you on two points about fork():

 - it isn't a very satisfactory API, because the termination notification
(SIGCHLD) is too generic, and identification of the event (wait/waitpid)
is messy. It would be much better if the signal could carry a payload
containing both the pid of the dead process and its exit status, and
could be requeued after examination. Or if a simple EOF could also
carry this information.

 - It requires a convention between the main caller and the libraries;
notably, libraries should never waitpid(), and thus should detect their
subprocesses' termination via EOF, which means that they cannot read an
exit status.

 So, yes, it could definitely have been better designed. Yet, the
interface is usable, and what it accomplishes is well worth the effort.

> This requires a global list/global state that is inherently
> incompatible with library use, and just fundamentally bad programming.

 Only main-created subprocesses should be listed. Libraries should not
even store the pid of their created processes anywhere, except to send
them signals.

> Unfortunately pclose and system use waitpid, and it's rather common
> practice. If your library code is not going to use waitpid, you need
> to document that the caller is responsible for handling SIGCHLD and
> reaping zombies.

 Absolutely.

> I'm thinking of an application model where the main event loop creates
> a new thread/process for handling events for which it cannot complete
> the processing immediately, and the possibility that those handlers,
> in turn, might have subtasks which themselves cannot be completed
> immediately. With threads this requires almost no code, unless you're
> doing things horribly wrong (which I admit many people do). With
> forking, it requires painful synchronization back with the main
> process's event loop, and artificially inflates the levels of
> interdependence between modules.

 Indeed, if your model is such that every subtask, included those
created by subtasks, should be handled by the main event loop, then
multiprocessing is certainly not the right implementation. Multiprocessing
would be used here if every subtask was independent enough and could
handle its own sets of subtasks without having to share much data or
logic with the main loop. You would then have a process tree rooted in
the main loop; that's quite reasonable if subprocesses are exec()ed to
something with a meaningful API.
 If your subtasks are intricately entangled, however, multiprocessing
will not do; threads are one approach. Another approach would be to
view subtask creation and termination as asynchronous events and simply
write a single-threaded process with a fully asynchronous loop.

-- 
 Laurent