[PATCH] crond: reap orphaned grandchildren to prevent zombie buildup
Nadav Tasher
tashernadav at gmail.com
Sat May 31 13:57:54 UTC 2025
Great catch!
However, since the process status is never used (NULL provided to waitpid),
why not just set the signal handler for SIGCHLD to SIG_IGN?
This would significantly reduce the number of waitpid invocations.
Nadav
On Sat, May 31, 2025 at 11:53:05AM +0800, Valentin Lab wrote:
> This patch fixes a long-standing zombie-process leak in crond that
> appears whenever crond itself is PID 1 (typical in minimal BusyBox
> containers).
>
> Reproducer
> ==========
>
> mkdir /tmp/test-root-crontabs -p
> echo "* * * * * sh -c 'sleep 1 &'" > "/tmp/test-root-crontabs/root"
> sudo chown root:root -R /tmp/test-root-crontabs
>
> ## Run busybox crond PID 1
> docker run -d --rm --name busybox \
> -v /tmp/test-root-crontabs/root:/var/spool/cron/crontabs/root:ro \
> busybox:1.37.0-uclibc \
> crond -f -d 0 > /dev/null
>
> watch -n 1 '
> echo "Expecting more zombies in $((60 - 10#$(date +%S)))s (Ctrl-C to quit):"
> ps -ax -o pid,ppid,stat,comm | egrep "\s+Z\s+"
> '
> echo "Cleaning busybox container"
> docker stop busybox
>
>
> You should see one new "sleep" zombie each minute.
>
> If crond is *not* PID 1 (e.g. started via a wrapper shell), the leak
> does not appear because the wrapper—now PID 1—reaps the orphans.
>
>
> Root cause
> ==========
>
> crond only calls `waitpid()` for PIDs it tracks. Background tasks
> (`sleep 5 &`) become orphans; the kernel re-parents them to PID 1.
> When crond *is* PID 1, it inherits these orphans but never waits for
> them, so they persist as zombies.
>
>
> Fix
> ===
>
> Add a tiny reaper loop:
>
> while (waitpid(-1, NULL, WNOHANG) > 0);
>
> Placed at the end of `check_completions()`, it collects any remaining
> children. On systems where crond is **not** PID 1 the loop returns
> `-ECHILD` immediately, so behaviour and overhead are unchanged.
>
>
> Testing
> =======
>
> * Reproduced the leak on `busybox:1.37.0-uclibc` and current git master.
> * Applied the patch, rebuilt BusyBox statically (with only crond),
> bind-mounted the binary into a fresh container; zombie count stays at
> 0 after X min.
>
> Suggested docker commmand for testing with compiled (static) binary in CWD:
>
> docker run --rm --name busybox \
> -v /tmp/test-root-crontabs/root:/var/spool/cron/crontabs/root:ro \
> -v $PWD/busybox:/bin/crond \
> busybox:1.37.0-uclibc \
> crond -f -d 0
>
> * Verified crond still exits cleanly and runs scheduled jobs normally.
>
> Binary size impact (gcc 13.3.0, x86-64, `-Os`, static): +17 bytes.
>
> Thanks for your time and for maintaining BusyBox!
>
> Regards,
> Valentin Lab
> From 6bd536d80448485ebb5c12cb032286e97ddacb2a Mon Sep 17 00:00:00 2001
> From: Valentin Lab <valentin.lab at kalysto.org>
> Date: Sat, 31 May 2025 09:56:09 +0800
> Subject: [PATCH] crond: reap orphaned grandchildren to prevent zombie buildup
>
> If a cron job launches a background task, e.g. `sh -c "sleep 5 &"`,
> the shell exits immediately and the `sleep` process is re-parented to
> PID 1. When BusyBox `crond` itself happens to be PID 1 (common in a
> minimal container), those orphans become direct children of `crond`.
> Because `crond` only calls waitpid() for the PIDs it explicitly tracks,
> these processes remain forever in Z state and the container slowly
> fills with zombies.
>
> Add a small `while (waitpid(-1, NULL, WNOHANG) > 0)` sweep at the end
> of check_completions() so any stray children are reaped. When `crond`
> is not PID 1 the loop returns -ECHILD immediately, so behaviour and
> overhead on a normal system are unchanged.
>
> Size impact: +12 bytes on x86-64 (gcc 13.3.0 -Os, static)
>
> Signed-off-by: Valentin Lab <valentin.lab at kalysto.org>
> ---
> miscutils/crond.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/miscutils/crond.c b/miscutils/crond.c
> index b3762d327..43d83b474 100644
> --- a/miscutils/crond.c
> +++ b/miscutils/crond.c
> @@ -1001,6 +1001,10 @@ static int check_completions(void)
> /* else: r == 0: "process is still running" */
> file->cf_has_running = 1;
> }
> +
> + /* Reap any other children we don't actively track */
> + while (waitpid(-1, NULL, WNOHANG) > 0);
> +
> //FIXME: if !file->cf_has_running && file->deleted: delete it!
> //otherwise deleted entries will stay forever, right?
> num_still_running += file->cf_has_running;
> --
> 2.43.0
>
> _______________________________________________
> busybox mailing list
> busybox at busybox.net
> https://lists.busybox.net/mailman/listinfo/busybox
More information about the busybox
mailing list