suspected bug in timeout command
David Laight
David.Laight at ACULAB.COM
Sat Feb 12 11:08:14 UTC 2022
From: Raffaello D. Di Napoli
> Sent: 12 February 2022 01:33
>
>
> On 2/11/22 16:22, Rob Landley wrote:
> > On 2/9/22 11:12 AM, Baruch Siach wrote:
> >> Hi Sun,
> >>
> >> On Wed, Feb 09 2022, סאן עמר wrote:
> >>> Hi, I'm using busybox for a while now (v1.29.2). and I had an issue with a sigterm send randomly
> to a process of mine. I debugged it until I found
> >>> it from the timeout process which was assigned before to another process with the same pid. (i'm
> using a lot of timeouts for a lot of jobs)
> >>> so i looked at the code, "timeout.c" file where it sleep for 1 second in each iteration then check
> the timeout status. I suspect at this time the
> >>> process timeout monitoring is terminated, but another one with the same pid is already created.
> which creates unwanted timeout.
> >>>
> >>> There is a comment in there about sleep for "HUGE NUM" will probably result in this issue, but I
> can't see why it won't occur also in the current
> >>> case.
> >>>
> >>> there is no change of this behaviour in the latest master.
> >>> i would appreciate any help, sun.
> >> Any reference to PID number is inherently racy.
> > Not between parent and child.
>
> Except in BB’s timeout, the relationship is not parent/child :)
>
> Much to my surprise, I’ll say that. When I read the bug report the other
> day, I thought to myself well, this one ought to be easy to fix. But no,
> there’s no SIGCHLD to be handled, no relationship between processes to
> be leveraged.
>
> I don’t think this bug can be fixed without a near-complete rewrite, or
> without doing a lot of procfs digging to really validate the waited-on
> process, since kill(pid, 0) only validates a pid, not a process.
And Linux uses a strict 'next free pid' algorithm for new processes
so the is no guard time between a process exiting and its pid being reused.
This problem was 'fixed' inside the kernel by using a small structure
instead of the pid itself - but that didn't help userspace (or even some drivers).
By comparison NetBSD uses the high bits of the pid as a 'generation number'
and so guarantees that a pid won't be reused for some time (a few thousand forks).
You can use the process start time (I think it is in /proc/pid/stat)
to validate the process just before the kill().
That leaves a very small timing window that it is hard to avoid
without using pidfd.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
More information about the busybox
mailing list