[Buildroot] [PATCH] autobuilder: hung package monitor
Matthew Weber
matt at thewebers.ws
Tue Jan 2 20:04:49 UTC 2018
Thomas, Romain,
On Wed, Dec 27, 2017 at 4:13 PM, Thomas Petazzoni
<thomas.petazzoni at free-electrons.com> wrote:
>
> Hello,
>
> On Mon, 25 Dec 2017 11:41:43 -0600, Matt Weber wrote:
> > @@ -143,6 +143,8 @@ import sys
> > from time import localtime, strftime
> > from distutils.version import StrictVersion
> > import platform
> > +from threading import Thread, Event
> > +import time
>
> Where is time being used? We are already some functions from the "time"
> package above.
Removed, I used it early on but it's no longer required in v2.
>
> > +def stop_on_build_hang(build_done_flag, sub_proc, outputdir, log):
> > + secBetweenCheck = 30
>
> Please don't use camel case names for variables, since we don't use
> such a convention in this script. Even your function is mixing camel
> case naming with lower case + underscore naming.
Bad habit, fixed in v2
>
> > + maxNumOfChecks = (HUNG_BUILD_TIMEOUT * 60) / secBetweenCheck
> > + hungTimeout = maxNumOfChecks
>
> This doesn't look very Pythonic to me. What about having a start time
> taken before the loop, calculate the maximum end time, and then do a
> simple comparison ?
Completely reworked in v2. Thanks for the idea about using mtime, it
was a much cleaner approach
> > + # Setup hung build monitoring thread
> > + build_done_flag = Event()
> > + build_monitor = Thread(target=stop_on_build_hang,
> > + args=(build_done_flag, sub, outputdir, log))
> > + build_monitor.daemon = True
>
> I am not 100% sure of the consequences of this. Are you doing this so
> that the monitor threads are automatically killed when the main thread
> of the autobuild-run program terminates ?
Correct, the main thread owns the lifecycle of the monitor thread when
set as daemon.
>
> Did you test that sending a SIGTERM/SIGINT to autobuild-run continues
> to work properly ?
Yep, verified these work correctly
>
> > + build_monitor.start()
> > +
> > kwargs['buildpid'][kwargs['instance']] = sub.pid
> > ret = sub.wait()
> > kwargs['buildpid'][kwargs['instance']] = 0
> >
> > - # 124 is a special error code that indicates we have reached the
> > - # timeout
> > - if ret == 124:
> > - log_write(log, "INFO: build timed out")
> > + # If build failed, monitor thread would have exited at this point
> > + if not build_monitor.isAlive():
>
> I'm not sure about the synchronization here. The monitor thread kills
> the sub-process doing the build, and then terminates. So it could very
> well that you reach this .isAlive() test at a moment where the
> sub-process doing the build has been killed by the monitor thread, but
> the monitor thread itself hasn't terminated. So we probably need a
> better synchronization mechanism here, such as a status written by the
> monitor thread before it kills the sub-process.
I've added a second Event value to pass back the status of the thread
killing the hung process. Now it isn't dependent on the thread
finishing at a specific point and instead uses a value which is set
before the process.wait() in the main thread releases.
Thanks for the review!
Matt
More information about the buildroot
mailing list