[PATCH] measure execution time of each step in shell scripts

Petra Mazdin petra.mazdin1 at gmail.com
Tue Sep 22 16:19:21 UTC 2015


Dear all,

as a project assignment for my summer internship at CERN, I worked on a way
to profile and optimize the boot process of the CernVM virtual machine
(Linux based virtual appliance optimized for high energy physics
experiments). A key part of the project was the development of a
performance profiler for shell scripts as an extension to the busybox ash.
Based on the measurements, costly shell code was replaced by more
efficient, custom C programs.


So, the idea is to take start and end time separately for each executed
step. In order to avoid wrong interpretation between different commands,
each time pid (process ID), start/end time and exact command are taken so
that it was possible to analyze them by the same pid and calculate duration.

All data are then saved in a list whose elements differ by its type: start
pid (long int), end pid (long int), start time (long int), end time (long
int) and command (char *).


A part of busybox ash where start time should be taken is right before
dowait() where parent process starts to wait for forked child process to
finish its execution (all in waitforjob()). When ps->ps_status becomes -1,
meaning that a child is finished, exact time stamp is taken as end time.

Also, we need a special treatment for pipelined processes, meaning that it
can happen that  there exist some processes executing in parallel. What
happens is that final time stamps that are proceeded to further analysis
are the ones taken at the moment the last part of a pipeline is executed.
That is, of course, wrong because parts are executed in parallel and any on
them can finish earlier.

In order to avoid that,  ps->ps_status is checked every time. If a process
is not finished, take the start time and if the process is finished, take
the end time. For everything else, just write '-1' as a time stamp.

This way it makes it easier to analyze data later to take the lowest time
stamps and avoid '-1'-s.

Furthermore, not to lose obtained data, they must be saved. That is why,
each time when exiting a shell, in exitshell(), a new .csv file is created
and saved to a path given by the environment variable ASH_SCORES.

When busybox ash intepreted everything it had to, we have a certain number
of .csv files containing all data that can be analyzed using two python
files (attached) to get final results that include command name, total
execution time because some of them were repeated, number of repetitions
and average execution time.

All these modifications are going to be applied only if it is defined by

#define GET_TIME_CONSUMPTION

in the beginning.

Steps are:

1) export ASH_SCORES=/path/to/directory   (to create an environment
variable to save all results)

2) interprete shell scripts using busybox

3) run main.py that invoked method from collect_data.py to analyze time
stamps; final result is saved to 'sortedAll' in the earlier created
ASH_SCORES

A link to the mentioned patch is:

https://github.com/cernvm/busybox/compare/1_23_stable-cernvm...pmazdin:1_23_stable-cernvm.patch

Considering the results obtained after modifications, we think this might
be of general interest for profiling shell scripts. If you have any
comments or feedback, please let us know.


GitHub link to my repository: https://github.com/pmazdin/busybox



Best regards,

Petra Mazdin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20150922/6ab63e3c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collect_data.py
Type: text/x-python
Size: 5297 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20150922/6ab63e3c/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.py
Type: text/x-python
Size: 172 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20150922/6ab63e3c/attachment-0001.py>


More information about the busybox mailing list