[PATCH] measure execution time of each step in shell scripts

Bernhard Reutner-Fischer rep.dot.nop at gmail.com
Thu Sep 24 10:27:09 UTC 2015


On September 22, 2015 6:19:21 PM GMT+02:00, Petra Mazdin <petra.mazdin1 at gmail.com> wrote:
>Dear all,

>So, the idea is to take start and end time separately for each executed
>step. In order to avoid wrong interpretation between different
>commands,
>each time pid (process ID), start/end time and exact command are taken
>so
>that it was possible to analyze them by the same pid and calculate
>duration.
>
>All data are then saved in a list whose elements differ by its type:
>start
>pid (long int), end pid (long int), start time (long int), end time
>(long
>int) and command (char *).
>
>
>A part of busybox ash where start time should be taken is right before
>dowait() where parent process starts to wait for forked child process
>to
>finish its execution (all in waitforjob()). When ps->ps_status becomes
>-1,
>meaning that a child is finished, exact time stamp is taken as end
>time.
>
>Also, we need a special treatment for pipelined processes, meaning that
>it
>can happen that  there exist some processes executing in parallel. What
>happens is that final time stamps that are proceeded to further
>analysis
>are the ones taken at the moment the last part of a pipeline is
>executed.
>That is, of course, wrong because parts are executed in parallel and
>any on
>them can finish earlier.
>
>In order to avoid that,  ps->ps_status is checked every time. If a
>process
>is not finished, take the start time and if the process is finished,
>take
>the end time. For everything else, just write '-1' as a time stamp.
>
>This way it makes it easier to analyze data later to take the lowest
>time
>stamps and avoid '-1'-s.
>
>Furthermore, not to lose obtained data, they must be saved. That is
>why,
>each time when exiting a shell, in exitshell(), a new .csv file is
>created
>and saved to a path given by the environment variable ASH_SCORES.
>
>When busybox ash intepreted everything it had to, we have a certain
>number
>of .csv files containing all data that can be analyzed using two python
>files (attached) to get final results that include command name, total
>execution time because some of them were repeated, number of
>repetitions
>and average execution time.
>
>All these modifications are going to be applied only if it is defined
>by
>
>#define GET_TIME_CONSUMPTION
>
>in the beginning.
>
>Steps are:
>
>1) export ASH_SCORES=/path/to/directory   (to create an environment
>variable to save all results)
>
>2) interprete shell scripts using busybox
>
>3) run main.py that invoked method from collect_data.py to analyze time
>stamps; final result is saved to 'sortedAll' in the earlier created
>ASH_SCORES
>
>A link to the mentioned patch is:
>
>https://github.com/cernvm/busybox/compare/1_23_stable-cernvm...pmazdin:1_23_stable-cernvm.patch
>
>Considering the results obtained after modifications, we think this
>might
>be of general interest for profiling shell scripts. If you have any
>comments or feedback, please let us know.

Heh, i once did a quick 30 minute hack along those lines:
http://lists.busybox.net/pipermail/busybox/2005-September/050400.html and following, so I like the idea.
:)



More information about the busybox mailing list