[Buildroot] [PATCH v2 2/5] support/scripts/pkg-stats: retrieve packages latest version using processes

Victor Huesca victor.huesca at bootlin.com
Fri Jul 19 14:35:53 UTC 2019


The major bottleneck in pkg-stats is the time spent waiting for answer
from distant servers. Two functions involve such communications with
remote servers are:
- 'check_package_urls' which check that package website are up, it
  is efficient do to the use of process-pools thanks to Matt Weber.
- 'check_package_latest_version' which fetch the latest package version
  from release-monitoring, it uses a http-pool but run sequentially.

This patch extends the use of process-pools to 'check_latest_version'.
This implementation rely on the apply_async's callback to allow
per-package progress feedback. To simplify this feedback creation, this
patch introduce the following functions:
- 'apply_async': this function simply wrap the Pool's method of the same
in order to pass additional arguments to the callback. In particular it
is used to print the package name in the feedback message.
- 'progress_callback': this function ease the definition of "progress
feedback function": it create a callable that will keep track of how
many time it has been called and print a custom message.

Also change the behaviour of print for python 2 to be a function instead
of a statement, allowing to use it in lambdas.

Runtimes for this function are ~3m vs ~25m for the linear version.
Tested on an i7 7500U (2/4 cores/threads @3.5GHz) with 15ms ping.

Note: There have already been work trying to parallelize this function
using threads but there were a failure on some configurations [1].
This implementation rely on a dedicated module already in use on this
script, so it's unlikely to see failure with this version.

[1] http://lists.busybox.net/pipermail/buildroot/2018-March/215368.html

Signed-off-by: Victor Huesca <victor.huesca at bootlin.com>
---
 support/scripts/pkg-stats | 64 +++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 12 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 77819c4804..08730b8d43 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -16,6 +16,7 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 
+from __future__ import print_function
 import argparse
 import datetime
 import fnmatch
@@ -159,6 +160,37 @@ class Package:
             (self.name, self.path, self.has_license, self.has_license_files, self.has_hash, self.patch_count)
 
 
+class progress_callback:
+    def __init__(self, progress_fn, start=0, end=100):
+        '''
+        Create a callback 'function' which purpose is to display a progress message.
+
+        :param progress_fn: must take at least 2 arguments representing the current step
+        and the 'end' step.
+        :param start: First step.
+        :param end: Last step.
+        '''
+        self._progress_fn = progress_fn
+        self._cpt = start
+        self._end = end
+
+    def __call__(self, *args):
+        '''
+        Calls progress_fn.
+        '''
+        self._progress_fn(self._cpt, self._end, *args)
+        self._cpt += 1
+
+
+def apply_async(pool, func, args=(), kwds={}, callback=None, cb_args=(), cb_kwds={}):
+    '''
+    Wrapper around `pool.apply_async()` to allow passing arguments to the callback
+    '''
+    _func = lambda: func(*args, **kwds)
+    _cb = lambda res: callback(res, *cb_args, **cb_kwds)
+    return pool.apply_async(_func, callback=_cb)
+
+
 def get_pkglist(npackages, package_list):
     """
     Builds the list of Buildroot packages, returning a list of Package
@@ -345,6 +377,14 @@ def release_monitoring_get_latest_version_by_guess(pool, name):
     return (RM_API_STATUS_NOT_FOUND, None, None)
 
 
+def check_package_latest_version_worker(pool, name):
+    """Wrapper to try both by name then by guess"""
+    res = release_monitoring_get_latest_version_by_distro(pool, name)
+    if res[0] == RM_API_STATUS_NOT_FOUND:
+        res = release_monitoring_get_latest_version_by_guess(pool, name)
+    return res
+
+
 def check_package_latest_version(packages):
     """
     Fills in the .latest_version field of all Package objects
@@ -360,18 +400,18 @@ def check_package_latest_version(packages):
     - id: string containing the id of the project corresponding to this
       package, as known by release-monitoring.org
     """
-    pool = HTTPSConnectionPool('release-monitoring.org', port=443,
-                               cert_reqs='CERT_REQUIRED', ca_certs=certifi.where(),
-                               timeout=30)
-    count = 0
-    for pkg in packages:
-        v = release_monitoring_get_latest_version_by_distro(pool, pkg.name)
-        if v[0] == RM_API_STATUS_NOT_FOUND:
-            v = release_monitoring_get_latest_version_by_guess(pool, pkg.name)
-
-        pkg.latest_version = v
-        print("[%d/%d] Package %s" % (count, len(packages), pkg.name))
-        count += 1
+    http_pool = HTTPSConnectionPool('release-monitoring.org', port=443,
+                                    cert_reqs='CERT_REQUIRED', ca_certs=certifi.where(),
+                                    timeout=30)
+    worker_pool = Pool(processes=64)
+    cb = progress_callback(
+        lambda i, n, (status, ver, id), name:
+            print("[%d/%d] (version) Package %s: %s" % (i, n, name, id)),
+        1, len(packages))
+    results = [apply_async(worker_pool, check_package_latest_version_worker, (http_pool, pkg.name),
+                           callback=cb, cb_args=(pkg.name,)) for pkg in packages]
+    for pkg, r in zip(packages, results):
+        pkg.latest_version = r.get()
 
 
 def calculate_stats(packages):
-- 
2.21.0



More information about the buildroot mailing list