[Buildroot] [PATCH 1/1] support/scripts/pkg-stats: iterate over CVEs in streaming

Thomas De Schampheleire patrickdepinguin+buildroot at gmail.com
Thu Feb 20 20:11:15 UTC 2020


Hi Titouan,

El jue., 20 feb. 2020 a las 19:27, Titouan Christophe
(<titouan.christophe at railnova.eu>) escribió:
>
> The NVD files that are used to build the list of CVEs affecting
> Buildroot packages are quite large (a few hundreds MB of json),
> and cause the pkg-stats scripts to have a huge memory footprint
> (a few GB with Python 2.7).
>
> However, because we only need to iterate on CVE items one by one,
> we can process them in streaming (ie decoding one CVE at a time
> from the JSON representation). Because the json module from the
> python standard library does not support such a mode of operation,
> we switch to the third-party package ijson, which is compatible
> with both Python 2 and Python3.
>
> To run the script with these modifications, one should install
> the ijson python package. This can be done with pip:
> `pip install ijson`. On Debian based distributions, this can
> also be done with the apt package manager:
> `apt install python-ijson`.
>
> Signed-off-by: Titouan Christophe <titouan.christophe at railnova.eu>
> ---
>  support/scripts/pkg-stats | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> index c113cf9606..7721d98459 100755
> --- a/support/scripts/pkg-stats
> +++ b/support/scripts/pkg-stats
> @@ -25,6 +25,7 @@ import re
>  import subprocess
>  import requests  # URL checking
>  import json
> +import ijson
>  import certifi
>  import distutils.version
>  import time
> @@ -231,11 +232,11 @@ class CVE:
>          for year in range(NVD_START_YEAR, datetime.datetime.now().year + 1):
>              filename = CVE.download_nvd_year(nvd_dir, year)
>              try:
> -                content = json.load(gzip.GzipFile(filename))
> +                content = ijson.items(gzip.GzipFile(filename), 'CVE_Items.item')
>              except:
>                  print("ERROR: cannot read %s. Please remove the file then rerun this script" % filename)
>                  raise
> -            for cve in content["CVE_Items"]:
> +            for cve in content:
>                  yield cls(cve['cve'])
>
>      def each_product(self):

This is _way_ better. In my test run observing top output, resident
memory stayed around 50 MB.

Reviewed-by: Thomas De Schampheleire <thomas.de_schampheleire at nokia.com>
Tested-by: Thomas De Schampheleire <thomas.de_schampheleire at nokia.com>


More information about the buildroot mailing list