[Buildroot] [PATCH v2 1/2] support/scripts/pkg-stats: add support for CVE reporting

Titouan Christophe titouan.christophe at railnova.eu
Tue Feb 11 11:15:29 UTC 2020


Hello Thomas,

Thanks for your review !

On 2/11/20 11:02 AM, Thomas De Schampheleire wrote:
> Hi Titouan, Thomas,
> 
> El sáb., 8 feb. 2020 a las 22:58, Titouan Christophe
> (<titouan.christophe at railnova.eu>) escribió:
>>
>> From: Thomas Petazzoni <thomas.petazzoni at bootlin.com>
>>
>> This commit extends the pkg-stats script to grab information about the
>> CVEs affecting the Buildroot packages.
>>
> 
> While I was testing and playing around, I found that the when the
> json.gz file is removed, it is not redownloaded.
> This is because here, the json.gz path is returned without checking
> that it exists.
> (I have some patch for this and next suggestions at the end of this mail).

Good catch, that probably happened because I flipped around the conditions.

> 
> During testing I initially got an error in the 2017 file:
> 
> Traceback (most recent call last):
>    File "support/scripts/pkg-stats", line 917, in <module>
>      __main__()
>    File "support/scripts/pkg-stats", line 906, in __main__
>      check_package_cves(args.nvd_path, {p.name: p for p in packages})
>    File "support/scripts/pkg-stats", line 484, in check_package_cves
>      for cve in CVE.read_nvd_dir(nvd_path):
>    File "support/scripts/pkg-stats", line 231, in read_nvd_dir
>      content = json.load(gzip.GzipFile(filename))
>    File "/usr/lib64/python2.7/json/__init__.py", line 287, in load
>      return loads(fp.read(),
>    File "/usr/lib64/python2.7/gzip.py", line 260, in read
>      self._read(readsize)
>    File "/usr/lib64/python2.7/gzip.py", line 314, in _read
>      self._read_eof()
>    File "/usr/lib64/python2.7/gzip.py", line 353, in _read_eof
>      hex(self.crc)))
> IOError: CRC check failed 0x9b7b00d7 != 0x790947a4L

I'm curious to understand how that happened :o

> 
> I'm not sure how this happened, but after removing the file and
> redownloading it got fixed.
> I then tried reproducing such problem by manually messing with the
> json.gz file, and this created another error:
> 
> Traceback (most recent call last):
>    File "support/scripts/pkg-stats", line 919, in <module>
>      __main__()
>    File "support/scripts/pkg-stats", line 908, in __main__
>      check_package_cves(args.nvd_path, {p.name: p for p in packages})
>    File "support/scripts/pkg-stats", line 486, in check_package_cves
>      for cve in CVE.read_nvd_dir(nvd_path):
>    File "support/scripts/pkg-stats", line 233, in read_nvd_dir
>      content = json.load(gzip.GzipFile(filename))
>    File "/usr/lib64/python2.7/json/__init__.py", line 287, in load
>      return loads(fp.read(),
>    File "/usr/lib64/python2.7/gzip.py", line 260, in read
>      self._read(readsize)
>    File "/usr/lib64/python2.7/gzip.py", line 318, in _read
>      uncompress = self.decompress.decompress(buf)
> zlib.error: Error -3 while decompressing: invalid distance too far back
> 
> I suggest a better error handling for such case (see below).

[SNIP]

> If the the passed nvd path does not exist, the script will fail. I
> suggest creating the dir automatically.
> 
> 
> The below changes implement the suggestions above:
> 
> 
> diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> index 2784a43d05..e8f708537e 100755
> --- a/support/scripts/pkg-stats
> +++ b/support/scripts/pkg-stats
> @@ -27,8 +27,10 @@ import requests  # URL checking
>   import json
>   import certifi
>   import distutils.version
> +import sys
>   import time
>   import gzip
> +import zlib
>   from urllib3 import HTTPSConnectionPool
>   from urllib3.exceptions import HTTPError
>   from multiprocessing import Pool
> @@ -201,10 +203,12 @@ class CVE:
>           print("Getting %s" % url)
>           page_meta = requests.get(url)
>           page_meta.raise_for_status()
> -        if os.path.exists(path_metaf):
> -            # If the meta file already existed, we compare the existing
> -            # one with the data newly downloaded. If they are different,
> -            # we need to re-download the database.
> +        # If the meta file already existed, we compare the existing
> +        # one with the data newly downloaded. If they are different,
> +        # we need to re-download the database.
> +        # If the database does not exist locally, we need to redownload it in
> +        # any case.
> +        if os.path.exists(path_metaf) and os.path.exists(path_jsonf_gz):
>               meta_known = open(path_metaf, "r").read()
>               if page_meta.text == meta_known:
>                   return path_jsonf_gz

=> agreed

> @@ -227,7 +231,13 @@ class CVE:
>           """
>           for year in range(NVD_START_YEAR, datetime.datetime.now().year + 1):
>               filename = CVE.download_nvd_year(nvd_dir, year)
> -            content = json.load(gzip.GzipFile(filename))
> +            try:
> +                content = json.load(gzip.GzipFile(filename))
> +            except (zlib.error, IOError) as e:
> +                print('ERROR: problem reading %s, please remove the
> file and rerun this script.' % filename)
> +                print(e)
> +                sys.exit(1)
> +

I don't find it pythonic to catch the exception then exit. Unless 
there's a really good reason to do so, I'd prefer to let the exception 
bubble up, as it helps diagnosing the issue. Maybe something like:

try:
     content = json.load(gzip.GzipFile(filename))
except:
     # Display an informative message about the problematic file
     print("ERROR: cannot read %s. Please remove the file then rerun 
this script" % filename)
     # Then bubble up the exception
     raise

(Also, exit() is included in the global namespace, so no need to import 
the sys module for that)

>               for cve in content["CVE_Items"]:
>                   yield cls(cve['cve'])
> 
> @@ -892,6 +902,8 @@ def __main__():
>       check_package_latest_version(packages)
>       if args.nvd_path:
>           print("Checking packages CVEs")
> +        if not os.path.exists(args.nvd_path):
> +            os.makedirs(args.nvd_path)

I thought that not creating the directory was a design choice, but this 
indeed makes much more sense.

>           check_package_cves(args.nvd_path, {p.name: p for p in packages})
>       print("Calculate stats")
>       stats = calculate_stats(packages)
> 
> 
> Best regards,
> Thomas
> 

Regards,

Titouan


More information about the buildroot mailing list