[Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python

Thomas De Schampheleire patrickdepinguin at gmail.com
Tue Jan 8 14:56:49 UTC 2019


El lun., 7 ene. 2019 a las 23:05, Yann E. MORIN
(<yann.morin.1998 at free.fr>) escribió:
>
> The existing check-bin-arch script is written in shell, so it can't make
> use of all the helpers we have in python, especially the parser for the
> package-files lists.
>
> Although this script is relatively clean as it is, it is not totally
> fool-proof either, especially against weird filenames (e.g.
> specially-crafted filenames with \n, or \b, etc...).
>
> Also, shell scripts are often frowned upon but for the most mundane
> processing, and this script is definitely not mundane.
>
> Finally, shell scripts are slow, as all the processing the have to do is
> more often than not done by spawning programs, and that is relatively
> expensive.
>
> Rewrite that script in python.
>
> This allows to do cleaner processing and reuse the package-files list
> parsuer.
>
> There's however a drawback: the script grows substantially, in part
> because of spawning the actual readelf call (there is no ELF parser in
> the standard python library), and because we want to keep backward
> compatible with old pythons that lack proper abstractions like
> subprocess.DEVNULL et al.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998 at free.fr>
> Cc: Thomas De Schampheleire <patrickdepinguin at gmail.com>
> ---
>  support/scripts/check-bin-arch | 205 +++++++++++++++++++++++------------------
>  1 file changed, 113 insertions(+), 92 deletions(-)
>
> diff --git a/support/scripts/check-bin-arch b/support/scripts/check-bin-arch
> index 66b8d89932..d4902163e7 100755
> --- a/support/scripts/check-bin-arch
> +++ b/support/scripts/check-bin-arch
> @@ -1,92 +1,113 @@
> -#!/usr/bin/env bash
> -
> -# List of hardcoded paths that should be ignored, as they may
> -# contain binaries for an architecture different from the
> -# architecture of the target.
> -declare -a IGNORES=(
> -       # Skip firmware files, they could be ELF files for other
> -       # architectures
> -       "/lib/firmware"
> -       "/usr/lib/firmware"
> -
> -       # Skip kernel modules
> -       # When building a 32-bit userland on 64-bit architectures, the kernel
> -       # and its modules may still be 64-bit. To keep the basic
> -       # check-bin-arch logic simple, just skip this directory.
> -       "/lib/modules"
> -       "/usr/lib/modules"
> -
> -       # Skip files in /usr/share, several packages (qemu,
> -       # pru-software-support) legitimately install ELF binaries that
> -       # are not for the target architecture
> -       "/usr/share"
> -
> -       # Skip files in /lib/grub, since it is possible to have it
> -       # for a different architecture (e.g. i386 grub on x86_64).
> -       "/lib/grub"
> -)
> -
> -while getopts p:l:r:a:i: OPT ; do
> -       case "${OPT}" in
> -       p) package="${OPTARG}";;
> -       l) pkg_list="${OPTARG}";;
> -       r) readelf="${OPTARG}";;
> -       a) arch_name="${OPTARG}";;
> -       i)
> -               # Ensure we do have single '/' as separators,
> -               # and that we have a leading and a trailing one.
> -               pattern="$(sed -r -e 's:/+:/:g; s:^/*:/:; s:/*$:/:;' <<<"${OPTARG}")"
> -               IGNORES+=("${pattern}")
> -               ;;
> -       :) error "option '%s' expects a mandatory argument\n" "${OPTARG}";;
> -       \?) error "unknown option '%s'\n" "${OPTARG}";;
> -       esac
> -done
> -
> -if test -z "${package}" -o -z "${pkg_list}" -o -z "${readelf}" -o -z "${arch_name}" ; then
> -       echo "Usage: $0 -p <pkg> -l <pkg-file-list> -r <readelf> -a <arch name> [-i PATH ...]"
> -       exit 1
> -fi
> -
> -exitcode=0
> -
> -# Only split on new lines, for filenames-with-spaces
> -IFS="
> -"
> -
> -while read f; do
> -       for ignore in "${IGNORES[@]}"; do
> -               if [[ "${f}" =~ ^"${ignore}" ]]; then
> -                       continue 2
> -               fi
> -       done
> -
> -       # Skip symlinks. Some symlinks may have absolute paths as
> -       # target, pointing to host binaries while we're building.
> -       if [[ -L "${TARGET_DIR}/${f}" ]]; then
> -               continue
> -       fi
> -
> -       # Get architecture using readelf. We pipe through 'head -1' so
> -       # that when the file is a static library (.a), we only take
> -       # into account the architecture of the first object file.
> -       arch=$(LC_ALL=C ${readelf} -h "${TARGET_DIR}/${f}" 2>&1 | \
> -                      sed -r -e '/^  Machine: +(.+)/!d; s//\1/;' | head -1)
> -
> -       # If no architecture found, assume it was not an ELF file
> -       if test "${arch}" = "" ; then
> -               continue
> -       fi
> -
> -       # Architecture is correct
> -       if test "${arch}" = "${arch_name}" ; then
> -               continue
> -       fi
> -
> -       printf 'ERROR: architecture for "%s" is "%s", should be "%s"\n' \
> -              "${f}" "${arch}" "${arch_name}"
> -
> -       exitcode=1
> -done < <( sed -r -e "/^${package},\.(.+)$/!d; s//\1/;" ${pkg_list} )
> -
> -exit ${exitcode}
> +#!/usr/bin/env python
> +
> +import argparse
> +import os
> +import re
> +import subprocess
> +import sys
> +from brpkgutil import parse_pkg_file_list as parse_pkg_file_list
> +
> +
> +# List of hardcoded paths that should be ignored, as they may contain
> +# binaries for an architecture different from the architecture of the
> +# target
> +IGNORES = {
> +    # Skip firmware files
> +    # They could be ELF files for other architectures.
> +    '/lib/firmware',
> +    '/usr/lib/firmware',
> +
> +    # Skip kernel modules
> +    # When building a 32-bit userland on 64-bit architectures, the
> +    # kernel and its modules may still be 64-bit. To keep the basic
> +    # check-bin-arch logic simple, just skip these directories
> +    '/lib/modules',
> +    '/usr/lib/modules',
> +
> +    # Skip files in /usr/share
> +    # Several packages (qemu, pru-software-support) legitimately install
> +    # ELF binaries that are not for the target architecture
> +    '/usr/share',
> +
> +    # Skip files in /lib/grub
> +    # It is possible to have it for a different architecture (e.g. i386
> +    # grub on x86_64)
> +    '/lib/grub',
> +}
> +
> +
> +ERROR = 'ERROR: architecture for "%s" is "%s", should be "%s"\n'
> +
> +
> +def main():
> +    parser = argparse.ArgumentParser()
> +    parser.add_argument('--package', '-p', metavar='PACKAGE', required=True)
> +    parser.add_argument('--pkg-list', '-l', metavar='PKG_LIST', required=True)
> +    parser.add_argument('--readelf', '-r', metavar='READELF', required=True)
> +    parser.add_argument('--arch', '-a', metavar='ARCH', required=True)
> +    parser.add_argument('--ignore', '-i', metavar='PATH', action='append')

I think the above 'metavar' options are not necessary, as they are the
default value.

> +    args = parser.parse_args()
> +
> +    if args.ignore is not None:
> +        # Ensure we do have single '/' as separators, and that we have a
> +        # leading and a trailing one, then append to the global list.
> +        for pattern in args.ignore:
> +            IGNORES.add(re.sub('/+', '/', '/{0}/'.format(pattern)))
> +

Note that the global list itself does not have trailing slashes. This
seems inconsistent to me.

> +    ignores_re = set()
> +    for i in IGNORES:
> +        ignores_re.add(re.compile(i))
> +
> +    arch_re = re.compile('^  Machine: +(.+)')
> +
> +    target_dir = os.environ['TARGET_DIR']
> +
> +    exit_code = 0
> +    for record in parse_pkg_file_list(args.pkg_list):
> +        if record['pkg'] != args.package:
> +            continue
> +
> +        fpath = record['file']
> +
> +        ignored = False
> +        for i in ignores_re:
> +            if i.match(fpath):
> +                ignored = True
> +                break
> +        if ignored:
> +            continue

We can never get into this if, right?, because there is already a
break whenever ignored is set to True.

Note that I think that performance will be better with a list
comprehension instead of explicit for's. Something like (untested):

valid_records = [ record for record in parse_pkg_file_list(args.pkg_list)
            if record['pkg'] == args.package
            and not any(ignore_re.match(record['file']) for ignore_re
in ignores_re) ]
for record in valid_records:
    ...


You can normally swap the [ ] for ( ) to reduce memory consumption
(generator iso list comprehension, yielding one entry at a time).

You can argue about readability, but for me this is not less readable
than the expanded version.

/Thomas


More information about the buildroot mailing list