[Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files
Thomas De Schampheleire
patrickdepinguin at gmail.com
Tue Jan 8 15:07:31 UTC 2019
El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
(<yann.morin.1998 at free.fr>) escribió:
>
> The existing format for the packages-files lists has two drawbacks:
>
> - it is not very resilient against filenames with weird characters,
> like \n or \b or spaces...
>
> - it is not easily expandable, partly because of the above.
>
> AS such, introduce a new format for those files, that solves both
> issues.
>
> First, we must find a way that unambiguously separate fields. There is
> one single byte that can never ever occur in filenames or package names,
> i.e. the NULL character. So, we use that as a field separator.
>
> Second, we must find a way to unambiguously separate records. Except for
> \0, any character may occur in filenames, but the other existing field
> we currently have is the package name, which we do know does not contain
> any weird byte (e.g. it's basically limited to [[:alnum:]_-]). Thus, we
> can't use a single character as record separator. A solution is to use
> \0\n as the record separator.
>
> Thirdly, we must ensure that filenames never mess up with our
> separators. By making the filename the first field, we can be sure that
> it is properly terminated by a field separator, and that any leading \n
> does not interfere with a previous field separator to form a spurious
> record separator.
>
> So, the new format is now (without spaces):
>
> filename \0 package-name \0\n
>
> Update the parser accordingly.
>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998 at free.fr>
> ---
> package/pkg-generic.mk | 8 ++++++--
> support/scripts/brpkgutil.py | 28 ++++++++++++++++++++++++----
> 2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 7daea190a6..d261b5bf76 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -59,6 +59,10 @@ GLOBAL_INSTRUMENTATION_HOOKS += step_time
>
> # The suffix is typically empty for the target variant, for legacy backward
> # compatibility.
> +# Files are record-formatted, with \0\n as record separator, and \0 as
> +# field separator. A record is made of these fields:
> +# - file path
> +# - package name
> # $(1): package name
> # $(2): base directory to search in
> # $(3): suffix of file (optional)
> @@ -66,8 +70,8 @@ define step_pkg_size_inner
> cd $(2); \
> find . \( -type f -o -type l \) \
> -newer $@_before \
> - -exec printf '$(1),%s\n' {} + \
> - >> $(BUILD_DIR)/packages-file-list$(3).txt
> + |sed -r -e 's/$$/\x00$(1)\x00/' \
> + >> $(BUILD_DIR)/packages-file-list$(3).txt
> endef
>
> define step_pkg_size
> diff --git a/support/scripts/brpkgutil.py b/support/scripts/brpkgutil.py
> index d15b18845b..f6ef4b3dca 100644
> --- a/support/scripts/brpkgutil.py
> +++ b/support/scripts/brpkgutil.py
> @@ -5,6 +5,26 @@ import sys
> import subprocess
>
>
> +# Read the binary-opened file object f with \0\n separated records (aka lines).
> +# Highly inspired by:
> +# https://stackoverflow.com/questions/19600475/how-to-read-records-terminated-by-custom-separator-from-file-in-python
> +def _readlines0n(f):
> + buf = b''
> + while True:
> + newbuf = f.read(1048576)
I would find 1024 * 1024 more readable.
> + if not newbuf:
> + if buf:
> + yield buf
> + return
> + if buf is None:
> + buf = b''
> + buf += newbuf
> + lines = buf.split(b'\x00\n')
> + for line in lines[:-1]:
> + yield line
> + buf = lines[-1]
> +
> +
> # Iterate on all records of the packages-file-list file passed as filename
> # Returns an iterator over a list of dictionaries. Each dictionary contains
> # these keys (others maybe added in the future):
> @@ -12,11 +32,11 @@ import subprocess
> # 'pkg': the last package that installed that file
> def parse_pkg_file_list(path):
> with open(path, 'rb') as f:
I now understand why you read as binary.
> - for rec in f.readlines():
> - l = rec.split(',0')
I still think this ',0' was wrong.
> + for rec in _readlines0n(f):
> + srec = rec.split(b'\x00')
> d = {
> - 'file': l[0],
> - 'pkg': l[1],
> + 'file': srec[0],
> + 'pkg': srec[1],
and I now see how the swap in a previous commit could go unnoticed in
your testing :-)
/Thomas
More information about the buildroot
mailing list