[Buildroot] [PATCH 13/15] Makefile: Rsync global {TARGET, HOST}_DIR using exclusion file list

Herve Codina herve.codina at bootlin.com
Fri Jun 25 11:59:51 UTC 2021


Hi,

On Thu, 24 Jun 2021 22:34:10 +0200
"Yann E. MORIN" <yann.morin.1998 at free.fr> wrote:

> Hervé, All,
> 
> Additional revview I forgot previously...
> 
> On 2021-06-24 22:20 +0200, Yann E. MORIN spake thusly:
> > On 2021-06-21 16:11 +0200, Herve Codina spake thusly:  
> > > On a per-package build, rsync final {TARGET,HOST}_DIR using
> > > exclusion file list computed for each packages.
> > > As we rsync only files generated by each packages, we need to
> > > to do this rsync recursively on each dependencies to collect
> > > all needed files. This is done based on existing
> > > <PKG>_FINAL_RECURSIVE_DEPENDENCIES  
> > I am not sure I understand why this is needed...
> > 
> > One thing that comes to mind is speedup. Indeed, now that we can ensure
> > that packages only install new files and don;t change existing files, we
> > can speedup the final aggregation by just handling the new files.
> > 
> > But since we are using hardlinks, the aggregation is rather fast...
> > 
> > Can you expand on the rationale in the commit log when you respin,
> > please?  
> 
> Now I have seen the next patch, it makes sense. Yet, it should be
> explained here nonetheless. ;-)

The next patch just breaks the hardlinks had nothing to do with
the use of <PKG>_FINAL_RECURSIVE_DEPENDENCIES here.

Previously files were collected using a simple rsync and iterating on
$(PACKAGES). The rsync collected all files from current package host or target
so it collected files generated by the current package and the packages it
depends on.

Now, we collect only files generated by the current package and we discard
files generated by the packages it depends on.
Iterating on $(PACKAGES) is no longer enough.

Indeed $(PACKAGES) is the list of packages that are enabled in .config file (ie
selected during 'make menuconfig' or defconfig file)
But some packages (sure for host packages) can have some "hidden" dependencies
by hidden I means not seen or selected by config files.
This "hidden" dependencies are created at Makefile level.

HOST_FOO_DEPENDENCIES = host-bar

We need to collect files from this kind on dependency to generate the final
HOST_DIR.

<PKG>_FINAL_RECURSIVE_DEPENDENCIES contains all makefile level dependencies
a given package depends on.

Finally, we do:
for pkg selected in .config ($(PACKAGES))
        # Add current pkg and packages it depends on at makefile level
        FULL_PACKAGES_AND_DEPENDS += pkg + $(<PKG>_FINAL_RECURSIVE_DEPENDENCIES)
And then we call sort $(FULL_PACKAGES_AND_DEPENDS) to remove duplicates


I can add in the commit log message:
---- 8< ----
Using only packages present in $(PACKAGES) is no longer enough.
Indeed, we collect only files generated for a given package
and $(PACKAGES) contains the list of packages selected in .config file
Some dependencies exists at Makefile level that are no present at .config
file level (at least for host packages).
HOST_FOO_DEPENDENCIES += host-bar
To take into account these dependencies, <PKG>_FINAL_RECURSIVE_DEPENDENCIES
is used in addition to $(PACKAGES).
---- 8< ----

Is that ok for you ?

> 
> [--SNIP--]
> > > +	$(foreach pkg,$(1),\
> > > +		rsync -a --link-dest=$(PER_PACKAGE_DIR)/$(pkg)/$(strip $(2))/ \
> > > +		--filter='merge $($(call UPPERCASE,$(pkg))_DIR)/$(strip $(4))' \
> > > +		$(PER_PACKAGE_DIR)/$(pkg)/$(strip $(2))/ \
> > > +		$(3)$(sep))  
> [--SNIP--]
> > Furthermore, this rsync filter is non-obvious. It would be nice to
> > explain the commit log that the list is already a proper filer, not just
> > a list.
> > Additionally, in the commit that generates that list, the commit log
> > should also mention that a list of filters is created, not a list of
> > filenames.  
> 
> And to begin with, why are we generating an _exclude_ list, rather than
> an _include_ list?
> 
> I.e. rather than rsync everything except what we don't want, why not
> just rsync just what we want, and generate a list of filters about
> exactly just the files installed by a package?
> 
> I.e. in previous patch, crea create '+' filters rather than '-' filters.
> Also, I think using the full-length filter 'include' is nicer than the
> shorter '+'.
> 
> Probably something like:
> 
>     LC_ALL=C comm -1 \
>         $($(PKG)_DIR)/.files-final-rsync$(2).before \
>         $($(PKG)_DIR)/.files-final-rsync$(2).after \
>         | sed -r -e 's/^[^,]+,./include /' \
>         > $($(PKG)_DIR)/.files-final-rsync$(2).exclude_rsync  
> 
> (Totally untested, slippery code, excersise caution.)
> 

Inclusion filter is not so simple ...

In rsync man, we can read:
---- 8< ----
Note  that,  when  using the --recursive (-r) option (which is implied by -a),
every subdir component of every path is visited left to right, with each
directory having a chance for exclusion before its content.  In this way
include/exclude patterns are applied recursively to the pathname of each node in
the filesystem's tree (those  inside  the  transfer).
The exclude patterns short-circuit the directory traversal
stage as rsync finds the files to send.

For  instance,  to include "/foo/bar/baz", the directories "/foo" and
"/foo/bar" must not be excluded.  Excluding one of those parent directories
prevents the examination of its content, cutting off rsync's recursion into
those paths and rendering the include for "/foo/bar/baz" ineffectual (since rsync
can't match something it never sees in the cut-off section of the directory
hierarchy).

The concept path exclusion is particularly important when using a trailing '*' rule.
For instance, this won't work:

           + /some/path/this-file-will-not-be-found
           + /file-is-included
           - *

This  fails because the parent directory "some" is excluded by the '*' rule, so
rsync never visits any of the files in the "some" or "some/path" directories.
One solution is to ask for all directories in the hierarchy to be included by
using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps
use the  --prune-empty-dirs  option.
Another solution is to add specific include rules for all the parent dirs that
need to be visited.  For instance, this set of rules works fine:

           + /some/
           + /some/path/
           + /some/path/this-file-is-found
           + /file-also-included
           - *

---- 8< ----


I feel tricky and error prone to use include filter and really simpler to take the
problem on the other side.
Instead of including what we want, exclude what we don't want.

The filter file passed to rsync is as simple as:
  - /foo/bar/not_thisone
  - /foo/not_thisone_too
  - /not_thisone

With this filter file, rsync will include all files that are not mentioned without
any more needs.


Regards,
Hervé

-- 
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com



More information about the buildroot mailing list