[patch] Touching tar with tweezers...

Glenn L. McGrath bug1 at iinet.net.au
Tue Dec 13 23:48:29 UTC 2005


On Mon, 29 Aug 2005 23:40:29 -0500
Rob Landley <rob at landley.net> wrote:

> Code removal.  Always fun.
> 
> Does the attached "svn diff" look sane?  It makes filter_accept_reject_list() do all the
> filtering, which means we can shoot archival/libunarchive/filter_accept_all.c and
> filter_accept_list.c in the head.  (This patch doesn't delete them because
> that's a svn action, but with it applied those two files are not used.)
> 
> Along the way, I cleaned up filter_accept_reject_list() itself, and untangled
> the return values.  Returning FALSE to indicate success and TRUE to indicate
> failure is just wrong, but since EXIT_SUCCESS is 0 and EXIT_FAILURE
> is nonzero (because they're program exit codes, not truth values), that's
> exactly what it was doing...
> 
> I left filter_accept_list_reassign() alone because I don't know what to do
> with it (it has nothing to do with filtering), but it's not currently
> connected up to anything.  As a result, this patch breaks dpkg (and dpkg_deb),
> or at least the dynamic filetype detection bit.  The references were just
> commented out with no attempt to make that actually work because I neither use
> it nor understand it.  (Anybody else want to take a swing at this?)
> 
> This is probably the start of more widespread cleanups to the archive stuff
> because, well, I made the mistake of reading some of it...

Its complicated, the filtering function is tied into a function pointer
that is shared by all the unarchiving code.

tar is by far the most complex filtering, and anything that can work
with tar can be made to fit everywhere else.

Here are some ground rules.
 - If a pattern is specified on the command line, only extracts
filenames that match the pattern.
 - If -T was specified extract files that match the patterns in that
file.
 - If -X was specified dont extract files that match the patterns in
that file.
 - If --exclude was specified dont extract the pattern used as the
argument.
 - If a file matches an accept and reject pattern, reject it.
 - If any accept pattern doesnt get a match then report it as an error.

It may be worth considering performance, filter_accept_reject_list does
two checks for every file in the archive, filter_accept_list does one,
and accept_all just checks for a filename.

With a tar.gz or tar.bz2 we apply the filtering to the lowest layer,
tar, we skip compression.

A .deb file is an ar archive containing three files debian,
control.tar.gz and data.tar.gz,

an example of the filtering code in debs is dpkg-deb -e, it will parse
the ar layer and find control.tar.gz then parse control.tar.gz to
filter out just the ./control file. So its going through an extra stage
as compared to a tar.gz

I extended busybox dpkg/dpkg-deb so it could process control.tar.bz2 or
data.tar.bz2 as well.

It is enabled at the bottom of the archiving section
 --- Common options for dpkg and dpkg_deb
	[*]   gzip debian packages (normal)
	[*]   bzip2 debian packages    

In this case the filtering code is used to decide what action to take
on the file being filtered.
eg. if the file that matches data.tar.gz then sue the tar.gz code, if
its data.tar.bz2 use the tar.bz2 code.

In theory its a good improvment to save space, but i think im the only
one who ever used the feature (i repackaged a .deb by hand) i guess it
was a waste of effort and helped nobody...

I did (and still do) have the idea of having an unarchiving applet that
will extract any archive format automatically, but thats a story for
another day...


Back to the question;

It might work, but its very hairy.

The easiest way to test things is to modify init_handle.c to
-       archive_handle->filter = filter_accept_all;
+       archive_handle->filter = filter_accept_reject_list;

Same thing with all references to filter_accept_list, if code needs to
be modifed beyond that to make thearchiving code work then its very
slippery slope.


Glenn



More information about the busybox mailing list