tar and the semantics of "filenames"

Robert P. J. Day rpjday at mindspring.com
Wed Apr 19 11:10:03 UTC 2006


  to follow up on natanael's posting from yesterday, this is what
appears to be happening with respect to "tar" and the format of
filenames and why we need to clarify the semantics *here* as well.

  as natanael pointed out, if you look at get_header_tar.c, near the
bottom, you see that routine stripping the possible trailing "/" from
directory names *as they come out of the tarball*:

  { /* Strip trailing '/' in directories */
    /* Must be done after mode is set as '/' is used to check if its a directory */
      char *tmp = last_char_is(file_header->name, '/');
      if (tmp) {
           *tmp = '\0';
      }
  }

before assigning the final filename to the "file_header" structure.

  personally, i had never realized that older versions of tar stored a
literal "/" in the filename area of the header.  that code definitely
implies that that trailing slash thing is an older version thing:

  #ifdef CONFIG_FEATURE_TAR_OLDGNU_COMPATIBILITY
      if (last_char_is(file_header->name, '/')) {
             file_header->mode |= S_IFDIR;
      } else
  #endif

  so, two observations.  first, i was a little surprised that, even
today, GNU tar is adding the trailing slash in the header.  doesn't
hurt, of course -- i guess that's to make sure newly-created tarballs
are still compatible with much older versions of tar.  no problem.

  second, i think it's correct behaviour that get_header_tar strips
that trailing slash before storing the filename internally.  after
all, that trailing slash isn't really part of the filename so i
completely agree with that behaviour.

  however, because of that, when it comes to printing the listing of a
tar archive, then (unlike GNU tar) BB tar *doesn't* print that
trailing slash for directories, which i think it *should* to be
visually similar.  fixing this would require having to rewrite both
header_list.c and header_verbose_list.c somehow.  and i think this
might be important since it's possible people have written shell
utility scripts that process tarballs by *using* that trailing slash
as an indicator.  yes, it sounds incredibly trivial, but i'm convinced
BB tar should emulate GNU tar here.  how to do that is open for
discussion.  but that's not all.

  the format of that internal filename depends on how *exactly* you
create the tarball.  there's a difference between:

  $ tar cf r.tar r.tif
  $ tar cf r.tar ./r.tif

in the second case, the internal filename will start with that "./"
prefix.  so what should happen there?  when you use GNU tar for
explicit extraction, it *requires* you to add that prefix if that's
how the tarball was created.

  $ tar xf r.tar r.tif
  tar: r.tif: Not found in archive
  tar: Error exit delayed from previous errors
  $ tar xf r.tar ./r.tif 	(OK)
  $

in any event, time to think about how to resolve this stuff?

rday



More information about the busybox mailing list