[RFC] Reorganizing the download directory.

Rob Landley rob at landley.net
Sun Feb 21 23:35:50 UTC 2010


Huh, found this lying around unsent.  (Thought it had gone out way back when.)

Quite possibly a dead horse, but I might as well hit "send". :)

(Yes, still poking at a zlib rewrite, but got rather distracted debugging a 
sparc issue first...)

Rob

On Monday 08 February 2010 17:27:14 Denys Vlasenko wrote:
> On Monday 08 February 2010 07:55, Rob Landley wrote:
> > > > I'd like to do a couple things to http://busybox.net/downloads/
> > > >
> > > > 1) Move the actual source tarballs into a "sources" subdirectory
> > > > (there's a lot of them now, makes it hard to see anything else in
> > > > downloads).
> > > >
> > > > I note that this change is potentially disruptive to people who
> > > > download via script, but it should only have to be done once.  (And
> > > > we already did it for the "legacy" directory with the pre-1.0
> > > > versions back when clutter was getting overwhelming.)
> > >
> > > In order to not break the script, I propose just moving "sufficiently
> > > old" releases to legacy/
> >
> > I thought about that.  The question is how to define "sufficiently old". 
> > It also means that all download scripts eventually break as things get
> > moved in future.
>
> You convinced me. I propose leaving all .tar.bz2 files in
> downloads/, move old -pre/-rc and .gz files to legacy/,
> and stop generating new .gz files. .bz2 is ~15 years old.
> Anyone who did not switch to bz2 yet can do it trivially.

*shrug*  Not what I would have done, but it's a reasonable approach.

I suspect the releases predating 1.0 can probably go in legacy even if they're 
bzip files...

> > Possibly if the download directory is going to accumulate enough stuff
> > other than the project's source code release tarballs, then the release
> > tarballs should be in their own "sources" subdirectory?  (I suggest
> > "sources" plural because downloads is plural, which I always had a hard
> > time remembering when "download.html" was singular.)
>
> I don't want to have too many (or too deep) subdirs.

Understandable.

> Why do you want to make download/ as small as possible?
> Do you frequently work with it interactively?

Yes actually, but it's more that I try to keep the newbie perspective in mind.  
What would somebody who's never seen this directory for get out of it, and 
what are they going to _miss_?  Somebody who doesn't already know what's 
available, and thus can easily miss stuff because it didn't occur to them to 
look.

However, a README that describes what's avilable is another way of dealing 
with that.  And just moving "fixes" out of the top level cuts down over 1/3 of 
the entries in there, and also means "there's only one big long run of 
alphabetized busybox-* files, and then everything else is either before or 
after that", which is less cognitive load for a newbie to parse.

> > > I actually went ahead and moved those to legacy/, does it look ok now?
> >
> > It looks fine for now, but it's a short term fix.  The oldest source
> > tarball at the top level is currently march 2008.  So any download script
> > that's a full 2 years old (February 2008) is already broken.
>
> Moved all .bz2 back to top level.

By the way, I'm not suggesting there _is_ a perfect solution.  Just "hmmm, 
this seems like it could be improved"... :)

> > It really depends whether you want to break it once in a big way and be
> > done with it, or break it regularly in small ways that don't
> > inconvenience as many people but which never ends...
>
> I prefer to not break it at all.
>
> > > Another thing: we can stop generating .gz files, I think.
> > > It's year 2010 after all.
> >
> > Except that raises the question "should we produce lzma files"?
>
> lzma is maybe 5 years in more-or-less wide use,
> and jumped file formats a few times.

True, but so did bzip->bzip2, and the kernel was producing .bz2 tarballs for 
the 2.0 releases circa 1996.  (Although bzip2 was maintained on kernel.org and 
that helped drive adoption of the format, so that's not exactly unbiased.)

For reference, lzma was installed by default on my crufty old ubuntu 9.04 
laptop, and ftp://ftp.gnu.org/gnu/gzip/gzip-1.3.13.tar.xz showed up in 
october, and ftp://ftp.gnu.org/gnu/glibc/glibc-2.11.tar.xz in november.  That 
said, the most recent gcc, binutils, and gdb releases are still "gz+bz2" with 
no xz (yet).

I'm not saying there's a need to jump on the bandwagon immediately.  The first 
busybox bz2 tarball seems to have been in 2002, when bzip2 was 7 years old, so 
it's still a little bit early by our own history.  (Although we've had support 
for the format in busybox itself for a few years now.)

I'm mostly just questioning the importance of the bz2 format.  Long term, it 
seems like it's going to get squished between "fast and cheap, good for 
streaming" gzip and "best available compression" lzma.  Also, determining 
which releases are worth keeping at the top level based on which archive 
format they're in strikes me as a questionable criterion...

> > I personally use the bz2 files, but other than ubiquitous availability
> > the format hasn't really got anything to recommend it.  It's neither as
> > efficiently encoded/decoded as gzip (which is why streaming protocols
> > like ssh -C will continue to use gzip), or as space efficient as lzma
> > (which does not yet have universal availability, and asking people to
> > download the busybox source in a file format they need busybox to
> > decompress would be silly).
>
> The point is, lets stick either to .gz or .bz2, they both are widely known.
> Since .bz2 is smaller, I picked .bz2. No other reason.

*shrug*  I see your point.

My perspective is that I'm revisiting the downloads directory after it's had 
several years to accumulate cruft, and suggesting a bit of tidying.  I'd like 
to tidy it in such a way that it _stays_ tidy and won't have to be revisited 
in another 5-7 years, since moving stuff around in their breaks automated 
downloaders and people like me who _do_ use it interactively are a bit weird. 
:)

Too bad there's no rss feed for source downloads...

> > In general, I tended to take my queues from the linux kernel, and they're
> > still producing gz files.  But it's your call.  Back in 2003 the kernel
> > needed ftp://www.kernel.org/pub/README_ABOUT_BZ2_FILES but now the main
> > http://kernel.org/ page's "full source" links are straight to the bz2
> > files and nobody blinks...
>
> Exactly.
>
> > > > 4) Is anybody still using the "qemu" directory (the one with a Red
> > > > Hat 9 image in it for regression testing)?  As far as I know it still
> > > > works fine, but I dunno what our minimum acceptable development
> > > > environment is these days.
> > >
> > > Didn't try.
> >
> > I know the image still works, I mean as far as I know the current busybox
> > source still builds under it.
>
> Tested, found one fixable warning and one incompatibility (O_NOATIME isn't
> defined)

Cool.

It's your call whether that environment is still supported, but it's nice to 
know _what_ our oldest supported Linux environment is.  (If it works in RH9, 
it has no excuse _not_ to work in slackware-10 or some such.  It would mean 
that distro introduced a regression, rather than that we've suddenly started 
depending on a debian-ism or something.)

> > > > 5) Move "README" to "README-busybox" (what updates this, by the way? 
> > > > Is there some kind of commit hook?  Does it match what's currently in
> > > > source control?),
> > >
> > > It seems redundant and obsolete, ok to delete?
> >
> > Look at the nav bar on the main page, under documentation, there's a link
> > to that README.  It's also in the source tarball.
> >
> > It's not a bad thing to have (the linux kernel source has a README too),
> > but it could use a little love...
>
> If the thing was never looked at for years, perhaps it is not needed.
> FAQ and in-tarball doc covers everything README currently contains.
>
> Removed the link. Did not remove README itself for now.

"Never looked at by the developers" and "never looked at by newbies 
encountering busybox for the first time" are two different things... :)

I find that newbies are often the least likely to speak up, partly because they 
know that the answer's probably in something they haven't read yet and don't 
want to get flamed, and partly because if they can't find what they're looking 
for they're as likely to leave as to ask.  (Lots of projects are decidedly 
_unfriendly_ to newbies.  I try to go out of my way in the other the other 
direction, which isn't easy to get right...)

I used to be the one to update the README.  To _me_ the most useful thing in 
it was the list of external source packages busybox could completely replace.  
The one in downloads says:

  bzip2, coreutils, file, findutils, gawk, grep,  inetutils, modutils,
  net-tools, procps, sed, shadow, sysklogd, sysvinit, tar, util-linux, and vim

The one in git says:

  bzip2, coreutils, dhcp, diffutils, e2fsprogs, file, findutils, gawk, grep,
  inetutils, less, modutils, net-tools, procps, sed, shadow, sysklogd,
  sysvinit, tar, util-linux, and vim.

And even the git version is a bit out of date by now.  (No mention of lzma, 
for one thing... And then there's the partial replacements, such as rpm and 
dpkg, which are their own list...)

*shrug*  There should be a place for this info.  Web page or in the source, 
either way.  I admit that the README is a bit easy to overlook (since the rest 
of our docs are on the web or in docs/) but when a newbie downloads a source 
tarball it _is_ the first thing they look for.  Maybe it should be a symlink to 
docs/README. :)

> > > > 6) Add a "binaries/$VERSION" directory with prebuilt "not quite
> > > > allyesconfig" busybox binaries for a bunch of different targets,
> > > > based on the current version of the source and statically linked
> > > > against uClibc. See
> > > > http://busybox.net/~landley/binaries/1.16.0 for what I'd put there.
> > >
> > > Wow, I like it.
>
> Copied to:
>
> http://busybox.net/downloads/binaries/1.16.0/

Woot.

I'm working on 32-bit powerpc and powerpc-440 builds now.

(I can do them by hand just fine, it's getting the automation to do it from a 
cron job that's fiddly.  My setup has the read-only build environment in 
/dev/hda, a writeable home directory in /dev/hdb, and then the build script 
and source tarballs in /dev/hdc.  The hda init script hands off control to the 
hdc build script if it exists, otherwise you get an interactive shell prompt.  
Unfortunately, the qemu "g3beige" board is darn fiddly about supplying a third 
drive, and last night's attempts to make it work actually managed to SEGFAULT 
THE EMULATOR.  Reliably, in the most recent release version.  Yeah, I know: I 
break stuff.  Off to try current qemu-git, and then bother the qemu people about 
it if it's still happening in current source control...)

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds


More information about the busybox mailing list