bunzip2 fails to decompress pbzip2-compressed files
rob at landley.net
Sat Nov 6 23:40:22 UTC 2010
On Friday 05 November 2010 20:27:47 Denys Vlasenko wrote:
> On Wed, Nov 3, 2010 at 7:09 PM, Rob Landley <rob at landley.net> wrote:
> >> > I thought it was inherent in the mandate of the project, but
> >> > apparently not. The focus these days is on features, adding more and
> >> > more, always making the project bigger and more complicated.
> >> >
> >> > I look around and everywhere see things that aren't that hard to clean
> >> > up,
> >> Which ones (except those mentioned in TODO)?
> > It's sort of a constant background thing.
> > If you want a specific example, there's bound to be a way to simplify
> > editors/vi.c. Or miscutils/less.c.
> Ohh, I *gladly* would take patches which simplify these.
> Or patches which fix them wrt Unicode. Or both.
My point was that finding this stuff is easy. Dealing with it is the part that
requires a lot of time and careful thought.
Making any change to busybox requires reading through the code that's there to
gain a broad enough understanding of it that you're not making it worse. And
I can't do that anymore without coming across buckets of tangents that need
doing, and I tend to lose track of my original goal.
Last time I seriously engaged with BusyBox it took over my life for a couple
years. Which meant the rest of my Linux work essentially rolled to a stop for
a while. Now I'm back getting a minimal native development environment to
boot and run on over a dozen different hardware architectures that QEMU
emulates, and getting existing architectures to _keep_ working is a heck of a
Red Queen's race:
And so on. Not counting the perl removal patches I still haven't gotten
upstreamed into the kernel, or the pending uClibc NPTL mess, or my supposed
goal of bootstrapping Linux From Scratch, Gentoo, Fedora, and Ubuntu to
natively under the resulting system.
Or other things I _want_ to do like learn Lua and reimplement toybox in it,
turn tinycc into qcc by ripping the back-end off and replacing it with QEMU's
TCG, or testing out all the new device tree stuff that's going into the kernel,
or helping out with the new llvm/clang work to come up with a viable
replacement compiler for the political morass GCC has become, or do a "hello
world" kernel for each target stripped down enough that with a bit of kexec
magic you could use Linux as its own bootloader, or reinstalling my laptop
with Gentoo instead of an Ubuntu version so stale the update manager gave me a
"no more updates, upgrade already" pop-up last week...
(Or the whole "get a new day job" thing since my contract with Qualcomm ran
out on the 31st and the department's new budget won't be approved before
january at the earliest so they couldn't renew it. But I'm used to having
time between contracts, that's when I get the bulk of my open source
programming done. :)
It's not that I don't want to work on busybox, it's that the scope of the
problem is beyond the time commitment I can offer. The project is pervasively
messy, and continuing to get messier, to the point where just poking at it a
couple days a month can't hope to keep up with the continuing influx of mess.
I keep bookmarking things like this:
Which was a 30 second fix: all those #ifdefs could be if() statements. I
realize you don't see this is a problem, but I do. Henry Spencer nicely sums
up why here:
And Greg Kroah-Hartman covered it in his kernel coding style talk (this slide
and the next two, and page 6 of the corresponding paper):
But by the time I read that message on the mailing list you're already applied
it, and by the time I sat down to deal with the resulting code it had changed
again to an even denser forest of #ifdefs, and if I have to argue about _why_
removing them is a good thing that takes even more time...
And when people ask where the mess needing cleanup is as if they can't see it,
or act like #ifdef removal is black magic it takes special talent to do, or
when you say you wonder how I came up with such a small sha1sum
implementation... I find that really depressing.
I am not a very good coder. By my standards, I suck at this. I really do. I
just don't let sucking at it stop me from trying to figure out how to make it
suck _less_. The fact that I can't always manage doesn't make the goal any
less worthwhile, and I would LOVE if other people could do a better job at
this so I didn't have to.
You don't see the code I throw away, or all time time I spend _thinking_
before coding. One of the reasons I tend to have three or so open source
projects ongoing at once (when I'm not just banging out some barely functional
schlock to make a deadline and HOPE they throw it away afterwards) is that I
get writer's block. Not because I can't figure out how to make it work but
because I can't figure out how to do it RIGHT. Because I haven't yet convinced
myself I've minimized the suck. I haven't got the DESIGN right, which means
I'm not thinking about the problem the right way yet, and it's far easier to
tell I've got it wrong than to figure out what right is.
You'd think the BusyBox project would be the right place for computational
Dorodango if anywhere was, especially five years after its' 1.0 release where
it's supposedly code complete and presumably implementing all of the Single
Unix Specification's command line stuff it cares to. You'd think the focus
would switch to doing what it already does better.
But no, the focus is on adding more to the project. New commands, new
features, new complexity... *shrug*
> All these ideas seem like good ones to me.
Again, see "belling the cat". Ideas are not the limiting factor.
> >> I wouldn't say 'nobody'.
> > It is no longer the majority opinion.
> I actually look at code size VERY closely. See my other recent mail
> where I show that there is, on average, net reduction in size
> since 1.00 on the same config.
Great, but there's a pitfall here. You know how pointy haired managers love
quoting the phrase "You can't manage what you can't measure"?
To which the rebuttal is Einstein's quote, "Not everything that counts can be
counted, and not everything that can be counted counts":
The failure mode is "managing what you can measure". Focus on what you can
measure and consider it more important than what you can't.
BusyBox doesn't have a metric for simplicity, but it does have a metric for
size (and to a lesser extent functionality: you can enumerate new features and
even add regression tests for them). Thus size/features are what you think
about, what you constantly check, it becomes more important over time, and
comes to eclipse simplicity.
This gets you in big trouble when the metric you've got is only a proxy for
the thing you really want, because people game the system. When IBM focused
on KLOCS (thousands of lines of code) as a programmer productivity metric,
their employees cut and pasted their way to productivity bonuses and the
actual code quality suffered tremendously until they stopped incentivizing that
This isn't just a programming thing, it's the same failure mode which leads
corporations to take away the free towels in the gym, because the cost of
providing the service can be easily measured but the morale boost it gives the
employees can't, therefore one is "real" and the other isn't.
Even when the metric is real and important, it tends to lead to the things you
can't as easily measure getting ignored, because they're harder to think
about. You have to make an effort to see them. It's an easy trap to fall into
and a hard problem to solve, especially when the things you can measure are
good things and important to get right, so spending time on them isn't
necessarily _bad_... They're just not the whole story.
That's part of the reason I personally valued simplicity _above_ the other
two. Precisely because it's harder to measure.
> I'm just doing it not in Rob's way ("rewrite this crap!"),
> but in "let's simplify this crap!" way.
> I invite Rob to rewrite any part he likes. Then I will
> try to simplify his rewrite. It's a win-win situation.
Been there, done that.
> >> > I've come to the conclusion I'm not helping here.
> >> From my point of view, you _are_ helping. In your own way 8).
> > No, I'm telling _myself_ to "shut up and show me the code".
> > I just don't see it making a difference here with the amount of time I
> > have to put into it. It's like trying to mop up a river, the new
> > arrivals bury any small gains I could make.
> New arrivals do help in one area: they reduce dependencies
> in LFS-type systems. You know it yourself since that's precisely
> the reason you use busybox in Aboriginal Linux:
> you want to have fewer packages.
> And when busybox acquires a new feature _you_ need, you see it
> as a win. (For example, 1.18.x will have brace expansion in hush).
I'm not saying more features is bad. I'm saying the loss of simplicity is
bad. All three things trade off for each other, but one of them has taken it
on the chin in favor of the other two. It makes the project uncomfortable for
me to work on, and I don't believe the amount of time/energy I have available
to put in won't even keep up with the ongoing degradation of the quality.
And being only one who sees the imbalance as an actual _problem_ is
discourging enough that I'd rather just not watch, thanks.
> Which makes it more useful. Which is good.
> Why you fail to extrapolate your feeling of a win when
> *others* submit stuff, and it is accepted?
I love it when other people solve my problems. But having to solve a second
set of problems other people created while solving the first set of problems
isn't always a net win.
I'm already running a red queen's race over on the kernel and qemu side, and
LLVM will be another, and when I start caring about the unreleased uClibc code
that will be another, and when I get distros natively bootstrapped the
bootstrapping logic will bit rot too.
Mostly I want to push this stuff upstream. If nothing else, automate the bug
reporting and git bisect to the commit that broke test case X. (That's half
of what the cron job stuff is for. Alas impactlinux.com went away this past
week and it's all back on landley.net now which hasn't got the bandwidth for
heavy use... Yet another todo item I hadn't planned on...)
Cleaning up busybox is only an issue if I'm going to be developing on busybox.
It's just my weird aesthetic sensibility that obviously isn't important to the
metrics the project uses to measure itself, and I have enough self-imposed
tasks for the moment, thanks.
> They are in exactly
> the same position as you: they don't want to cross-compile
> a $HUGE_BLOATED_PACKAGE, so they reimplement or port
> part of it to busybox.
> Same thing.
Actually in my case I want to keep the set of environmental dependencies as
simple as possible. The thing that really appeals to me about busybox is I
can build it on a wide range of host systems without having to worry about
whether that system has lex and bison and autoconf and automake and perl and
python and zlib and internationalization support and
It lets me worry about what I'm cross compiling _to_ and not have to worry
about what I'm cross compiling _from_. And that's valuable.
But it's the _simplicity_ that appeals to me, not the size or speed. I'm
building busybox "defconfig" because that makes my build scripts conceptually
very simple, even though that literally enables a hundred more apps than my
build actually needs. Micro-managing busybox's config to strip it down would
make it smaller, and make the _result_ simpler, but it would make my build
more complicated, harder to maintain, harder for new people to learn, more
likely to bit-rot...
I'm glad busybox does more things for more people, but I'd already implemented
most of the feature set I personally needed to do this back in 1.2.2, and the
remaining bits I've added since (oneit, patch, nbd-client, ccwrap, etc.) could
live as individual files in sources/toys/*.c if necessary. (You'll note I
haven't pushed oneit into busybox, even though an init program designed to
launch a single executable with a proper controlling TTY and signal handling,
reap zombies until that executable exits, and then shut the system down...
once upon a time that simplicity would have been in busybox's purview. But it
would have been a unique facility of busybox, not a copy of an existing
program somebody else already wrote and maintains externally, and thus not
part of busybox's _current_ mandate. I never saw busybox as a shadow of other
projects, but I was weird. I went to a Weird Al concert last night where I
bought the tour T-shirt I'm wearing now and shouted the civic motto "Keep
Austin Weird" at him. And the only two songs that were new to me were the
"Polka face" medly at the beginning and the one about cellphones. That's how
weird we're talking here.)
I'm happy that busybox is well maintained, and I'm happy that if I post a bug
report here it tends to get addressed promptly. But my goals and busybox's
goals have drifted apart over the years, and I'd rather spend the majority of
my time elsewhere now.
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
More information about the busybox