What on earth happened to platform.h?

Sun Oct 24 09:02:52 UTC 2010

On Saturday 23 October 2010 20:20:01 Denys Vlasenko wrote:
> I wanted to post a very similar rant about uclibc. But did not. Ranting
> usually doesn't help. Doing something about things you want to see improved
> is the way forward.

For uClibc, I tended to mail cakes to people.  It seems to work, for some 
reason.

Alas at this point, it's hard to consider the uClibc project anything other 
than a zombie.  NTPL for uClibc has been in development for _five_years_ and 
still hasn't shipped.  It took the original NPTL developers what, six months 
to get a working implementation in glibc?  And they had to do the kernel-side 
infrastructure while they were at it.  The uClibc web page hasn't been updated 
in over 6 months...

I know, I should download the git snapshots and regression test and go engage 
with that project more...  It's on the todo list.

> You are good at simplifying stuff by rewriting stuff. Thinking positively
> about it, bbox is a huge field of work where you can take a small, isolated
> chunk of code and rewrite it. Most other projects do not have such
> structure, they are more monolithic.
>
> *And* you have write access to git. You do not need to wait for asshole
> maintainer to wake up and reply to your patches.

I'm not blaming you for this.  You're doing a better job of keeping up with 
the patch submission flood than I ever did...

> IOW: bbox is a nearly ideal project for you to participate, do
> what you like to do, and enjoy the process.
>
> So why are you so grumpy? :D

Eh, I'm usually grumpy.  I yell at my own code all the time.  (That's why I 
keep rewriting it so much.)  In an average open source development project, I 
throw away ten times as much code as I wind up with.  (Day job code I'm paid 
to write, I tend to keep my first or second draft because they just usually 
want it to work once, by deadline.  Open source code, I keep going until I've 
got something I can be proud of.)

In this case, I'm grumpy because last time I got seriously involved in busybox 
it sort of ate my life, and I've got other projects (like aboriginal linux) 
that I'm trying to get to a good stopping point before getting _too_ involved 
with other stuff.  I still haven't gotten the kernel perl removal patches 
pushed upstream.  The kconfig stuff is finally getting something _like_ 
miniconfig, but not actually miniconfig.  And SOMEBODY needs to turn glue qemu's 
TCG to the old tinycc front end and make a real cross-platform qcc compiler 
capable of building the Linux kernel.  Plus the "hello world" kernel project, 
and about 30 other things...

Once upon a time I spent months each rewriting individual busybox apps, over 
and over, until I was pleased with the results.  And umount was one of them, 
so it seemed safe to go in and make a specific change to busybox umount, to add 
a small and simple feature that should have been maybe a three line patch.

Instead I got sidetracked into wondering why dietlibc #ifdefs and hurd patches 
were splattered all over the code since the last time I looked at it.  "git 
show fbedacfc8caa1" is a huge mess replacing realpath(), which is a 
posix.1-2001 function.  We weren't even using the NULL extension we were doing 
our own malloc, and it got replaced with:

+               char *resolved_path = xmalloc_realpath(*argv);
+               if (resolved_path != NULL) {
                        puts(resolved_path);
+                       free(resolved_path);

So we have an xmalloc function and we're checking the null return.  Huh?  
Either it's badly named, or the caller is wasting bytes.

-                       safe_strncpy(path, m->dir, PATH_MAX);
+                       path = xstrdup(m->dir);

What was wrong with xstrdup() here?  And what's "safe" about randomly 
truncating a string (which isn't necessary on Linux), this arbitrarily breaks 
umount on really long paths when it might otherwise work fine.  If the syscall 
has a limit, it's the syscall's job to truncate it.  (Or error out, which 
isn't going to be worse than truncating the path _before_ calling the syscall, 
which can't possibly work.)

-                       realpath(zapit, path);
-                       for (m = mtl; m; m = m->next)
-                               if (!strcmp(path, m->dir) || !strcmp(path, m-
>device))
-                                       break;
+                       path = xmalloc_realpath(zapit);
+                       if (path) {
+                               for (m = mtl; m; m = m->next)
+                                       if (strcmp(path, m->dir) == 0 || 
strcmp(path, m->device) == 0)
+                                               break;
+                       }

Posix 2008 says we can rely on realpath() doing a malloc with NULL now, so 
wrapping it seems like something we'd want to _undo_ now.  As for the rest of 
it, the previous code was better.  The new code is pointlessly verbose to 
perform the same function.  And I have no idea why "path" was renamed "buf" in 
lots of places, since "path" describes what the variable holds and "buf" might 
as well be "variable" for all it tells us about the function.

This sort of thing makes me grumpy, because when I see "hurd support" as the 
topic of a patch I go "is this really a worthwile goal?", and then I see that 
the patch makes the code significantly uglier, which is a cost.  And a lot of 
the patch has nothing to do with hurd but is about adding G.varname prefixes to 
things and other unrelated churn, so I have to read it all to figure out which 
parts are good and which parts aren't.  And each individual quibble seems kind 
of petty by itself, but they add up, all together they add up to the code 
getting significantly uglier in pursuit of a goal I honestly don't see any 
point in.

The hurd is slightly _less_ interesting than Plan 9, haiku-os, netbsd, the 
post-illumos Solaris still maintained by Oracle, QNX Neutrino (which is open 
source these days), and so on.  The only reason Hurd still exists at all is 
that some people worship Richard Stallman to the exclusion of all reason, and 
insist that his 20-year old fossilized ego trip will somehow do _something_ 
any day now.  (Yeah, and maybe so will ReactOS, but who _cares_?)

This doesn't mean a small and simple patch to tweak something that was 
bothering the hurd isn't worth doing, but pouring buckets of ugly over the 
codebase in the name of the hurd does not strike me as worth it.

Which means if I go in and start fiddling with umount, the first thing I'm 
likely do is break Hurd support.  I haven't got a regression test environment, 
and I actively fail to care about that target, so I pretty much expect it.  A 
significant part of my frustration is not knowing whether or not I'm _allowed_ 
to clean out that crap to get us back to simple, or whether maintaining hurd 
support (which I can't test) is important.

I don't understand what the requirements are for this code.  Which means my 
approach would have to be "make a small localized change and don't change 
anything else because it's brittle black magic"... which takes all the fun out 
of a project like busybox.

(And you're right, uClibc needs a serious cleanup too.  Earlier this week I 
took a close look at bits/mman.h across multiple architectures.  Between 
0.9.30 and 0.9.30, PROT_READ and PROT_EXEC swapped values in "common", and 
MAP_FIXED changed its value.  MAP_FIXED is still inconsistent accross 
architectures, and I need to figure out if that's intentional or not...  I just 
haven't had time to do that so far this week.)

> > When did simplicity stop being a goal?
>
> It did not. For example, sha1/256/512 and md5 hashes were recently
> simplified.

/me looks:

Um...

#if 0
        remaining = 128 - bufpos;

        /* Hash whole blocks */
        while (len >= remaining) {
                memcpy(ctx->wbuffer + bufpos, buffer, remaining);
                buffer = (const char *)buffer + remaining;
                len -= remaining;
                remaining = 128;
                bufpos = 0;
                sha512_process_block128(ctx);
        }

        /* Save last, partial blosk */
        memcpy(ctx->wbuffer + bufpos, buffer, len);
#else
        yet more code...

Uh-huh.

Attached is the sha1sum.c I wrote for toybox a few years back.  (Never did 
quite get around to cleaning it up, extending it to support the various other 
shaXXXsum sizes, or implementing md5sum.  My todo list runneth over.)  But 
what I was going for was _simple_, and I confirmed it produced the right output 
on all the data I threw at it.  It's 185 lines including the actual applet 
implementation and help text and everything.  The busybox one is 900 lines for 
the engine.

Busybox has three or four different implementations of each piece of 
infrastructure in the sha1sum/md5sum stuff, depending on the "size vs speed" 
knob.  I focused on doing it once in a way that was easiest to read and to 
understand.

I.E. my implementation was simple first, worrying about small and fast second.  
The code you were referring to (current as of October 19th, git says) has a 
size vs speed config entry leading to a heavily redundant implementation, at 
the expense of simplicity, with rather a lot of what it's doing hidden in 
various macros.

This honestly seems to me like a question of differing design goals.

Rob
-- 
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem 
Forever, and as welcome as New Coke.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sha1sum.c
Type: text/x-csrc
Size: 4478 bytes
Desc: not available
URL: <http://lists.busybox.net/pipermail/busybox/attachments/20101024/06ea8485/attachment.c>