cross compile issues

Sun Mar 7 03:59:55 UTC 2010

On Saturday 06 March 2010 13:53:44 Denys Vlasenko wrote:
> > > The only change here is that __s64 and __u64 are typedef'ed in some
> > > cases. I did it because a user reporter it did not work for him until
> > > he added it.
> > >
> > > Do you think it's wrong?
> >
> > Yes, I think it's wrong.  The toolchain that had a problem with that is
> > clearly broken, and cluttering up busybox's code for a brittle workaround
> > for a specific obviously broken toolchain isn't an improvement.
> >
> > The __s64 and __u64 types are kernel internal types.  Either they should
> > be cleaned out of the kernel headers by whatever's replacing make
> > headers_install for your toolchain/distro, or they should be #defined by
> > those kernel headers ala this chunk from 2.6.32's loop.h:
> >
> >   #include <asm/posix_types.h>    /* for __kernel_old_dev_t */
> >   #include <linux/types.h>        /* for __u64 */
> >
> >   /* Backwards compatibility version */
> >   struct loop_info {
> >           int                lo_number;           /* ioctl r/o */
> >           __kernel_old_dev_t lo_device;           /* ioctl r/o */
> >
> > Code that #includes linux/types.h shouldn't have to manually #define
> > __u64.  If it does, its headers are broken.
>
> Everything is broken to some extent ("every nontrivial program has bugs").

And they should fix it.  We are small and simple.  That's the PURPOSE of 
BusyBox.  Intrducing bugs and brittleness into our code when the problem is 
easy to fix in _theirs_ defeats the entire purpose of BusyBox.

For example, there are 8 gazillion broken cross compiler build environments 
out there.  Every one of them is broken in a slightly different way.  It's 
IMPOSSIBLE to work around all of those, because you get to the point where 
your workarounds start breaking other things.

Simplicity of implementation has always been at _least_ as big design goal for 
BusyBox as small size.  Way back when they used it to run the display at NORAD 
not because they were short on space or on CPU but because they could audit 
every line of it and understand what it was _doing_.  That's a big advantage, 
and not one to discard lightly.

This mess drew my attention because something broke, and I had to wade through 
this unnecessary complexity to try to understand whether or not it was his 
toolchain or our code that's causing the problem.  To be honest, I still don't 
know, because he hasn't gotten back to me yet and I don't have his toolchain.

The Free Software Foundation got huge unreadable crappy code in part by adding 
#ifdefs with extensions and workarounds for every strange buggy non-posix 
system in the world, rather than relying on standards and telling broken 
systems to fix their stuff.  The end result was such horrible bloat that from-
scratch reimplementations like busybox had enough appeal to get a significant 
userbase.  Repeating the FSF mistakes of allowing workarounds for other 
people's bugs to metasticize through our code is not an improvement.

BusyBox is all about having clear limits and knowing what we DON'T do.  That's 
the only way to rip out crap and get to the small and simple.  You're adding a 
bug workaround for maybe three users ever in the entire history of the 
project, and not only making everybody who ever reads that code work out why 
it's there and what it does, but probably have the extra complexity break the 
build of more users than will ever actually benefit from it.

> It makes sense to help some old toolchain to limp along.

Old?  It claims to support a 2.6 kernel.  We have an _old_ path, it's the 2.4 
support.  Tell FreeBSD to stop pretending to support 2.6 kernel APIs when it 
can't competently do so.

Notice that we already _have_ a test.  Look at the code:

  #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
  blah blah blah
  #else
  blah blah blah
  #endif

Note that it's not an #else if.  There is not further test.  It's a fallback 
for "does not support the new 64-bit API".

Clearly, FreeBSD does not actually support the new 64 bit API.  It CLAIMS to, 
but if it _did_ we wouldn't need an #ifdef salad for it _within_ the (clean 
and simple) 64 bit API block.  FreeBSD is _lying_ about whether or not it 
properly supports this API.  You're adding code to accept this lie.

Show me an actual _Linux_ system that can't #include linux/loop.h.  The API 
has "linux" in the name, and its in a test explicitly checking for the 
existence of a 2.6 _LINUX_ kernel.  If FreeBSD is pretending to have a Linux 
kernel API then the onus is on FreeBSD to _actually_ have the Linux kernel 
API.  If they can't competently emulate Linux, then they can't use the Linux 
functionality provided by that Linux API, and they either have to disable some 
applets or stop lying about what they support.  Why on earth is this _our_ 
problem?

Are you suggesting that reporting this bug to BSD would not help matters?  
That FreeBSD is not maintained?  That bugs cannot possibly be fixed in their 
upstream because Apple hired all of their developers away in 1999?  This may 
be true, but I still don't see how it's our problem...

> > And if their linux/loop.h isn't #including
> > something to #define __u64, their linux/loop.h is broken.  Variants of
> > the same conclusion.
> >
> > In any case, if a horrible workaround like that was worth doing (which
> > this one isn't; they should fix their toolchain) it would belong in
> > platform.h. Making loop.c aware of the existence of specific compiler
> > versions is kind of evil.  (It's bad enough making it aware of kernel
> > versions, but that's really an API test.  Do we have the 64-bit API or
> > not.  It's possible that a cleaner way to do that would be a "support old
> > 2.4 kernel APIs" in menuconfig, but it seems silly to ask people to
> > manually select something we can autodetect at compile time, which is why
> > it was how it was, as the least ugly solution.)
>
> How about this?

Oh dear.

These types are defined in an existing kernel header file:

include/asm-generic/int-l64.h:
  typedef signed int s32;

But that typedef is inside an #ifdef __KERNEL__ so if you ever wind up seeing 
that from a userspace #include of a linux/* header, it means your kernel 
headers were improperly sanitized and thus your toolchain is broken.  That's 
_why_ they have both s32 and __s32, users of the second may be exported into 
userspace (where the double underscore prevents it from conflicting with the 
namespace of normal userspace symbols).  If the first type ever winds up being 
used in userspace, it means your kernel headers were improperly sanitized and 
you've just hit a bug.

I.E. the REASON for that markup is to CATCH THAT STUFF LEAKING INTO USERSPACE, 
and force the improperly sanitized kernel headers to do a build break so they 
catch and fix their mistake.

It's a bit like going "#define #error //", if such a thing actually worked in 
your compiler.  You've essentially #ifdefed out an assert() because it was 
triggering for you, rather than fixing the underlying problem.  You've added an 
entire header to _break_ the kernel's explicit checking to distinguish 
internal from external symbols.

Of course this wouldn't have come up if you weren't adding unnecessary 
complexity, bloating the code and making it more brittle, to work around clear 
bugs in other people's build environments which are not our problem.

Kernel headers are a can of worms, and there's years of history behind this 
can of worms, and as the movie Wargames said, "the only winning move is not to 
play".  There was a particularly large flamewar about this on linux-kernel in 
November 2004, shortly before make headers_install went in.

The _first_ thing to know about kernel headers is that if there's a libc way to 
do it, then you should use the libc wrapper and not the kernel #include 
directly.  The only time you should ever need to #include a Linux file is when 
there _is_ no libc wrapper for it, because a portable way to access this 
functionality does not exist, generally because Linux invented it (or came up 
with its own unique way of doing it).

For example, under solaris (which Oracle seems to have killed off) there was no 
"losetup" command.  There was lofiadm instead.  They had their own completely 
unrelated mechanism using a different API.  You couldn't use the Solaris API 
under Linux, and you couldn't use the Linux API under Solaris.

Meaning _when_ you're using linux headers, you're using Linux-specific 
functionality exported by the Linux kernel, and the standard is set by the 
Linux kernel.  Linus Torvalds was very clear on this, calling it a "one way 
street":

  http://lkml.indiana.edu/hypermail/linux/kernel/0411.3/1356.html

The __x## sizes are kernel internal types, which actually predate _both_ of 
those standards.  Here's Linus Torvalds personally explaining that:

  http://lkml.indiana.edu/hypermail/linux/kernel/0411.3/1099.html

If your kernel headers are _not_ what the Linux kernel developers consider 
properly sanitized kernel headers, then it's the same as trying to build with 
a C compiler that doesn't properly implement C99, or build against a C library 
that doesn't properly implement Posix.  There is something specific broken 
which is external to our project, it violates the clearly documented 
expectations of busybox, and we can point at it and tell them to fix it.

Note that the "can of worms" you're re-opening used to be a real problem.  
Back before "make headers_install", broken kernel headers were an epidemic.  
Under the 2.4 kernel you could use the kernel headers directly, just chop out 
the #ifdef __KERNEL__ bits and the rest was usable by userspace.  But in 2.6 
that didn't work anymore, so for a while it was the distro's job to come up 
with properly sanitized kernel headers, but they all did it a slightly 
different way and embedded developers had to grab obsolete headers from Linux 
From Scratch or gentoo or the versions Mariusz Mazur was doing.  (Note: this 
wasn't _raw_ kernel headers.  You had to modify them extensively for use in 
userspace at all.  It's just that the modifications tended to be imperfect and 
brittle.)

Accepting and trying to work around broken headers is one of the big reasons 
headers got so crappy to begin with.  For a few years there people were block 
copying structures out of the headers into their code.  Insane as this sounds, 
that was literally considered best practice.  And then the distros and the 
kernel guys got together and said "no, we will have kernel headers that work, 
exported by the kernel's build infrastructure itself, and we'll rip out the 
horrible workarounds in the 8 gazillion packages and actually REPORT bugs in 
the kernel headers so they get FIXED".  And they did "make headers_install" 
five years ago so there was ONE CONSISTENT set of exported headers, and 
everybody started using it.

Note: the layers of workarounds DID NOT WORK.  BusyBox was not the only thing 
that broke, and they all broke in _different_ways_.  You CANNOT use improperly 
sanitized kernel headers to build userspace packages reliably, the workarounds 
never end.  Thus the modern stance that if your kernel headers are not 
properly sanitized, either your toolchain is _broken_ or it's NOT FOR LINUX.

For us to try to #define our own __u64 is just as silly as for us to try to 
#define our own gcc or glibc internal symbols to work around something that 
pretends too have gcc extensions or pretends to have glibc extensions.  Either 
they have them or they don't.  We can test for them and not use them, but it's 
NOT OUR JOB try to fix broken attempts to provide these extensions.

The type "__u64" remains a kernel internal type that is supposed to be #defined 
by the kernel headers themselves.  If their kernel headers are broken it's not 
our job to work around that bug, because that just encourages more breakage.

A new Linux kernel comes out every three months, we're allowed to blacklist 
clearly buggy broken versions and report the bug upstream where it WILL be 
rapidly fixed.  There's a whole bugfix-only mechanism for the Linux kernel:

  http://lwn.net/Articles/370236/

Your argument is that FreeBSD does _not_ have such a mechanism, that FreeBSD 
is irretrievably buggy, and yet FreeBSD is explicitly trying (and explicitly 
failing) to emulate a LInux kernel API that even has Linux in the header name.  
They can't have it both ways.

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds