What on earth happened to platform.h?
Rob Landley
rob at landley.net
Thu Oct 28 03:55:44 UTC 2010
On Wednesday 27 October 2010 18:36:16 Denys Vlasenko wrote:
> > > Then, do not bother creating insanely parallelized gzip (and insanely
> > > parallelized image analysis software, and insanely parallelized
> > > database...); instead, process in parallel *insane number of photos*
> > > using run of the mill, simple, *single-threaded* tools.
> >
> > If you have a task that breaks down that way, sure.
>
> Google did not have a choice. They *had to* scale everything they do,
> scale into tens of thousands of subtasks, and they could not affort
> waiting thirty years while gzip and gazillion other tools got
> parallelized to death. They succeeded, using the above approach.
Yes, and most web transactions are essentially read-only batch processing, and
thus extremely amenable to parallelize via cluster, as a category. (I had a
class on how to schedule that kind of stuff at Rutgers in 1993, research on
that class of problems dates back to the 50's.)
> > A lot of signal processing issues are like that. You can chop the signal
> > at each keyframe and distribute across a cluster that way, but if your
> > keyframes are every 2 seconds then you guarantee that much latency, which
> > sucks for videoconferencing and other types of live broadcasts.
> >
> > That's one program domain out of hundreds for which "redefine the problem
> > into something I feel like solving" turns out to be hard.
>
> Sure, there _are_ programs which must be threaded. Linux kernel
> is another example.
There's nothing that _must_ be threaded. There are things that benefit from
threading.
Heck, you can set up shared memory via mmap() of a file (then delete the file to
tell the OS to stop updating the on-disk copy), implement mutex and event
semaphores via signal files or pipes (the O_CREAT|O_EXCL trick is mutex in a
nutshell, and filesystem timeout give you timeout behavior, we're using this in
the passwd command I believe. And the dentry cache means it never _has_ to
touch disk), and do pretty much everything threading can without relying on
pthread primitives.
It's just not necessarily an improvement. :)
> > > if ([!]x) is definitely ok for bool and pointers. For ints,
> > > it is sometimes good to write if (x != 0), especially if the code
> > > aroud that place is complex-ish and it's hard to figure out
> > > the type of x. Not a hard rule.
> >
> > *blink* *blink*
>
> My failure in English. I meant "It's not a hard rule, though", as in
> "not a must". But I suspect it come through as "easy to follow rule".
>
> > You see a significant difference between scalar and pointer types?
>
> This is not the point. The point is:
>
> ...
> ...
> ...
> const re_dfa_t *const dfa = mctx->dfa;
> reg_errcode_t err;
> int match = 0;
> int match_last = -1;
> int cur_str_idx = re_string_cur_idx (&mctx->input);
> re_dfastate_t *cur_state;
> int at_init_state = p_match_first != NULL;
> int next_start_idx = cur_str_idx;
>
> err = REG_NOERROR;
> cur_state = acquire_init_state_context (&err, mctx, cur_str_idx);
> /* An initial state must not be NULL (invalid). */
> if (BE (cur_state == NULL, 0))
> {
> assert (err == REG_ESPACE);
> return -2;
> }
>
> if (mctx->state_log != NULL)
> {
> mctx->state_log[cur_str_idx] = cur_state;
>
> /* Check OP_OPEN_SUBEXP in the initial state in case that we use them
> later. E.g. Processing back references. */
> if (BE (dfa->nbackref, 0))
> {
> at_init_state = 0;
> err = check_subexp_matching_top (mctx, &cur_state->nodes, 0);
> if (BE (err != REG_NOERROR, 0))
> return err;
>
> if (cur_state->has_backref)
> {
> err = transit_state_bkref (mctx, &cur_state->nodes);
> if (BE (err != REG_NOERROR, 0))
> return err;
> }
> }
> }
>
> /* If the RE accepts NULL string. */
> if (BE (cur_state->halt, 0))
> {
> if (!cur_state->has_constraint
>
> || check_halt_state_context (mctx, cur_state, cur_str_idx))
>
> {
> if (!fl_longest_match)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> return cur_str_idx;
> else
> {
> match_last = cur_str_idx;
> match = 1;
> }
> }
> }
>
> Rob, can you figure out whether fl_longest_match is a number? or bool?
> or maybe a string (that is, char*) in the underlined if() statement?
A) I do python development, which means I've learned when I don't have to
_care_. :)
B) I stopped reading at about REG_NOERROR, with the mental note "there is no
way turning a !x into an X == 0 will make any significant difference to the
readability of this code.
C) ==0 doesn't tell you if it's a char, short, int, long, signed, unsigned,
and doesn't rule _out_ it being a pointer.
> The problem is that in code like this (and this is not the worst example -
> it's only what I found with quick grep in uclibc) you need to grep
> for the declaration, it's not easily available. A hint on the type
> may be useful.
x==0 isn't any more of a hint on the type than !x is. Both imply "probably
not floating point, and not a struct". That's about it.
Code that is unclear can be fixed or commented. This trick doesn't strike me
as either.
> > A pointer is a scalar type. You can do math on the suckers (admittedly
> > in base sizeof(*)).
> >
> > char *fred="123";
> >
> > printf("%s", 2+fred);
> >
> > Prints 3. Nothing special about "fred+2" or "&fred[2]"...
>
> You think 2+"abc" is a weird expression?
Not really, no.
> How about this? -
>
> #include <stdio.h>
> int main() { puts(&3["qwerty"]); return 0; }
>
> Hehe ]:)
I'e read the first several years of obfuscated C code contest entries, thanks.
That's actually how I first got involved with Tinycc:
http://bellard.org/otcc/
The contest had a size limit of 2048 bytes for entries, and Fabrice Bellard
submitted a compiler for a subset of C, capable of rebuilding itself from
source code.
He won "best abuse of the rules", and then untangled his entry and expanded it
into a full c99 compiler.
Its main advantage was speed: it was so fast he added a "-run" mode that let
you set the executable bit on a C file, start it with "#!/usr/bin/tcc -run"
(plus -lpthreads or whatever else you needed), and use C as a scripting
language. Yes, with X11 and everything, if you liked.
The main disadvantage was that it produced x86 machine code directly so was
hard to retarget for different processors (or even x86-64), and while
refactoring it to have a front end and back end he could decouple from each
other (and thus have multiple code generator back ends for various
sarchitectures), he started to wonder about what kind of _front_ ends he could
swap out with. Specifically, he wanted something that would parse x86 machine
code a page at a time, translate it to native code on the fly, and run x86
binaries (specifically wine) on non-x86 platforms.
The new project this led to was called "qemu", and it sucked away so much of
his time tinycc stalled. It's been revived by windows guys, but all they do
is windows development and they don't really care about linux or non-x86
targets.
> > Personally I find:
> >
> > if (strcmp(walrus) == 0);
> > {
> > thingy()
> > }
> >
> > Harder to spot than:
> >
> > if (!strcmp(walrus));
> > thingy();
>
> Me too, but because of {}s, not because of == 0.
I.E. the effect of this "clarification" on code readability is negligible at
best.
> > I've seldom found adding extra characters helps me parse code. But then
> > I didn't even put lot of spaces in each line until Erik complained. (And
> > that wasn't because I thought it was better, that was just for
> > consistency.)
>
> Iwasusingjammedtogether for(i=0;i<nregs;++i) style for quite some time.
> But I eventually changed my mind.
I find that the easiest code to read is _what_you_are_used_to_. (That's the
dominant factor by an order of magnitude, and the reason coding style
documents for projects exist.)
The next easiest to read is "the most code fits on the screen", I.E. doesn't
wrap off the right edge or require you to scroll up and down to see functional
units.
Adding whitespace can help delineate functional units, both size of
indendation (two space, 4 space, 8 space) to make block stand out, and blank
lines so you can see start-of-next-thought ("ok, we stopped doing the previous
thing and started doing something else"), which lets you easily pick out
starting points of stuff when hunting around in the code. It's essentially
paragraph breaks so you can find your place in the wall of text. Curly brackets
on their own line do just as well but there aren't enough of those kind of
breaks for complicated linear code.
I prefer "!x" to "x==0" because it takes up less horizontal space (less eye
movement, less likelihood of wrapping off the edge of the screen), because the
! always has to go at the beginning and you can have x == 0 and 0 == x as
synonyms, because "x=0" means something different but looks the same and is
easy to miss (hence some gurus recommend "4 == x" programming style with the
constant on the left so you CAN'T typo up an assignment, but algebra leads
math people to phrase it the other way, so you see both in the wild), because
"x == 0" and "x == NULL" and "x == EXIT_SUCCESS" and "x == STDIN_FILENO" and
so on mean exactly the same thing but look gratuitously different...
But mostly, I'll admit, it's because it's what I'm used to. :)
Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
More information about the busybox
mailing list