What on earth happened to platform.h?

Thu Oct 28 03:55:44 UTC 2010

On Wednesday 27 October 2010 18:36:16 Denys Vlasenko wrote:
> > > Then, do not bother creating insanely parallelized gzip (and insanely
> > > parallelized image analysis software, and insanely parallelized
> > > database...); instead, process in parallel *insane number of photos*
> > > using run of the mill, simple, *single-threaded* tools.
> >
> > If you have a task that breaks down that way, sure.
>
> Google did not have a choice. They *had to* scale everything they do,
> scale into tens of thousands of subtasks, and they could not affort
> waiting thirty years while gzip and gazillion other tools got
> parallelized to death. They succeeded, using the above approach.

Yes, and most web transactions are essentially read-only batch processing, and 
thus extremely amenable to parallelize via cluster, as a category.  (I had a 
class on how to schedule that kind of stuff at Rutgers in 1993, research on 
that class of problems dates back to the 50's.)

> > A lot of signal processing issues are like that.  You can chop the signal
> > at each keyframe and distribute across a cluster that way, but if your
> > keyframes are every 2 seconds then you guarantee that much latency, which
> > sucks for videoconferencing and other types of live broadcasts.
> >
> > That's one program domain out of hundreds for which "redefine the problem
> > into something I feel like solving" turns out to be hard.
>
> Sure, there _are_ programs which must be threaded. Linux kernel
> is another example.

There's nothing that _must_ be threaded.  There are things that benefit from 
threading.

Heck, you can set up shared memory via mmap() of a file (then delete the file to 
tell the OS to stop updating the on-disk copy), implement mutex and event 
semaphores via signal files or pipes (the O_CREAT|O_EXCL trick is mutex in a 
nutshell, and filesystem timeout give you timeout behavior, we're using this in 
the passwd command I believe.  And the dentry cache means it never _has_ to 
touch disk), and do pretty much everything threading can without relying on 
pthread primitives.

It's just not necessarily an improvement. :)

> > > if ([!]x) is definitely ok for bool and pointers. For ints,
> > > it is sometimes good to write if (x != 0), especially if the code
> > > aroud that place is complex-ish and it's hard to figure out
> > > the type of x. Not a hard rule.
> >
> > *blink* *blink*
>
> My failure in English. I meant "It's not a hard rule, though", as in
> "not a must". But I suspect it come through as "easy to follow rule".
>
> > You see a significant difference between scalar and pointer types?
>
> This is not the point. The point is:
>
> ...
> ...
> ...
>   const re_dfa_t *const dfa = mctx->dfa;
>   reg_errcode_t err;
>   int match = 0;
>   int match_last = -1;
>   int cur_str_idx = re_string_cur_idx (&mctx->input);
>   re_dfastate_t *cur_state;
>   int at_init_state = p_match_first != NULL;
>   int next_start_idx = cur_str_idx;
>
>   err = REG_NOERROR;
>   cur_state = acquire_init_state_context (&err, mctx, cur_str_idx);
>   /* An initial state must not be NULL (invalid).  */
>   if (BE (cur_state == NULL, 0))
>     {
>       assert (err == REG_ESPACE);
>       return -2;
>     }
>
>   if (mctx->state_log != NULL)
>     {
>       mctx->state_log[cur_str_idx] = cur_state;
>
>       /* Check OP_OPEN_SUBEXP in the initial state in case that we use them
>          later.  E.g. Processing back references.  */
>       if (BE (dfa->nbackref, 0))
>         {
>           at_init_state = 0;
>           err = check_subexp_matching_top (mctx, &cur_state->nodes, 0);
>           if (BE (err != REG_NOERROR, 0))
>             return err;
>
>           if (cur_state->has_backref)
>             {
>               err = transit_state_bkref (mctx, &cur_state->nodes);
>               if (BE (err != REG_NOERROR, 0))
>                 return err;
>             }
>         }
>     }
>
>   /* If the RE accepts NULL string.  */
>   if (BE (cur_state->halt, 0))
>     {
>       if (!cur_state->has_constraint
>
>           || check_halt_state_context (mctx, cur_state, cur_str_idx))
>
>         {
>           if (!fl_longest_match)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>             return cur_str_idx;
>           else
>             {
>               match_last = cur_str_idx;
>               match = 1;
>             }
>         }
>     }
>
> Rob, can you figure out whether fl_longest_match is a number? or bool?
> or maybe a string (that is, char*) in the underlined if() statement?

A) I do python development, which means I've learned when I don't have to 
_care_. :)

B) I stopped reading at about REG_NOERROR, with the mental note "there is no 
way turning a !x into an X == 0 will make any significant difference to the 
readability of this code.

C) ==0 doesn't tell you if it's a char, short, int, long, signed, unsigned, 
and doesn't rule _out_ it being a pointer.

> The problem is that in code like this (and this is not the worst example -
> it's only what I found with quick grep in uclibc) you need to grep
> for the declaration, it's not easily available. A hint on the type
> may be useful.

x==0 isn't any more of a hint on the type than !x is.  Both imply "probably 
not floating point, and not a struct".  That's about it.

Code that is unclear can be fixed or commented.  This trick doesn't strike me 
as either.

> > A pointer is a scalar type.  You can do math on the suckers (admittedly
> > in base sizeof(*)).
> >
> >   char *fred="123";
> >
> >   printf("%s", 2+fred);
> >
> > Prints 3.  Nothing special about "fred+2" or "&fred[2]"...
>
> You think 2+"abc" is a weird expression?

Not really, no.

> How about this? -
>
> #include <stdio.h>
> int main() { puts(&3["qwerty"]); return 0; }
>
> Hehe ]:)

I'e read the first several years of obfuscated C code contest entries, thanks.

That's actually how I first got involved with Tinycc:

  http://bellard.org/otcc/

The contest had a size limit of 2048 bytes for entries, and Fabrice Bellard 
submitted a compiler for a subset of C, capable of rebuilding itself from 
source code.

He won "best abuse of the rules", and then untangled his entry and expanded it 
into a full c99 compiler.

Its main advantage was speed: it was so fast he added a "-run" mode that let 
you set the executable bit on a C file, start it with "#!/usr/bin/tcc -run" 
(plus -lpthreads or whatever else you needed), and use C as a scripting 
language.  Yes, with X11 and everything, if you liked.

The main disadvantage was that it produced x86 machine code directly so was 
hard to retarget for different processors (or even x86-64), and while 
refactoring it to have a front end and back end he could decouple from each 
other (and thus have multiple code generator back ends for various 
sarchitectures), he started to wonder about what kind of _front_ ends he could 
swap out with.  Specifically, he wanted something that would parse x86 machine 
code a page at a time, translate it to native code on the fly, and run x86 
binaries (specifically wine) on non-x86 platforms.

The new project this led to was called "qemu", and it sucked away so much of 
his time tinycc stalled.  It's been revived by windows guys, but all they do 
is windows development and they don't really care about linux or non-x86 
targets.

> > Personally I find:
> >
> >   if (strcmp(walrus) == 0);
> >   {
> >     thingy()
> >   }
> >
> > Harder to spot than:
> >
> >   if (!strcmp(walrus));
> >     thingy();
>
> Me too, but because of {}s, not because of == 0.

I.E. the effect of this "clarification" on code readability is negligible at 
best.

> > I've seldom found adding extra characters helps me parse code.  But then
> > I didn't even put lot of spaces in each line until Erik complained.  (And
> > that wasn't because I thought it was better, that was just for
> > consistency.)
>
> Iwasusingjammedtogether for(i=0;i<nregs;++i) style for quite some time.
> But I eventually changed my mind.

I find that the easiest code to read is _what_you_are_used_to_.  (That's the 
dominant factor by an order of magnitude, and the reason coding style 
documents for projects exist.)

The next easiest to read is "the most code fits on the screen", I.E. doesn't 
wrap off the right edge or require you to scroll up and down to see functional 
units.

Adding whitespace can help delineate functional units, both size of 
indendation (two space, 4 space, 8 space) to make block stand out, and blank 
lines so you can see start-of-next-thought ("ok, we stopped doing the previous 
thing and started doing something else"), which lets you easily pick out 
starting points of stuff when hunting around in the code.  It's essentially 
paragraph breaks so you can find your place in the wall of text. Curly brackets 
on their own line do just as well but there aren't enough of those kind of 
breaks for complicated linear code.

I prefer "!x" to "x==0" because it takes up less horizontal space (less eye 
movement, less likelihood of wrapping off the edge of the screen), because the 
! always has to go at the beginning and you can have x == 0 and 0 == x as 
synonyms, because "x=0" means something different but looks the same and is 
easy to miss (hence some gurus recommend "4 == x" programming style with the 
constant on the left so you CAN'T typo up an assignment, but algebra leads 
math people to phrase it the other way, so you see both in the wild), because 
"x == 0" and "x == NULL" and "x == EXIT_SUCCESS" and "x == STDIN_FILENO" and 
so on mean exactly the same thing but look gratuitously different...

But mostly, I'll admit, it's because it's what I'm used to. :)

Rob
-- 
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem 
Forever, and as welcome as New Coke.