md5sum

Rob Landley rob at landley.net
Thu Jun 18 20:40:59 UTC 2009


On Thursday 18 June 2009 13:15:01 Cathey, Jim wrote:
> >for i in file
> >do
> >  md5sum $i &
> >done
>
> The trouble with this, and your proposed successor, is
> that they fork off N processes, whereas I want there to
> only be two working processes, one for each core we have.
> Any more than that represents a slowdown, not a speedup,
> for a CPU-intensive task like MD5, assuming the embedded
> system could even survive that number of forks.

Ok, given that _additional_ criteria, off the top of my head try:

CPUS=$(echo /sys/devices/system/cpu/cpu[0-9]* | wc -w)

while read A B < filelist
do
  if [ "$(jobs | wc -l)" -gt "$CPUS" ]
  then
    sleep .25
  else
    md5sum -c $A "$B"
  fi
done

wait

> >When you say "threaded" here, are you talking -lpthread?  Because you
>
> were
>
> >talking about merely using fork before...
>
> No, just pure fork.  No need for pthreads, the only passed-back
> information is a success/fail status code, which exit() can handle
> just fine.

*shrug*  Sounds doable, but I'm still not convinced it's better than a script 
wrapper.

> I tend to use "threaded" in its more generic sense, since I
> spent a lot of time coding that kind of stuff on a platform
> that didn't have any sort of what we now call threads.

I switched from DOS (with Desqview) to OS/2 towards the end of 1992, and 
started doing Java programming in 1996.

To the tune of "my favorite things":

  Deadlocks and livelocks and strange race conditions,
  Thread local resources leaked into siblings,
  conflicting contexts for allocation,
  these are some reasons my code will not run..

  When your mouse clicks,
  don't get handled.
  When repaint won't run.

  Try the thing in a loop 5 million times
  to reproduce the... pro-blem.

Last year I wrote a little threaded hello world program to exercuse the basic 
pthread infrastructure (as smoke test for uClibc) and remind me of what the 
Linux threading api actually is.  (They don't _call_ it an event semaphore, 
that would be too easy, and it has magic unnecessary mutex dropping and re-
acquiring semantics that seem to have come from mars, but it's usable if you 
just humor it.)

  http://landley.net/hg/firmware/file/tip/sources/native/src/thread-hello2.c

But mostly, I try to avoid going there without a good reason.  (The fact 
that's a 90 line hello world program might be a hint _why_.)

> (It
> had fabulous asynchronous I/O, though.  All of stat, open, creat,
> close, read, write, ioctl, wait, nap, were optionally asynchronous.

Back on Dos with Desqview I wrote a multi-line bulletin board system that 
handled 8 phone lines (FOSSIL driver limit) plus the sysop on the keyboard, 
which had exactly one synchronization primitive available to it: file locking.

You'd be amazed what you can do with atomic filesystem operations, though.

Alas, the poor man's shared memory from days of yore (create a file, have two 
processes mmap it, then delete the file so its link count goes to zero although 
it inode and associated blocks won't be freed until the open file handles to it 
close) no longer works reliably because not all filesystems take "link count of 
zero" as a signal that they don't need to flushing dirty pages to disk in the 
absence of memory pressure.  (Yes, this has caused real world problems: 
http://www.mail-archive.com/user-mode-linux-
devel at lists.sourceforge.net/msg02574.html )  Yeah tmpfs is great, assuming 
/dev/shm exists and has a tmpfs mounted on it...

> All network activity utilized only the above calls, too.)  The
> most effective coding style was event-driven, with multiple
> threads of activity going on in parallel.

The only _option_ java gives you is threading, because they never bothered to 
implement poll() or select().  (Might be in j2ee, I haven't looked.  It wasn't 
there in 1.2, which is about when I stopped reading each new API release cover 
to cover.)

Taking advantage of SMP is one reason to use threading, which is part of the 
reason I've gotten back into it a bit.  But it's not the _only_ way to do 
that...

> -- Jim

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds


More information about the busybox mailing list