An example mdev.conf

Sat Nov 7 21:40:46 UTC 2009

On Saturday 07 November 2009 10:46:35 Denys Vlasenko wrote:
> On Saturday 07 November 2009 05:56, Rob Landley wrote:
> > > random          root:root 666
> > > urandom         root:root 444
> > > hwrandom        root:root 660
> > >
> > > random and urandom modes seem out of sync.
> >
> > He took the write bit off of urandom, but you can also write to urandom
> > to feed entropy back into the system
>
> But why in that file random is writable, and urandom is not?

Probably because he didn't _know_ you can feed entropy back into the entropy 
pool through /dev/urandom the same way you can through /dev/random?

<rant>

You can't generate true random numbers in software (proof: run the same OS 
snapshot twice under an emulator, it'll produce the same output), so if you 
want unpredictable numbers for encryption keys and password salt and so on, 
you need to get your randomness from unpredictable aspects of the hardware.  
Cryptographers call this truly random data "entropy".  Think television static 
from the old analog TV sets of yore.

So the kernel collects all the random information it can from the hardware and 
puts it in a buffer called an "entropy pool", which holds a few thousand bits 
(I forget the exact size).  Interrupt timings are a big source, _what_ 
interrupts come in may be predictable but exactly _when_ they come in measured 
with a high resolution timer... not so much.  (Keyboard key presses and mouse 
movements keep the entropy buffer pretty constantly full on desktop systems.  
Humans are great sources of randomness from a computer's perspective when you 
look at the low bits of their I/O timing information.)

As entropy comes in the kernel mixes it into the pool using a hash function 
(some variant of sha1sum I think) so no matter how small every new chunk of 
random data is, mixing it in perturbs the whole buffer (just to give 
cryptographers headaches).  Then when they read data out of the buffer they 
give it _another_ stir with the hash function so that even if no new entropy 
is mixed in you get a big long stream of pseudo-random data, produced by the 
hash function from the "seed" of the entropy pool.

But even though the whole entropy buffer looks like random noise, in theory if 
you read enough data out of it without mixing in more entropy, a good 
cryptogropher sniffs all that data and who knows your hash function could use 
what they've already read to start predicting what comes next.  (Of course 
this assumes they've managed to crack root and freeze all other processes on 
the system so none of them read any data out of the pool (thus giving the 
cryptographer incomplete information to work with) and disabled every piece of 
hardware that might produce fresh entropy (pretty much meaning interrupts 
disabled), and don't have to worry about SMP or hyper-threading or preempt...  
Did I mention cryptographers are professionally paranoid?  See 
http://www.networkworld.com/community/node/21935 for example.)

Anyway, the way they deal with this (largely theoretical) problem is by 
tracking how many reliably random bits of entropy have been mixed into the 
pool, and once they've read that many bits back out and you block until more 
entropy gets mixed into the pool.  (That way, you can guarantee that what they 
_do_ read is unpredictable, but you can't guarantee their process will ever 
complete.)

Note that some sources of entropy are considered suspect (such as disk 
timings, which _might_ be locally measurable, or network packet interrupt 
timings which somebody with wireshark and a really good clock might be able to 
sniff going across the wire, maybe).  Sometimes what they do is they mix a 
large number of bits they harvested into the pool, but only account for a 
smaller number of bits.  "we collected 5 bits of data but only think there's 
about 2 bits of real randomness in it".  Because of the hash function, mixing 
in extra predictable data still shouldn't yield a predictable result (or else 
you're using a worthless hash function).

So that's what /dev/random does, block when the entropy accounting says we've 
read out as many bits of randomness as we can _guarantee_ are unpredictable, 
and waits for more entropy to come in.  And what /dev/urandom does is let the 
read continue to give you pseudo-random data the hash function's grinding out, 
with arbitrary new entropy asynchronously dumped into the pool by interrupts, 
even while you're reading.

When the system is freshly booted, the entropy pool is empty.  The kernel 
initializes the entropy pool with all the "unique to this system" information 
it can find: mac addresses, cpuid, current clock time, and so on.  But it sets 
the entropy accounting to zero, because in theory all that data's somewhat 
predictable.  So /dev/urandom should give different information for each system 
(and if you have a battery backed up clock, different information on each 
boot), but it won't necessarily keep the NSA or the russian mafia from 
decrypting your https transactions.  (Generating a new pgp key right after the 
system boots via an automatic script might not be your best move, but although 
that's that's more or less what sshd does right after the OS install 
generating your host keys.  Current systems do that _after_ installing all the 
OS packages, to let the disk I/O interrupt timings accumulate in the entropy 
pool, if nothing else.  And it's after the system prompts you for language and 
time zone and stuff, so the user provided some entropy too if it's not an 
automated install.)

To work around this problem, the kernel developers made /dev/random writable.  
Any data you write to it gets mixed into the pool with the hash function, 
although I dunno if this increases the entropy accounting (there were flamewars 
over that, I forget the result).

This means that when you reboot the system, the shutdown scripts can read 512 
bytes of random info from /dev/urandom (that's probably the whole pool, and 
guaranteed not to block) and save it to a /tmp file, and then on the next boot 
the init scripts write the contents of that file to /dev/random to perturb the 
heck out of the entropy pool and hopefully make it as unpredictable as it was 
last boot.  (Of course they want to securely delete this /tmp file right after 
using it so your NSA/mafia spook can't get ahold of it and use that to 
reconstruct the entropy pool state.)

Embedded systems have a particularly hard time collecting entropy.  They have 
few devices generating interrupts.  There's no keybaord and mouse, their hard 
drive may not do anything after boot (and flash != hard drive anyway, the 
timings of solid state hardware are a lot more predictable than moving parts), 
and if you're not counting network packet timings as "unpredictable" you may 
literally have NO sources of entropy in the system.  Meaning /dev/random 
blocks on embedded devices all the time and MAY NEVER UNBLOCK.

This is why lots of modern chipsets have hardware random number generators 
built into 'em, so the entropy pool can be kept constantly full by an insane 
little circut that does nothing but sit there and gibber electrically.  It's 
JOB is to generate static.

Failing that, a sound card with analog audio input counts as a hardware random 
number generator (if you just listen to the low bits) even when no 
microphone's plugged into it, so that's another thing the kernel can read from 
to constantly keep the entropy pool full.  (Most sound drivers these days are 
aware of that and hook it up to the entropy pool, I think.  And THAT is why so 
many high bandwidth e-commerce servers have a sound card installed, because 
serving 3-4 https transactions per second will keep the .)

In any case, lots of people use /dev/urandom for everything, becuase blocking 
is a bad failure mode, and the real-world predictability of the urandom output 
is pretty darn low on a system that's receiving _any_ entropy.  Although the 
problem there is you won't catch the ones that _aren't_ receiving any entropy 
(servers and embedded devices), and if you give those a sound card or a 
chipset with a HW random number generator then /dev/random should never block 
either.

Oh, and emulators reopen this can of worms.  A system under qemu is just as 
bad off as an embedded system from a random number generator perspective, and 
all the "virtual server" people who know what they're doing hook up a virtual 
hardware random number generator that reads from the host's /dev/urandom to 
pass through entropy to the emuluted system (which _can't_ harvest any 
meaningfully from the hardwrae, because its hardware is software and software 
is by its nature deterministic and predictable.)

Example from User Mode Linux:
  http://lkml.indiana.edu/hypermail/linux/kernel/0503.1/0823.html

Alas, neither QEMU nor KVM seem to have this yet.  I guess they use the 
virtual sound cards?

</rant>

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds