[PATCH] getrandom: new applet

Thu Jul 7 15:49:27 UTC 2016

On 07/06/2016 11:41 AM, Etienne Champetier wrote:
> Now you really hate the fact that getrandom() is a syscall.

I do not hate the fact getrandom is a syscall. I'm asking what the point
is of a new applet to call this syscall. You have suggested it could
block to show when /dev/urandom is producing a higher grade of
randomness than it does before being properly seeded. That is, as far as
I can tell, the only reason for your new applet to exist rather than
upgrading $RANDOM in ash/hush.

> getrandom() was introduced to fight file descriptor exhaustion and
> also added a new behavior, blocking until /dev/urandom is initialized.
> I'm only interested in the new behavior (which was added for the
> embedded world).
> 
> We could say that waiting 6 minutes (using /dev/random) instead of 3
> minutes (using getrandom()) is acceptable, but now what if you have 10
> soft wanting to do that, /dev/random might block again

I was suggesting reading a byte from /dev/random (at boot time, from
your init scripts) to see when /dev/urandom was initialized, and then
reading from /dev/urandom after that. Serializing the start of things
that need "high grade" random numbers (such as key generation) until
after the system has collected some entropy from clock skew. I.E.
presumably the same thing your blocking call to your new applet would
accomplish. (Although if your system is tickless, I'm not sure how much
entropy it's actually collecting when it goes quiescent, so it may block
quite a while if you don't do the "write mac address into /dev/urandom"
workaround to at least get this box to vary from adjacent boxes.)

> I will quote man getrandom(2):
> By default, getrandom() draws entropy from the /dev/urandom pool.
> [...]
> If the pool has not yet been initialized, then the call blocks.
> [...]
> When a sufficient number of random bits has been collected, the
> /dev/urandom entropy pool is considered to be initialized.

Did you miss the part where I read the man page and wrote an example
program to test what it was doing last time around?

> It doesn't state that it's 128 bits because it can change in the
> future, the API just block until it's safe, it's clearly defined.
> getrandom() is not a revolution but it's better than /dev/random
> Is it worth 207K (uncompressed), that's Denys call

The bit about having two pools and reading a byte from /dev/random
potentially slowing the proper initialization of /dev/urandom (even if
you write it back) sounds strange and potentially a kernel bug. Adding a
syscall to work around a kernel bug seems odd, especially since the
stated justification was so code derived from OpenBSD could be more
easily supported by Linux without being rewritten.

I had arguments with the OpenBSD guys back in 2001 that openssh should
not _hang_ when you do "sleep 999 & exit". The context is that when
ifconfig brought up an rtl8139 device it launched a kernel thread, and
that kernel thread had the same stdin/stdout/stderr as the kernel
process, which meant the last user of the pty hadn't closed it yet and
thus openssh wouldn't exit even though the shell it had launched had
exited... Oh here, just read the thread:

https://lkml.org/lkml/2001/5/1/30

The point is that the official position of OpenBSD with regards to
openssl and openssh has always been that any deviation in behavior
between OpenBSD and Linux means, by definition, that Linux is wrong.
That the openssl/openssh ports to Linux were very much second class
citizens, and if Linux didn't adapt to what they did hell would freeze
over before they changed their code.

>> Ok, this is a new assertion. How does that work? How would /dev/urandom
>> get random data to seed it without /dev/random having enough entropy for
>> one byte of output?
> 
> not really, see my mail from 29 june 2016 at 17:04
> also answered at the begining of this email i think

You keep pointing me back at what you've already said, despite the fact
I read what you said.

>> How? /dev/urandom and/dev/random pull from the same pool. If it's
>> ignoring the entropy accounting, your definition of "cryptographically
>> secure" differs from the conventional one.
> 
> As said at the begining it's not the same pool.

So you're saying it should actually be:

  dd bs=1 count=1 if=/dev/random of=/dev/urandom

Or possibly that writing it back doesn't matter...

> Many cryptographers think that the blocking of /dev/random is useless
> or bad, including DJB (Daniel J. Bernstein)

You're proposing an app whose sole purpose is to block. The above is an
alternate way to block using an existing app through an existing api. If
it takes significantly longer for that API to give 8 bits of randomness
than it takes another API to give 128 bits of randomness, somebody
should ask the kernel guys why.

> Still a quote from DJB

So /dev/urandom doesn't need fresh entropy the same way qmail doesn't
need a license (or the ability of anyone but him to distribute updated
versions of it)?

>> Pretty much by definition crypto exploits are something the developer
>> didn't think of, so cryptographers tend to be paranoid.
> 
> I'm just trying to use a well defined API here,

A) So am I,

B) To accomplish _what_? You seem to want to use the new API because it
exists.

> not to review kernel code
> If the kernel is broken, it will be fixed,

http://landley.net/notes-2013.html#04-03-2013 links to old 2006 lkml
messages where I first raised an issue I was fixing at the time, because
nobody else had in the 7 years in between.

I periodically submitted approximately the same perl removal patches for
6 years (here's 2009, https://lwn.net/Articles/353830/ and here's 2012
http://lkml.iu.edu/hypermail/linux/kernel/1201.2/02849.html). I believe
they finally went in in 2013.

Here's the squashfs guy explaining his 5 year struggle to get that
upstream: https://lwn.net/Articles/563578/

Linux-kernel development circled the wagons against outsiders quite some
time ago. (My guess is about a year into the SCO trial.)

Hands up everybody who thinks that if I don't bother to resubmit
http://lkml.iu.edu/hypermail/linux/kernel/1606.2/05686.html (which I
probably won't because my patch Works For Me) that the issue will be
fixed this year.

> but the API will still exists, and will then keep it initial promise

Yes, /dev/random and /dev/urandom should still be there and still be
usable for the forseeable future. Just like they have been for the past
20 years.

>>>> which is what
>>>> /dev/urandom is for. You've implied that your new function call is
>>>> somehow superior to /dev/urandom because you're magically generating
>>>> entropy when the entropy pool is empty, in userspace, somehow.
>>>
>>> no no and no, this syscall block until /dev/urandom is initialized
>>> with 128bits of entropy
>>
>> dd if=/dev/random of=/dev/random bs=16 count=1
>>
>> There, you've blocked until there were 128 bits of entropy and you
>> haven't actually consumed any entropy because you fed it right back.
>>
>> If the entropy pool contains 128 bits but /dev/random won't give it to
>> userspace, file a bug with the kernel guys.
> 
> As explained at the begining the first 128bits goes to /dev/urandom so
> you wait at least for the first 256 bits (or even more, i don't know)
> and writing to /dev/random doesn't increase the entropy estimation so
> you don't consume entropy but the kernel entropy estimation goes down
> by 128bits, and /dev/random will enventually block again

Which is why further reads would be from the non-blocking /dev/urandom.
In the example the point of the read from /dev/random is to block (at
boot time) until you can read from an initialized /dev/urandom.

("Here is way you can block." "But if you do that again it might block
again!" "Um... yeah?")

>> So to save a file descriptor, you're making an applet.
> 
> no, i'm only interresed in the blocking behavior of getrandom()
...
> again, i'm only interested in the blocking behavior, not the fd
> exhaustion prevention
> (which explain why it's a syscall and not a character device)

So if urandom and random have different pools, and urandom gets the
first 128 bits of entropy, reading one byte from /dev/random at init
time to wait until the pool is initialized requires the system to
collect 136 total bits of entropy. Your mechanism would return after it
had collected 128 bits of entropy. And the difference between 128 and
136 justifies a new applet, one which apparently renders userspace
mitigation strategies for the problem category (ala the "write /proc
stuff into the /dev node") irrelevant despite the remaining 3 minute
wait in my test even with your new api.

I'm going to stop here, thanks. Good luck with your thing, I've lost
interest in the outcome.

Rob