RFD: Rework/extending functionality of mdev

Wed Mar 18 23:16:31 UTC 2015

Hi Laurent !

 > On 18/03/2015 18:08, Didier Kryn wrote:
>> No, you must write to the pipe to detect it is broken. And you won't
>> try to write before you've got an event from the netlink. This event
>> will be lost.

On 18.03.2015 18:41, Laurent Bercot wrote:
>   I skim over that discussion (because I don't agree with the design)

Why?

Did you note my last two alternatives, unexpectedly both named #3?
... but specifically the last one "Netlink the Unix way"?

- uses private pipe for netlink and named pipe for hotplug helper
   (with maximum of code sharing)

- should most likely do the flow of operation as you suggested
   (as far I did understand you)

- except, I split of the pipe watcher / on demand startup code of the 
conf parser / device operation into it's own thread (process), for 
general code usability as a different applet for on demand pipe consumer 
startup purposes
(you had that function as integral part of your netlink reader)

- and I'm currently going to split of that one-shot "xdev init" feature 
from the xdev, creating an own applet / command for this, as you suggested
(extending functionality for even more general usage, as suggested by 
Isaac, independent from the device management, and maybe modifiable in 
it's operation by changing functions in a shell script)

So why do you still doubt about the design? ... because I moved some 
code into it's own (small) helper thread?

> I can't make any substantial comments, but here's a nitpick: if you
> use an asynchronous event loop, your selector triggers - POLLHUP for
> poll(), not sure if it's writability or exception for select()- as
> soon as a pipe is broken.

This is what I expected, but the problem is, the question for this 
arrived, and I can't find the location where this is documented.

>   Note that events can still be lost, because the pipe can be broken
> while you're reading a message from the netlink, before you come
> back to the selector; so the message you just read cannot be sent.
> But that is a risk you have to take everytime you perform buffered IO,
> there's no way around it.

Ok, what would you then do? Unbuffered I/O on the pipe, and then what?

... if that single one more message dropped, except the others not read 
from netlink buffer (to be lost on close), matters, then we shall indeed 
use unbuffered I/O on the pipe, and only read a message, when there is 
room for one more one more message in the pipe:

   set non blocking I/O on stdout
   establish netlink socket
loop:
   poll for write on stdout possible, until available
   (may set an upper timeout limit, failure on timeout)
   poll for netlink read and still for write on stdout
   if write ability drops
     we are in serious trouble, failure
   if netlink read possible
     gather message from netlink
     write message to stdout (should never block)
     on EAGAIN, EINTR: do 3 write retries, then failure

... does that fit better? I don't think that it makes a big difference, 
but I can live with the slight bigger code.

My problem is not the detection of the failing pipe write, but the 
reaction on it. When that happen, the down chain of the pipe most likely 
need more than just a restart. That is, it should only happen on serious 
failure in the conf file or the device operations (-> manual action 
required). So I expect more loss of event messages, than just that 
single one message, you were grumbling about. Hence on hotplug restart 
we need to re-trigger the plug events, nevertheless!

--
Harald