RFD: Rework/extending functionality of mdev

Sun Mar 15 00:45:05 UTC 2015

On 14.03.2015 03:40, Laurent Bercot wrote:
>   Hm, after checking, you're right: the guarantee of atomicity
> applies with nonblocking IO too, i.e. there are no short writes.
> Which is a good thing, as long as you know that no message will
> exceed PIPE_BUF - and that is, for now, the case with uevents, but
> I still don't like to rely on it.

Named pipes are a proven IPC concept, not only in the Unix world. They 
are pipes and behave exactly as them, including non blocking I/O, 
programming the poll loop, and failure handling. There is only one 
difference, the method how to get access to the pipe file descriptors 
(either calling pipe or open).

>   I call "pipe" an anonymous pipe. And an anonymous pipe, created by
> the netlink listener when it forks the event handler, is clearly the
> right solution, because it is private to those two processes. With
> a fifo, aka named pipe, any process with the appropriate file system
> access may connect to the pipe, and that is a problem:

Right, any process with root access may write to this pipe, but don't 
you think such processes have the ability to do other need things, like 
changing the device node entries in the device file system directly?

May processes with root access produce confusion on the pipe?

Yes, but aren't such processes be able to produce any kind of confusion 
they like?

We could have (at some slight extra cost):

- create the fifo with devparser:pipegroup 0620

- run hotplug helper (if used) suid:sgid to hotplug:pipegroup
   (or drop privileges to that)

- drop netlink reader after socket creation to same user:group

- run the fifo supervisor as devparser:parsergroup

- but then we need to run the parser as suid root
Needs to access device file system and do some operation which require 
root (as far as I remember). Any suggestion how to avoid that suid root?

>   - for writing: why would you need another process to write into the
> pipe ? You have *one* authoritative source of information, which is
> the netlink listener, any other source is noise, or worse, malevolent.

You stuck on netlink usage and oversee you are forcing others to do it 
your way. No doubt about the reasons for using netlink, but why forcing 
those who dislike? This forcing won't be different then forcing others 
to rely on e.g. systemd? Isn't it? (provocation, don't expected to be 
answered)

Where as I'm trying to give the user (or say system maintainer) the 
ability to chose the mechanism he likes, and even with the chance to 
flip the mechanism, by just modifying one or two parameters or commands. 
Flipping the mechanism is even possible in a running system without 
disturbance, and without changing configuration.

So why is this approach worser than forcing others to do things in a 
specific way? Except those known arguments why netlink is the better 
solution, where we absolutely agree.

>   - for reading: having several readers on the same pipe is the land
> of undefined behaviour. You definitely don't want that.

Is here anyone trying to have more than one "reader" on the pipe? The 
only one reader of the pipe is the parser, and right as we are using 
fifos the parser shouldn't bet on incoming message format and content. 
It shall do sanity checks on those before usage (and here we hit the 
point, where I expect getting some overhead, not much due to other 
changes). Isn't that good practice to do this for other pipes too (even 
if a bit more for paranoia)? But all with the benefit of avoiding 
re-parsing the conf for every incoming event, and expected over all 
speed improvement. Not to talk about the possibility to chose/flip the 
mechanism as the user likes.

This even includes extra possibilities for e.g. debugging and watching 
purposes. With a simple redirection of the pipe you may add event 
logging functionality and/or live display of all event messages 
(possibly filtered by a formating script / program). All without extra 
cost / impact for normal usage, and without creating special debug 
versions of the event handler system.

I'm just trying to make it modular, not monolithic.

>   - generally speaking, fifos are tricky beasts, with weird and
> unclear semantics around what happens when the last writer closes,
> different behaviours wrt kernel and even *kernel version*, and more
> than their fair share of bugs. Programming a poll() loop around a
> fifo is a lot more complicated, counter-intuitive, and brittle, than
> it should be (whereas anonymous pipes are easy as pie. Mmm... pie.)

See my statement about fifos above, I don't know what you fear about 
fifos, but there usage and functionality is more proven in the Unix 
world, as you expect. Sure you need to watch your steps, but this shall 
also be done when using pipes (even if only for paranoia, e.g. checking 
incoming data before usage and not blind reliance).

And may be there are internal differences on pipe / fifo handling in the 
kernels, but likely they are internal and don't change the expected 
usage behavior, or else they would risk breaking other pipe 
functionality too.

In detail:

Close on last writer: What's unclear on this? What shall here be 
different than on other pipes? The trick is, to let it not happen ... 
the job of the fifo supervisor daemon: hold the named pipe open and 
available for usage (buffer space is only assigned to the pipe by the 
kernel, when there is data), as tcpsvd does with a network socket to 
accept incoming connections.

Poll loop: Fifos are pipes and need exact the same handling as pipes. 
What shall there be more "brittle" than other pipe usage?

>   I'm talking about the netlink because it's a natural source of streamed
> uevents, which is exactly what you want to give to a long-lived uevent
> handler such as mdev -i.

ACK, but do you like forcing others to do things in a specific way? Do 
you like to be forced by others?

Why not opening up to other possibilities, with a slight different concept?

> ... write-only mode ...

Definitely NO, I'm trying to read very carefully and understand the 
intention of the told, not only the words.

I even understand the fears of e.g. Isaac, but see not much possibility 
to stay exact at old mdev behavior, without blocking any kind of 
innovation on this (not talking about writing a wrapper around old 
suffering operation to add in netlink usage - shrek).

>   Please take a look at my s6-uevent-listener/s6-uevent-spawner, or at
> Natanael's nldev/nldev-handler. The long-lived uevent handler idea is
> a *solved problem*.

I know how that works, and this is the problem. I see limitations of 
this approach, which I try to overcome.

1) using as netlink mechanism only -> no problem

2) using with kernel hotplug helper mechanism -> fails to use, or still 
suffers from re-parsing conf for each event.

3) Open up my mind and accept that next one coming around, may have a 
brand new plug mechanism in his bag -> may be difficult to do without 
changing code.

So why not allowing to go one step further, to split the known 
functional blocks into separate threads, allowing to get a modular 
system, which may be used and setup to the maintainers likes?

So who is at pure write-only mode?

>   Your original plan was to:
>   - write mdev -i: I think it's a good idea.

Different part, with no (notable) influence on rest.

... but as I'm not in write-only mode, I slightly modified my initial 
intention:

Let the parser look for special formatted lines (not much cost), and on 
match, it writes out the line to stdout (in a shell friendly format), if 
a flag (-i) is set. Otherwise (normal device operation) those lines are 
just ignored, like comments. On start of "xdev -i" it is checked if 
stdout is a tty, and redirected to /dev/null in that case (don't clobber 
console with those lines).

So it's possible to do

   xdev -i /etc/xdev-init.conf | sh xdev-init-script

with xdev-init-script:

while read cmd line
   do
     set -- $line
     case $cmd of
       TYPE_OF_LINE ) ... react at line as you like;;
     esac
   done

No further init operation in xdev -i then this, just the possibility to 
re-use the parser to pick out some init related lines from file in the 
mdev.conf format, without need of a not so trivial sed script, etc.

The sense of "xdev -i CONF_FILE >/dev/null" is to check the conf file 
for errors, throw error messages on stderr, and exit not 0 when errors 
are detected. To check the CONF_FILE before the device file system is 
started, and allow fall back to sane defaults.

We may even go one step further (just thinking loud), check the command 
line for extended syntax (e.g. "xdev parser SCRIPT_NAME ARGS"), spawn 
the given script (once per parser startup) and then forward the matching 
lines for incoming events to that script.

   parse the conf file into memory table
   if script file given
     spawn a pipe for the given script
   while read next message with timeout
     search for matching line in memory table
     if we spawned a script
       write out the line information to the pipe
     else
       do the device operation of the matching entry

>   - modify mdev -s to add more functionality than just triggering a
> coldplug: I don't think it's good design but I don't care as long as I
> can configure it out. Other people have also answered. The answers may
> not have been the ones you were looking for, but you wanted feedback,
> you got feedback.

So you think it is not worth of have some improved symlink handling for 
new device nodes?

currently:

=path  - allow to move new device to e.g. a subdirectory
 >path  - move the device and add a symlink pointing to device

... but what about leaving the device at it's original name and creating 
a symlink pointing to this device (e.g. /dev/cdrom -> /dev/sr0, not 
/dev/sr0 -> /dev/cdrom)?

One intended extension:

<path  - shall create symlink to device if symlink does not exist

... and what about combining the move and symlink operation? Move new 
device to different location *and* add a symlink at a different location?

=symlink_path >new_device_path  - overwrite existing symlink
=new_device_path <symlink_path  - don't overwrite existing symlink

... and possibly the functionality to include other files in the conf 
file, with the possibility to specify a directory, resulting in 
including all files from that directory.

What's wrong with my wish getting some extended functionality for more 
simplicity? Not of public interest?

My major intention of extensions is to improved symlink handling, mount 
point creation with permission setup (device file system related), and 
mounting of file systems (most likely virtual) in the device file system 
(e.g. devpts).

Sorry for my hopping on proc and sys, this could be done with the above 
extensions without extra cost, but is clearly personal preference.

*No fixed logic*, what gets setup in any system! Just the ability to 
focus the device system setup more on "What to setup" (table with a list 
of required symlinks, etc.), instead of describing "How to setup" 
(calling commands in a shell script).

>   I'm not interested in providing 15 APIs to do the same thing. Users
> don't like the netlink ? Tough, if they want serialized uevents. What
> would you do if your kid wanted to drive a car but said he didn't like
> steering wheels, would you build him a car with a joystick ?

To stay at your example: What about, you see the possibility to build a 
base car, which has the wheel steering module, replaceable with a 
joystick module. Now every body is fine, as he can plug in the steering 
module he like. What is wrong with this? At least the base idea, not 
looking at the cost, which need to stay in acceptable tolerance.

>   I provided you with clear designs and working code. So did other
> people. (You said code is premature at this point. I'm sorry, but no,
> code is not premature when the problem is solved, and if you're not
> convinced, please simply study the code, which is extremely short.)

I don't need further studying your code, to understand your approach. I 
see the cave-eats of this and try to overcome them.

... but I can't overcome, if you like forcing others to do it your way :(

My clear and *final* statement on this topic is: I want to give every 
maintainer the possibility to use the plug mechanism he like, but still 
benefit from increased speed and reduced memory consumption on event 
bursts. ... *not* replacing one mechanism by another or adding a wrapper 
for some new technology, letting those who dislike to stay in the back, 
still suffering from known problems.

> Now it's all up to you. I would like to see a mdev -i, can you work
> on it ?

Sure ... as I told: planning then code hacking, but see above as a 
possible alternative, before I start any hacking.

> If you prefer to keep beating around the bush and smoking crack
> about fifo superservers, it's fine too, but I'm just not interested.

... (ohps, I told *final statement*) ...

... but extend the above final statement with the following for more 
clarity:

I know, I dropped my head frustrated in the sand, due to some comments 
here (most likely from Denys), but I'm at a point, I have three possible 
solutions:

1) get some extended functionality into Busybox
2) fork the project and run a "MyBusybox" project (private or public)
3) drop my development interests (that means all) forever

As I highly disregard #3, and do not like the #2 way (due to all sorts 
of known problems, as I did that way already, in the private), I prefer 
(and at least for now: insist) on #1 ... but have no other chance as to 
say "good bye" and hop on #2, if it's impossible to find a working 
solution for #1 :(

It's not me, forcing others to setup there system in a specific way, or 
to chose a specific mechanism, or to be left out and stuck on known 
suffering code.

It's up to you ... you all ...

... but new automobiles are dangerous, don't use them ...

(final: I promise! except -> private)

--
Harald