Standalone network block device client now working. :)

Wed Sep 22 19:09:56 UTC 2010

On Tuesday 21 September 2010 19:19:17 Denys Vlasenko wrote:
> Outer for(;;) loop looks wrong. For one, on the second iteration
> nbd file descriptor will be closed. I have doubts ioctls can work
> on closed fds. Repeatedly daemonizing also looks strange.
>
> I imagine the loop should just be removed, body should execute just once.

The problem is the ioctl _can_ exit.  I'm not sure the circumstances under 
which it does so, could be a certain number of dropped packets, or it could be 
the server rebooted.  Not sure.  (I haven't rewritten the server side yet, no 
immediate call for it here.)

In theory you could call a non-daemon version of NBD from an upstart variant 
that respawns it when it goes away, and thus bump this retry up to an external 
level.  However, the retry doesn't re-login, it only retries with the same 
login credentials, and if it gets an EBADR return it goes "oh, login credentis 
are stale, exit" so it doesn't try to swap in a _different_ block device under 
an existing active mount (which would be bad).

So without the retry, it's less reliable.  With an externally imposed retry, 
it may increase the likelihood of data corruption.  (I also dunno under what 
circumstances a mount on top of the network block device fails and unmounts 
itself.)

If the loop isn't working right now, removing it is reasonable.  We can always 
add it back in if people start complaining.  Really what I should do is ask 
Pavel Machek how the code he wrote back in 1997 works at a design level.  (Or 
read through the nbd-server and kernel code.)

(I vaguely recall that at one point, Linas Vepstas wanted to write an NBD-
server based on bittorrent to speed up cluster boots. :)

>                 // Open the device to force reread of the partition table.
>                 if (!fork()) {
>                         char *s = strrchr(device, '/');
>                         sprintf(data, "/sys/block/%.32s/pid", s ? s+1 :
> device); // Is it up yet?
>                         for (;;) {
>                                 temp = open(data, O_RDONLY);
>                                 if (temp == -1) sleep(1);
>                                 else {
>                                         close(temp);
>                                         break;
>                                 }
>                         }
>                         close(open(device, O_RDONLY));
>                         exit(0);
>                 }
>
> Why do you do it in a child? nbd-client's parent might actually
> prefer to know that this step is complete before nbd-client
> exits. As it stands, now reread happens "somewhere in the future,
> maybe in 2050, who knows".

A) They can't happen in the same process, ioctl(NBD_DO_IT) blocks and the open 
won't succeed until some process is in that ioctl() handling packets.

B) The NBD daemon doesn't _use_ the partition information, and in its normal 
use case it daemonizes so you can't tell when it's done anyway.  The 
information propogates through udev/mdev and you can trigger on new devices 
showing up that way, so it's already an external hotplug callback.

Calling the ioctl(NBD_DO_IT) sits there and processes NBD requests, it doesn't 
return until you disconnect the NBD from the server.  That IOCTL turns the 
process that calls it into a dedicated NBD request handler.  The partition 
table doesn't get re-read until the first open() of the NBD _after_ the request 
handler gets set up (due to some race in the kernel that makes it hard to do 
in-kernel).

So the process that calls ioctl(NBD_DO_IT) can't be the one that calls the 
open() to force the partition table re-read, because the open() won't succeed 
until there's already a request handler running, and the request handler 
doesn't return while the NBD is connected.

>
>                 // Process NBD requests until further notice.
>
>                 if (ioctl(nbd, NBD_DO_IT)>=0 || errno==EBADR) break;
>
> Need more comments here.
> Does this ioctl block for a long time, processing many requests?

Yes.  It's the in-kernel request handler, which requires a process context.  
It's where the daemon spends the rest of its runtime after doing the setup.

> On what condition does it return? Etc.

Well, "nbd-client -d" sends a disconnect request, but I didn't implement that.  
It's basically:

  int fd=open("/dev/nbd0", O_RDWR);
  ioctl(fd, NBD_CLEAR_QUE);
  ioctl(fd, NBD_DISCONNECT);
  ioctl(fd, NBD_CLEAR_SOCK);

Generally I just sync and unmount the filesystem and then go "killall nbd-
client", which seems to accomplish the same thing. :)

Rob
-- 
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem 
Forever, and as welcome as New Coke.