My brain hurts. (Messing with mount.)

Mon Mar 6 19:55:37 UTC 2006

On Monday 06 March 2006 3:31 am, Denis Vlasenko wrote:
> > It is possible to --bind mount a directory onto itself.  (Oddly, the
> > system survived.  It seems to be a NOP, I could still list the contents
> > of the directory, but it shows up in /proc/mounts.)
>
> With upcoming kernel support, this will allow making subrees RO
> (or noatime, noexec etc) if you use corresponding mount option.

Ah.  Cool.

Mount's always handled "mount blockdev directory" and my version handles 
"mount file directory" with an automatic losetup behind the scenes.  I'm 
pondering making the "mount directory directory" do an automatic --bind 
mount.

I've held off because that could also be a --move mount, but you can --bind 
mount any directory, but only move an actual mount point.  You can specify 
--move when you want --move...

> > For iterating through /etc/fstab, I need to check for matches against
> > _both_, because /etc/fstab doesn't just have /dev/hda1, it can also have
> > something like:
> >   walrus.img    walrus   ext2 loop,ro
> >
> > Which can be mounted via:
> >   mount walrus.img
> > But not via:
> >   mount ./walrus.img  # Still expects "walrus" in the current
> > directory...
>
> Bizarre

No, it's matching against an identifier that isn't necessarily a path, in 
which case it only does an exact match.  I understand the behavior here now.  
If you tell it "mount www.server.com:/path smbfs defaults 0 0" and it tried 
to prepend $PWD to www.server.com:/path, you'd rightfully object.  As one of 
the jffs2 guys pointed out to me eight months ago, not everything we mount is 
a filesystem object.

> > Oddly, after mounting that, I can umount ./walrus.img and it works.
>
> I think that "umount something" basically shall simply call
> oldumount("something") syscall, literally passing "something" as an
> argument.

A) Not when we maintain an /etc/mtab.
B) We sometimes need to losetup -d the loop device.

But in general, umount is simpler than mount, yes.

> Should umount do "ulosetup" as well?

It already does.  (Ours does, anyway.)

> Maybe something like this: 
> 1) scan /etc/mtab (or /proc/mounts), if entry exists and has "loop" option,
>    remember that.

/proc/mounts doesn't store a "loop" option.  Life would be much easier if it 
did.  Right now we speculatively call the ioctl for losetup -d on every block 
device we umount, since it's a NOP everywhere else.

> 2) call oldumount, exit if not successful
> 3) if there was a "loop" entry: rescan /etc/mtab (or /proc/mounts),
>    if entry is gone, then delete loop device.

And if they say umount filename?  (Since that's what they mounted, that we did 
an losetup for them behind their back?)

The current umount seems to work ok.  The main problem with mount (requiring 
the extensive ripping apart) is that sections of code that need to be 
reentrant for -a aren't, and the code is fragile because I never properly 
documented the problem it was trying to solve.  (There's 8 gazillion implicit 
constraints I've had to recreate.  Fixing one breaks another unless you get 
them all together.)  Oh, and the memory allocation _sucked_.  No lifetime 
rules for anything.  That sucks much less snow...

If umount can stil lbe simplified, we can revisit this issue later.

> > When you supply two arguments, mount doesn't check fstab at all.  With
> > the above fstab entry:
> >   mount walrus.img walrus
> >
> >   The mount is rw, not ro, and I have to say -t ext2 -o loop with the gnu
> >   mount.
>
> Makes sense
>
> > "mount -o remount,rw" does require at least one argument.
>
> Makes sense too. More to it, passing _two_ arguments would be rather
> confusing.

But you can.  Probably doesn't do what you expect... :)

> > Using the above fstab entry, and doing this:
> >   mount walrus.img walrus
> >   mount -o remount,rw walrus
> > Complains that block device walrus.img is write protected, and it's
> > mounting read only.  (Understandable, since it did a read-only
> > association on the loopback device.)
>
> Probably best to try rw losetup, but ro mount. Allows for remount,rw.

Our version may already do something like this, I don't remember.  (If we try 
a rw losetup it'll fall back to ro before returning failure.)

> > Following up with
> >   mount -o remount,rw /dev/does-not-exist walrus
> > Complains that /dev/does-not-exist is mounted read only.  Amusing. :)
>
> Two-argument remount makes little sense for me. There must be one argument
> - a mountpoint.

Yup.

Remember, I'm trying to derive a spec based on man pages and observation of 
the existing implementation.  It doesn't return an error, but doesn't provide 
userful behavior, either.

> Can we just print an error message and exit? (OTOH I fear there are
> init scripts from hell which will be upset...)

That's broken enough that I'm not worried about breaking them.  No, what I'm 
wondering about is if there's anybody crazy enough to have "remount" as an 
option in their fstab.  (Since remount means we should have been 
parsing /proc/mounts instead of fstab it would require a restart to support, 
but I refuse to support or worry about it.)

> > Ooh, if I add an entry:
> >   /path/to/walrus.img potato ext2 loop,rw 1 2
> >
> > Then _that_ always resolves the paths, and can be mounted via a relative
> > path. (In fact if _both_ are in there, that one wins even if it's after
> > the first one.  Strange.)  Right, my brain hurts.
> >
> > I consider "mount -t procfs /dev /proc" and then trying to umount /dev to
> > be a "doctor, it hurts when I do this" case.
> >
> > Apparently, when you mount a symlink on another symlink, the kernel
> > dereferences both for you, which makes sense I suppose.  (Of course this
> > is another case where /etc/mtab can get confused...)
>
> I'm glad someone is trying to sanitize this mess.
> Thanks Rob!

I'm sorry it's taking so long.

The code's shaping up decently.  I'm probing into all these corner cases to 
try to understand the problem so I can figure out what we _should_ do, and 
thus find the simplest implementation.  Right now, it looks like this:

First off, forget all the flag parsing into variables, just keep everything as 
strings as long as possible, and append together.  -w is a synonym for "-o 
rw", -r is a synonym for "-o ro".  When you have fstab options, the command 
line options get glued on the end (and thus override fstab).

Mount can have zero, one, or two non-option arguments.

The -a flag only matters in the zero argument case.  With no arguments and no 
-a, we show all mount points.  With no arguments and -a, we mount all.  We 
never need to check -a anywhere else (just the fact we have zero arguments 
means "mount all", we wouldn't have gotten there otherwise), and supplying -a 
with one or two arguments is ignored.

With two arguments, we skip reading fstab.  It doesn't matter what flags that 
might have supplied, they don't apply to this mount.  We can also 
unconditionally turn the second argument into an absolute path, because it 
_must_ be a directory and we don't care what fstab says.

Zero arguments, and two arguments, are now simple.  (They weren't before, but 
I understand 'em better now.)

Another thing that can be simple is the "remount" flag.  In addition to being 
passed on to the kernel, that means we should use mtab rather than fstab, but 
everything else goes normally.  With two options, we just pass it on to the 
kernel.  (That's why that test produced a funky error message, but behaved 
properly.)  With no options, iterate through everything in mtab (which means 
"mount -a -o remount,ro" should remount everything read-only).  With one 
option, we find the existing mount and remount it, but we must find the 
_last_ hit because if there are two things mounted on the same mount point, 
the most recent one is the active mount.  Solution?  When iterating through 
with one option, _always_ use the last hit.

Now we get to the poorly-defined parts: whether or not the first argument 
needs an absolute or relative path, and where.  I'm still working that out a 
bit.

With exactly one argument, matching against fstab seems to use the absolute 
path when the fstab entry starts with "/", and the actual argument when it 
doesn't.  Fine, I can use that.

When mounting a block device backed filesystem, we want to supply absolute 
paths rather than relative paths to the mount syscall, so that absolute path 
shows up in /proc/mounts (which makes life easier for umount).  The thing is, 
how do you know whether the first argument is a path, or a network address, 
or some other kind of identifier?

The current (6-month-old) code in svn tests whether the path exists, and makes 
it absolute if so.  But that's not reliable.  Who knows what kind of trash is 
in the current directory?  False positives are easy.  I learned about this 
when it broke a jffs2 user who was trying to mount a memory technology device 
called "mtd" and was in "/dev" at the time, and there _was_ a /dev/mtd.  
(Although I think I'm calling that one pilot error).  But in general, that 
approach has the potential for false positives, creating absolute paths for 
nodev mounts.

The distinguishing factor is "what does the filesystem want"?  When we're 
mounting a filesystem that _isn't_ block device backed, what we're passing in 
should be verbatim whatever was fed in on the command line.  How do we tell 
the difference?  We can look in /etc/filesystems and /proc/filesystems.  We 
already do that in order to autodetect the filesystem type for auto mounts 
(basically try them all until one sticks), and I can factor that code out and 
re-use it.

This is not a perfect solution, specifically if they try to mount a filesystem 
type where the filesystem driver is in a module that's not currently loaded, 
we won't notice it's a block device backed filesystem and thus won't feed it 
the absolute path.  But we have that same problem autodetecting the 
filesystem type for those, and that's why /etc/filesystems exists: to list 
modular filesystem types.  (And to list ext3 before ext2, but personally I 
call that a kernel bug.)

It doesn't entirely boil down to "simple, clean, reliable behavior", but it's 
an improvement.

Now to go implement it...

> --
> vda

Rob
-- 
Never bet against the cheap plastic solution.