[BusyBox] cp.c SUSv3 compliance
Rob Landley
rob at landley.net
Sun Dec 12 07:01:35 UTC 2004
On Sunday 12 December 2004 01:02 am, Glenn McGrath wrote:
> On Sat, 11 Dec 2004 19:24:03 -0500
>
> Rob Landley <rob at landley.net> wrote:
> > I remember thinking that it would be nice if cp and tar and such
> > understood about holes in files, but that's been about the limit of my
> > involvement in it.
>
> Do you know of any references describing holes in files, ive heard
> about, but dont understand them.
It's basically a chunk of the file that's never been written to. Open a file,
seek past the end of it, and write data. The bit in between that you never
wrote to reads as zeroes, but has no blocks allocated to it. (This saves
disk space.)
A perennial complaint on linux-kernel is that there's no way to punch a hole
in an existing file. You can't say "deallocate from here to here". You can
write zeroes over it, but can't actualy retroactively make it be a hole that
doesn't take up space on disk. (Last I checked, anyway. Maybe they added a
new syscall or something since 2.6.0 shipped.)
Try doing df, then do something along the lines of:
int fp=open("blah",O_RDWR|O_CREAT);
lseek(fd,1000000000,SEEK_SET);
write(fd,"woot",4);
close(fp);
And now do an "ls -l" and another df.
Viola: a stringed instrument.
(P.S. There are all sorts of fun corner cases when you start abusing this. A
file with a big hole in it that's then mmaped writeable can exhaust disk
space by writing to memory. The general feeling is that something like
sigbus is the appropriate thing to do here, it's sort of like the OOM killer
taking the sucker out, only very specific to the program that just had a
problem. I believe swapon also checks to make sure that any swap files you
try to add contain no holes, and I wouldn't be at all suprised if the actual
error message in that case was "what is WRONG with you?" :)
But in general, it's pretty straightforward.
> It seems a bit strange to me, how do you tell the difference between a
> hole in a file and a deliberate string of 0's such as at the end of a
> tar entry... or are strings of 0's automatically considered to be holes
> wether 0's were written or not ?
Well, functionally any consecutive string of null bytes should be considered a
potential (if not actual) hole. If you write to the intervening space later,
it'll retroactively allocate sectors to it. (Even if you write zeroes to it.
If you get into filesystem implementation there's crud about "spans", but you
get that when it can't allocate enough space for your whole file in one go
anyway. You wind up with a fragmented file if you didn't write the bits in
order. Woo. Tragedy.)
But other than that it's pretty transparent.
There are magic ways to see which sectors are allocated to a given chunk of a
file, and lilo's "geometry.c" does that if you're really in the mood to throw
up at some unintelligible source code. (I just added a "length=" option to
lilo, so I can append loopback mountable root partitions to a kernel that I
can mount with an offset via "losetup -o", but only have lilo load the kernel
part. This was a three line change. Figuring out how to make this three
line change took an entire evening.)
But for something like tar or cp, you don't need it. When you're writing
data, test for consecutive \0 bytes and every time you find enough of them,
seek past it.
You probably want to make a new thingy in libbb to do this, some kind of
holey_writes(FILE *batman, void *data, int length) function. That way cp
could use it, tar could use it, cpio could use it...
P.S. Why on EARTH do we generally use ascii-bloat FILE * and not the int fd
stuff of the low level unix I/O functions? I very vaguely remember asking
this at some point in the past, and it led to me finding the existence of the
dprintf, which busybox already uses in a number of places. And we even
implement our own vdprintf in libbbb just in case we're linked against
libc5...
> Glenn
Rob
More information about the busybox
mailing list