[PATCH] diff: rewrite V2. -1005 bytes

Matheus Izvekov mizvekov at gmail.com
Fri Jan 15 02:41:32 UTC 2010


On 03:23 Fri 15 Jan     , Denys Vlasenko wrote:
> On Thursday 14 January 2010 18:20, Matheus Izvekov wrote:
> > > takes less than one second with current bbox diff:
> > > 
> > > # PATH="$PATH" time ./mkdiff
> > > Command exited with non-zero status 1
> > > real    0m 0.94s
> > > user    0m 0.54s
> > > sys     0m 0.40s
> > > 
> > > but new one takes 14 times longer:
> > > 
> > > # PATH=".:$PATH" time ./mkdiff
> > > Command exited with non-zero status 1
> > > real    0m 14.32s
> > > user    0m 8.45s
> > > sys     0m 5.83s
> > 
> > After a round of profiling, what I can see is that almost all of that
> > performance hit comes from reusing the diff machinery to compare binary
> > files. After adding back the old code which skipped those, that
> > performance is regained.
> > 
> > Maybe that was not a good idea after all.
> > 
> > Maybe the code cmp uses to compare files could be moved to libbb, and I
> > could reuse that?
> 
> I looked at cmp code. It's too big and too slow (since it needs to handle
> -s -and -l). I guess it's better to write up separate code.
> It should be trivial - two largish reads from both files + memcpy, right?
> --
> vda
Yeah I am going to do that. It turns out most of our performance hit
comes from not doing an initial byte by byte comparison, which is
actually very fast, and worth it. If I add that up, and plus your much
faster rewritten read_token, we end up beating the old diff by 1/3.


More information about the busybox mailing list