[PATCH] diff: rewrite V2. -1005 bytes

Matheus Izvekov mizvekov at gmail.com
Thu Jan 14 17:20:00 UTC 2010


On 01:34 Thu 14 Jan     , Denys Vlasenko wrote:
> On Wednesday 13 January 2010 17:11, Matheus Izvekov wrote:
> > >From a7a2d7c904ce0ad44017195744b5eea785ae27f5 Mon Sep 17 00:00:00 2001
> > From: Matheus Izvekov <mizvekov at gmail.com>
> > Date: Wed, 13 Jan 2010 13:49:27 -0200
> > Subject: [PATCH] diff: rewrite V2. -1005 bytes
> 
>         llist_t *L_arg = NULL;
> 
>         INIT_G();
> 
>         /* exactly 2 params; collect multiple -L <label>; -U N */
>         opt_complementary = "=2:L::U+";
>         getopt32(argv, "abdiL:NqrsS:tTU:wu"
>                         "p" /* ignored (for compatibility) */,
>                         &L_arg, &s_start, &opt_U_context);
>         argv += optind;
>         while (L_arg) {
>                 if (label[0] && label[1])
>                         bb_show_usage();
>                 if (label[0]) { /* then label[1] is NULL */
> ====>                   free(label[1]);
>                         label[1] = label[0];
>                 }
>                 label[0] = llist_pop(&L_arg);
>         }
> 
> Bug? You are freeing one of argv elements. They are not malloc'ed.

Yes, sorry, that was definitely a bug.

> Performance problem. With ten subdirs named busybox.[123456789a]
> this script:
> 
> #!/bin/sh
> d=`echo *.1`; test -e "$d" && diff -d -urpN *.0 *.1 >1.patch
> d=`echo *.2`; test -e "$d" && diff -d -urpN *.1 *.2 >2.patch
> d=`echo *.3`; test -e "$d" && diff -d -urpN *.2 *.3 >3.patch
> d=`echo *.4`; test -e "$d" && diff -d -urpN *.3 *.4 >4.patch
> d=`echo *.5`; test -e "$d" && diff -d -urpN *.4 *.5 >5.patch
> d=`echo *.6`; test -e "$d" && diff -d -urpN *.5 *.6 >6.patch
> d=`echo *.7`; test -e "$d" && diff -d -urpN *.6 *.7 >7.patch
> d=`echo *.8`; test -e "$d" && diff -d -urpN *.7 *.8 >8.patch
> d=`echo *.9`; test -e "$d" && diff -d -urpN *.8 *.9 >9.patch
> d=`echo *.a`; test -e "$d" && diff -d -urpN *.9 *.a >a.patch
> d=`echo *.b`; test -e "$d" && diff -d -urpN *.a *.b >b.patch
> 
> takes less than one second with current bbox diff:
> 
> # PATH="$PATH" time ./mkdiff
> Command exited with non-zero status 1
> real    0m 0.94s
> user    0m 0.54s
> sys     0m 0.40s
> 
> but new one takes 14 times longer:
> 
> # PATH=".:$PATH" time ./mkdiff
> Command exited with non-zero status 1
> real    0m 14.32s
> user    0m 8.45s
> sys     0m 5.83s

After a round of profiling, what I can see is that almost all of that
performance hit comes from reusing the diff machinery to compare binary
files. After adding back the old code which skipped those, that
performance is regained.

Maybe that was not a good idea after all.

Maybe the code cmp uses to compare files could be moved to libbb, and I
could reuse that?


More information about the busybox mailing list