[PATCH] diff: rewrite V2. -1005 bytes
Matheus Izvekov
mizvekov at gmail.com
Thu Jan 14 17:20:00 UTC 2010
On 01:34 Thu 14 Jan , Denys Vlasenko wrote:
> On Wednesday 13 January 2010 17:11, Matheus Izvekov wrote:
> > >From a7a2d7c904ce0ad44017195744b5eea785ae27f5 Mon Sep 17 00:00:00 2001
> > From: Matheus Izvekov <mizvekov at gmail.com>
> > Date: Wed, 13 Jan 2010 13:49:27 -0200
> > Subject: [PATCH] diff: rewrite V2. -1005 bytes
>
> llist_t *L_arg = NULL;
>
> INIT_G();
>
> /* exactly 2 params; collect multiple -L <label>; -U N */
> opt_complementary = "=2:L::U+";
> getopt32(argv, "abdiL:NqrsS:tTU:wu"
> "p" /* ignored (for compatibility) */,
> &L_arg, &s_start, &opt_U_context);
> argv += optind;
> while (L_arg) {
> if (label[0] && label[1])
> bb_show_usage();
> if (label[0]) { /* then label[1] is NULL */
> ====> free(label[1]);
> label[1] = label[0];
> }
> label[0] = llist_pop(&L_arg);
> }
>
> Bug? You are freeing one of argv elements. They are not malloc'ed.
Yes, sorry, that was definitely a bug.
> Performance problem. With ten subdirs named busybox.[123456789a]
> this script:
>
> #!/bin/sh
> d=`echo *.1`; test -e "$d" && diff -d -urpN *.0 *.1 >1.patch
> d=`echo *.2`; test -e "$d" && diff -d -urpN *.1 *.2 >2.patch
> d=`echo *.3`; test -e "$d" && diff -d -urpN *.2 *.3 >3.patch
> d=`echo *.4`; test -e "$d" && diff -d -urpN *.3 *.4 >4.patch
> d=`echo *.5`; test -e "$d" && diff -d -urpN *.4 *.5 >5.patch
> d=`echo *.6`; test -e "$d" && diff -d -urpN *.5 *.6 >6.patch
> d=`echo *.7`; test -e "$d" && diff -d -urpN *.6 *.7 >7.patch
> d=`echo *.8`; test -e "$d" && diff -d -urpN *.7 *.8 >8.patch
> d=`echo *.9`; test -e "$d" && diff -d -urpN *.8 *.9 >9.patch
> d=`echo *.a`; test -e "$d" && diff -d -urpN *.9 *.a >a.patch
> d=`echo *.b`; test -e "$d" && diff -d -urpN *.a *.b >b.patch
>
> takes less than one second with current bbox diff:
>
> # PATH="$PATH" time ./mkdiff
> Command exited with non-zero status 1
> real 0m 0.94s
> user 0m 0.54s
> sys 0m 0.40s
>
> but new one takes 14 times longer:
>
> # PATH=".:$PATH" time ./mkdiff
> Command exited with non-zero status 1
> real 0m 14.32s
> user 0m 8.45s
> sys 0m 5.83s
After a round of profiling, what I can see is that almost all of that
performance hit comes from reusing the diff machinery to compare binary
files. After adding back the old code which skipped those, that
performance is regained.
Maybe that was not a good idea after all.
Maybe the code cmp uses to compare files could be moved to libbb, and I
could reuse that?
More information about the busybox
mailing list