[PATCH] diff: rewrite V2. -1005 bytes

Matheus Izvekov mizvekov at gmail.com
Thu Jan 14 14:25:46 UTC 2010


On 04:14 Thu 14 Jan     , Denys Vlasenko wrote:
> On Thursday 14 January 2010 03:01, Matheus Izvekov wrote:
> > On 01:34 Thu 14 Jan     , Denys Vlasenko wrote:
> > > Attached patch cuts off ~250 more bytes:
> > > 
> > > function                                             old     new   delta
> > > read_token                                           176     183      +7
> > > diff_main                                           1265    1198     -67
> > > diff                                                3033    2847    -186
> > > ------------------------------------------------------------------------------
> > > (add/remove: 0/0 grow/shrink: 1/2 up/down: 7/-253)           Total: -246 bytes
> > 
> > Actually, you did notice EOF was a token, but then you reused the eof
> > flag in the 8th bit position to represent it.
> > Problem is, with flag -b set, the user does not see that token,
> > because it is considered a space, and it should treat all kinds
> > of spaces as if they were the same.
> > 
> > To sum it up, all flags should be moved up 1 bit and 0x1ff should be
> > used instead of 0xff as the mask.
> 
> Can you point out where exactly is the bug:
> 
> typedef int token_t;
> enum {
>         SHIFT_EOF = (sizeof(token_t)*8 - 8) - 1,
          remove this line
>         TOK_EOF   = 1 << 8,
          TOK_EOF   = 1 << 9,
>         TOK_EOL   = 1 << 9,
          TOK_EOL   = 1 << 10,
>         TOK_EMPTY = 1 << 10,
          TOK_EMPTY = 1 << 11,
>         TOK_SPACE = 1 << 11,
          TOK_SPACE = 1 << 12,
>         TOK_PRINT = 1 << 12,
          TOK_PRINT = 1 << 13,
          TOK_MASK = TOK_EOF - 1,
> };
#define TOK2CHAR(t) ((t) & TOK_MASK)
> static int read_token(FILE *fp, token_t tok)
> {
>         tok |= TOK_EMPTY;
>         while (!(tok & TOK_EOL)) {
>                 bool is_space;
>                 int t = fgetc(fp);
> 
>                 is_space = t == EOF || isspace(t);
> 
>                 /* Printable? */
>                 tok &= ~(TOK_PRINT + 0xff);
>                 tok |= ((is_space || t >= 0x20) * TOK_PRINT);
> 
>                 /* Set char value (low 8 bits);
>                  * if t == EOF (-1), set both TOK_EOF and TOK_EOL bits */
>                 tok |= (t & (TOK_EOF + TOK_EOL + 0x1ff));
                  tok |= (t & (TOK_EOF + TOK_EOL + TOK_MASK));
> 
>                 if (t == '\n')
>                         tok |= TOK_EOL;
> 
>                 if ((option_mask32 & FLAG(w)) && is_space)
>                         continue;
> 
>                 if (option_mask32 & FLAG(b)) {
>                         if (tok & TOK_SPACE) {
>                                 if (is_space)
>                                         continue;
>                                 tok &= ~TOK_SPACE;
>                         } else if (is_space) {
>                                 tok &= ~0xff;
                                  tok &= ~TOK_MASK;
>                                 tok |= TOK_SPACE + ' ';
>                         }
>                 }
>                 tok &= ~TOK_EMPTY;
>                 break;
>         }
>         return tok;
> }
> 
> 
> --
> vda

Something like this.

By the way, I did the profiling and most of the time is spent on
read_token and hash_line, as expected.
Will cook something up to improve this.


More information about the busybox mailing list