[PATCH] diff: rewrite V2. -1005 bytes
Matheus Izvekov
mizvekov at gmail.com
Thu Jan 14 14:25:46 UTC 2010
On 04:14 Thu 14 Jan , Denys Vlasenko wrote:
> On Thursday 14 January 2010 03:01, Matheus Izvekov wrote:
> > On 01:34 Thu 14 Jan , Denys Vlasenko wrote:
> > > Attached patch cuts off ~250 more bytes:
> > >
> > > function old new delta
> > > read_token 176 183 +7
> > > diff_main 1265 1198 -67
> > > diff 3033 2847 -186
> > > ------------------------------------------------------------------------------
> > > (add/remove: 0/0 grow/shrink: 1/2 up/down: 7/-253) Total: -246 bytes
> >
> > Actually, you did notice EOF was a token, but then you reused the eof
> > flag in the 8th bit position to represent it.
> > Problem is, with flag -b set, the user does not see that token,
> > because it is considered a space, and it should treat all kinds
> > of spaces as if they were the same.
> >
> > To sum it up, all flags should be moved up 1 bit and 0x1ff should be
> > used instead of 0xff as the mask.
>
> Can you point out where exactly is the bug:
>
> typedef int token_t;
> enum {
> SHIFT_EOF = (sizeof(token_t)*8 - 8) - 1,
remove this line
> TOK_EOF = 1 << 8,
TOK_EOF = 1 << 9,
> TOK_EOL = 1 << 9,
TOK_EOL = 1 << 10,
> TOK_EMPTY = 1 << 10,
TOK_EMPTY = 1 << 11,
> TOK_SPACE = 1 << 11,
TOK_SPACE = 1 << 12,
> TOK_PRINT = 1 << 12,
TOK_PRINT = 1 << 13,
TOK_MASK = TOK_EOF - 1,
> };
#define TOK2CHAR(t) ((t) & TOK_MASK)
> static int read_token(FILE *fp, token_t tok)
> {
> tok |= TOK_EMPTY;
> while (!(tok & TOK_EOL)) {
> bool is_space;
> int t = fgetc(fp);
>
> is_space = t == EOF || isspace(t);
>
> /* Printable? */
> tok &= ~(TOK_PRINT + 0xff);
> tok |= ((is_space || t >= 0x20) * TOK_PRINT);
>
> /* Set char value (low 8 bits);
> * if t == EOF (-1), set both TOK_EOF and TOK_EOL bits */
> tok |= (t & (TOK_EOF + TOK_EOL + 0x1ff));
tok |= (t & (TOK_EOF + TOK_EOL + TOK_MASK));
>
> if (t == '\n')
> tok |= TOK_EOL;
>
> if ((option_mask32 & FLAG(w)) && is_space)
> continue;
>
> if (option_mask32 & FLAG(b)) {
> if (tok & TOK_SPACE) {
> if (is_space)
> continue;
> tok &= ~TOK_SPACE;
> } else if (is_space) {
> tok &= ~0xff;
tok &= ~TOK_MASK;
> tok |= TOK_SPACE + ' ';
> }
> }
> tok &= ~TOK_EMPTY;
> break;
> }
> return tok;
> }
>
>
> --
> vda
Something like this.
By the way, I did the profiling and most of the time is spent on
read_token and hash_line, as expected.
Will cook something up to improve this.
More information about the busybox
mailing list