[PATCH] unify itoa

Denis Vlasenko vda.linux at googlemail.com
Wed Jul 12 14:08:15 UTC 2006


On Tuesday 11 July 2006 21:42, Rob Landley wrote:
> > I still dislike the speed, i/=10 eats too much CPU...
> 
> It's a tight L1 cache-hot loop.  In a pipeline like hexdump | grep it's 
> unlikely to be even 1% of the total CPU overhead.

I'm sure that there are tons of things to improve in hexdump and/or
tcpdump. Last I looked, tcpdump was using printf _far too much_,
not even coalescing adjacent calls into single one with bigger
format string... oh well...

> What platform have you  
> profiled this on?

i386

> I don't believe integer division has boiled down to a  
> subtraction loop on any interesting processor for a quarter-century (I 
> remember that multiplication could be done in something like 6 clock cycles 
> without being _that_ clever, although this is a vague recollection from my 
> undergraduate days circa 1993), but then I'm not entirely certain how some of 
> the really low-transistor-budget embedded processors (like ARM) are 
> implemented.  I doubt they're _stupid_, though.  Many of them have floating 
> point units, there's no reason for them to stint on integer arithmetic 
> operations...
> 
> Could I have some benchmark numbers on the system you're worried about, 
> please?  I already mentioned I tried the following here:
> 
> #include <stdio.h>
> 
> // 0x4f bytes
> void utoa_to_buf(unsigned n, char *buf, int buflen)
> {
>         int i, out = 0;
>         for (i=1000000000; i; i/=10) {
>                 int res = n/i;
> 
>                 if (res || out || i == 1) {
>                         out++;
>                         n -= res*i;
>                         *buf++ = '0' + res;
>                 }
>         }
>         *buf = 0;
> }
> 
> int main(int argc, char *argv[])
> {
>   int i;
>   char blah[16];
> 
>   for(i=0;i<1000000;i++) {
>     utoa_to_buf(i, blah, 16);
>   }
> }
> 
> Ran that under "time", and got 0.707 seconds.  1000000/.707 is 1.4 million 

Got 0.944 seconds here.

Let's use the below one which is 1.5 faster on i386, has bound checking
and truncates most significant digits if buffer is too small, not least
significant ones:

// 0x52 bytes
// time: user 0m0.572s
void utoa_to_buf_test(unsigned n, char *buf, int buflen)
{
    unsigned i, out = 0;
    if (buflen > 0) {
        buflen -= 11;
        for (i=1000000000; i; i = (i*429496730ULL)>>32 /* divide by 10, fast */) {
            int res = n/i;
                n %= i; // gcc reuses remainder produced in division
                if (res || out || i == 1) {
                    out++;
                    if (buflen >= 0)
                        *buf++ = '0' + res;
                }
            buflen++;
        }
        *buf = 0;
    }
}

Correctness check for it:

sprintf='12345678' count=27599
utoa_to_buf='12345678' count=62667
utoa='12345678' count=59227
utoa_to_buf_test='12345678' count=41303
utoa_to_buf_test(0)='0'
utoa_to_buf_test(123)='123'
utoa_to_buf_test(100223)='100223'
utoa_to_buf_test(100223,3)='23'
utoa_to_buf_test(100223,2)='3'
utoa_to_buf_test(100223,1)=''

Please apply the attached patch.
--
vda
-------------- next part --------------
A non-text attachment was scrubbed...
Name: itoa.diff
Type: text/x-diff
Size: 930 bytes
Desc: not available
Url : http://lists.busybox.net/pipermail/busybox/attachments/20060712/15c57894/attachment.bin 


More information about the busybox mailing list