[PATCHv2] networking: consolidate the IP checksum code

Baruch Siach baruch at tkos.co.il
Thu Sep 8 05:06:27 UTC 2011


Hi Joakim,

On Wed, Sep 07, 2011 at 10:09:12AM +0200, Joakim Tjernlund wrote:
> Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/07 09:46:54:
> > On Wed, Sep 07, 2011 at 09:05:36AM +0200, Joakim Tjernlund wrote:
> > > Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/07 06:29:32:
> > > > Hi Joakim,
> > > > On Mon, Sep 05, 2011 at 10:09:14PM +0200, Joakim Tjernlund wrote:
> > > > > > From: Baruch Siach <baruch at tkos.co.il>
> > > > > >
> > > > > > Use a single IP checksum routine for ping, traceroute and udhcp.
> > > > > >
> > > > > > Signed-off-by: Baruch Siach <baruch at tkos.co.il>
> > > > > > ---
> > > > > >
> > > > > > Changes from v1:
> > > > > >    Fix inet_cksum() with odd length on big endian systems
> > > > > >    Remove declaration of removed udhcp_checksum()
> > > > >
> > > > > This looks like it is from quagga. I cleaned that up/optimized it
> > > > > some years ago into this(you might want to fix the types):
> > > >
> > > > What is the advantage of this implementation over the one I suggested (which I
> > > > took from the traceroute code)?

[snip]

> > > Optimizes better(at least on RISC like archs).
> >
> > I get the following 'readelf -s' results for different toolchains with '-Os':
> >
> > x86_64 on current Debian testing (gcc 4.6.1):
> >      8: 0000000000000000    78 FUNC    GLOBAL DEFAULT    1 in_cksum
> >      9: 000000000000004e    66 FUNC    GLOBAL DEFAULT    1 inet_cksum
> >
> > ARM with the CodeSourcery 2010q1 toolchain (gcc 4.4.1):
> >     12: 00000000   100 FUNC    GLOBAL DEFAULT    1 in_cksum
> >     14: 00000064   104 FUNC    GLOBAL DEFAULT    1 inet_cksum
> >
> > PowerPC with the CodeSourcery 2010.09 toolchain (gcc 4.5.1):
> >      8: 00000000    96 FUNC    GLOBAL DEFAULT    1 in_cksum
> >      9: 00000060   124 FUNC    GLOBAL DEFAULT    1 inet_cksum
> >
> > I don't see a definite winner here.
> 
> I do. Both ARM and PPC is smaller(PPC is quite much smaller).
> 
> You could try change it to
> 		 count = nbytes >> 1; /* div by 2 */
> 		 for(; count; --count)
> 		   sum += *ptr++;

OK.

> 		 if (nbytes & 1) /* Odd */
> 		   sum += *(u_char *)ptr;   /* one byte only */

This breaks big endian systems like PowerPC. The assignment via pointer dance 
of the original implementation is required, because we need the last byte to 
be considered as MSB of a 16bit word, as if we had one more '\0' byte in 
buffer.  Note that the generated checksum is the same on big and little endian 
machines in terms of memory representation.  They differ however in their 
numeric representation.  So, for example the checksum of

    uint8_t buf[] = {0x12, 0x34, 0x56};

is 0xcb97 on little endian machines, and 0x97cb on big endian machines. This 
is OK since we use the memory representation of the network packet.

Here are the update results with only the first change applied 
(new_inet_cksum), against the original code:

x86_64 on current Debian testing (gcc 4.6.1):
    49: 0000000000400514    78 FUNC    GLOBAL DEFAULT   14 in_cksum
    50: 0000000000400562    66 FUNC    GLOBAL DEFAULT   14 inet_cksum
    60: 00000000004005a4    86 FUNC    GLOBAL DEFAULT   14 new_inet_cksum

ARM with the CodeSourcery 2010q1 toolchain (gcc 4.4.1):
   105: 0000842c   100 FUNC    GLOBAL DEFAULT   12 in_cksum
   102: 00008490   104 FUNC    GLOBAL DEFAULT   12 inet_cksum
    90: 000084f8   104 FUNC    GLOBAL DEFAULT   12 new_inet_cksum

PowerPC with the CodeSourcery 2010.09 toolchain (gcc 4.5.1):
    71: 1000043c    96 FUNC    GLOBAL DEFAULT   11 in_cksum <-- Broken
    69: 1000049c   124 FUNC    GLOBAL DEFAULT   11 inet_cksum
    60: 10000518   120 FUNC    GLOBAL DEFAULT   11 new_inet_cksum

Overall this change seems to make the generated code significantly worse on 
x86_64, while having almost no impact on ARM and PowerPC.

Here is the complete new_inet_cksum() routine for reference:

uint16_t new_inet_cksum(uint16_t *addr, int len)
{
    int wcount = len >> 1; /* div by 2 */
    uint16_t *w = addr;
    uint16_t answer;
    int sum = 0;

    /*
     * Our algorithm is simple, using a 32 bit accumulator (sum),
     * we add sequential 16 bit words to it, and at the end, fold
     * back all the carry bits from the top 16 bits into the lower
     * 16 bits.
     */
    for (; wcount; wcount--)
        sum += *w++;

    /* mop up an odd byte, if necessary */
    if (len & 1) {
        /* Make sure that the left-over byte is added correctly both
         * with little and big endian hosts */
        uint16_t tmp = 0;
        *(uint8_t*)&tmp = *(uint8_t*)w;
        sum += tmp;
    }

    /* add back carry outs from top 16 bits to low 16 bits */
    sum = (sum >> 16) + (sum & 0xffff);     /* add hi 16 to low 16 */
    sum += (sum >> 16);                     /* add carry */
    answer = ~sum;                          /* truncate to 16 bits */
    return answer;
}

baruch

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch at tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -


More information about the busybox mailing list