[PATCHv2] networking: consolidate the IP checksum code
Baruch Siach
baruch at tkos.co.il
Thu Sep 8 05:06:27 UTC 2011
Hi Joakim,
On Wed, Sep 07, 2011 at 10:09:12AM +0200, Joakim Tjernlund wrote:
> Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/07 09:46:54:
> > On Wed, Sep 07, 2011 at 09:05:36AM +0200, Joakim Tjernlund wrote:
> > > Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/07 06:29:32:
> > > > Hi Joakim,
> > > > On Mon, Sep 05, 2011 at 10:09:14PM +0200, Joakim Tjernlund wrote:
> > > > > > From: Baruch Siach <baruch at tkos.co.il>
> > > > > >
> > > > > > Use a single IP checksum routine for ping, traceroute and udhcp.
> > > > > >
> > > > > > Signed-off-by: Baruch Siach <baruch at tkos.co.il>
> > > > > > ---
> > > > > >
> > > > > > Changes from v1:
> > > > > > Fix inet_cksum() with odd length on big endian systems
> > > > > > Remove declaration of removed udhcp_checksum()
> > > > >
> > > > > This looks like it is from quagga. I cleaned that up/optimized it
> > > > > some years ago into this(you might want to fix the types):
> > > >
> > > > What is the advantage of this implementation over the one I suggested (which I
> > > > took from the traceroute code)?
[snip]
> > > Optimizes better(at least on RISC like archs).
> >
> > I get the following 'readelf -s' results for different toolchains with '-Os':
> >
> > x86_64 on current Debian testing (gcc 4.6.1):
> > 8: 0000000000000000 78 FUNC GLOBAL DEFAULT 1 in_cksum
> > 9: 000000000000004e 66 FUNC GLOBAL DEFAULT 1 inet_cksum
> >
> > ARM with the CodeSourcery 2010q1 toolchain (gcc 4.4.1):
> > 12: 00000000 100 FUNC GLOBAL DEFAULT 1 in_cksum
> > 14: 00000064 104 FUNC GLOBAL DEFAULT 1 inet_cksum
> >
> > PowerPC with the CodeSourcery 2010.09 toolchain (gcc 4.5.1):
> > 8: 00000000 96 FUNC GLOBAL DEFAULT 1 in_cksum
> > 9: 00000060 124 FUNC GLOBAL DEFAULT 1 inet_cksum
> >
> > I don't see a definite winner here.
>
> I do. Both ARM and PPC is smaller(PPC is quite much smaller).
>
> You could try change it to
> count = nbytes >> 1; /* div by 2 */
> for(; count; --count)
> sum += *ptr++;
OK.
> if (nbytes & 1) /* Odd */
> sum += *(u_char *)ptr; /* one byte only */
This breaks big endian systems like PowerPC. The assignment via pointer dance
of the original implementation is required, because we need the last byte to
be considered as MSB of a 16bit word, as if we had one more '\0' byte in
buffer. Note that the generated checksum is the same on big and little endian
machines in terms of memory representation. They differ however in their
numeric representation. So, for example the checksum of
uint8_t buf[] = {0x12, 0x34, 0x56};
is 0xcb97 on little endian machines, and 0x97cb on big endian machines. This
is OK since we use the memory representation of the network packet.
Here are the update results with only the first change applied
(new_inet_cksum), against the original code:
x86_64 on current Debian testing (gcc 4.6.1):
49: 0000000000400514 78 FUNC GLOBAL DEFAULT 14 in_cksum
50: 0000000000400562 66 FUNC GLOBAL DEFAULT 14 inet_cksum
60: 00000000004005a4 86 FUNC GLOBAL DEFAULT 14 new_inet_cksum
ARM with the CodeSourcery 2010q1 toolchain (gcc 4.4.1):
105: 0000842c 100 FUNC GLOBAL DEFAULT 12 in_cksum
102: 00008490 104 FUNC GLOBAL DEFAULT 12 inet_cksum
90: 000084f8 104 FUNC GLOBAL DEFAULT 12 new_inet_cksum
PowerPC with the CodeSourcery 2010.09 toolchain (gcc 4.5.1):
71: 1000043c 96 FUNC GLOBAL DEFAULT 11 in_cksum <-- Broken
69: 1000049c 124 FUNC GLOBAL DEFAULT 11 inet_cksum
60: 10000518 120 FUNC GLOBAL DEFAULT 11 new_inet_cksum
Overall this change seems to make the generated code significantly worse on
x86_64, while having almost no impact on ARM and PowerPC.
Here is the complete new_inet_cksum() routine for reference:
uint16_t new_inet_cksum(uint16_t *addr, int len)
{
int wcount = len >> 1; /* div by 2 */
uint16_t *w = addr;
uint16_t answer;
int sum = 0;
/*
* Our algorithm is simple, using a 32 bit accumulator (sum),
* we add sequential 16 bit words to it, and at the end, fold
* back all the carry bits from the top 16 bits into the lower
* 16 bits.
*/
for (; wcount; wcount--)
sum += *w++;
/* mop up an odd byte, if necessary */
if (len & 1) {
/* Make sure that the left-over byte is added correctly both
* with little and big endian hosts */
uint16_t tmp = 0;
*(uint8_t*)&tmp = *(uint8_t*)w;
sum += tmp;
}
/* add back carry outs from top 16 bits to low 16 bits */
sum = (sum >> 16) + (sum & 0xffff); /* add hi 16 to low 16 */
sum += (sum >> 16); /* add carry */
answer = ~sum; /* truncate to 16 bits */
return answer;
}
baruch
--
~. .~ Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
- baruch at tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
More information about the busybox
mailing list