[PATCHv2] networking: consolidate the IP checksum code

Joakim Tjernlund joakim.tjernlund at transmode.se
Thu Sep 8 15:09:23 UTC 2011


Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/08 14:06:59:
>
> Hi Joakim,
>
> On Thu, Sep 08, 2011 at 01:31:31PM +0200, Joakim Tjernlund wrote:
> > > Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/08 07:06:27:
> > > ...
> > >
> > > > >        if (nbytes & 1) /* Odd */
> > > > >          sum += *(u_char *)ptr;   /* one byte only */
> > > >
> > > > This breaks big endian systems like PowerPC. The assignment via pointer dance
> > > > of the original implementation is required, because we need the last byte to
> > > > be considered as MSB of a 16bit word, as if we had one more '\0' byte in
> > > > buffer.  Note that the generated checksum is the same on big and little endian
> > > > machines in terms of memory representation.  They differ however in their
> > > > numeric representation.  So, for example the checksum of
> > > >
> > > >     uint8_t buf[] = {0x12, 0x34, 0x56};
> > > >
> > > > is 0xcb97 on little endian machines, and 0x97cb on big endian machines. This
> > > > is OK since we use the memory representation of the network packet.
> > >
> > > Ouch, even RFC 1071 got it wrong then.
> >
> > I just checked and all this ptr casting confuses gcc and generates
> > bad code. Using an union is better, both for ppc and x86.
> >
> > This is only compile tested:
> >
> > if (len & 1) {
> >        union uu {
> >           uint16_t val;
> >           uint8_t b[2];
> >        } tmp = {0};
> >         /* Make sure that the left-over byte is added correctly both
> >          * with little and big endian hosts */
> >         tmp.b[0] = *(uint8_t*)w;
> >         sum += tmp.val;
> >     }
>
> Run tested on x86_64 and PowerPC (with odd sized DHCP packets). However, I see
> no significant code size improvement:
>
> x86_64 : -4  bytes
> ARM    : +4  bytes
> PowerPC: +16 bytes

The wonders of gcc :) It keeps getting worse with every new major version.
For me the powerpc improves from:
	andi. 11,4,1
	beq- 0,.L5
	lbz 9,0(9)
	li 11,0
	sth 11,8(1)
	stb 9,8(1)
	lhz 9,8(1)
	add 0,0,9
to
	andi. 11,4,1
	beq- 0,.L13
	lbz 9,0(9)
	rlwinm 9,9,8,16,23
	add 0,0,9
with gcc 3.4.6 and gcc 4.4.4 using -O2


x86 has a similar improvement
Before(type casting)
.L3:
	testb	$1, 12(%ebp)
	je	.L5
	movw	$0, -14(%ebp)
	movzbl	(%eax), %eax
	movb	%al, -14(%ebp)
	movzwl	-14(%ebp), %eax
	addl	%eax, %ecx

After (union)
	testb	$1, 12(%ebp)
	je	.L13
	movzbl	(%eax), %eax
	addl	%eax, %ecx
with gcc 4.4.5 using -O2

 Jocke



More information about the busybox mailing list