[PATCHv2] networking: consolidate the IP checksum code
Joakim Tjernlund
joakim.tjernlund at transmode.se
Thu Sep 8 15:09:23 UTC 2011
Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/08 14:06:59:
>
> Hi Joakim,
>
> On Thu, Sep 08, 2011 at 01:31:31PM +0200, Joakim Tjernlund wrote:
> > > Baruch Siach <baruch at tkos.co.il> wrote on 2011/09/08 07:06:27:
> > > ...
> > >
> > > > > if (nbytes & 1) /* Odd */
> > > > > sum += *(u_char *)ptr; /* one byte only */
> > > >
> > > > This breaks big endian systems like PowerPC. The assignment via pointer dance
> > > > of the original implementation is required, because we need the last byte to
> > > > be considered as MSB of a 16bit word, as if we had one more '\0' byte in
> > > > buffer. Note that the generated checksum is the same on big and little endian
> > > > machines in terms of memory representation. They differ however in their
> > > > numeric representation. So, for example the checksum of
> > > >
> > > > uint8_t buf[] = {0x12, 0x34, 0x56};
> > > >
> > > > is 0xcb97 on little endian machines, and 0x97cb on big endian machines. This
> > > > is OK since we use the memory representation of the network packet.
> > >
> > > Ouch, even RFC 1071 got it wrong then.
> >
> > I just checked and all this ptr casting confuses gcc and generates
> > bad code. Using an union is better, both for ppc and x86.
> >
> > This is only compile tested:
> >
> > if (len & 1) {
> > union uu {
> > uint16_t val;
> > uint8_t b[2];
> > } tmp = {0};
> > /* Make sure that the left-over byte is added correctly both
> > * with little and big endian hosts */
> > tmp.b[0] = *(uint8_t*)w;
> > sum += tmp.val;
> > }
>
> Run tested on x86_64 and PowerPC (with odd sized DHCP packets). However, I see
> no significant code size improvement:
>
> x86_64 : -4 bytes
> ARM : +4 bytes
> PowerPC: +16 bytes
The wonders of gcc :) It keeps getting worse with every new major version.
For me the powerpc improves from:
andi. 11,4,1
beq- 0,.L5
lbz 9,0(9)
li 11,0
sth 11,8(1)
stb 9,8(1)
lhz 9,8(1)
add 0,0,9
to
andi. 11,4,1
beq- 0,.L13
lbz 9,0(9)
rlwinm 9,9,8,16,23
add 0,0,9
with gcc 3.4.6 and gcc 4.4.4 using -O2
x86 has a similar improvement
Before(type casting)
.L3:
testb $1, 12(%ebp)
je .L5
movw $0, -14(%ebp)
movzbl (%eax), %eax
movb %al, -14(%ebp)
movzwl -14(%ebp), %eax
addl %eax, %ecx
After (union)
testb $1, 12(%ebp)
je .L13
movzbl (%eax), %eax
addl %eax, %ecx
with gcc 4.4.5 using -O2
Jocke
More information about the busybox
mailing list