[BusyBox 0004774]: Bitwise operations in awk applet are done the default signedness of longs, which varies with compilation options/platforms

bugs at busybox.net bugs at busybox.net
Thu Sep 4 20:46:58 UTC 2008


A NOTE has been added to this issue. 
====================================================================== 
http://busybox.net/bugs/view.php?id=4774 
====================================================================== 
Reported By:                benoar
Assigned To:                BusyBox
====================================================================== 
Project:                    BusyBox
Issue ID:                   4774
Category:                   Standards Compliance
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     feedback
====================================================================== 
Date Submitted:             08-27-2008 21:33 PDT
Last Modified:              09-04-2008 13:46 PDT
====================================================================== 
Summary:                    Bitwise operations in awk applet are done the
default signedness of longs, which varies with compilation options/platforms
Description: 
When using bitwise operations in awk applet, the signedness of the value
operated on depends on how busybox is compiled, because these operations
are defined to work on "long". This can lead to "strange" result when long
is signed by default, when using values higher than 2^31-1, i.e. :
echo|awk '{ print and(0x80000000,1) }'
gives 1 when compiled with signed long, whereas it gives 0 when compiled
with unsigned longs by default.

I don't know if there is a standard for bitwise operations in awk
(http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html doesn't
give me a clue), but most of the platform I have tested behave as if long
is unsigned by default.

Furthermore, working on signed values when doing bitwise operations is
awkward.

This may be a gcc bug, which uses the wrong default, I don't know. The
fact is that this bug affects openwrt on big endian platforms which use
signed long by default (I filled
https://dev.openwrt.org/cgi-bin/trac.fcgi/ticket/3946), contrary to
little-endian platform which use unsigned long.

I think the correct solution is to explicitly state that these bitwise
operations operate on unsigned long. A patch is attached to correct this
behavior. It only affects platforms that use signed long by default.
====================================================================== 

---------------------------------------------------------------------- 
 vda - 08-28-08 13:53  
---------------------------------------------------------------------- 
long is always signed in C. Only char has unspecified signedness.

Please give a concrete example of incorrect awk behavior. If GNU awk
produces a result which is different from busybox's awk, that is a good
indication of a bug. 

---------------------------------------------------------------------- 
 vda - 08-28-08 16:16  
---------------------------------------------------------------------- 
Applied to svn, thanks! Also see:

http://busybox.net/downloads/fixes-1.12.0/busybox-1.12.0-awk.patch 

---------------------------------------------------------------------- 
 benoar - 08-31-08 20:41  
---------------------------------------------------------------------- 
I reopen this bug after I made some more investigation on the root cause of
the problem, and to let busybox developers decide what is the right thing
to do.

First, vda, sorry for the signedness confusion : you are right, long is
always signed, there is no such thing as "default signedness" for it, and
the problem doesn't come from there. This is where I was confused.

Actually, the "problem" comes from the cast from double (internal bb awk
representation of numbers) to (unsigned) long. The bug I saw in openwrt
was in fact that different architecture casted to different integral
values for values > 2^31-1.

But I think the reason for this behavior is that there is no rule on how
to cast a value _not_ in the destination's type range, as 0x80000000 >
LONG_MAX ! So, the behavior was undefined, and I got different results on
different archs. This is only a supposition, please correct me if I am
wrong.

So, the patch I sent obviously matches what suits me, but I am not sure it
pleases everybody : now, negative values in bitwise operations get cast to
something undefined. I get 0 on an armeb for example.

To me, using bitwise ops on values >0 and <ULONG_MAX looks good, as
opposed to values >LONG_MIN and <LONG_MAX. But not everybody may agree. If
someone can find some reference on awk's behavior on integers limits
regarding bitwise ops, I'd be gratefull. 

---------------------------------------------------------------------- 
 vda - 09-02-08 01:59  
---------------------------------------------------------------------- 
Try attached 5.patch, it should fix it 

---------------------------------------------------------------------- 
 benoar - 09-02-08 09:41  
---------------------------------------------------------------------- 
What ? It's not by doing some magic casting that you will solve it. I get
the same result as before with your patch, as it eventually gets casted to
unsigned long (the return type of your getvar_i_int function). Furthermore,
this looks far uglier than the previous solution.

If you really want to "extend" the range of bitwise functions, you will
have to create a macro or something to do operations on both types (signed
and unsigned longs). But then, what do you do when the two operands aren't
of the same type ?...

I think the best solution is to choose wether we use signed or unsigned
longs. Not extending it.

BTW, your octal test fails on my ppc (native) and armeb (cross-compiled),
both giving 1235 as a result. 

---------------------------------------------------------------------- 
 vda - 09-03-08 10:36  
---------------------------------------------------------------------- 
> It's not by doing some magic casting that you will solve it.

+       /* Casting doubles to longs is undefined for values outside
+        * of target type range. Try to widen it as much as possible */
+       if (d >= 0)
+               return (unsigned long)d;

Above will correctly handle any value in [0.0, UINT_LONG]

+       return - (long) (unsigned long) (-d);

and if d < 0, we convert (-inf, 0.0) to (0.0, inf) by inverting the sign,
then if it is in (0.0, UINT_LONG], we convert it to unsigned long. Then we
invert it again to compensate for (-d).

Which step does not work for you? Can you add

bb_error_msg("1 step: %f", -d);
bb_error_msg("2 step: %ul", (unsigned long) -d);

etc? 

---------------------------------------------------------------------- 
 benoar - 09-04-08 13:00  
---------------------------------------------------------------------- 
> + /* Casting doubles to longs is undefined for values outside
> + * of target type range. Try to widen it as much as possible */
> + if (d >= 0)
> + return (unsigned long)d;
>
> Above will correctly handle any value in [0.0, UINT_LONG]

Correct.

> + return - (long) (unsigned long) (-d);
>
> and if d < 0, we convert (-inf, 0.0) to (0.0, inf) by inverting the
sign, then if it is in (0.0, UINT_LONG], we convert it to unsigned long.
Then we invert it again to compensate for (-d).

You just forget one last step : the result get implicitly cast to the
return type of your function, which is unsigned long ; and the value
you're catsing is out of its range ! (you negate a positive value, so it
will be < 0). Furthermore, casting integral value like that doesn't change
anything : signed/unsigned long's representation doesn't get changed when
casted to the same kind of type (here, 32bits integral value -- on 32bits
arch, that is). But here, as you're casting a long to an integral type
which is of the "same kind", the result is not "undefined", it's just let
"as is", which is what we want with bitwise operations. Please note that
all what I said is not verified, I am not a compiler internals guru, but
it looks fine to me.

So, your patch effectively extends the range, and the useful bit in it is
that the double gets cast to something meaningful for the destination type
: here, some positive number for unsigned long. But we could do simpler :
just cast the value to a long (where the FP to integral type conversion
takes place), and then let it be casted to unsigned long (the base type of
our bitwise operations). No need for a double negation. Patch is attached
(diff to the current svn).

Sorry for the too quick look at first on your patch. 

---------------------------------------------------------------------- 
 vda - 09-04-08 13:46  
---------------------------------------------------------------------- 
> the result get implicitly cast to the return type of your function, which
is unsigned long ; and the value you're catsing is out of its range ! (you
negate a positive value, so it will be < 0).

But for unsigned long -> unsigned long, unlike double -> long, the cast is
basically a no-op, bit image of the value is not changed. For example, -2L
can be cast to unsigned long and it will have a value of 0xfff...fffe.
"Out of range" clause is not applicable, unlike the case where you cast
double -> long.

> Furthermore, casting integral value like that doesn't change anything :
signed/unsigned long's representation doesn't get changed when casted to
the same kind of type (here, 32bits integral value -- on 32bits arch, that
is).

Exactly. That's what I wanted. My (long) cast there is just to avoid
having an unary minus applied to unsigned value, which feels fishy. Then
it is cast back to unsigned long (implicitly).

> But we could do simpler : just cast the value to a long (where the FP to
integral type conversion takes place)

This will fail for large values. For 32-bit longs, 0xffffffff will become
a double value of 4294967295.0 and then will be cast to long -> out of
range -> undefined! Not good. (It may work on some arches, but it is
purely by chance. I want more robust solution.)

If 5.patch doesn't work for you, can you find out why? Can you add

bb_error_msg("1 step: %f", -d);
bb_error_msg("2 step: %ul", (unsigned long) -d);
bb_error_msg("3 step: %l", (long)(unsigned long) -d);
bb_error_msg("4 step: %ul", (unsigned long)(long)(unsigned long) -d);

etc? 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
08-27-08 21:33  benoar         New Issue                                    
08-27-08 21:33  benoar         Status                   new => assigned     
08-27-08 21:33  benoar         Assigned To               => BusyBox         
08-27-08 21:33  benoar         File Added: bitwise_ops_on_unsigned_long.patch   
                
08-28-08 13:53  vda            Note Added: 0010854                          
08-28-08 16:16  vda            Status                   assigned => closed  
08-28-08 16:16  vda            Note Added: 0010864                          
08-28-08 16:16  vda            Resolution               open => fixed       
08-28-08 16:16  vda            Fixed in Version          => svn             
08-31-08 20:41  benoar         Status                   closed => feedback  
08-31-08 20:41  benoar         Resolution               fixed => reopened   
08-31-08 20:41  benoar         Note Added: 0010914                          
09-02-08 01:58  vda            File Added: 5.patch                          
09-02-08 01:59  vda            Note Added: 0010924                          
09-02-08 09:41  benoar         Note Added: 0010934                          
09-03-08 10:36  vda            Note Added: 0010944                          
09-04-08 13:00  benoar         Note Added: 0011024                          
09-04-08 13:00  benoar         File Added: negative_bitwise_cast_to_long.patch  
                 
09-04-08 13:46  vda            Note Added: 0011034                          
======================================================================




More information about the busybox-cvs mailing list