[Bug 14336] New: busybox sed differs from GNU sed with respect to NUL (0x00)

bugzilla at busybox.net bugzilla at busybox.net
Sun Nov 7 23:44:35 UTC 2021


https://bugs.busybox.net/show_bug.cgi?id=14336

            Bug ID: 14336
           Summary: busybox sed differs from GNU sed with respect to NUL
                    (0x00)
           Product: Busybox
           Version: 1.30.x
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: calestyo at scientia.org
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Hey.

Not sure whether this is a "bug" or just something not defined by POSIX (I'm
not really sure whether POSIX says anything with respect to sed and NUL),... at
least it doesn't seem to be a configure option this time.

I've noted a differing behaviour between busybox' sed and GNU sed with respect
to 0x00:

It seems that GNU sed, leaves any 0x00 (as well as other "binary" characters)
in the current line and respects it when matching.
busybox' sed OTOH, doesn't do this but seems to terminate the string upon such
0x00.


Example Files:
$ hd test-with-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 0a 62 61 7a  |foo.bar.zer..baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-lone-0x00
00000000  66 6f 6f 0a 62 61 72 0a  00 0a 62 61 7a 0a 7a 65  |foo.bar...baz.ze|
00000010  72 00 0a 65 6e 64 0a                              |r..end.|
00000017
$ hd test-with-0x02-and-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 02 00 0a 62 61 7a  |foo.bar.ze...baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-0x00-followed-by-alpha
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 6f 6f 0a 62  |foo.bar.zer.oo.b|
00000010  61 7a 0a 7a 65 72 00 74  74 0a 65 6e 64 0a        |az.zer.tt.end.|
0000001e


GNU sed:
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00 | hd
00000000  7a 65 72 00 0a                                    |zer..|
00000005
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-lone-0x00 | hd
00000000  00 0a                                             |..|
00000002
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x02-and-0x00 | hd
00000000  7a 65 02 00 0a                                    |ze...|
00000005
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00-followed-by-alpha | hd
00000000  7a 65 72 00 6f 6f 0a                              |zer.oo.|
00000007

(Note that GNU sed's -z option is NOT used.)


busybox' sed:
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-lone-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x02-and-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00-followed-by-alpha | hd
$


So it seems that busybox' sed simply does the matching till the 0x00 (which is
perhaps used as string terminator), while GNU sed goes fully down the end of
line (\n).


Though it's worth to bring this to your attention.


Cheers,
Chris.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the busybox-cvs mailing list