[Bug 14336] New: busybox sed differs from GNU sed with respect to NUL (0x00)
bugzilla at busybox.net
bugzilla at busybox.net
Sun Nov 7 23:44:35 UTC 2021
https://bugs.busybox.net/show_bug.cgi?id=14336
Bug ID: 14336
Summary: busybox sed differs from GNU sed with respect to NUL
(0x00)
Product: Busybox
Version: 1.30.x
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5
Component: Other
Assignee: unassigned at busybox.net
Reporter: calestyo at scientia.org
CC: busybox-cvs at busybox.net
Target Milestone: ---
Hey.
Not sure whether this is a "bug" or just something not defined by POSIX (I'm
not really sure whether POSIX says anything with respect to sed and NUL),... at
least it doesn't seem to be a configure option this time.
I've noted a differing behaviour between busybox' sed and GNU sed with respect
to 0x00:
It seems that GNU sed, leaves any 0x00 (as well as other "binary" characters)
in the current line and respects it when matching.
busybox' sed OTOH, doesn't do this but seems to terminate the string upon such
0x00.
Example Files:
$ hd test-with-0x00
00000000 66 6f 6f 0a 62 61 72 0a 7a 65 72 00 0a 62 61 7a |foo.bar.zer..baz|
00000010 0a 7a 65 72 00 0a 65 6e 64 0a |.zer..end.|
0000001a
$ hd test-with-lone-0x00
00000000 66 6f 6f 0a 62 61 72 0a 00 0a 62 61 7a 0a 7a 65 |foo.bar...baz.ze|
00000010 72 00 0a 65 6e 64 0a |r..end.|
00000017
$ hd test-with-0x02-and-0x00
00000000 66 6f 6f 0a 62 61 72 0a 7a 65 02 00 0a 62 61 7a |foo.bar.ze...baz|
00000010 0a 7a 65 72 00 0a 65 6e 64 0a |.zer..end.|
0000001a
$ hd test-with-0x00-followed-by-alpha
00000000 66 6f 6f 0a 62 61 72 0a 7a 65 72 00 6f 6f 0a 62 |foo.bar.zer.oo.b|
00000010 61 7a 0a 7a 65 72 00 74 74 0a 65 6e 64 0a |az.zer.tt.end.|
0000001e
GNU sed:
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00 | hd
00000000 7a 65 72 00 0a |zer..|
00000005
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-lone-0x00 | hd
00000000 00 0a |..|
00000002
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x02-and-0x00 | hd
00000000 7a 65 02 00 0a |ze...|
00000005
$ sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00-followed-by-alpha | hd
00000000 7a 65 72 00 6f 6f 0a |zer.oo.|
00000007
(Note that GNU sed's -z option is NOT used.)
busybox' sed:
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-lone-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x02-and-0x00 | hd
$ busybox sed -n
'0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}'
test-with-0x00-followed-by-alpha | hd
$
So it seems that busybox' sed simply does the matching till the 0x00 (which is
perhaps used as string terminator), while GNU sed goes fully down the end of
line (\n).
Though it's worth to bring this to your attention.
Cheers,
Chris.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the busybox-cvs
mailing list