[Bug 8791] sed : in substitution string \1 to \9 or & don't behave as per GNU & POSIX specification

bugzilla at busybox.net bugzilla at busybox.net
Thu Mar 17 09:45:53 UTC 2016


https://bugs.busybox.net/show_bug.cgi?id=8791

--- Comment #6 from Ron Yorston <rmy at pobox.com> ---
>Now, about an issue on https://github.com/rmyorston/busybox-w32 :
> 1. isn't it restricting the pb to windows when streams with CRLF line
> endings can also be fed to sed under unix?

Microsoft Windows and Unix treat DOS-format files differently.  In the former
the CRLF is a line terminator; in the latter the LF is a line terminator and CR
is part of the line.  busybox-w32 should have used the platform convention for
line terminators and now it does.  Processing DOS-format files with Unix tools
may require the file to be converted to Unix conventions first.  GNU sed and
BusyBox sed are consistent in their handling of DOS-format files:  they both
treat the CR as part of the line.  I think this is the correct way for Unix
tools to behave.

>2. as my example was simple it actually wasn't generic at all since i only
>used the \(.*\) pattern as reference ( which alone stands for ^\(.*\)$ )
>and, indeed, goes till the end of line. But the problem also occurs with
>patterns referencing inner parts of the streamed line :
> say, something like xy\(.*\)zt which is closer to the actual cases i
> noticed the pb with.

Can you provide an example of the problem?  I tried the following with BusyBox
sed, busybox-w32 sed and GNU sed with DOS-format and Unix-format files.  In
each case the result is the same and consistent with what I'd expect.

$ cat myfile
xyazt
xybzt
xyczt
$ sed 's/xy\(.*\)zt/\1\1/' myfile
aa
bb
cc

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the busybox-cvs mailing list