[BusyBox] Outright insanity from gnu sed.
Rob Landley
rob at landley.net
Mon Sep 29 00:45:33 UTC 2003
On Thursday 25 September 2003 08:17, Rob Landley wrote:
> Okay, THIS behavior of gnu sed probably isn't worth replicating unless
> somebody complains, unless you can figure out what's going on...? (I think
> it's a bug. the appended line is showing up before the line it was
> supposed to be appended to. That can't be right...)
Okay, the bit about the append line showing up before the thing it's supposed
to be appended to is right because N flushes the append buffer. So that's
actually right.
The _inconsistency_ is that when N hits EOF, it does NOT flush the append
buffer. And this explicitly violates the text of the spec:
> The text specified for the a command, and the contents of the file specified
> for the r command, shall be written to standard output just before the next
> attempt to fetch a line of input when executing the N or n commands, or when
> reaching the end of the script.
It should flush the append buffer before the next _attempt_, not the next
successful line read. Gnu sed is wrong here, and I'm not implementing this
behavior.
Speaking of which, is there any real world test case where gnu sed's "append a
newline for N on EOF" behavior is actually depended upon? (I.E. do we really
_have_ to implement this? It's amazingly ugly, and those scripts are
depending on an admitted and documented bug.)
P.S. I may actually know why their bug happens:
echo -e "boo" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -n -e "boo" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
Their "add extra EOF at end" logic may be triggering when it shouldn't. Just
a guess, I haven't looked at their code...
In any case, I hope to post a Big Evil Patch of all the work I've done in the
past few days to the list within the next 24 hours. Not guaranteeing it'll
be fully debugged yet, but I've done 8 gazillion things and need to resync
with everybody
Random stuff I did:
Moved \n parsing down in to the areas it's supposed to be. (We can't do the
whole string up front because then sed -e "\nblahns/one/two/" wouldn't work,
and a couple other similar corner cases.)
Made it so "y.abc\.def.ghijkl." should work properly. And \n in there too.
Audited sed command parsing in general and rationalized the heck out of it.
Moved a couple small functions like parse_translate command inline if they
were only called from one place anyway.
"One string to rule them all": there used to be 8 gazillion string fields in
sed_cmd that were never used at the same time. Now every sed command that
needs one string argument is using sed_cmd->string, making the structure
smaller and the free logic simpler. The only one that couldn't be folded in
is filename, because then you couldn't support "s/one/two/w filename",
because "two" is using sed_cmd->string alrady.)
- I added match count support ( s/from/to/3 ). This ate the s///g command,
which is now indicated by match count=0. (The default is 1, for first
match.)
- Rather than allocate and free a regexp buffer on every search, I just made a
global that's the biggest size it can ever be in our code (10 entries,
sizeof(regmatch_t) is 8 bytes, so 80 bytes total. Those 10 entries are the
global match's start and end, plus backrefs 1-9. And that's not going to get
bigger until there are more digit keys on the keyboard or we start allowing
\10 as a backref, which really doesn't seem worth it. Implementation note:
\0 is a synonym for &. I can live with that.) note: This may actually allow
us to rip out all the logic for counting and tracking the number of backrefs
in the sed_cmd (since we no longer need it for memory allocation and can tell
at runtime that the match start is -1), but I haven't actually done that yet.
- I added the first half of s///w support (it now parses the option and stores
the filename in sed_cmd. The logic to actually DO it is missing because I
have to work out when it creates an empty file and when it appends to an
existing file. To-do item...)
- pipeline_putc was using in-band signalling (PIPE_MAGIC). That's just evil.
I fixed it.
- Lots and lots of little memory leaks. parse_translate_command wasn't
freeing match and replace, etc.
- I made append work. There is now an append buffer. It gets flushed by the
next line read, or at end of input.
- moved the line reading and line printing logic into functions which track
whether the last line read had a newline, so that if it prints out that line
it doesn't append a newline onto it when there wasn't one in the source file.
(This is a gnu sed behavior that is a technical violation of the spec, which
says there will always be a newline. I don't care, the gnu behavior is
correct here. Document the deviation.) One idiosyncrasy of this is that if
there are multiple input files (and yes stdin can be one of them, filename
"-"), a newline is inserted BETWEEN those files. It's only the actual last
line of output that can potentially leave the newline off. The fix? If we
left off the newline, add an empty line to the append buffer (this is after
the EOF flush, so it won't get flushed again unless we read another line,
I.E. from the next file.) So the free logic up top that iterated through
sed_cmd now needs to know to free the append_buffer as well (which is a
linked list).
Rob
(P.S. This stuff still makes my brain hurt, but at least the CODE doesn't
anymore. That's something. I _still_ need to finish reading the spec. :)
More information about the busybox
mailing list