[BusyBox] My sed to-do list.
Rob Landley
rob at landley.net
Wed Oct 1 03:19:04 UTC 2003
Just a brain dump of my notes.txt file in case somebody sees something on
there they feel like playing with before I do...
(When looking at my tests, it helps to know that no_newline is a file that
doesn't end in a newline. "echo -n hi > no_newline" or some such...)
Rob
Implemented D and w. Still need to implement l.
Note: washing something through the hold buffer gives it a newline.
echo -e -n "a\nb" | sed -e "1x;1x;2p" - no_newline
echo -e -n "a\nb" | sed -e "1h;1g;2p" - no_newline
The reason a naieve scan for \n on the whole string is a bad idea:
echo -e -n "a\nb\nc\n" | sed -e "p;\nbnp"
If the command line has actual newline characters in it, we have to treat
that as a multi-line input file. Yes, you can do this:
$echo echo fred | sed -e "i \
> more"
more
fred
$
Problem:
If last line didn't end with a newline, but now we output something
new from next file, but it's not the CONTENTS of the file. Example:
echo -n wham | sed -e "/hi/i boom" - no_newline
So _all_ output needs to go through sed_puts.
More stuff:
parse_file_cmd was just completely wrong. Filename continues until EOL.
c should output immediately, it doesn't play with pattern_space. Open Group:
Delete the pattern space. With a 0 or 1 address or at the end of a
2-address range, place text on the output and start the next cycle.
All the bb_error_msg_and_die instances should be factored out into translation
file, with ability to not compile in message text to save size...
Todo: s///w filename
Filenames like this should go to EOL, but _NOT_ eat next line if line
ends with \ (which is a valid unix name character).
Done:
Match count support.
One string to rule them all...
pipeline_putc should NOT use in-band signaling!
parse_translate_command wasn't freeing match and replace...
Sed:
Editing command: [blanks;][address[,address]][blanks]function
Address: line#, $ (last line of input), /regexp/
no address=everything, one address=matching line(s)
addr2<addr1, select one line at addr1.
What about file boundaries with N;? Get next line, no more in this
file but one in _next_ file with multiple files on command line...?
add support for s/i/j/42 (done, remember to add test)
echo "little whinging" | sed -e "/li/{s/i/j/I3};p"
echo "hello" | sed -e "s ll \ &\ "
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/1pp"
sed: -e expression #1, char 9: multiple `p' options to `s' command
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/0"
sed: -e expression #1, char 7: number option to `s' command may not be zero
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/1p3"
sed: -e expression #1, char 9: multiple number options to `s' command
[landley at localhost busybox]$ echo "hello" | sed -e "1!!s/l/y/"
sed: -e expression #1, char 2: Extra characters after command
sed: -e expression #1, char 3: Multiple `!'s
It's the number of explicit error messages gnu sed has that impresses me...
Note: Gnu sed seems to eat anything up to a semicolon, closing curly bracket,
or newline as an option to the s command:
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/ s/fred/george/"
sed: -e expression #1, char 8: Unknown option to `s'
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/;s/fred/george/"
heylo
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/}s/fred/george/"
sed: -e expression #1, char 7: Unexpected `}'
[landley at localhost busybox]$ echo "hello" | sed -e "{s/l/y/}s/fred/george/"
sed: -e expression #1, char 9: Extra characters after command
[landley at localhost busybox]$ echo "hello" | sed -e "{s/l/y/};s/fred/george/"
heylo
echo -e "boo\n" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -e "boo" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -n -e "boo" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -n | sed -e "s/this/that/" - testing (no blank line for -)
Corner case:
line with no newline prints.
append buffer prints (ending with newline).
Start next file, already has newline, no extra needed.
# Filename can have spaces in it.
echo "hello" | sed -e "s/ll/mm/w what gives"
# Filename that ends with backslash does not eat next line. (no \n either.)
echo "hello" | sed -e "s/ll/mm/w what; gives\\" -e "p"
# Creates files even though errors out before doing anything.
echo "hello" | sed -e "s/a/b/w test1" -e "w test2" -e "pp"
# Error: file couldn't be opened
echo "hello" | sed -e "w does/not/exist"
# file written has no newline, but output only affected by
echo -n hi | sed -n -e "a blat" -e "w gawhonga"
echo -e -n "a\nb\nc" | sed -e "/b/,/c/c\\" -e "fred" - no_newline
Old SED_GNU_COMPATABILITY help text said:
Where GNU sed doesnt follow the posix standard, do as GNU sed does.
Current difference are in
- N command with odd number of lines (see GNU sed info page)
- Blanks before substitution flags eg.
GNU sed interprets 's/a/b/ g' as 's/a/b/g'
Standard says 's/a/b/ g' should be 's/a/b/;g'
- GNU sed allows blanks between a '!' and the function.
The GNU info page mentions all these commands we don't quite do yet:
e [command]
pipe input from a shell command into pattern space
by itself, run pattern space as command.
with command, pipe pattern space into command, use output
works across multiple lines like "c".
'L [N]'
Wordwrap L's output to given length.
'l 0' Length 0 means never wordwrap.
Q [exit-code]
Error return code for Q.
q [exit-code]
Same, doesn't discard current line.
#
Can't have an address, #n must be first 2 chars of input.
R filename
Read one line of filename, append instead of insert.
T label
Branch if _no_ substitutions.
v
NOP. (Fail if gnu sed extensions not supported.)
W filename
Write pattern space to file up to first newline.
Numerical address can be start~step
/regexp/I ignore case
/regexp/M multi-line (^ and $ match prev or next line if it's empty).
Gnu extension: 0,/regexp/ so that if first line has ending match it
gets matched. (try: echo -e "a\nb\nc\n" | sed -e "1,/a/p" ) They
say that end match starts looking with line _after_ start match.
addr1,+n addr1 plus next n lines.
addr2,~n addr1 to next line that's a multiple of n.
s search extensions.
new escape sequences:
\L \U \E Turn replacement string to upper/lower case until
next escape of this type found. (\E is end.)
# plus g: replace matches from # onwards.
e execute match as a command string, pipe result to pattern_space.
Ii Ignore case in regexp.
Mm That multi-line stuff again.
\a=7, \f=12, \n=10, \r=13, \t=9, \v=11
\cX: toupper(X)^0x40.
\dXXX decimal escape
\oXXX octal excape
\xXX hex escape
In regexps: \s=whitespace, \S not whitespace, \w word (letter, digit,
underscore), \W non-word.
Test: Try y/blah/blah2/ with different "blah" sizes. We should have an
error message for this...
Do these show up in the same file ok:
echo "blah" | sed -e "w fred" -e "s/la/la/w fred"
More information about the busybox
mailing list