[BusyBox] My sed to-do list.

Wed Oct 1 03:19:04 UTC 2003

Just a brain dump of my notes.txt file in case somebody sees something on 
there they feel like playing with before I do...

(When looking at my tests, it helps to know that no_newline is a file that 
doesn't end in a newline.  "echo -n hi > no_newline" or some such...)

Rob

Implemented D and w.  Still need to implement l.

Note: washing something through the hold buffer gives it a newline.
  echo -e -n "a\nb" | sed -e "1x;1x;2p" - no_newline
  echo -e -n "a\nb" | sed -e "1h;1g;2p" - no_newline

The reason a naieve scan for \n on the whole string is a bad idea:
  echo -e -n "a\nb\nc\n" | sed -e "p;\nbnp"

If the command line has actual newline characters in it, we have to treat
that as a multi-line input file.  Yes, you can do this:
  $echo echo fred | sed -e "i \
  > more"
  more
  fred
  $

Problem:
	If last line didn't end with a newline, but now we output something
	new from next file, but it's not the CONTENTS of the file. Example:
        echo -n wham | sed -e "/hi/i boom" - no_newline
	So _all_ output needs to go through sed_puts.

More stuff:
parse_file_cmd was just completely wrong.  Filename continues until EOL.

c should output immediately, it doesn't play with pattern_space.  Open Group:
	Delete the pattern space. With a 0 or 1 address or at the end of a
	2-address range, place text on the output and start the next cycle.

All the bb_error_msg_and_die instances should be factored out into translation
file, with ability to not compile in message text to save size...

Todo: 	s///w filename
	Filenames like this should go to EOL, but _NOT_ eat next line if line
        ends with \ (which is a valid unix name character).

Done:
	Match count support.
	One string to rule them all...
	pipeline_putc should NOT use in-band signaling!

parse_translate_command wasn't freeing match and replace...
Sed:

Editing command: [blanks;][address[,address]][blanks]function
Address: line#, $ (last line of input), /regexp/
	 no address=everything, one address=matching line(s)
	 addr2<addr1, select one line at addr1.

What about file boundaries with N;?  Get next line, no more in this
file but one in _next_ file with multiple files on command line...?

add support for s/i/j/42 (done, remember to add test)
	echo "little whinging" | sed -e "/li/{s/i/j/I3};p"

echo "hello" | sed -e "s ll \ &\  "

[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/1pp"
sed: -e expression #1, char 9: multiple `p' options to `s' command
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/0"
sed: -e expression #1, char 7: number option to `s' command may not be zero
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/1p3"
sed: -e expression #1, char 9: multiple number options to `s' command
[landley at localhost busybox]$ echo "hello" | sed -e "1!!s/l/y/"
sed: -e expression #1, char 2: Extra characters after command
sed: -e expression #1, char 3: Multiple `!'s

It's the number of explicit error messages gnu sed has that impresses me...

Note: Gnu sed seems to eat anything up to a semicolon, closing curly bracket,
or newline as an option to the s command:

[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/ s/fred/george/"
sed: -e expression #1, char 8: Unknown option to `s'
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/;s/fred/george/"
heylo
[landley at localhost busybox]$ echo "hello" | sed -e "s/l/y/}s/fred/george/"
sed: -e expression #1, char 7: Unexpected `}'
[landley at localhost busybox]$ echo "hello" | sed -e "{s/l/y/}s/fred/george/"
sed: -e expression #1, char 9: Extra characters after command
[landley at localhost busybox]$ echo "hello" | sed -e "{s/l/y/};s/fred/george/"
heylo

echo -e "boo\n" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -e "boo" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -n -e "boo" | sed -e "aboing" -e "N" -e "s/\n/kablam/"
echo -n | sed -e "s/this/that/" - testing (no blank line for -)

Corner case:
  line with no newline prints.
  append buffer prints (ending with newline).
  Start next file, already has newline, no extra needed.

# Filename can have spaces in it.
echo "hello" | sed -e "s/ll/mm/w what gives"
# Filename that ends with backslash does not eat next line.  (no \n either.)
echo "hello" | sed -e "s/ll/mm/w what; gives\\" -e "p"
# Creates files even though errors out before doing anything.
echo "hello" | sed -e "s/a/b/w test1" -e "w test2" -e "pp"

# Error: file couldn't be opened
echo "hello" | sed -e "w does/not/exist"

# file written has no newline, but output only affected by 
echo -n hi | sed -n -e "a blat" -e "w gawhonga"

echo -e -n "a\nb\nc" | sed -e "/b/,/c/c\\" -e "fred" - no_newline

Old SED_GNU_COMPATABILITY help text said:
          Where GNU sed doesnt follow the posix standard, do as GNU sed does.
          Current difference are in
            - N command with odd number of lines (see GNU sed info page)
            - Blanks before substitution flags eg.
                GNU sed interprets 's/a/b/ g' as 's/a/b/g'
                Standard says 's/a/b/ g' should be 's/a/b/;g'
            - GNU sed allows blanks between a '!' and the function.

The GNU info page mentions all these commands we don't quite do yet:
	e [command]
		pipe input from a shell command into pattern space
		by itself, run pattern space as command.
		with command, pipe pattern space into command, use output
		works across multiple lines like "c".
	'L [N]'
		Wordwrap L's output to given length.
	'l 0'	Length 0 means never wordwrap.
	Q [exit-code]
		Error return code for Q.
	q [exit-code]
		Same, doesn't discard current line.
	#
		Can't have an address, #n must be first 2 chars of input.
	R filename
		Read one line of filename, append instead of insert.
	T label
		Branch if _no_ substitutions.
	v
		NOP.  (Fail if gnu sed extensions not supported.)
	W filename
		Write pattern space to file up to first newline.

	Numerical address can be start~step
	/regexp/I ignore case
	/regexp/M multi-line (^ and $ match prev or next line if it's empty).

	Gnu extension: 0,/regexp/ so that if first line has ending match it
	gets matched.  (try: echo -e "a\nb\nc\n" | sed -e "1,/a/p" )  They
        say that end match starts looking with line _after_ start match.

	addr1,+n   addr1 plus next n lines.
	addr2,~n   addr1 to next line that's a multiple of n.

	s search extensions.
		new escape sequences:
		\L \U \E Turn replacement string to upper/lower case until
			next escape of this type found.  (\E is end.)
		# plus g: replace matches from # onwards.
		e execute match as a command string, pipe result to pattern_space.
		Ii	Ignore case in regexp.
		Mm	That multi-line stuff again.
	\a=7, \f=12, \n=10, \r=13, \t=9, \v=11
	\cX: toupper(X)^0x40.
	\dXXX decimal escape
	\oXXX octal excape
	\xXX  hex escape

	In regexps: \s=whitespace, \S not whitespace, \w word (letter, digit,
        underscore), \W non-word.

Test: Try y/blah/blah2/ with different "blah" sizes.  We should have an
error message for this...

Do these show up in the same file ok:
	echo "blah" | sed -e "w fred" -e "s/la/la/w fred"