Tuesday, August 14, 2007

Manipulate text with sed

Sed is a very handy and very powerful little text manipulator. Sed is short for “stream editor” and what it does is manipulate and filter text. Typically, sed is used “in-transmit,” meaning that you pipe the output of one command into sed to have it modify the contents of another program’s output and format it, rendering new output. You can also operate sed on a text file; it will send the transformed text to standard output, which can then be redirected into another file.

The best way to illustrate the power of sed is to provide a few examples:

$ printf "line onenline twon" | sed -e 's/.*/( & )/'

( line one )

( line two )

This example outputs two lines, and sed transforms them into two lines wrapped in parentheses. It does this by matching a pattern and transforming it. The expression is s/[pattern]/[replacement]/. You can use other characters as delimiters; in this case I used the backslash (/), but you can also use a comma or pipe (|).

In the above expression, the pattern to match is “.*” (everything); the replacement expression uses the ampersand (&) as a placeholder to indicate all matched text. In this case, it’s the entire line, so the replacement text is ( [text] ).

You can also use sed to transpose text. Assume you had a file with two words per line, but you wished to have the second word displayed first, then the second, separated by a comma:

$ printf "line onenline twon" | sed -e 's/(.*) (.*)/2,1/'

one,line

two,line

Here the line line one is transformed into one,line. The pattern uses parentheses to create matching blocks. In other words, the expression (.*) (.*) matches one string, a space, then another string. Both of these strings are placed into hold buffers, which are represented by \1 for the first and \2 for the second. The replacement expression then uses these hold buffers to place the text in the format we want: second string, comma, first string.

You can use sed to do some very interesting things, such as create a command to rename certain files:

$ ls -1 *.txt | sed -e 's/.*/mv & &.old/' >execute; sh execute && rm -f execute

This chain of commands takes the output of ls -1 *.txt, which sed modifies — turning the file name from list.txt to mv list.txt list.txt.old, which is then piped into a file called execute. Once this is complete, execute is executed by the sh shell, which will perform the mv command on each listed file; when it has successfully completed, the execute file is removed.

This has just scratched the surface of using sed. It is extremely powerful and has many interesting uses, and is definitely worth a closer look.

No comments: