Most programs that use regular expressions (26.4) are able to match a pattern only on a single line of input. This makes it difficult to find or change a phrase, for instance, because it can start near the end of one line and finish near the beginning of the next line. Other patterns might be significant only when repeated on multiple lines.
sed has the ability to load more than one line into the pattern space. This allows you to match (and change) patterns that extend over multiple lines. In this article, we show how to create a multiline pattern space and manipulate its contents.
The multiline Next command, N, creates a multiline pattern space
by reading a new line of input and appending it to the
contents of the pattern space.
The original contents of the pattern space and the new input line
are separated by a newline.
The embedded newline character can be matched in patterns
by the escape sequence \n
.
In a multiline pattern space, only the metacharacter ^
matches the newline at the beginning of the pattern space
and $
matches the newline at the end.
After the Next command is executed,
control is then passed to subsequent commands in the script.
The Next command differs from the next command, n, which outputs the contents of the pattern space and then reads a new line of input. The next command does not create a multiline pattern space.
For our first example, let's suppose that we wanted to
change "Owner and Operator Guide" to "Installation Guide"
but we found that it appears in the file on two lines,
splitting between Operator
and Guide
.
For instance, here are a few lines of sample text:
Consult Section 3.1 in the Owner and Operator Guide for a description of the tape drives available on your system.
The following script looks for Operator
at the end of
a line, reads the next line of input, and then makes
the replacement:
/Operator$/{ N s/Owner and Operator\nGuide/Installation Guide/ }
In this example, we know where the two lines split and
where to specify the embedded newline.
When the script is run on the sample file, it produces
the two lines of output, one of which combines
the first and second lines and is too long
to show here.
This happens because the substitute command matches
the embedded newline but does not replace it.
Unfortunately, you cannot use \n
to insert a newline
in the replacement string.
You must either use the backslash
to escape the newline, as follows:
s/Owner and Operator\nGuide /Installation Guide\ /
or use the
\(
..\)
operators (34.10)
to keep the newline:
s/Owner and Operator\(\n\)Guide /Installation Guide\1/
This command restores the newline after Installation Guide
.
It is also necessary to match a blank space following Guide
so the new line won't begin with a space.
Now we can show the output:
Consult Section 3.1 in the Installation Guide for a description of the tape drives available on your system.
Remember, you don't have to replace the newline, but if you don't, it can make for some long lines.
What if there are other occurrences of "Owner and Operator Guide" that break over
multiple lines in different places? You could
change the address to match Owner
, the first
word in the pattern instead of the last, and then modify
the regular expression to look for a space or a newline
between words, as shown below:
/Owner/{ N s/Owner *\n*and *\n*Operator *\n*Guide/Installation Guide/ }
The asterisk (*
) indicates that the space or newline is optional.
This seems like hard work though, and indeed there is a more
general way. We can read the newline
into the pattern space and then
use a substitute command to remove the embedded newline, wherever
it is:
s/Owner and Operator Guide/Installation Guide/ /Owner/{ N s/ *\n/ / s/Owner and Operator Guide */Installation Guide\ / }
The first line of the script matches Owner and Operator Guide
when it appears
on a line by itself. (See the discussion at the end of the article about
why this is necessary.)
If we match the string
Owner
, we read the next line into the pattern space and replace the
embedded newline with a space. Then we attempt to match the whole
pattern and make the replacement followed by a newline.
This script will match Owner and Operator
Guide
regardless
of how it is broken across two lines.
Here's our expanded test file:
Consult Section 3.1 in the Owner and Operator Guide for a description of the tape drives available on your system. Look in the Owner and Operator Guide shipped with your system. Two manuals are provided, including the Owner and Operator Guide and the User Guide. The Owner and Operator Guide is shipped with your system.
Running the above script on the sample file produces the following result:
%sed -f sedscr sample
Consult Section 3.1 in the Installation Guide for a description of the tape drives available on your system. Look in the Installation Guide shipped with your system. Two manuals are provided, including the Installation Guide and the User Guide. The Installation Guide is shipped with your system.
In this sample script, it might seem redundant to have two substitute commands that match the pattern. The first command matches it when the pattern is found already on one line, and the second matches the pattern after two lines have been read into the pattern space. Why the first command is necessary is perhaps best demonstrated by removing that command from the script and running it on the sample file:
%sed -f sedscr2 sample
Consult Section 3.1 in the Installation Guide for a description of the tape drives available on your system. Look in the Installation Guide shipped with your system. Two manuals are provided, including the Installation Guide and the User Guide.
Do you see the two problems?
The most obvious problem is that the last line
did not print. The last line matches Owner
, and
when N is executed, there is not another input
line to read, so sed quits.
It does not even output the line.
If this is the normal
behavior,
the Next command should be used as follows
to be safe:
$!N
It excludes the last line ($
) from the Next command.
As it is in our script, by matching Owner and Operator
Guide
on the last line, we avoid matching Owner
and applying the N command. However, if the word Owner
appeared on the last line we'd have the same problem
unless we implement the $!N
syntax.
The second problem is a little less conspicuous. It has
to do with the occurrence of Owner and Operator
Guide
in the second paragraph. In the input file,
it is found on a line by itself:
Look in the Owner and Operator Guide shipped with your system.
In the output shown above, the blank line following
shipped with your system
is missing. The reason for
this is that this line matches Owner
and the next
line, a blank line, is appended to the pattern space.
The substitute command removes the embedded newline, and
the blank line has in effect vanished. (If the line
were not blank, the newline would still be removed
but the text would appear on the same line with
shipped with your system
.) The best
solution seems to be to avoid reading the next line
when the pattern can be matched on one line.
So, that is why the first instruction attempts to match the case
where the string appears all on one line.
- from O'Reilly & Associates' sed & awk, Chapter 6