[Article 27.11 introduced a script called cgrep.sed, a general-purpose, grep-like program built with sed. It allows you to look for one or more words that appear on one line or across several lines. This article explains the sed tricks that are necessary to do this kind of thing. It gets into territory that is essential for any advanced applications of this obscure yet wonderful editor. (Articles 34.13 through 34.16 have background information.) -JP]
Let's review the two examples from article 27.11. The first command below finds all lines containing the word system in the file main.c, and shows 10 additional lines of context above and below each match. The second command finds all occurrences of the word "awk" where it is followed by the word "perl" somewhere within the next 3 lines:
cgrep -10 system main.c cgrep -3 "awk.*perl"
Now the script, followed by an explanation of how it works:
case expr ${?} "$@" | #!/bin/sh # cgrep - multiline context grep using sed # Usage: cgrep [-context] pattern [file...] n=3 case $1 in -[1-9]*) n=`expr 1 - "$1"` shift esac re=${1?}; shift sed -n " 1b start : top \~$re~{ h; n; p; H; g b endif } N : start //{ =; p; } : endif $n,\$D b top " "$@" |
---|
The sed script is embedded in a bare-bones
shell wrapper (44.14)
to parse out the initial arguments because, unlike awk and
perl, sed cannot directly access command-line parameters.
If the first argument looks like a -context option, variable
n is reset to one more than the number of lines specified, using
a little trick - the argument is treated as a negative number and
subtracted from 1
.
The pattern argument is then stored in $re
, with the
${1?}
syntax causing the shell to abort with an error message
if no pattern was given.
Any remaining arguments are passed as filenames to the sed
command.
So that the $re
and $n
parameters can be embedded,
the sed script is enclosed in
double quotes (8.14).
We use the -n option because we don't want to print out every
line by default, and because we need to use the n
command
in the script without its side effect of outputting a line.
The sed script itself looks rather unstructured (it was actually
designed using a flowchart), but the basic algorithm is easy enough
to understand.
We keep a "window" of n lines in the pattern space
and scroll this window through the input stream.
If an occurrence of the pattern comes into the window, the entire
window is printed (providing n lines of previous context), and
each subsequent line is printed until the pattern scrolls out of view
again (providing n lines of following context).
The sed idiom N;D
is used to advance the window, with the
D
not kicking in until the first n lines of input have
been accumulated.
The core of the script is basically an if-then-else construct
that decides if we are currently "in context."
(The regular expression here is delimited by tilde (~
)
characters because tildes are less likely to occur in the user-supplied
pattern than slashes.)
If we are still in context, then the next line of
input is read and output, temporarily using the hold space to
save the window (and effectively doing an N
in the process).
Else we append the next input line (N
) and search for the
pattern again (an empty regular expression means to reuse the
last pattern).
If it's now found, then the pattern must have just come into view - so
we print the current line number followed by the contents of the
window.
Subsequent iterations will take the "then" branch until the pattern
scrolls out of the window.
-