You may recall that you can search for lines containing "this"
or "that" using the
egrep (27.5)
|
metacharacter:
egrep 'this|that'files
But how do you grep for "this" and "that"? Conventional regular expressions don't support an and operator because it breaks the rule that patterns match one consecutive string of text. Well, agrep (28.9) is one version of grep that breaks all the rules. If you're lucky enough to have it installed, just use:
agrep 'cat;dog;bird'files
If you don't have agrep, a common technique is to filter the text through several greps so that only lines containing all the keywords make it through the pipeline intact:
grep catfiles
| grep dog | grep bird
But can it be done in one command? The closest you can come with grep is this idea:
grep 'cat.*dog.*bird'files
which has two limitations - the
words must appear in the given order, and they cannot overlap.
(The first limitation can be overcome using egrep 'cat.*dog|dog.*cat'
,
but this trick is not really scalable to more than two terms.)
As usual, the problem can also be solved by moving beyond the grep family to the more powerful tools. Here is how to do a line-by-line and search using sed, awk, or perl: [2]
[2] Some versions of nawk require an explicit
$0~
in front of each pattern.
sed '/cat/!d; /dog/!d; /bird/!d'files
awk '/cat/ && /dog/ && /bird/'files
perl -ne 'print if /cat/ && /dog/ && /bird/'files
Okay, but what if you want to find where all the words occur in the same
paragraph?
Just turn on paragraph mode by setting RS=""
in awk
or by giving the -00 option to perl:
awk '/cat/ && /dog/ && /bird/ {print $0 ORS}' RS=files
perl -n00e 'print "$_\n" if /cat/ && /dog/ && /bird/'files
And if you just want a list of the files
that contain all the
words anywhere in them?
Well, perl can easily slurp in entire files
if you have the memory and you use the -0 option to set the record
separator to something that won't occur in the file (like NUL):
perl -ln0e 'print $ARGV if /cat/ && /dog/ && /bird/'files
(Notice that as the problem gets harder, the less powerful commands drop out.)
The grep filter technique shown above also works on this problem. Just add a -l option (15.7) and the xargs command (9.21) to make it pass filenames through the pipeline rather than text lines:
grep -l catfiles
| xargs grep -l dog | xargs grep -l bird
(xargs is basically glue used when one program produces output that's needed by another program as command-line arguments.)
-