start page | rating of books | rating of authors | reviews | copyrights

Unix Power ToolsUnix Power ToolsSearch this book

13.10. Compound Searches

You may recall that you can search for lines containing "this" or "that" using the egrep (Section 13.4) | metacharacter:

egrep 'this|that' files

But how do you grep for "this" and "that"? Conventional regular expressions don't support an and operator because it breaks the rule of patterns matching one consecutive string of text. Well, agrep (Section 13.6) is one version of grep that breaks all the rules. If you're lucky enough to have it installed, just use this:

agrep 'cat;dog;bird' files

If you don't have agrep, a common technique is to filter the text through several greps so that only lines containing all the keywords make it through the pipeline intact:

grep cat files | grep dog | grep bird

But can it be done in one command? The closest you can come with grep is this idea:

grep 'cat.*dog.*bird' files

which has two limitations -- the words must appear in the given order, and they cannot overlap. (The first limitation can be overcome using egrep 'cat.*dog|dog.*cat', but this trick is not really scalable to more than two terms.)

As usual, the problem can also be solved by moving beyond the grep family to the more powerful tools. Here is how to do a line-by-line and search using sed, awk, or perl:[44]

[44]Some versions of nawk require an explicit $0~ in front of each pattern.

sed '/cat/!d; /dog/!d; /bird/!d' files
awk '/cat/ && /dog/ && /bird/' files
perl -ne 'print if /cat/ && /dog/ && /bird/' files

Okay, but what if you want to find where all the words occur in the same paragraph? Just turn on paragraph mode by setting RS="" in awk or by giving the -00 option to perl:

awk '/cat/ && /dog/ && /bird/ {print $0 ORS}' RS= files
perl -n00e 'print "$_\n" if /cat/ && /dog/ && /bird/' files

And if you just want a list of the files that contain all the words anywhere in them? Well, perl can easily slurp in entire files if you have the memory and you use the -0 option to set the record separator to something that won't occur in the file (like NUL):

perl -ln0e 'print $ARGV if /cat/ && /dog/ && /bird/' files

(Notice that as the problem gets harder, the less powerful commands drop out.)

The grep filter technique shown earlier also works on this problem. Just add a -l option and the xargs command (Section 27.17) to make it pass filenames, rather than text lines, through the pipeline:

grep -l cat files | xargs grep -l dog | xargs grep -l bird

(xargs is basically the glue used when one program produces output needed by another program as command-line arguments.)

-- GU



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.