You want to find the N
th
match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of
"fish"
:
One fish two fish red fish blue fish
Use the
/g
modifier in a
while
loop, keeping count of matches:
$WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; # Warning: don't `last' out of this loop } }
The third fish is a red one.
Or use a repetition count and repeated pattern like this:
/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
As explained in the chapter introduction, using the
/g
modifier in scalar context creates something of a
progressive match
, useful in
while
loops. This is commonly used to count the number of times a pattern matches in a string:
# simple way with while loop $count = 0; while ($string =~ /PAT/g) { $count++; # or whatever you'd like to do here } # same thing with trailing while $count = 0; $count++ while $string =~ /PAT/g; # or with for loop for ($count = 0; $string =~ /PAT/g; $count++) { } # Similar, but this time count overlapping matches $count++ while $string =~ /(?=PAT)/g;
To find the N
th
match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every N
th
match by checking for multiples of N using the modulus operator. For example,
(++$count
%
3)
==
0
would be every third match.
If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.
$pond = 'One fish two fish red fish blue fish'; # using a temporary @colors = ($pond =~ /(\w+)\s+fish\b/gi); # get all matches $color = $colors[2]; # then the one we want # or without a temporary array $color = ( $pond =~ /(\w+)\s+fish\b/gi )[2]; # just grab element 3 print "The third fish in the pond is $color.\n";
The third fish in the pond is red.
Or finding all even-numbered fish:
$count = 0; $_ = 'One fish two fish red fish blue fish'; @evens = grep { $count++ % 2 == 1 } /(\w+)\s+fish\b/gi; print "Even numbered fish are @evens.\n";
Even numbered fish are two blue.
For substitution, the replacement value should be a code expression that returns the proper string. Make sure to return the original as a replacement string for the cases you aren't interested in changing. Here we fish out the fourth specimen and turn it into a snack:
$count = 0; s{ \b # makes next \w more efficient ( \w+ ) # this is what we'll be changing ( \s+ fish \b ) }{ if (++$count == 4) { "sushi" . $2; } else { $1 . $2; } }gex;
One fish two fish red fish sushi fish
Picking out the last match instead of the first one is a fairly common task. The easiest way is to skip the beginning part greedily. After
/.*\b(\w+)\s+fish\b/
, for example, the
$1
variable would have the last fish.
Another way to get arbitrary counts is to make a global match in list context to produce all hits, then extract the desired element of that list:
$pond = 'One fish two fish red fish blue fish swim here.'; $color = ( $pond =~ /\b(\w+)\s+fish\b/gi )[-1]; print "Last fish is $color.\n";
Last fish is blue.
If you need to express this same notion of finding the last match in a single pattern without
/g
, you can do so with the negative lookahead assertion
(?!THING)
. When you want the last match of arbitrary pattern A, you find A followed by any amount of not A through the end of the string. The general construct is
A(?!.*A)*$
, which can be broken up for legibility:
m{ A # find some pattern A (?! # mustn't be able to find .* # something A # and A ) $ # through the end of the string }x
That leaves us with this approach for selecting the last fish:
$pond = 'One fish two fish red fish blue fish swim here.'; if ($pond =~ m{ \b ( \w+) \s+ fish \b (?! .* \b fish \b ) }six ) { print "Last fish is $1.\n"; } else { print "Failed!\n"; }
Last fish is blue.
This approach has the advantage that it can fit in just one pattern, which makes it suitable for similar situations as shown in
Recipe 6.17
. It has its disadvantages, though. It's obviously much harder to read and understand, although once you learn the formula, it's not too bad. But it also runs more slowly though - around twice as slowly on the data set tested
above.
The behavior of
m//g
in scalar context is given in the "Regexp Quote-like Operators" section of
perlop
(1), and in the
"Pattern Matching Operators"
section of
Chapter 2
of
Programming Perl
; zero-width positive lookahead assertions are shown in the "Regular Expressions" section of
perlre
(1), and in the
"rules of regular expression matching"
section of
Chapter 2
of
Programming Perl
Copyright © 2001 O'Reilly & Associates. All rights reserved.